Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add additional Pacbio Processing Path #106

Closed
apeltzer opened this issue Dec 5, 2019 · 12 comments · Fixed by #168
Closed

Add additional Pacbio Processing Path #106

apeltzer opened this issue Dec 5, 2019 · 12 comments · Fixed by #168
Labels
help wanted Extra attention is needed question Further information is requested

Comments

@apeltzer
Copy link
Member

apeltzer commented Dec 5, 2019

We had discussions with Anders @andand, Daniel @erikrikarddaniel and Jeanette @jtangrot adding this on the Stockholm hackathon.

It would be an additional path to the existing ampliseq workflow for a future release to add possibilities that are currently missing.

@apeltzer apeltzer added help wanted Extra attention is needed question Further information is requested labels Dec 5, 2019
@apeltzer
Copy link
Member Author

apeltzer commented Dec 5, 2019

Adding in @d4straub to get him aboard too :-)

@erikrikarddaniel
Copy link
Member

We have discussed how to technically integrate this, and lean towards writing one or more R scripts that do long read denoising. The idea would be to run this as part of the workflow, after primer removal with cutadapt, when users specify PacBio or similar, instead of the normal QIIME2 processing. At the end of the script, a QIIME2 artefact would be output and the rest of the workflow could continue. Probably, this would be just before taxonomy assignment.

(Primer removal might be somewhat different, since the PacBio apparently contain sequences upstream of the primer (both forward and reverse) that should also be deleted. A flag to cutadapt, I suppose.)

We will start working on the R script(s) after Christmas and after we agree on a plan. We're all happy to discuss!

(Adding @DiegoBrambilla too.)

@d4straub
Copy link
Collaborator

d4straub commented Dec 6, 2019

Probably dada2:::removePrimers is better suited than cutadapt. This funcation can also immediately re-orient PacBio reads.
e.g. dada2:::removePrimers(rawFILES[i], trimmedFILES[i], primer.fwd=primerFW, primer.rev=dada2:::rc(primerRV), orient=TRUE)

@nbargues
Copy link

nbargues commented Jan 6, 2020

I'm currently working on a workflow for analyse ONT full-lenght 16S sequences using QIIMe2, based on this project [https://github.com/DeniRibicic/q2ONT]. Are you considering adding Nanopore data to the processing path ?

@d4straub
Copy link
Collaborator

d4straub commented Jan 13, 2020

Hi @nbargues , this looks interesting.
Yes, Nanopore processing should be included one day. As far as I can see you are using vsearch but dada2 is planned for this pipeline. Sorry, I am wrong, that's not planned at the moment.

@nbargues
Copy link

@d4straub Thanks for the response. I read that nanopore data is not supported by DADA2 currently and neither is the other denoising method Deblur. That's why vsearch is used.

@d4straub
Copy link
Collaborator

@nbargues Oh! I am so sorry, I somehow mixed PacBio and Nanopore!!
You are right, DADA2 only supports Illumina and PacBio but not Nanopore. And I have to correct myself, Nanopore is currently not meant to be included.
I'll edit the comment above so that nobody else is deceived!

@jtangrot
Copy link
Contributor

Probably dada2:::removePrimers is better suited than cutadapt. This funcation can also immediately re-orient PacBio reads.
e.g. dada2:::removePrimers(rawFILES[i], trimmedFILES[i], primer.fwd=primerFW, primer.rev=dada2:::rc(primerRV), orient=TRUE)

In our experience, cutadapt does a better job recognising the primers, but maybe that's not what you have seen? Anyway, using the (relatively new) option --rc cutadapt too can re-orient the reads during the primer removal process.

@d4straub
Copy link
Collaborator

Thanks for this remark. I did not specifically compare performance of dada2:::removePrimers with cutadapt. I have also have not used yet cutadapt's -rc, seems like a valid solution as well.

@jtangrot
Copy link
Contributor

What was maybe not clear in the discussions on the Stockholm hackathon was that we also would like to add support for ITS (which is the current use case we have for PacBio data). Should a separate issue be opened for that?

@d4straub
Copy link
Collaborator

I have close to no experience with analyzing ITS sequences. Would a processing path be very different from 16S to ITS analysis with DADA2 (except the taxonomic database, obviously)? If yes, than it would be definitely worth it to open a separate issue.

@d4straub d4straub linked a pull request Oct 27, 2020 that will close this issue
8 tasks
@d4straub d4straub added this to the V1.2 Teal Bronze Lion milestone Oct 28, 2020
@d4straub
Copy link
Collaborator

This was solved in #168, thanks @jtangrot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants