Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add process that identifies sequences still containing primers #157

Closed
erikrikarddaniel opened this issue Jul 10, 2020 · 5 comments
Closed
Assignees
Labels
enhancement New feature or request

Comments

@erikrikarddaniel
Copy link
Member

In some cases, sequences contain duplicated primers. Only the first one will be removed by cutadapt and the processed sequence will hence contain a primer sequence. This is potentially an artifact from the PCR step.

Add a process that searches all ASV sequences for primers and output 1) a table (tsv file) that contains affected ASVs and 2) filtered tables containing information about all ASVs except those containing primers.

@erikrikarddaniel erikrikarddaniel added the enhancement New feature or request label Jul 10, 2020
@erikrikarddaniel
Copy link
Member Author

One possibility is to run cutadapt twice. The first time letting sequences through only if they contain the primers, the second time to not let sequences through if they still contain primers.

@erikrikarddaniel
Copy link
Member Author

There's an --discard-trimmed option to cutadapt that can be used when cleaning up sequence pairs containing a second primer. @d4straub, this would be easy enough to implement but raises questions:

  1. Do we want this step to be non-mandatory? If so, one needs to create another if block...
  2. Does the --retain_untrimmed option to the workflow make sense?

I'd like to make the second cutadapt run mandatory and skip the --retain_untrimmed, to simplify the workflow.

@d4straub
Copy link
Collaborator

d4straub commented Sep 7, 2020

Ad 1: I am unsure how that second cutadapt run (--discard-trimmed) would effect results. Generally, I agree that reads should be after the first cutadapt run without any primer sequences (independent of using --retain_untrimmed), so the second cutadapt run discarding primer-containing reads should not break the analysis in any case. However, there might be unforseen complications and to keep the pipeline as broad as possible it might be good to make such a second run non-mandatory, but it seems ok to me to make it default.

Ad 2: Yes this option makes sense, because only primer trimmed reads might be available to the pipeline user. I'll like to have that in the pipeline, if at all possible.

@d4straub
Copy link
Collaborator

nf-core/test-datasets#195 by @emnilsson prepared tests for this issue.

@d4straub
Copy link
Collaborator

This was addressed by the linked PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants