You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
How do I specify to trim partial sequences when supplying a fasta file of the sequences to be trimmed?
I have amplicon sequencing data and I want to use cutadapt to remove FWD and REV amplicon primer sequences. I've read through the recipe in the documentation (https://cutadapt.readthedocs.io/en/stable/recipes.html#trimming-amplicon-primers-from-paired-end-reads), which anchors the primer sequences; however what if the primer sequence is only partial (e.g. APTORmysequence ---> mysequence, removes APTOR even though AD is missing if the fill primer sequence was ADAPTOR). I'm asking because I used the anchoring method and not all primer sequences were removed, so I'm thinking there might be partial primer sequences at the beginning of the reads.
When specifying a file, I see in the documentation how to anchor the sequences:
cutadapt -g ^ATTCCGTAC # if there was no file specified
cutadapt -g ^file:primer.fa # when a file is specified
However, when I tried to apply the same logic for non-internal/partical adapter sequences:
cutadapt -a XATTCCGTAC # if there was no file specified
cutadapt -g Xfile:primer.fa # when a file is specified
# my full command with bash variables
cutadapt --cores=4 -g ^file:${FWD} -G ^file:${REV} -o ${out1} -p ${out2} ${R1} ${R2}
I get the following error:
"
Character 'F' in adapter sequence 'FNLE:/HOME/FWD.PRNMERS.FASTA' is not a valid IUPAC code. Use only characters 'ABCDGHIKMNRSTUVWXY'.
"
Something weird about the error is that FWD.PRNMERS.FASTA is not the name of the file I specified. The name is FWD.primers.fasta (there is an "N" substituted for the "i" in primers).
How do I specify to trim partial sequences when supplying a fasta file of the sequences to be trimmed?
Versions
cutadpat v4.9
python v3.10.14
installed via conda
The text was updated successfully, but these errors were encountered:
Hi, the Xfile: syntax is not supported at the moment (see #361). You would need to manually add the X to each sequence in your fwd.primers.fasta file instead.
That said, because you just want to check whether there might be partial primer occurrences, I would just try it without the X first as an initial check. That doesn’t restrict where the 5' primer is allowed to be located at all. So it is less strict than using the X. If running it without the X does not give you an improvement, then adding the X will not help either.
Character 'F' in adapter sequence 'FNLE:/HOME/FWD.PRNMERS.FASTA' is not a valid IUPAC code. Use only characters 'ABCDGHIKMNRSTUVWXY'.
Something weird about the error is that FWD.PRNMERS.FASTA is not the name of the file I specified.
Yeah, that looks a bit weird because Cutadapt did not understand that you wanted it to read the adapters from a file. Instead, it interpreted the file:/home/fwd.primers.fasta string directly as an adapter sequence. It then did a couple of transformations (for example, converting all characters to uppercase) and complains about the first character that it doesn’t know how to interpret.
The name is FWD.primers.fasta (there is an "N" substituted for the "i" in primers).
Yes, the "I" is for inosine, which Cutadapt treats like an "N" wildcard, see #546.
How do I specify to trim partial sequences when supplying a fasta file of the sequences to be trimmed?
I have amplicon sequencing data and I want to use cutadapt to remove FWD and REV amplicon primer sequences. I've read through the recipe in the documentation (https://cutadapt.readthedocs.io/en/stable/recipes.html#trimming-amplicon-primers-from-paired-end-reads), which anchors the primer sequences; however what if the primer sequence is only partial (e.g. APTORmysequence ---> mysequence, removes APTOR even though AD is missing if the fill primer sequence was ADAPTOR). I'm asking because I used the anchoring method and not all primer sequences were removed, so I'm thinking there might be partial primer sequences at the beginning of the reads.
When specifying a file, I see in the documentation how to anchor the sequences:
However, when I tried to apply the same logic for non-internal/partical adapter sequences:
I get the following error:
"
Character 'F' in adapter sequence 'FNLE:/HOME/FWD.PRNMERS.FASTA' is not a valid IUPAC code. Use only characters 'ABCDGHIKMNRSTUVWXY'.
"
Something weird about the error is that FWD.PRNMERS.FASTA is not the name of the file I specified. The name is FWD.primers.fasta (there is an "N" substituted for the "i" in primers).
How do I specify to trim partial sequences when supplying a fasta file of the sequences to be trimmed?
Versions
cutadpat v4.9
python v3.10.14
installed via conda
The text was updated successfully, but these errors were encountered: