Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error at GroupReadsByUmi due to read order? #52

Open
davidmasp opened this issue May 28, 2024 · 4 comments
Open

Error at GroupReadsByUmi due to read order? #52

davidmasp opened this issue May 28, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@davidmasp
Copy link

Description of the bug

Hi

I have hit an error trying to run one of the test datasets that merges multiple fastq files per sample (samplesheet.multi_lanes.csv).

I have to say first that when running with the testing dataset that includes only the chr17.fa the pipeline runs great. When i run it with my local version of the hg38 igenomes fasta however it hits an error at the GroupReadsByUmi step. The file should be accessible by downloading the hg38 version in this link - here.

I download the sample sheet from the test-dataset repo and I run the pipeline using:

nextflow run nf-core/fastquorum -r 1.0.0 -params-file params.yaml -w work.nobackup -resume

And the params.yaml file is:

input: ./samplesheet.multi_lanes.csv
outdir: ./results.nobackup
duplex_seq: yes
fasta: <<path to>>/iGenomes/Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/genome.fa
call_min_reads: 6 3
filter_min_reads: 3 3

The error is in the GroupReadsByUmi step, it claims that a given read is not sorted correctly (SRR6109255.382418).

Command error:
  INFO:    Environment variable SINGULARITYENV_TMP is set, but APPTAINERENV_TMP is preferred
  INFO:    Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
  [2024/05/27 03:00:20 | FgBioMain | Info] Executing GroupReadsByUmi from fgbio version 2.0.2 as <<USER>>@<<machine>> on JRE 11.0.9.1-internal+0-adhoc..src with snappy, IntelInflater, and IntelDeflater
  [2024/05/27 03:00:20 | GroupReadsByUmi | Info] Filtering the input.
  [2024/05/27 03:00:20 | GroupReadsByUmi | Info] Assigning reads to UMIs and outputting.
  [2024/05/27 03:00:20 | FgBioMain | Fatal] ########################################################################################
  [2024/05/27 03:00:20 | FgBioMain | Fatal] Execution failed!
  [2024/05/27 03:00:20 | FgBioMain | Fatal] Template SRR6109255.382418 has only one read, paired-reads required for paired strategy.
  [2024/05/27 03:00:20 | FgBioMain | Fatal] ########################################################################################
  [2024/05/27 03:00:20 | FgBioMain | Info] GroupReadsByUmi failed. Elapsed time: 0.04 minutes.

When going into the directory I can see that the problematic read (the first read in the file) is actually properly paired but and not in order?

> samtools view SRR6109255_three_lanes.bam | cut -f 1,2,3,4 | head
SRR6109255.382418       99      chr1    10466962
SRR6109255.143407       163     chr1    1913705
SRR6109255.143407       83      chr1    1913705
SRR6109255.441042       99      chr1    18777854
SRR6109255.382418       147     chr1    10467504
SRR6109255.441042       147     chr1    18777853
SRR6109255.58832        163     chr1    26281528
SRR6109255.58832        83      chr1    26281528

When I track down the merged bam one level more I reach the MERGE_BAM process. The individual bams seem to be all sorted correctly but the merged one (see above) may not be?

> samtools view 1/SRR6109255_three_lanes.mapped.bam |  cut -f 1,2,3,4 | head
SRR6109255.382418       99      chr1    10466962
SRR6109255.382418       147     chr1    10467504
SRR6109255.145917       99      chr1    28242786
SRR6109255.145917       147     chr1    28243430

> samtools view 2/SRR6109255_three_lanes.mapped.bam |  cut -f 1,2,3,4 | head
SRR6109255.143407       163     chr1    1913705
SRR6109255.143407       83      chr1    1913705
SRR6109255.764263       113     chr1    30571796
SRR6109255.764263       177     chr1    197441624

> samtools view 3/SRR6109255_three_lanes.mapped.bam |  cut -f 1,2,3,4 | head
SRR6109255.441042       99      chr1    18777854
SRR6109255.441042       147     chr1    18777853
SRR6109255.58832        163     chr1    26281528
SRR6109255.58832        83      chr1    26281528

I am not sure if I am doing smth wrong.

Command used and terminal output

nextflow run nf-core/fastquorum -r 1.0.0 -params-file params.yaml -w work.nobackup -resume

(information in the description)

Relevant files

No response

System information

HPC, running on SGE

Ubuntu Linux

N E X T F L O W
version 24.04.1 build 5913
created 20-05-2024 09:48 UTC (02:48 PDT)
cite doi:10.1038/nbt.3820
http://nextflow.io

nf-core/fastquorum = 1.0.0

@nh13
Copy link
Member

nh13 commented May 28, 2024

@davidmasp looks like a bug (that I introduced) in samtools: samtools/samtools#2062.

@nh13
Copy link
Member

nh13 commented Jul 15, 2024

Fixed in samtools/samtools#2062, but we'll need a samtools release before we can fix it here

@SPPearce
Copy link
Contributor

SPPearce commented Aug 3, 2024

Fixed in #68

@SPPearce SPPearce closed this as completed Aug 3, 2024
@SPPearce SPPearce reopened this Aug 3, 2024
@SPPearce
Copy link
Contributor

SPPearce commented Aug 3, 2024

But going to keep this one around until we have the samtools release and we can undo this fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants