Skip to content
Young edited this page Mar 1, 2024 · 9 revisions

Usage

nextflow run UPHL-BioNGS/Donut_Falls -profile singularity --sample_sheet <path to sample sheet>

There are many steps into getting an assembled bacterial genome. Donut Falls supports Nanopore sequencing of isolates with and without corresponding Illumina fastq files. For metagenomic samples, we recommend NF-core's MAG or other workflows. Our typical use-case is sequencing isolates on a GridIon and using MinKnow for basecalling and fastq.gz file generation.

---
Basic nanopore workflow
---
flowchart LR

A[isolate bacteria] --> D[sequence]
D --> E[basecalling]
E --> F[Donut Falls]
F --> G[analysis]
Loading

Final results are placed in the value of params.outdir (default = 'donut_falls'), which can be adjusted on the command line or in an input file.

Prepare input files

There was an attempt made to match Illumina reads to Nanopore reads in a variety of different ways, but we decided it was too difficult to maintain. Thus, a sample sheet that matches Nanopore reads with Illumina reads can be used as input.

The sample file has two required columns and two optional columns

  • 'sample' designate the name used for the isolate that was sequenced
  • 'fastq' designate the Nanopore fastq.gz file
  • 'fastq_1' and 'fastq_2' are optional and designate the forward and reverse Illumina reads

A typical sample file with both Nanopore and Illumina reads

sample,fastq,fastq_1,fastq_2
test,nanopore.fastq.gz,illumina_1.fastq.gz,illumina_2.fastq.gz

An acceptable sample file for just Nanopore reads

sample,fastq
test,long_reads_low_depth.fastq.gz

An acceptable sample file where one sample does not have Illumina reads

sample,fastq,fastq_1,fastq_2
sample1,sample1.fastq.gz,sample1_R1.fastq.gz,sample1_R2.fastq.gz
sample2,sample2.fastq.gz,,

Recommended ext.args to adjust

The default workflow should run just fine, but there are some parameters that would improve performance. Donut Falls, due to its portability, does not have many input parameters - because those would need to be inherited from the larger workflow that it was part of. Instead, ext.args can be adjusted with a config file. Admittedly, this does make customization more difficult. Sorry.

WARNING : Changing ext.args via config files will change those values for every sample in the workflow. If you have samples that need different values, they need to be run separately.

rasusa

The default workflow assumes a genome size of 5M for rasusa subsampling to 150X coverage. The recommended coverage for assembly is 100X coverage, but we needed the base values to work for most use-cases. Although this works for many organisms sequenced at UPHL and in public health in general (i.e. Escherichia coli, Salmonella enterica, and even Pseudomonas aeruginosa), this may be problematic for genomes much larger (like Sorangium cellulosum with 13M bases) or smaller (like Campylobacter jejuni with 1.7M).

For these cases, we recommend adjusting the ext.args for rasusa in a config file.

process {
    withName: rasusa {
        ext.args = "--genome-size 8.5mb --coverage 150"
    }
}

medaka

Medaka performs best when given what kind of model basecaller used. It generally has the format of {pore}_{device}_{caller variant}_{caller version} and specified with -m.

  • Example for data from MinION R9.4.1 flowclells using the fast Guppy basecaller version 3.0.3: '-m r941_min_fast_g303'
process {
    withName: medaka {
        ext.args = "-m r941_min_fast_g303"
    }
}

Switching assemblers

There are currently several options available for Donut falls that are specified by 'params.assembler'.

De novo assembly of nanopore reads (with or without polishing):

Hybrid assembly (requires Illumina reads)

Assembler only have to be listed once.

Choosing a profile

Donut Falls has two profiles for "easy" command line container management.

  • docker : uses Docker to manage containers in the workflow
docker.enabled = true
docker.runOptions = "-u \$(id -u):\$(id -g)"
  • singularity : uses Singularity to manage containers in the workflow
singularity.enabled = true
singularity.autoMounts = true

Using a config file

Config files are a reproducible way to ensure that the same parameters are shared each time a workflow is run. It is common to specify paths to databases and solidify parameter values in config files.

To get a copy of a template config file with every editable parameter, run the following command

nextflow run UPHL-BioNGS/Donut_Falls --config_file true

This will create a config file named edit_me.config into the current directory. This file can be renamed and edited without altering the original workflow. The parameters (also known as params) in this file are all preceded by //, which indicates that they are not in use. The // must be removed for that line to be taken into consideration from the workflow.

To use this config file during runtime, simply specify the config file with -c on the command line.

nextflow run UPHL-BioNGS/Donut_Falls -c edit_me.config

This master config file can also be found at Donut_Falls/configs/donut_falls_template.config.

Relevant parameters (params) including external files and directories and outputs

# optional: input summary file from nanopore sequencing run
params.sequencing_summary          = '' 
# sample sheet with information about samples and their corresponding files
params.sample_sheet                = ''
# specifies which subworkflow to use (default = 'flye')
params.assembler                   = 'flye' and/or 'raven' and/or 'unicycler'
# where the results are saved (default = 'donut_falls'
params.outdir                      = 'donut_falls'
# specifies if test files should be downloaded
params.test                        = false

Examples

Running a test profile (can be used to test different assemblers)

nextflow run UPHL-BioNGS/Donut_Falls -profile singularity,test

Defaut usage : flye assembly with files listed in sample_sheet.csv

nextflow run UPHL-BioNGS/Donut_Falls -profile singularity --sample_sheet sample_sheet.csv --assembler flye

Assembly with flye and raven using a sample sheet named 'SampleSheet.csv'

nextflow run UPHL-BioNGS/Donut_Falls -profile singularity --sample_sheet SampleSheet.csv --assembler flye,raven

Hybrid assembly with unicycler using docker to manage containers and a sample sheet named 'SampleSheet.csv'

nextflow run UPHL-BioNGS/Donut_Falls -profile docker --sample_sheet SampleSheet.csv --assembler unicycler

Using a config file to set all params, including container management and sample sheet

The config file

docker.enabled = true
docker.runOptions = "-u \$(id -u):\$(id -g)"
params.assembler = 'flye'
params.flye_options = '--meta'
params.sample_sheet = 'SampleSheet.csv'

The command line

nextflow run UPHL-BioNGS/Donut_Falls -c config.config