Skip to content
Young edited this page Mar 1, 2024 · 10 revisions

Welcome to the wiki for Donut Falls!

All "good" bioinformatic tools and workflows are attempting to solve a problem. The problem we ran into we could not find a completed workflow on nf-core, and we needed a simple workflow to assembly nanopore sequencing reads with and without corresponding Illumina reads for downstream analyses.

Notable goals:

  • works for R10.4 and above flow cells
  • portability for easy incorporation into other workflows
  • assembly of nanopore reads with and without polishing
  • assembly with unicycler
  • assembly with more than one assembler
  • easy-to-access QC metrics
    • coverage and circular status are in the resultant fasta files
    • all results go into a multiqc report
  • no gene annotation (our genomes are submitted to PGAP)
  • subsampling to 100X depth (subsampling depth default is actually to 150X) to reduce assembly artifacts
  • removal of short nanopore reads
  • medaka needed to run with 1 cpus because we were seeing an increase in errors after medaka polishing

Missed goals:

  • Time filtering with ontime. We have noticed if we filter reads "a little bit" after the run has started, but before the reagents get depleted, we have better reads. Time information, however, is removed for reads in the SRA, so it is difficult to test. We may find ways to incorporate this feature at a later date. Also, this information can be "difficult" to obtain automatically, but we look forward to more developments in this area.

Nanopore sequence processing is an actively developing field, so tools were chosen due to their acceptance in the field and extracted from the tutorials generated by Dr. Ryan Wick in the Trycycler wiki and Perfect bacterial genome tutorial wiki.

The generated consensus files can then be used in multiple applications, including phylogenetic analysis with Grandeur or submission to NCBI via the genome submissions portal.

This wiki will cover the rationale and steps of this workflow.

Basic diagram of the workflow

---
Donut Falls
---
flowchart TD

subgraph S0[input files]
A[/nanopore fastq/]
M[/"illumina reads (optional)"/]
end



subgraph S1[flye or raven]
B["fastp for length filtering (1,000 bp) and quality (Q12)"]
B --> C[rasusa for random downsampling to 150X coverage for 5M sized genome]
C --> D[assembly with flye and/or raven]
D --> F[rotation with dnaapler]
F --> H[medaka for polishing]
end

A --> B

subgraph S2[unicycler]
O[unicycler hybrid assembly]
end

A --> S2
M --> S2

subgraph S4[polishing]
H --> J[bwa]
J --> K[polypolish]
K --> L[pypolca]
M --> N[fastp for default filtering]
N --> J
end

subgraph S5[quality metrics]
T[nanopore fastq files] --> Nanoplot
T --> CirculoCov
R --> CirculoCov
R[fasta files] --> G[busco]
S[gfa files] --> I[bandage for assembly visualization]
S --> Q[gfastats]
end


S2 --> S5
S1 --> S5
S4 --> S5

S5 --> V[MultiQC report]

style S1 stroke-width:4px
%% mermaid help : https://mermaid.js.org/syntax/flowchart.html
Loading