Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
AWS ParallelCluster user committed Feb 16, 2021
2 parents ac3f913 + 56d8a40 commit 2d405ee
Show file tree
Hide file tree
Showing 3 changed files with 86 additions and 14 deletions.
41 changes: 41 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,43 @@
# covid_sequencing_analysis_pipeline
AWS optimized pipeline based on https://github.com/niemasd/SD-COVID-Sequencing

Pipeline version 0.1.0 uses the following external software programs:

* ivar 1.3.1
* minimap2 2.17-r941
* samtools 1.11
* QualiMap v.2.2.2-dev
* FastQC v0.11.9

These are pre-installed on the snap-094df268c9d5d3ef0 Amazon Web Services snapshot in region us-east-2 (Ohio).

Should one wish to set up the pipeline on a fresh instance, follow the below commands.
Create a conda environment and activate it, then run:

```
conda install numpy
conda install boto3
conda install -c bioconda fastqc
conda install -c bioconda qualimap
conda install -c bioconda minimap2
conda install -c bioconda samtools
```

Followed by:
```
pip install multiqc
pip install nwalign3
```

Finally, install ivar from source (see https://github.com/andersen-lab/ivar ).

The pipeline is optimized to run on an AWS EC2 cluster with the following characteristics:
```
master_instance_type = t2.medium
compute_instance_type = r5d.24xlarge
cluster_type = ondemand
ebs_settings = custom
base_os = ubuntu1604
scheduler = sge
compute_root_volume_size = 500
```
31 changes: 19 additions & 12 deletions covid.sh
Original file line number Diff line number Diff line change
@@ -1,27 +1,28 @@
#!/bin/bash

export PATH=/shared/workspace/software/SD-COVID-Sequencing:/shared/workspace/software/ivar/bin:/shared/workspace/software/anaconda3/bin:$PATH
export PATH=/shared/workspace/software/ivar/bin:/shared/workspace/software/anaconda3/envs/covid1.1/bin:$PATH

THREADS=1
REF_FAS="/scratch/reference/NC_045512.2.fas"
REF_MMI="/scratch/reference/NC_045512.2.fas.mmi"
REF_GFF="/scratch/reference/NC_045512.2.gff3"
PRIMER_BED="/scratch/reference/sarscov2_v2_primers.bed"
PRIMER_BED="/scratch/reference/nCoV-2019.primer.bed"
WORKSPACE=/scratch/$SAMPLE
mkdir -p $WORKSPACE
mkdir -p $WORKSPACE/fastqc

if [[ ! -f "$REF_FAS" ]]; then
mkdir -p /scratch/reference/
cp /shared/workspace/software/SD-COVID-Sequencing/reference_genome/NC_045512.2.fas $REF_FAS
cp /shared/workspace/software/SD-COVID-Sequencing/reference_genome/NC_045512.2.fas.mmi $REF_MMI
cp /shared/workspace/software/SD-COVID-Sequencing/reference_genome/NC_045512.2.gff3 $REF_GFF
cp /shared/workspace/software/SD-COVID-Sequencing/primers/swift/sarscov2_v2_primers.bed $PRIMER_BED
cp /shared/workspace/projects/covid/data/primers/nCoV-2019.primer.bed $PRIMER_BED
fi

# Step 0: Download fastq
aws s3 cp $S3DOWNLOAD/"$SAMPLE"_R1_001.fastq.gz $WORKSPACE/
aws s3 cp $S3DOWNLOAD/"$SAMPLE"_R2_001.fastq.gz $WORKSPACE/
aws s3 cp $S3DOWNLOAD/ $WORKSPACE/ --recursive --exclude "*" --include "$SAMPLE*fastq.gz"

# Fastqc
fastqc $WORKSPACE/"$SAMPLE"*fastq.gz -o $WORKSPACE/fastqc

# Step 1: Map Reads + Sort
{ time ( minimap2 -t $THREADS -a -x sr $REF_MMI $WORKSPACE/"$SAMPLE"*.fastq.gz | samtools sort --threads $THREADS -o $WORKSPACE/"$SAMPLE".sorted.bam ) ; } 2> $WORKSPACE/"$SAMPLE".log.1.map.log
Expand Down Expand Up @@ -49,12 +50,18 @@ for x in sorted trimmed.sorted ; do
{ time ( qualimap bamqc -bam $WORKSPACE/"$SAMPLE".$x.bam -nt $THREADS --java-mem-size=4G -outdir $WORKSPACE/"$SAMPLE".$x.stats && tar c $WORKSPACE/"$SAMPLE".$x.stats | pigz -9 -p $THREADS > $WORKSPACE/"$SAMPLE".$x.stats.tar.gz && rm -rf $WORKSPACE/"$SAMPLE".$x.stats ) ; } > $WORKSPACE/"$SAMPLE".log.8.qualimap.$x.log 2>&1
done

# QC
python /shared/workspace/software/covid_sequencing_analysis_pipeline/test_sarscov2_consensus_qc.py $WORKSPACE/"$SAMPLE".trimmed.sorted.pileup.consensus.fa $WORKSPACE/"$SAMPLE".trimmed.sorted.depth.txt $REF_FAS

# Step 9: Zip
cd $WORKSPACE && zip -9 "$SAMPLE".zip "$SAMPLE"*

aws s3 cp $WORKSPACE/"$SAMPLE".zip $S3DOWNLOAD/results/zip/
aws s3 cp $WORKSPACE/"$SAMPLE".trimmed.sorted.pileup.variants.tsv $S3DOWNLOAD/results/variants/
aws s3 cp $WORKSPACE/"$SAMPLE".trimmed.sorted.pileup.consensus.fa $S3DOWNLOAD/results/consensus/
aws s3 cp $WORKSPACE/"$SAMPLE".trimmed.sorted.depth.txt $S3DOWNLOAD/results/depth/
aws s3 cp $WORKSPACE/"$SAMPLE".sorted.stats.tar.gz $S3DOWNLOAD/results/stats/
aws s3 cp $WORKSPACE/"$SAMPLE".trimmed.sorted.stats.tar.gz $S3DOWNLOAD/results/stats/
aws s3 cp $WORKSPACE/"$SAMPLE".zip $S3DOWNLOAD/results_20210212/zip/
aws s3 cp $WORKSPACE/"$SAMPLE".trimmed.sorted.pileup.variants.tsv $S3DOWNLOAD/results_20210212/variants/
aws s3 cp $WORKSPACE/"$SAMPLE".trimmed.sorted.pileup.consensus.fa $S3DOWNLOAD/results_20210212/consensus/
aws s3 cp $WORKSPACE/"$SAMPLE".trimmed.sorted.depth.txt $S3DOWNLOAD/results_20210212/depth/
aws s3 cp $WORKSPACE/"$SAMPLE".sorted.stats.tar.gz $S3DOWNLOAD/results_20210212/qualimap/
aws s3 cp $WORKSPACE/"$SAMPLE".trimmed.sorted.stats.tar.gz $S3DOWNLOAD/results_20210212/qualimap/
aws s3 cp $WORKSPACE/fastqc/ $S3DOWNLOAD/results_20210212/fastqc/ --recursive
aws s3 cp $WORKSPACE/ $S3DOWNLOAD/results_20210212/logs/ --recursive --exclude "*" --include "*log"
aws s3 cp $WORKSPACE/"$SAMPLE".passfail.tsv $S3DOWNLOAD/results_20210212/qc
28 changes: 26 additions & 2 deletions covid_custom_config.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,25 @@
table_columns_visible:
Qualimap:
avg_gc: True
median_insert_size: True
10_x_pc: True
p25_insert_size: True
median_coverage: True
mean_coverage: True
percent_aligned: True
mapped_reads: True
total_reads: False
general_error_rate: False
FastQC:
percent_duplicates: False
percent_gc: True
avg_sequence_length: True
percent_fails: False
total_sequences: False
iVar:
reads_too_short_after_trimming: False
reads_outside_primer_region: False
trimmed_reads: True
mapped_reads: True

qualimap_config:
general_stats_coverage:
Expand All @@ -10,4 +28,10 @@ qualimap_config:
- 1
- 5
- 30
- 50
- 50

custom_plot_config:
qualimap_coverage_histogram:
logswitch: True
logswitch_active: True
logswitch_label: 'Log10'

0 comments on commit 2d405ee

Please sign in to comment.