LifeBit code challenge

Info

The pipeline uses two docker containers:
garcianacho/lb_base
garcianacho/lb_vep_full
These containers contain all dependencies and scripts necesary to run the pipeline.

The container garcianacho/lb_base uses a different scatter script from the one to run on NextFlow to deal with the way WDL parses the paths to the R script.

Run

To run the pipeline you must clone this repo.

git clone garcia-nacho code-challenge-nextflow-wdl-annotation

and run the following command inside the code-challenge-nextflow-wdl-annotation folder

miniwdl run ./lb_challenge.wdl InputVcf=./VCFsubset.vcf

Note that you need miniwdl installed on your system, you can get it via pip install miniwdl

Interestingly, the same wdl code doesn't work when using Cromwell (v.85). This is because under Cromwell the input files for the last process are located inside subfolders and the R script can't find them. This behaviour is different from Miniwdl where all the files are located together inside the same folder. This is which is what the R script expects. In other words, Miniwdl's behaviour is the same as in NextFlow and different from Cromwell.

Input

As input, I have subsampled a vcf file from the 1000 genomes project: 1000Genomes/trio/HG00702_SH089_CHS. To speed up the process of testing I have just gathered a few variants from each chromosome as required.

Under the hood

The command runs a script that splits the vcf file used as input in several parts. Given the size of the vcf used as input, the size of the parts is just 10 variants per file. This size can be easily adjusted. Next all the chunks are sent to the vep command. Vep runs the following plugins:

BLOSUM62
CSN
DownstreamProtein
ProteinLengthChange
HGVS_IntronEndOffset
HGVS_IntronStartOffset
LOVD
NearestExonJB
ReferenceQuality
SpliceRegion
TSSDistance
FlagLRG

On the last step, the pipeline gathers all results and generate an unique vcf file inside the Results folder that is generated by the pipeline

Output

The output file will be called Results.vcf and the md5sum is 2a5f79e048b74f8ab98f10b47725c7dc
This output file will be located inside the ./XXXX_UUUU_RunVep/out/gather_vep.finaloutput folder.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
README.md		README.md
VCFsubset.vcf		VCFsubset.vcf
lb_challenge.wdl		lb_challenge.wdl
vep_inputs.json		vep_inputs.json
wdl.png		wdl.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LifeBit code challenge

Info

Run

Input

Under the hood

Output

About

Releases

Packages

Languages

garcia-nacho/WDL_CodeChallenge_LifeBit

Folders and files

Latest commit

History

Repository files navigation

LifeBit code challenge

Info

Run

Input

Under the hood

Output

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages