Breakpoint-pairs

Minor research internship of Sarah Mehrem.

The task of this project was to optimise DeepSV in regards to hyperparameters to increase its performance, and afterwards to explore its theoretical capabilities as an SV caller. For that I used existing calls of GRIDSS, Manta, Lumpy and Delly. I trained and tested the DeepSV CNN on these variants (per caller) and investigated aspects such as perfromance with different negative sets, channel importance and also compared it to the perfromance of the traditional SV callers using the gold-standard callset from svclassify:

ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/technical/svclassify_Manuscript/Supplementary_Information/Personalis_1000_Genomes_deduplicated_deletions.bed

Gerenal workflow

I started off by creating the test and training data per caller. For that I generated genomewide channel data of the NA12878 once. The negative set genomic coordinates were generated using: MakeNegative_NoBP_NoBP.py MakeNegative_CR_CR.py MakeNegative_CR_TrueSV.py MakeNegative_TrueSV_TrueSV.py

We used window pairs (breakpoint-breakpoint) of channel windows as input. These were created using the scripts: MakeWindowPairs_genomewide.sh MakeWindowPairs_genomewide_negative.sh and the VCF files of the respective callers and the genomic coordinates of the negative sets.

Final training and test sets were created using MakeTrainingData.py

Next, to train the DeepSV CNN this script is used: OC_cross_validation_dropout.py

In case of calculating channel importance, the above mentioned script was modified into: OC_cross_validation_dropout_zeroed.py

In order to compare the performance of DeepSV to the other callers, these R scripts were used: BenchmarkCallerCNNsSummarisedNegSets.R for a mean performance across negative sets BenchmarkGridssCNN.R for each negative set individually OverlapOver10Runs.R for overlap with the Mills and svclassify reference sets

Name		Name	Last commit message	Last commit date
Latest commit History 134 Commits
RScript		RScript
AnalyseModelHistory.py		AnalyseModelHistory.py
CNN_Train_DellyDels_Spl_Epo_LR.sh		CNN_Train_DellyDels_Spl_Epo_LR.sh
CNN_Train_DellyDels_Spl_Epo_LR_Dropout.sh		CNN_Train_DellyDels_Spl_Epo_LR_Dropout.sh
CNN_Train_DellyDels_Spl_Epo_LR_Dropout_Stack.sh		CNN_Train_DellyDels_Spl_Epo_LR_Dropout_Stack.sh
ChannelImportance_Calc.py		ChannelImportance_Calc.py
ChannelImportance_Figures-20191016T152552Z-001.zip		ChannelImportance_Figures-20191016T152552Z-001.zip
ChannelImportance_Paras.txt		ChannelImportance_Paras.txt
ChannelImportance_ParasNewGridss.txt		ChannelImportance_ParasNewGridss.txt
ChannelImportance_Run.sh		ChannelImportance_Run.sh
ChannelImportance_Testing.py		ChannelImportance_Testing.py
ChannelImportance_Testing_Zeroed.py		ChannelImportance_Testing_Zeroed.py
ChannelImportance_Tetr_Paras.txt		ChannelImportance_Tetr_Paras.txt
ChannelImportance_Tetr_Paras_onlynewgridss.txt		ChannelImportance_Tetr_Paras_onlynewgridss.txt
ChannelImportance_Tetr_Run.sh		ChannelImportance_Tetr_Run.sh
CheckChannelMaker.py		CheckChannelMaker.py
ConvertFLT_to_symb.py		ConvertFLT_to_symb.py
CountVariantsUsed.py		CountVariantsUsed.py
DropoutNewModels_Done_Parameters		DropoutNewModels_Done_Parameters
Dropout_Test_Paras.txt		Dropout_Test_Paras.txt
Dropout_Test_Paras_NewBestModels.txt		Dropout_Test_Paras_NewBestModels.txt
Dropout_Test_Paras_OneLine.txt		Dropout_Test_Paras_OneLine.txt
Excluded_SVs_delly.log		Excluded_SVs_delly.log
Excluded_SVs_gridss.log		Excluded_SVs_gridss.log
Excluded_SVs_lumpy.log		Excluded_SVs_lumpy.log
Excluded_SVs_manta.log		Excluded_SVs_manta.log
ExtractWindowSizes.py		ExtractWindowSizes.py
FilterVCF_bySize.py		FilterVCF_bySize.py
GetNucleotide.py		GetNucleotide.py
HPC_RunChannelImportance.sh		HPC_RunChannelImportance.sh
HPC_RunChannelImportance_Tetr.sh		HPC_RunChannelImportance_Tetr.sh
HPC_TrainCNN_spawn_jobs.sh		HPC_TrainCNN_spawn_jobs.sh
HPC_TrainCNN_spawn_jobs_Dropout.sh		HPC_TrainCNN_spawn_jobs_Dropout.sh
HPC_TrainCNN_spawn_jobs_Dropout_Stack.sh		HPC_TrainCNN_spawn_jobs_Dropout_Stack.sh
LICENSE		LICENSE
MakeChannelData_perChr.sh		MakeChannelData_perChr.sh
MakeDropoutParameters.py		MakeDropoutParameters.py
MakeHyperparameter_StackedWindows.py		MakeHyperparameter_StackedWindows.py
MakeNegative_NoBP_NoBP.py		MakeNegative_NoBP_NoBP.py
MakeParameterFile_HPC.py		MakeParameterFile_HPC.py
MakeTrainingData.py		MakeTrainingData.py
MakeVCFsforOverlap.py		MakeVCFsforOverlap.py
MakeVCFsforOverlap_MajorityScore.py		MakeVCFsforOverlap_MajorityScore.py
MakeWindowPairs_genomewide.sh		MakeWindowPairs_genomewide.sh
MakeWindowPairs_genomewide_negative.sh		MakeWindowPairs_genomewide_negative.sh
Negative_NoBP_NoBP.txt		Negative_NoBP_NoBP.txt
OC_cross_validation.py		OC_cross_validation.py
OC_cross_validation_dropout.py		OC_cross_validation_dropout.py
OC_cross_validation_dropout_zeroed.py		OC_cross_validation_dropout_zeroed.py
Parameters_HPC.txt		Parameters_HPC.txt
Parameters_StackedWindows.txt		Parameters_StackedWindows.txt
PlotWindows.py		PlotWindows.py
Plot_HPTweaking.py		Plot_HPTweaking.py
Plot_HPTweaking_Boxplots.py		Plot_HPTweaking_Boxplots.py
Plot_HPTweaking_DrpGradient.py		Plot_HPTweaking_DrpGradient.py
Plot_PR_Dist.py		Plot_PR_Dist.py
README.md		README.md
candidate_pairs.py		candidate_pairs.py
channel_maker.py		channel_maker.py
coverage.py		coverage.py
functions.py		functions.py
labels.py		labels.py
parallel_windowmaker.sh		parallel_windowmaker.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Breakpoint-pairs

Gerenal workflow

About

Releases

Packages

Languages

License

GooglingTheCancerGenome/breakpoint-pairs

Folders and files

Latest commit

History

Repository files navigation

Breakpoint-pairs

Gerenal workflow

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages