Skip to content

This is a project for Bioinformatics class. It's about finding mutations using 3th gen sequencing method.

License

Notifications You must be signed in to change notification settings

eugen-vusak/finding-mutations-using-3th-gen-sequencing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

94 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Finding mutations using 3th generation sequencing

This is a project for Bioinformatics class. It's about finding mutations using 3th gen sequencing method.

Mario Štrbac, Eugen Vušak
Mentor: mag. ing. Robert Vaser

Test data

https://www.dropbox.com/s/iqi3nbf839wrshh/Bioinfo_19_20_train_data.tar.gz?dl=0

Description

ecoli.fasta - reference genome
ecoli_mutated.csv - list of mutations to find
ecoli_mutated.report - reading statistics and reference implementation's Jaccards score
ecoli_simulated_reads.fasta - readings obtained by sequencing the mutated genome
jaccard.py - script for computing Jaccard's score

Same goes for another dataset lambda.

Running code

run make to build program

run make run to build and run program

to run program after it was build run bin/main if on Linux or Mac. For Windows run bin\main.exe

Configuration

To configure program there is a config.json file that can be altered to specify desired parameters and filenames that will be used.
In config.json all parameters with program prefix are arguments of a program.

References

[1] Heng Li. Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. Bioinformatics, 32(14):2103–2110, 03 2016.

[2] Michael Roberts, Wayne Hayes, Brian R. Hunt, Stephen M. Mount, and James A. Yorke. Reducing storage requirements for biological sequence comparison. Bioinformatics, 20(18):3363–3369, 07 2004.

[3] Mirjana Domazet-Lošo, Mile Šikić, Bioinformatika