GitHub - miRTop/isomir_accuracy_meta_analysis: meta-analysis of public dataset to measure isomiR accuracy

Draft

https://docs.google.com/document/d/1VQN_7j0omnzKNc566jfe39stZVhXsfG2feNEkaJ4cFc/edit

Goals

Optimize GFF format definition and usability
Detect methodology accuracy due to tools and some experimental step in the protocols.

Data

DSRG data

Still to be published, another study to compare protocols using the mirxplor sample.

Fratta data

Evaluation of methodologies for microRNA biomarker detection by next generation sequencing https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6161688/ issue: wrong ID number to download data. Contacted the author to get the data.

To be added if we get the data

Systematic assessment of commercially available low-input miRNA library preparation kits https://www.biorxiv.org/content/10.1101/702456v1.full No data yet.

Processed data

Trimming was done with the smadann nexflow pipeline.

The following command was used for each study and type of data:

nextflow run mirtop/smadann --csv totrim.csv -c ../../om-profile.config --outdir trimmed -qs 10

Analysis was done with [bowtie] + [mirtop] in a [snakemake] file located in each study and data type.

snakemake -p -s run.snakefile

standard analysis of synthetic

Mirxplor reference was parsed to use only synthetic with an edit distance of 4 or more, and the alignments were filtered to keep only reads that mapped to those unique synthetic with a maximum of 4 changes. Code used for this is at analysis folder.

Data is available for anyone at aws mirtop space.

Currently contains: tewari, wrigth, kim and dsrg data.

biological samples

For human data we use miRBase22 to map all sequences. Same filtering step were used here.

Data is available for anyone at aws mirtop space.

Tools

bcbio smallRNA-seq pipeline + isomiRs - On charge Lorena Pantano
isomiR-SEA - On charge Gianvito Urgese
ChimiRa, miRge - On charge Marck Halushka
sRNAbench - On charge Michael Hackenberg
Prost - Thomas Desvignes
miRGe - Marc K. Halushka
(Add your tool here and person will do it)

Questions to address

Reproducibility of replicates
Reproducibility of protocols
Reproducibility of tools

Results

Updated report can be found here

Milestones:

Set up

Select random public data
Run with all the tools listed above
Put data in common space
Adapt output tools to GFF format

Random sample

Sample SRR5756178 is a whole blood small RNA-seq run from this manuscript https://academic.oup.com/nar/article/4080663 and is part of project PRJNA391912. It has ~ 2.8 million reads, of which ~2.6 million are miRNAs.

Synthetic data

Benchmark was done with synthetic isomiRs for one human miRNA, see results.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
analysis		analysis
config		config
metadata		metadata
r_code		r_code
results		results
.gitignore		.gitignore
README.md		README.md
meta_analysis_isomirs.Rproj		meta_analysis_isomirs.Rproj
minutes.md		minutes.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Draft

Goals

Data

Tewari data

Narry Kim data

Carrie Wrigth data

DSRG data

Fratta data

To be added if we get the data

Processed data

standard analysis of synthetic

biological samples

Tools

Questions to address

Results

Milestones:

Set up

Random sample

Synthetic data

About

Releases

Packages

Languages

miRTop/isomir_accuracy_meta_analysis

Folders and files

Latest commit

History

Repository files navigation

Draft

Goals

Data

Tewari data

Narry Kim data

Carrie Wrigth data

DSRG data

Fratta data

To be added if we get the data

Processed data

standard analysis of synthetic

biological samples

Tools

Questions to address

Results

Milestones:

Set up

Random sample

Synthetic data

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages