Skip to content

Commit

Permalink
Merge pull request #10 from mortazavilab/supplementary_figures
Browse files Browse the repository at this point in the history
supplementary figure updates
  • Loading branch information
MuhammedHasan committed Jan 11, 2023
2 parents daea519 + 7ae347e commit 1530627
Show file tree
Hide file tree
Showing 87 changed files with 3,108 additions and 988 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -142,4 +142,5 @@ data/
!tests/data

.snakemake
*~
*~
logs/*
10 changes: 8 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
[![codecov](https://codecov.io/gh/mortazavilab/lapa/branch/master/graph/badge.svg?token=MJQ88T8JWK)](https://codecov.io/gh/mortazavilab/lapa)
[![Documentation Status](https://readthedocs.org/projects/lapa/badge/?version=latest)](https://lapa.readthedocs.io/en/latest/?badge=latest)

Alternative polyadenylation detection from diverse data sources such as 3'-seq, long-read and short-reads.
Alternative polyadenylation detection from diverse data sources such as 3'-seq, long-read and short-reads.

![method](docs/method.png)

Expand Down Expand Up @@ -146,5 +146,11 @@ Colab tutorials (analysis of myoblast myotube cell differentiation): https://col
If you are using LAPA on academic studies cite the following paper:

```
coming soon...
@article{celik2022analysis,
title={Analysis of alternative polyadenylation from long-read or short-read RNA-seq with LAPA},
author={Celik, Muhammed Hasan and Mortazavi, Ali},
journal={bioRxiv},
year={2022},
publisher={Cold Spring Harbor Laboratory}
}
```
15 changes: 7 additions & 8 deletions configs/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ encode:
fastq_read_count: data/results/encode/read_count/{encode_id}_fastq.txt
fasta_read_count: data/results/encode/read_count/{encode_id}_fasta.txt
bam_read_count: data/results/encode/read_count/{encode_id}_bam.txt
bam_aligned_len: data/results/encode/aligned_len/{encode_id}_bam.txt

minimap:
sam: data/results/minimap/sam/{encode_id}.sam
Expand All @@ -118,11 +119,6 @@ talon:
abundance_corrected: data/results/talon/{library_prep}_{platform}_{counting}_abundance_filtered.tsv

c2c12:
read_annot_bulk_sc: data/resources/c2c12/c2c12_bulk_sc_talon_read_annot.tsv
read_annot_bulk: data/resources/c2c12/c2c12_bulk_talon_read_annot.tsv
read_annot_sc: data/resources/c2c12/c2c12_sc_talon_read_annot.tsv
abundance: data/resources/c2c12/bulk_talon_abundance.tsv

bam:
myoblast:
- ENCFF772LYG
Expand All @@ -131,7 +127,10 @@ c2c12:
- ENCFF699KOR
- ENCFF731HHB

lapa_bulk_dir: data/results/c2c12/lapa/bulk/
lapa_bulksc_dir: data/results/c2c12/lapa/bulksc/

config: data/results/c2c12/lapa_config.csv
lapa_dir: data/results/c2c12/lapa/
diff_polya: data/results/c2c12/diff_polya.csv

stats: data/results/c2c12/c2c12_stats.csv

abundance: data/resources/c2c12/c2c12_bulk_talon_abundance.tsv
2 changes: 2 additions & 0 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ dependencies:
- snakemake
- pip:
- pdf2image
- openpyxl
- biopython
- jupyter
- notebook
Expand All @@ -26,5 +27,6 @@ dependencies:
- adjustText
- patchworklib
- sklearn
- more_itertools
- git+https://github.com/nboley/idr.git
- -e .
Binary file modified reports/figures/benchmark_pr_curve.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added reports/figures/benchmark_pr_curve_counting.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified reports/figures/c2c12_boxplot_gene.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified reports/figures/c2c12_heatmap.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added reports/figures/c2c12_scatterplot_gene.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified reports/figures/c2c12_volcona.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified reports/figures/correction/R2C2_ONT_end_overlap_annotation_both.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified reports/figures/correction/cDNA_ONT_end_overlap_annotation_both.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified reports/figures/correction/correction_percentage.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified reports/figures/correction/correction_tss_percentage.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified reports/figures/hist_read_counts.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added reports/figures/pas/replication_rate.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified reports/figures/percentage_of_reads_tes.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added reports/figures/plot_cluster_annotation.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added reports/figures/read_aligned_len.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified reports/figures/read_nums_datasets.png
Binary file modified reports/paper_figures/figure-2.png
Binary file modified reports/paper_figures/figure-4.png
Binary file modified reports/paper_figures/figure-4_low.png
Binary file modified reports/paper_figures/figure-5.png
Binary file modified reports/paper_figures/figure-5_low.png
Binary file modified reports/paper_figures/figure-6.png
Binary file modified reports/paper_figures/figure-6_low.png
Binary file added reports/paper_supp_figures/supp_figure-1.png
Binary file modified reports/paper_supp_figures/supp_figure-2.png
Binary file modified reports/paper_supp_figures/supp_figure-2_low.png
Binary file added reports/paper_supp_figures/supp_figure-4.png
Binary file added reports/paper_supp_figures/supp_figure-5.png
Binary file modified reports/paper_supp_figures/supp_figure-6.png
Binary file modified reports/paper_supp_figures/supp_figure-7.png
Binary file added reports/paper_supp_figures/supp_figure-8.png
Binary file added reports/supp_table.xlsx
Binary file not shown.
7 changes: 7 additions & 0 deletions reports/tables/benchmark_pr_curve.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Recall,Precision,Data Source
0.8521647307286166,0.6820919175911252,cDNA PacBio
0.48943754951148666,0.738593345287906,CapTrap PacBio
0.40644820295983086,0.8338303063160748,cDNA ONT
0.3554821664464993,0.7468776019983348,CapTrap ONT
0.5132170235263018,0.6988840892728582,R2C2 ONT
0.813993399339934,0.786981493299298,dRNA ONT
15 changes: 15 additions & 0 deletions reports/tables/cluster_annotation_counts.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
three_prime_utr,intron,exon,Data Source
0.9766783762685401,0.01707650273224044,0.00624512099921936,"cDNA
PacBio"
0.9558885904142513,0.02530750742259296,0.018803902163155663,"CapTrap
PacBio"
0.9490369661442757,0.028950831439167365,0.022012202416557006,"cDNA
ONT"
0.8914192761285076,0.06560932628439745,0.04297139758709503,"CapTrap
ONT"
0.982239247779906,0.009054501131812642,0.008706251088281386,"R2C2
ONT"
0.9872727272727273,0.00787878787878788,0.0048484848484848485,"dRNA
ONT"
0.8766264304749961,0.09648847781783979,0.026885091707164133,"Quantseq
Illumina"
13 changes: 7 additions & 6 deletions reports/tables/median_tail_lengths.csv
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
PacBio cDNA,35
PacBio CapTrap,16
ONT cDNA,17
ONT CapTrap,12
ONT R2C2,29
ONT dRNA,5
,Data Source,median_tail_length
0,PacBio cDNA,35
1,PacBio CapTrap,16
2,ONT cDNA,17
3,ONT CapTrap,12
4,ONT R2C2,29
5,ONT dRNA,5
14 changes: 7 additions & 7 deletions reports/tables/percentage_of_reads_tes.csv
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
data source,"% of read in proximity
of annotated TES"
cDNA PacBio,0.344148133904957
CapTrap PacBio,0.44965999493700437
cDNA ONT,0.27175495923795706
CapTrap ONT,0.33365620190367457
R2C2 ONT,0.30707398255387813
dRNA ONT,0.34497714958945036
Quantseq3,0.3442720499153513
cDNA PacBio,0.3461727396681412
CapTrap PacBio,0.45726749846077747
cDNA ONT,0.2735622217359923
CapTrap ONT,0.3367984027217065
R2C2 ONT,0.3058779598719411
dRNA ONT,0.34217521863269224
Quantseq3,0.34506585784256794
7 changes: 7 additions & 0 deletions reports/tables/read_aligned_len.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
data source,count,mean,std,min,25%,50%,75%,max
ONT CapTrap,38118062,738,418,80,465,587,897,8327
ONT R2C2,1984417,2701,769,80,2174,2536,3070,9001
ONT cDNA,27433659,912,647,80,482,682,1179,11007
ONT dRNA,2730704,1200,966,81,551,884,1530,16744
PacBio CapTrap,5773371,1042,596,80,539,894,1390,7518
PacBio cDNA,7147240,2525,1482,80,1463,2241,3329,20313
22 changes: 22 additions & 0 deletions reports/tables/supp_table1.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
\begin{tabular}{lll}
\toprule
data source & read & count \\
\midrule
Illumina RNA-seq & All Reads & 34,940,279 ± 3,565,959 \\
Illumina RNA-seq & Poly(A) usable reads & 66,276 ± 9,391 \\
ONT CapTrap & All Reads & 16,967,434 ± 2,732,784 \\
ONT CapTrap & Poly(A) usable reads & 12,706,021 ± 2,141,450 \\
ONT R2C2 & All Reads & 663,342 ± 110,955 \\
ONT R2C2 & Poly(A) usable reads & 656,352 ± 110,504 \\
ONT cDNA & All Reads & 17,064,845 ± 6,345,987 \\
ONT cDNA & Poly(A) usable reads & 9,144,553 ± 3,696,101 \\
ONT dRNA & All Reads & 996,143 ± 722,993 \\
ONT dRNA & Poly(A) usable reads & 902,966 ± 653,236 \\
PacBio CapTrap & All Reads & 2,133,211 ± 340,192 \\
PacBio CapTrap & Poly(A) usable reads & 1,906,174 ± 323,949 \\
PacBio cDNA & All Reads & 2,474,974 ± 698,776 \\
PacBio cDNA & Poly(A) usable reads & 2,373,264 ± 703,259 \\
Quantseq3 & All Reads & 105,423,691 ± 16,713,839 \\
Quantseq3 & Poly(A) usable reads & 26,515,255 ± 3,443,010 \\
\bottomrule
\end{tabular}
12 changes: 12 additions & 0 deletions reports/tables/supp_table2.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
\begin{tabular}{lrrrrrrrr}
\toprule
data source & count & mean & std & min & 25\% & 50\% & 75\% & max \\
\midrule
ONT CapTrap & 38118062 & 738 & 418 & 80 & 465 & 587 & 897 & 8327 \\
ONT R2C2 & 1984417 & 2701 & 769 & 80 & 2174 & 2536 & 3070 & 9001 \\
ONT cDNA & 27433659 & 912 & 647 & 80 & 482 & 682 & 1179 & 11007 \\
ONT dRNA & 2730704 & 1200 & 966 & 81 & 551 & 884 & 1530 & 16744 \\
PacBio CapTrap & 5773371 & 1042 & 596 & 80 & 539 & 894 & 1390 & 7518 \\
PacBio cDNA & 7147240 & 2525 & 1482 & 80 & 1463 & 2241 & 3329 & 20313 \\
\bottomrule
\end{tabular}
12 changes: 12 additions & 0 deletions reports/tables/supp_table3.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
\begin{tabular}{rlr}
\toprule
Unnamed: 0 & Data Source & median\_tail\_length \\
\midrule
0 & PacBio cDNA & 35 \\
1 & PacBio CapTrap & 16 \\
2 & ONT cDNA & 17 \\
3 & ONT CapTrap & 12 \\
4 & ONT R2C2 & 29 \\
5 & ONT dRNA & 5 \\
\bottomrule
\end{tabular}
13 changes: 13 additions & 0 deletions reports/tables/supp_table4.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
\begin{tabular}{rrrl}
\toprule
three\_prime\_utr & intron & exon & Data Source \\
\midrule
0.976678 & 0.017077 & 0.006245 & cDNA\textbackslash nPacBio \\
0.955889 & 0.025308 & 0.018804 & CapTrap\textbackslash nPacBio \\
0.949037 & 0.028951 & 0.022012 & cDNA\textbackslash nONT \\
0.891419 & 0.065609 & 0.042971 & CapTrap\textbackslash nONT \\
0.982239 & 0.009055 & 0.008706 & R2C2\textbackslash nONT \\
0.987273 & 0.007879 & 0.004848 & dRNA\textbackslash nONT \\
0.876626 & 0.096488 & 0.026885 & Quantseq\textbackslash nIllumina \\
\bottomrule
\end{tabular}
13 changes: 13 additions & 0 deletions reports/tables/supp_table5.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
\begin{tabular}{lr}
\toprule
data source & \% of read in proximity\textbackslash nof annotated TES \\
\midrule
cDNA PacBio & 0.346173 \\
CapTrap PacBio & 0.457267 \\
cDNA ONT & 0.273562 \\
CapTrap ONT & 0.336798 \\
R2C2 ONT & 0.305878 \\
dRNA ONT & 0.342175 \\
Quantseq3 & 0.345066 \\
\bottomrule
\end{tabular}
13 changes: 13 additions & 0 deletions reports/tables/supp_table6.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
\begin{tabular}{lr}
\toprule
Data Source & threshold \\
\midrule
cDNA PacBio & 6 \\
CapTrap PacBio & 5 \\
cDNA ONT & 9 \\
CapTrap ONT & 5 \\
R2C2 ONT & 6 \\
dRNA ONT & 6 \\
Quantseq Illumina & 12 \\
\bottomrule
\end{tabular}
12 changes: 12 additions & 0 deletions reports/tables/supp_table7.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
\begin{tabular}{rrl}
\toprule
Recall & Precision & Data Source \\
\midrule
0.852165 & 0.682092 & cDNA PacBio \\
0.489438 & 0.738593 & CapTrap PacBio \\
0.406448 & 0.833830 & cDNA ONT \\
0.355482 & 0.746878 & CapTrap ONT \\
0.513217 & 0.698884 & R2C2 ONT \\
0.813993 & 0.786981 & dRNA ONT \\
\bottomrule
\end{tabular}
12 changes: 12 additions & 0 deletions reports/tables/supp_table8.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
\begin{tabular}{rrl}
\toprule
Uncorrected & Corrected & Data Source \\
\midrule
0.598391 & 0.918254 & cDNA ONT \\
0.686895 & 0.875743 & CapTrap ONT \\
0.776502 & 0.962058 & R2C2 ONT \\
0.552042 & 0.908499 & dRNA ONT \\
0.609074 & 0.960579 & cDNA PacBio \\
0.767461 & 0.920187 & CapTrap PacBio \\
\bottomrule
\end{tabular}
12 changes: 12 additions & 0 deletions reports/tables/supp_table9.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
\begin{tabular}{rrl}
\toprule
Uncorrected & Corrected & Data Source \\
\midrule
0.664829 & 0.906510 & cDNA ONT \\
0.633491 & 0.928476 & CapTrap ONT \\
0.716032 & 0.959905 & R2C2 ONT \\
0.747163 & 0.979217 & dRNA ONT \\
0.735696 & 0.971293 & cDNA PacBio \\
0.703576 & 0.964382 & CapTrap PacBio \\
\bottomrule
\end{tabular}
7 changes: 7 additions & 0 deletions reports/tables/tes_correction.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Uncorrected,Corrected,Data Source
0.5983907354489397,0.9182544041231887,cDNA ONT
0.6868952898450279,0.8757425177061915,CapTrap ONT
0.776502470907062,0.9620583787746873,R2C2 ONT
0.5520416818126742,0.908498525276082,dRNA ONT
0.6090743602226785,0.960579379983376,cDNA PacBio
0.7674612771347958,0.9201874205844981,CapTrap PacBio
8 changes: 8 additions & 0 deletions reports/tables/threshold_min_read_replication.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Data Source,threshold
cDNA PacBio,6
CapTrap PacBio,5
cDNA ONT,9
CapTrap ONT,5
R2C2 ONT,6
dRNA ONT,6
Quantseq Illumina,12
7 changes: 7 additions & 0 deletions reports/tables/tss_correction.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Uncorrected,Corrected,Data Source
0.6648292976387427,0.9065100756199569,cDNA ONT
0.6334913550127989,0.9284755540324423,CapTrap ONT
0.7160316701206227,0.9599046877147963,R2C2 ONT
0.7471626479260067,0.9792166815282255,dRNA ONT
0.7356962936308025,0.971292676169073,cDNA PacBio
0.7035757471065814,0.9643821473951716,CapTrap PacBio
3 changes: 2 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
requirements = [
'setuptools',
'tqdm',
'numpy<=1.23',
'click',
'pandas',
'pybigwig',
Expand All @@ -26,7 +27,7 @@

setup(
name='lapa',
version='0.0.4',
version='0.0.5',

author="M. Hasan Çelik",
author_email='muhammedhasancelik@gmail.com',
Expand Down
12 changes: 12 additions & 0 deletions slurm/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
jobs: 16
latency-wait: 60
restart-times: 0
keep-going: True
cluster:
sbatch
--account=seyedam_lab
--cluster-constraint=fastscratch3
--cpus-per-task={threads}
--mem={resources.mem_gb}G
--job-name=smk-{rule}
--output=logs/slurm-{rule}_%j.out
10 changes: 10 additions & 0 deletions workflow/benchmark/Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,14 @@ rule benchmark_lapa:
platform=['ONT'], counting=['end'],
library_prep=config['long_read']['ONT'])
],
lapa_tail_dir = [
*expand(config['lapa']['lapa_dir'],
platform=['PacBio'], counting=['tail'],
library_prep=config['long_read']['PacBio']),
*expand(config['lapa']['lapa_dir'],
platform=['ONT'], counting=['tail'],
library_prep=['R2C2'])
],
quantseq = expand(config['lapa']['lapa_dir'],
platform='Illumina', counting='end',
library_prep='quantseq')
Expand All @@ -16,6 +24,8 @@ rule benchmark_lapa:
mem_gb = 4
output:
pr_curve_plot = 'reports/figures/benchmark_pr_curve.png',
pr_table = 'reports/tables/benchmark_pr_curve.csv',
pr_curve_counting = 'reports/figures/benchmark_pr_curve_counting.png',
heatmap = 'reports/figures/overlap_heatmap.png'
notebook:
'./benchmark_lapa.ipynb'
Expand Down
Loading

0 comments on commit 1530627

Please sign in to comment.