Skip to content

Latest commit

 

History

History
75 lines (57 loc) · 3.19 KB

USAGE.md

File metadata and controls

75 lines (57 loc) · 3.19 KB
  • Run bash wrapper script

    pavfinder genome [`find_sv_genome.py` parameters]
    pavfinder fusion [`find_sv_transcriptome` parameters]
    pavfinder splice [`map_splice` parameters]
    
  • Run PV to detect genomic structural variants (translocations, inversions, duplications, insertions, deletions, etc)

    find_sv_genome.py <contigs_to_genome.bam> <contigs_fasta> <genome_fasta> <outdir> --r2c <reads_to_contigs.bam>
    
  • To generate transcripts fasta for detecting transcriptome structural variants

    extract_transcript_sequence.py <tabix-indexed gtf> <output transcripts fasta> <reference genome> --index --only_longest
    
    • GTF file needs to be gzipped and indexed with Tabix
    • GTF and reference genome must have the same chromosomes
    • To create BWA index on fasta for generating contigs-to-transcripts(c2t) bam file, adding --index
  • Run PV to detect transcriptome structural variants (fusions, read-throughs, ITDs, PTDs, InDels)

    find_sv_transcriptome.py --gbam <contigs_to_genome_bam> --tbam <contigs_to_transcripts_bam> --transcripts_fasta <indexed_transcripts_fasta> --genome_index <GMAP index genome directory and name> --r2c <reads_to_contigs_bam> <contigs_fasta> <gtf> <genome_fasta> <outdir>
    
  • Run PV to detect novel splice variants (exon_skipping, novel_exon, novel_intron, novel_donor, novel_acceptor, retained_intron)

    map_splice.py <contigs_to_genome_bam> <contigs_fasta> <gtf> <genome_fasta> <outdir> --r2c <reads_to_contigs_bam> [--suppl_annot supplmental.gtf.gz] [--genome_bam genome.bam]
    
  • Run full (assembly + analysis) TAP or TAP2 in targeted mode

    tap <sample> <outdir> --bf <target_genes.bf> --fq_list <file_listing_FASTQ_pairs> --k <space-delimited k values> --readlen <read_length>  --nprocs <number_of_processes> --params <parameters_file>
    
    tap2 <sample> <outdir> --bf <target_genes.bf> --fq_list <file_listing_FASTQ_pairs> --readlen <read_length>  --nprocs <number_of_processes> --params <parameters_file>
    
  • Run full (assembly + analysis) TAP for entire transcriptome

    tap <sample> <outdir> --fq_list <file_listing_FASTQ_pairs> --k <space-delimited k values> --readlen <read_length> --nprocs <number_of_processes> --params <parameters_file>
    
  • Run TAP for just de novo assembly

    tap <sample> <outdir> --fq_list <file_listing_FASTQ_pairs> --k <space-delimited k values> --readlen <read_length> --nprocs <number_of_processes> --only_assembly
    
  • Run fusion-bloom for fusion calling

    source <fusion-bloom.profile>
    fusion-bloom profile=<fusion-bloom.profile> left=<fastq.gz> right=<fastq.gz> readlen=<read_length> outdir=<outdir> name=<prefix>
    

    example <fusion-bloom.profile>:

    export NUM_THREADS=12
    export SAMTOOLS_SORT_MEM=20G
    export GENOME=hg19
    export GMAPDB=/path/to/gmapdb_sarray/hg19
    export GTF=/path/to/gencode.v26.annotation.transcripts.sorted.gtf.gz
    export TRANSCRIPTS_FASTA=/path/to/gencode.v26.annotation.transcripts.sorted.gtf.fa
    export GENOME_FASTA=/path/to/hg38.fa
    export RNABLOOM_PARAMS='-fpr 0.005 -chimera -extend -tiplength 5'
    export PAVFINDER_PARAMS='--only_fusions --include_non_exon_bound_fusion --min_support 2'