Skip to content

Intermediates and Filtering

Keiran Raine edited this page Jan 17, 2022 · 5 revisions

After the initial group calling the BRASS flow progressively filters the groups down via several methods.

Some of the intermediate files are available in *.intermediates.tar.gz ordered as follows:

  1. .groups.gz - raw brass group output
  2. .groups.filtered.bedpe
  3. .groups.clean.bedpe

groups.gz

This is the raw output from the grouping algorithm:

  • cols 1-4 LOW chr, strand, start, stop
  • cols 5-8 HIGH chr, strand, start, stop
  • N columns of read counts from each sample (see #NSAMPLES in header) ordered as per the header
  • Hits repeat (. unless repeat filtering was used)
  • N columns of read NAMES from each sample (see #NSAMPLES in header) ordered as per the header

Intermediate files

The *.brass.intermediates.tar.gz contains files that are useful for debugging and deep investigation.

Content as displayed via tar ztf *.brass.intermediates.tar.gz.

  • ${T} = Tumour
  • ${N} = Normal

Many files are extended versions of the window GC reference input bed file (tagged as WGC-fmt in table). Format as follows:

Column Description
1 Chromosome/contig
2 start (0-based)
3 end (1-based)
4 b.p. of non N sequence in window
5 Fraction of bases GC (gc_bp / non_n_bp), NA when col-4 is 0

File listing, ordered by creation:

File Description
intermediates/samp_stats.txt Purity/ploidy, male/female status. Inputs provided at execution
intermediates/${T}_vs_${N}.groups.gz Primary grouping with normal panel filtering
intermediates/${T}_vs_${N}.groups.filtered.bedpe Groups passing basic blat and read support filtering.
intermediates/${T}.insert_size_distr Corrected insert size distribution using samtools view -f 66 -F 3868 as filter based on chr5/5
intermediates/${T}_vs_${N}.ngscn.abs_cn.bg.gz BedGraph version of absolute copynumber
intermediates/${T}_vs_${N}.ngscn.segments.abs_cn.bg.gz Segmented absolute copynumber
intermediates/${N}.ngscn.bed.gz Normal: WGC-fmt + count of properly paired reads
intermediates/${N}.ngscn.fb_reads.bed.gz Normal: As intermediates/${N}.ngscn.bed.gz + reads on same strand and contig (foldback).
intermediates/${T}.ngscn.bed.gz Tumour: WGC-fmt + count of properly paired reads
intermediates/${T}.ngscn.fb_reads.bed.gz Tumour: As intermediates/${N}.ngscn.bed.gz + reads on same strand and contig (foldback).
intermediates/${T}_vs_${N}.is_fb_artefact.txt List of event IDs considered to be fold-back artefacts (metropolis_hastings_inversions.R).
intermediates/${T}_vs_${N}.r2 filter_small_deletions_and_fb_artefacts.R
intermediates/${T}_vs_${N}.r3 Identifies groups that should be merged
intermediates/${T}_vs_${N}.r4 Corrects breackpoints using clipped reads (get_abs_bkpts_from_clipped_reads.pl)
intermediates/${T}_vs_${N}.r5[.scores] Filter events due to microbial or viral sequences (filter_with_microbes_and_remapping.pl)
intermediates/${T}_vs_${N}.ngscn.abs_cn.bg.rg_cns.gz Combined segmentation of tumour/normal taking into account cent/telo and purity data. (get_rg_cns.R)
intermediates/${T}_vs_${N}.r6 Adds flag where event hits a copy-number change as defined in intermediates/${T}_vs_${N}.ngscn.abs_cn.bg.rg_cns.gz
intermediates/${T}_vs_${N}.cn_filtered Filtered version of r5, downstream input
intermediates/${T}_vs_${N}.groups.clean.bedpe Annotates translocations, occurrences, copynumber changepoints, L v H range blat scores
intermediates/${T}_vs_${N}.inversions.pdf Plot showing inversions, insert size distribution and bad groupings. Intended for debugging.
Clone this wiki locally