Skip to content

To reassign small contigs after manual grouping by small contig removal, rescue & optimization

Notifications You must be signed in to change notification settings

Youpu-Chen/small-contig-reassignment

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

Background

Complete phasing using Hi-C data is sometimes difficult owing to the low genetic divergence between haplotypes. For example, ALLHiC might group homologous contigs into a big cluster (Fig. 1a). Although these contigs could be further adjusted manually, the placement of small contigs is always boring and time-consuming (Fig. 1b).

Fig1

Figure 1

Aim & Strategy

Here is a simple strategy to reassign these small contigs after manual grouping (see Fig. 2a): (1) just remove small contigs using remove_small_contigs.py provided in this repo; (2) run ALLHiC_rescue to rescue removed contigs (reassignment); (3) run allhic optimize to determine the ordering and orientation.

This strategy may also have the poteintial for general global refinement.

Fig2

Figure 2

Result

As shown in Fig. 1c, after running this pipeline, further manual work is dramatically reduced.

Usage of remove_small_contigs.py

Help message:

$ ./remove_small_contigs.py -h
usage: remove_small_contigs.py [-h] [--fasta FASTA] [--counts COUNTS]
                               [--len_cutoff LEN_CUTOFF]
                               assembly

positional arguments:
  assembly              *.review.assembly (output file of juicebox manual
                        grouping), used to generate new prunning.clusters.txt

optional arguments:
  -h, --help            show this help message and exit
  --fasta FASTA         input fasta file of contigs, this parameter will
                        remove contigs not in .review.assembly, optional
  --counts COUNTS       input prunning.counts_RE.txt, this parameter will
                        remove contigs not in .review.assembly, optional
  --len_cutoff LEN_CUTOFF
                        length cutoff, default: 100 Kbp

Command:

$ ./remove_small_contigs.py groups.manual.review.assembly --fasta seq.fasta --counts prunning.counts_GATC.txt

About

To reassign small contigs after manual grouping by small contig removal, rescue & optimization

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%