Derep Seqs

Dereplicate looooooong sequences!

If you want to get rid of duplicate long sequences (i.e. contigs that are exact substrings of some other contigs), derep_seqs is the tool for you!

Install

Download the source code (either with git clone or by downloading a release), cd into the source directory, and then use make to build it.

git clone https://github.com/mooreryan/derep_seqs.git
cd derep_seqs
make

This will install derep_seqs to the bin directory in the source directory. You can now move derep_seqs and sort_fasta to somewhere on your path if you'd like.

Usage

derep_seqs <num worker threads> <seqs.fasta> > seqs.derep.fa

Example

The fasta file must be sorted by increasing sequence length. The program sort_fasta (included in the bin directory) will do this for you.

$ bin/derep_seqs 10 <(bin/sort_fasta contigs.fasta) > contigs.derep.fa

That's it!

Error codes

0: Success
1: Argument error
2: Couldn't open a file
3: Error creating thread
4: Error joining thread

Versions

v0.1.0: First release
v0.2.0: Sort on decreasing seq length. Use greedy algorithm. Prefilter. Use hash3 instead of SSEF.
v0.3.0: Use hashing for prefiltering.
v0.4.0: Don't store hash vals...uses way less memory :) but it's slow again :(
v0.5.0: Use pthreads for multithreading!
v0.6.0: Make prefilter length a tunable option
v0.7.0: Use Rabin-Karp search for filtering

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Derep Seqs

Install

Usage

Example

Error codes

Versions

Files

README.md

Latest commit

History

README.md

File metadata and controls

Derep Seqs

Install

Usage

Example

Error codes

Versions