use seqkit for nonredundant prodigal #34

tijeco · 2021-09-26T23:17:17Z

The current protocol uses pandas, which is pretty memory intensive, and probably won't scale amazingly, unless swapping to some other pandas big data version protocol thing. I think it may be best to just use seqkit. Currently the expanded nonredundant file has ,pephash,sample,contig,start,stop,strand,allStandardAA,seq, this can be handled usin seqkit fx2tab with seq-hash, then plug that into seqkit tab2fx with the hash as the header. What we have is fine for now, but this will definitely be needed when scaling. Honestly, at that point we should probably also use seqkit to split the nr data into max_threads number of files for parallelization (though that is an entirely different issue)

The text was updated successfully, but these errors were encountered:

tijeco added the enhancement New feature or request label Oct 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use seqkit for nonredundant prodigal #34

use seqkit for nonredundant prodigal #34

tijeco commented Sep 26, 2021

use seqkit for nonredundant prodigal #34

use seqkit for nonredundant prodigal #34

Comments

tijeco commented Sep 26, 2021