Skip to content

Releases: marbl/MashMap

v3.1.3

04 Jan 16:40
1a07d0e
Compare
Choose a tag to compare

Previously, the -f one-to-one filter plane-sweep filter was applied to all mappings at the same time. In cases where users are mapping multiple query genomes to one or more target sequences with the --skipPrefix # flag, the one-to-one filter would treat all query sequences as part of the same genome/group.

This patch makes it so that the one-to-one plane-sweep filter is applied to each pair of query and reference groups independently, ensuring that -n mappings are retained for each pair. A "group" of sequences is the set of sequences which contain the same prefix up until the last occurrence of the character c, where --skipPrefix c is specified.

MashMap v3.1.2

30 Nov 16:45
Compare
Choose a tag to compare
  • #64 - Fixes a bug where the kc tag was incorrect for chained mappings

MashMap v3.1.1

22 Aug 13:25
Compare
Choose a tag to compare
  • In order to maintain a default build that requires the same libraries as previous versions, building with htslib is now optional.
  • To build with htslib, call cmake with -DUSE_HTSLIB=ON
  • htslib is useful for the --targetList and --targetPrefix options, which allow users to only index specific contigs from a reference fasta file.

MashMap v3.1.0

21 Aug 20:31
Compare
Choose a tag to compare
  • When filtering matches, the "score" of a match no longer takes into account the length. Previously, the score for a mapping was len*ANI, meaning that a 1000bp mapping with 100% identity would be tossed out in favor of a 1112bp mapping with 90% identity.
  • Fixes a rare bug that caused a crash when the very first minmer in the index is a hit.
  • Fixes bug with --kmerThreshold CLI option which ignored users' argument in favor of 1.
  • Low complexity segments are tossed out before stage 1 mapping.
  • Mappings use 32-bit integers to store positions now instead of 64-bit integers. If you need mashmap to work with contigs larger than 2^31, you can pass -DLARGE_CONTIG=1 to CMake when building.
  • Reads shorter than the block length are now split, instead of being aligned in one piece.
  • Added --targetPrefix and --targetList CLI options, which allow the users to specify subsets of the reference file to be indexed. Requires htslib!
  • Added --lowerTriangular CLI option which only computes mappings between sequence i and sequence j if i > j (meant to be used when reference and query files are identical).
  • Limits the size of the DP filter so that large sketch sizes don't incur a huge setup time.

MashMap v3.0.6

10 Jul 18:14
4f4df5d
Compare
Choose a tag to compare

Changelog:

  • Uses the chaining algorithm from wfmash
  • --splitPrefix now performs the filtering on each prefix-group independently.
  • Does not sketch or winnow k-mers w/ ambiguous nucleotides.
  • Added kc:f tag for the estimated k-mer complexity, defined as the ratio of the estimated number of distinct k-mers in a segment to the total number of k-mers in a segment (this estimate can be greater than 1.0).
  • Added a flag --kmerComplexity x to filter out segments with estimated kmer complexity less than x.
  • Mapping progress now updates for each segment mapped, as opposed to each contig mapped.
  • Added --reportPercentage option to report ANI as a percentage instead of in [0, 1] range (necessary for use w/ wfmash)
  • Fixes #54
  • Does not split sequences smaller than the block length.
  • Added address sanitizer for Debug build

MashMap v3.0.5

28 Jun 05:37
c6978dd
Compare
Choose a tag to compare

Changelog:

  • Removed sanity-check filters that were actually dropping desired mappings
  • Sort query minmers upon recruitment using the heap as opposed to sorting for every stage 1 hit
  • Add -DPROFILE flag to compile w/ debug symbols and no inline (also removed inline keyword from some functions)
  • Cast jaccard to float now that it is no longer multiplied by 100.0.

MashMap v3.0.4

18 May 17:28
7ba5173
Compare
Choose a tag to compare
  • Add --legacy flag for MashMap2 style output
  • Add -v/--version flag
  • Output id and jc tags in [0,1] range instead of [0, 100]
  • Improves stderr header output

Conda package (and MacOS binary) to be added once Bioconda CI is fixed

MashMap v3.0.2

16 May 16:46
85347a7
Compare
Choose a tag to compare
  • Clarified block-length help string
  • Fixed bug for block-length filter
  • Removed some optimization flags

MashMap v3.0.1

12 Apr 16:18
Compare
Choose a tag to compare

MashMap3 Changelog

  • Instead of indexing locations of minimizers, we track indexing of windows for which a k-mer is one of the lowest s hashes in the window where s is the sketch size. These k-mers are termed "minmers."

  • The first-pass filtering stage computes the number of shared minmers for each candidate mapping in linear time. Regions with significantly high counts of shared minmers are passed on to stage 2.

  • The second stage of filtering, where the minhash score of each mapping in the candidate region is calculated, uses a std::vector to keep track of the rolling minhash score as opposed to the std::map used in MashMap2. The details can be seen in slidingMap.hpp.

  • While the mapping stage is faster, particularly for lower ANI cutoffs (90% and below), the indexing stage does require a bit more time than before. To avoid spending time recomputing the index, users can save the index via --saveIndex PREFIX, and then reuse it in a later run with --loadIndex PREFIX.

  • The default parameter for the sketch size depends on the value of the minimum ANI threshold (pi) and the segment length (L). Decreasing the sketch size will decrease runtime in a linear fashion at the cost of increasing the variance in the ANI estimation error.

  • Frequent seeds are filtered out based on how many minmer-intervals they have as opposed to how many times the kmer actually occurs in the reference. This adds some noise to frequent-kmer filtering, as its possible for a less frequent kmer to have more intervals than a more frequent kmer.

  • The binomial model is used to estimate ANI from Jaccard instead of the Poisson model.

  • k-mer size is no longer limited to <=16, as the hash values are 64 bits instead of 32 bits. The default kmer size is now 19.

  • Numerous interface updates were copied over from wfmash, including a progress meter and usage of the samtools .fai index.

  • The output of MashMap3 is now in PAF format, with id and jc tags which represent the estimated ANI and the estimated Jaccard similarity, respectively. The jc tag is only present for mappings where chaining is disabled.

  • There is now an option for significantly denser sketching, --dense

MashMap v2.0

03 Feb 21:32
Compare
Choose a tag to compare

Now generalized for computing approximate local alignments between long DNA sequences. This will be useful for fast genome to genome mapping or split-read mapping of long reads.