About

Multi-SpaM is a program to infer phylogenies for a set of genomes. It is based on 4-sets of spaced words that are highly likely to represent true homologies. These blocks are evaluated with the Maximum-Likelihood method RAxML and the resulting trees of 4 sequences, or quartet trees, are amalgamated into a supertree (using the Quartet MaxCut tool). Since the number of blocks used is limited, it is suitable even for large datasets.

Additional information can be found in our paper. The paper has been published at the RECOMB-CG:

Dencker T., Leimeister CA., Gerth M., Bleidorn C., Snir S., Morgenstern B. (2018) Multi-SpaM: A Maximum-Likelihood Approach to Phylogeny Reconstruction Using Multiple Spaced-Word Matches and Quartet Trees. In: Blanchette M., Ouangraoua A. (eds) Comparative Genomics. RECOMB-CG 2018. Lecture Notes in Computer Science, vol 11183. Springer, Cham

Installation and Usage

Currently, multi-SpaM is only available on 64-bit linux distributions (due to the limitation of the Quartet MaxCut tool).

In order to install multi-SpaM simply use in the base directory:

$ make

The program itself can then be used via the python script multispam.py. Example usage:

$ python multispam.py -t <number of threads> -i <input file> -o <output file>

where the input file is a FASTA file containing multiple genomes. The output file will be a tree file in newick format.

Options

Option	Description
-i	Input file in FASTA format
-o	Output file in newick format
-w / -k	Weight of the pattern (i.e. the number of matching positions) [ can't be larger than 16 ]
-d	Number of don't care positions (i.e. the number of positions that don't have to match)
-t	Number of threads used
-n	Number of sampled blocks
--mem-save	Memory save mode (higher runtime, but much less RAM usage for larger files)

Tips:

In general, the parameters don't have to be changed. Only the number of threads, input and output need to be specified.
If the resulting trees seem unreasonable, you can try lowering the number of don't care positions to 50.
In case of large input files, it is recommended to increase the weight to 12 or even higher.
Also, if you have rather limited RAM, you can use the memory save mode. For input files larger than 200 mb or so, the required RAM will exceed 8 gb. With the memory saving mode, the RAM requirement could be reduced to 10.5 gb for a 4.8 gb dataset (doubling the runtime).
The number of sampled blocks doesn't have to be increased unless (potentially) for very large datasets.

License

This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. The full license text is available at http://gnu.org/licenses/gpl.html.

Some files may be licensed differently.

Contact

In case of bugs or unexpected errors don't hesitate to send me a mail: thomas.dencker@stud.uni-goettingen.de

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
bin		bin
src		src
.clang-format		.clang-format
.dockerignore		.dockerignore
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
COPYING		COPYING
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
multispam.py		multispam.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Installation and Usage

Options

License

Contact

About

Releases 1

Packages

Contributors 2

Languages

License

tdencker/multi-SpaM

Folders and files

Latest commit

History

Repository files navigation

About

Installation and Usage

Options

License

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages