Get L. pneumophila ST from long-read or hybrid assemblies.


ONTmompS is a tool to perform in silico Sequence Based Typing (SBT) of Legionella pneumophila long-read/hybrid assemblies.

Quick usage

python -a assembly.fasta


Clone this repository and install the dependencies. We recommend installing in a conda (mamba) environment:

git clone
mamba create -n ontmomps_env -c bioconda -c conda-forge pandas blast emboss biopython


This tool was built as an in silico approach to identify the Sequence Type (ST) of Legionella pneumophila genomes from long-read or hybrid assemblies. It first identifies the mompS1 and mompS2 alleles and then assigns allele numbers and ST. We recommend using long-read or hybrid assemblies that have circular chromosomes when running this tool. It is not intended for short-read assemblies.


Sequence-based typing (SBT) of Legionella pneumophila is a valuable tool in epidemiological studies and outbreak investigations of Legionnaires’ disease. In the L. pneumophila SBT scheme, mompS2 is one of seven genes that determine the ST. The Legionella genome typically contains two copies of mompS (designated mompS1 and mompS2). When they are non-identical, it can be challenging to determine the mompS2 allele, and subsequently the ST, from Illumina sequences, due to the short read-length. Using long-read sequencing from Oxford Nanopore Technologies (ONT) Kit12/Kit9 chemistry and R10.4/R9.4.1 flow cells, together with Trycycler v0.5.3 and Medaka v1.7.2 for long-read assembly and polishing, we were able to identify the mompS2 allele and subsequently the L. pneumophila ST of 81/81 genomes when using this tool.


If you use ONTmompS, please cite the paper: Krøvel AV, Hetland MAK, Bernhoff E, et al. Long-read sequencing for reliably calling the mompS allele in Legionella pneumophila sequence-based typing. Front. Cell. Infect. Microbiol (2023)


With default settings, two output files are created: LpST_ONTmompS.tsv reports the ST and allele numbers of the 7 SBT loci. mompS_alleles_ONTmompS.tsv reports the mompS alleles. The ST and alleles will be annotated if there are mismatching or missing alleles:

  • A complete ST is reported if allele matches are found to all seven SBT genes in the database (e.g. ST560).
  • If there are < 3 inexact matches, the nearest matching ST with the number of locus variants (LVs) is reported, e.g. ST560-1LV.
  • For allele matches with <100% sequence identity, the nearest matching allele is noted with "*"
  • For incomplete coverage of an allele, the nearest matching allele is noted with "?"
  • For loci with no allele matches, sequence identity <90% or sequence coverage <80%, the allele number is reported as "-"

It is possible to store the allele sequences to files, using the flags: store_novel_alleles to store only novel allele sequences, store_mompS_alleles to store the mompS allele sequences, or store_all_alleles to store all allele sequences. The files will by default be placed in a folder named ONTmompS_allele_sequences.

Full usage

usage: [-h] [-v] -a ASSEMBLIES [ASSEMBLIES ...] [--db DB]
                   [--store_mompS_alleles] [--store_novel_alleles]
                   [--store_all_alleles] [--verbose] [-l LOG]
                   [--ST_outfile ST_OUTFILE] [--mompS_outfile MOMPS_OUTFILE]
                   [-o OUTDIR]

In silico SBT of Legionella pneumophila from long-read or hybrid assemblies

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit

Input options (required):
                        FASTA file(s) for assemblies (*.fasta)

Optional flags:
  --db DB               Provide a path to database location if different than
                        that provided by this tool.
                        Print mompS alleles to files named
                        Print novel alleles to files named
  --store_all_alleles   Print all alleles (7 genes in SBT scheme + mompS1) to
                        files named {assembly}_{allele}.fna.
  --verbose             Log more details and keep intermediate files for
  -l LOG, --log LOG     Write logging to specified file name instead of stdout

Output options:
  --ST_outfile ST_OUTFILE
                        Output filename for STs. Default: ./LpST_ONTmompS.tsv
  --mompS_outfile MOMPS_OUTFILE
                        Output filename for mompS copy allele numbers.
                        Default: ./mompS_alleles_ONTmompS.tsv
  -o OUTDIR, --outdir OUTDIR
                        Output directory to store novel alleles in. Default is
                        current working directory

Update database

The database in this repository is the same version as that in Please contact the Legionella-SBT team at UKHSA if you want to obtain a more recent database version. When you have your desired database, you need to make sure the files follow the same format as those in the db for this repo. If the sequences are provided in csv format with 'sequence,number', you can run the commands below to make the database compatible with ONTmompS:

unzip ; cd sbt_schema_10_11_2022_12_59 
cat sbt.csv | sed 's/st/ST/g' | sed 's/,/\t/g' >> lpneumophila.txt ; rm sbt.csv 
cat neuAh.csv >> neuA.csv ; rm neuAh.csv 
for f in $(ls *.csv | sed 's/.csv//g') ; do paste <(cat ${f}.csv | cut -d"," -f2 | sed "s/^/>${f}_/g" ) <(cat ${f}.csv | cut -d"," -f1) |  sed 's/\t/\n/g' | grep -v "number\|sequence" >> ${f}.fna ; done ; rm *.csv

Once you have converted the database files, you can either 1) specify the path to your new db with the flag --db /path/to/db or 2) move your new db files to this repo's db directory (i.e. mv *.fna lpneumophila.txt /path/to/ONTmompS/db/).