Skip to content
forked from AbeelLab/ptolemy

Reference-free tool for the analysis of gene and structural diversity of microbial genome architectures via multiple genome alignment based on synteny analysis.

License

Notifications You must be signed in to change notification settings

jlobatop/ptolemy

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Overview

Ptolemy is a reference-free approach for analysing microbial genome architectures, particularly, to study gene and structural diversity. In a nutshell, it uses a "top-down" approach to align multiple genomes via synteny analysis. The output is a gene-based population genome graph describing genes and structural variants that are unique/shared across a population. It requires a set of FASTA-formatted-assemblies and corresponding GFF-formatted-annotations.

You can read more about it in our publication.

Experimental branch: bacterial-phage metagenomics

This is an experimental branch in an ongoing collaborative project for studying genome architectures of bacteria phages.

Aside from some optimizations, there is an experimental, standalone (noisy) long-read aligner. In essence: use Ptolemy to build gene-based population genome graphs of available bacteria-phage genomes, then align long-reads from a metagenomic sequencing run to identify existing/new architectures.

As an example, a graph of all available Pseudomonas genomes (146) from NCBI, followed by alignment of a barcoded sample from a metagenomics nanopore sequencing run generated by undergraduate bachelor students:

alt text

Executable JAR

Executable jar files are available under releases.

DEPENDENCIES

Ptolemy requires minimmap2 (uses it for performing pairwise gene-alignment during database creation and syntenic anchoring).

Running Ptolemy

Ptolemy requires a tab-delimited file containing unique sample identifier, path to assembly, and path to gene annotations. For example:

Genome1 path/to/assembly/genome1.fa path/to/annotations/genome1.gff
Genome2 path/to/assembly/genome2.fa path/to/annotations/genome2.gff
Genome3 path/to/assembly/genome3.fa path/to/annotations/genome3.gff

There are three main steps in Ptolemy:

  1. Database creation ( java -jar ptolemy.jar extract ... )
  2. Multiple-genome alignment via syntenic anchoring ( java -jar ptolemy.jar syntenic-anchors ... )
  3. Canonical graph construction ( java -jar ptolemy.jar canonical-quiver ... )

The experimental steps:

  1. Index canonical quiver ( java -jar ptolemy.jar index-graph ... )
  2. Long-read alignment ( java -jar ptolemy.jar align-reads ... )

A typical workflow:

#graph construction
java -jar ptolemy.jar extract -g genome_list.txt -o ptolemy_db
java -jar ptolemy.jar syntenic-anchors --db ptolemy_db -o  .
java -jar ptolemy.jar canonical-quiver -s syntenic_anchors.txt --db ptolemy_db -o .

#long-read alignment
java -jar ptolemy.jar index-graph -c canonical_quiver.gfa --db db/
java -jar ptolemy.jar align-reads -r reads.fa -c canonical_quiver.gfa --db db/ -o . -p alignment

The graph is stored as a GFA-formatted file and can be visualized via graph-visualizers such as Bandage.

Test-data available under 'testing_data' directory which contains full Pacbio assemblies of a single yeast chromosome from three genomes.

About

Reference-free tool for the analysis of gene and structural diversity of microbial genome architectures via multiple genome alignment based on synteny analysis.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Scala 97.2%
  • Java 2.8%