Skip to content

ajm/swiftlink

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SwiftLink: Parallel MCMC linkage analysis

SwiftLink performs multipoint parametric linkage analysis on large consanguineous pedigrees and is primarily targeted at pedigrees that cannot be analysed by a Lander-Green algorithm based program, i.e. many markers, but larger pedigrees. The current version of SwiftLink only supports SNP markers.

The SwiftLink source code is licensed under the GPLv3.

If you use SwiftLink in your work please cite Medlar et al, 2013

@article{medlar2013swiftlink,
  title={SwiftLink: parallel MCMC linkage analysis using multicore CPU and GPU},
  author={Medlar, Alan and G{\l}owacka, Dorota and Stanescu, Horia and Bryson, Kevin and Kleta, Robert},
  journal={Bioinformatics},
  volume={29},
  number={4},
  pages={413--419},
  year={2013},
  publisher={Oxford Univ Press}
}

Contact

If you have an questions about installing or running SwiftLink, please contact Alan Medlar

Installation

SwiftLink's only mandatory dependency is GNU scientific library for the Mersenne Twister pseudo random number generator. Optionally, SwiftLink can be compiled with CUDA support.

Download source code:

git clone git://github.com/ajm/swiftlink.git

Build without CUDA support:

cd swiftlink/src
make

Build with CUDA support:

cd swiftlink/src
make -f Makefile.cuda

Build under Mac OS (using homebrew for dependencies):

brew install gsl libiomp clang-omp
cd swiftlink/src
make -f Makefile.macos

Input Files

SwiftLink expects three input files: pedigree file, map file and locus data file, in LINKAGE format. We have mostly tested it on input files generated by Alohomora for Allegro.

Update: If you are using Mega2 to generate your input files, you must select "Allegro Format" to generate compatible pedigree and locus data files. Mega2 seems to confirm that the examples I have written (found in the example directory) when used as input is in LINKAGE format, but selecting "Linkage Format" as the output format generates something incompatible.

CUDA versions

The current version of SwiftLink has been tested on Linux with CUDA version 7.5 (tested Sept. 2016).

Older versions of SwiftLink are known to not work properly with CUDA versions 4.1 and 4.2 due to a known slow down bug in cudaMalloc (anecdotally, CUDA 4.0 in 32-bit mode did not seem to suffer from this bug).

Examples

All the input files used in the following commands can be found in the examples directory. We use the pedigree from Bockenhauer et al, 2009, but the data and map are simulated.

Expected LOD (ELOD) score

SwiftLink can calculate an expected LOD (ELOD) score for your pedigree assuming a recessive trait with complete penetrance (the default):

swift -p east.ped --elod

For an X-linked recessive trait with complete penetrance:

swift -p xlinked.ped --elod -X

For a dominant trait with complete penetrance (for illustrative purposes only, the file dominant.ped is not provided):

swift -p dominant.ped --elod --penetrance=0.0,1.0,1.0

The ELOD score can be used both as a power analysis and as an additional quality control post-analysis. If the maximum LOD score differs considerably from the ELOD, then this could point to an unidentified problem with the input data or other model misspecification.

Linkage analysis

The simplest way to run a linkage analysis with SwiftLink, i.e. with default parameters, is the following:

swift -p east.ped -m east.map -d east.dat -o results.txt

This will perform either an autosomal or X-linked analysis dependent on whether it is specified in the first line of the DAT file (SwiftLink can be forced to perform an X-linked analysis with the -X flag, see options). By default SwiftLink only uses a single CPU core and only performs a single replicate.

Using multiple CPUs

SwiftLink can be efficiently run across multiple CPU cores. Here we perform the same analysis using four CPUs:

swift -p east.ped -m east.map -d east.dat -o results.txt -c 4 

Performing multiple runs

SwiftLink has a builtin function to run multiple Markov chains and output LOD scores averaged over all replicates. For a majority of projects we have been involved in ~10 replicates is sufficient:

swift -p east.ped -m east.map -d east.dat -o results.txt -c 4 -R 10

Affected-only analysis

SwiftLink can easily perform an affected-only analysis, forcing all negative affection statuses to unknown status:

swift -p east.ped -m east.map -d east.dat -o results.txt -c 4 -a

Using the GPU

If you have a CUDA-compatible GPU and have the CUDA drivers installed (see CUDA installation guide), SwiftLink can offload LOD score calculations to the GPU and speed up the overall runtime. The GPU code only supports autosomal linkage analysis:

swift -p east.ped -m east.map -d east.dat -o results.txt -c 4 -g

MCMC options and convergence diagnostics

The examples directory contains mostly toy examples, but depending on the complexity of your project you may need to spend some time ensuring that the Markov chain has converged to the stationary distribution to ensure your inferences are trustworthy.

This command runs SwiftLink for 1,000,000 iterations of burnin, followed by 1,000,000 iterations of simulation, sampling every 100th descent graph for LOD score estimation:

swift -p east.ped -m east.map -d east.dat -o results.txt -c 4 -b 1000000 -i 1000000 -x 100

This command performs 4 separate runs and, for each run, outputs a log file starting with the prefix "log" that can be used as input to the CODA R package (see next subsection):

swift -p east.ped -m east.map -d east.dat -o results.txt -c 4 -R 4 --trace --traceprefix log

Using CODA R package to perform diagnostics

This is an example to perform convergence diagnostics on the output given by the previous command in R using the CODA package. (further details about the interpretation of plots can be found on the web, for example):

# install package if not present
# install.packages('coda')

library(coda)

chain0 <- read.table('log.ped1.run0', header=T)
chain1 <- read.table('log.ped1.run1', header=T)
chain2 <- read.table('log.ped1.run2', header=T)
chain3 <- read.table('log.ped1.run3', header=T)

chains <- mcmc.list(mcmc(chain0$likelihood), mcmc(chain1$likelihood), mcmc(chain2$likelihood), mcmc(chain3$likelihood))

plot(chains)
gelman.diag(chains)
gelman.plot(chains)

Options

Usage: ./swift [OPTIONS] -p pedfile -m mapfile -d datfile
       ./swift [OPTIONS] -p pedfile --elod

Input files:
  -p pedfile, --pedigree=pedfile
  -m mapfile, --map=mapfile
  -d datfile, --dat=datfile

Output files:
  -o outfile, --output=outfile            (default = 'swiftlink.out')

MCMC options:
  -i NUM,     --iterations=NUM            (default = 50000)
  -b NUM,     --burnin=NUM                (default = 50000)
  -s NUM,     --sequentialimputation=NUM  (default = 1000)
  -x NUM,     --scoringperiod=NUM         (default = 10)
  -l FLOAT,   --lsamplerprobability=FLOAT (default = 0.5)
  -n NUM,     --lodscores=NUM             (default = 5)
  -R NUM,     --runs=NUM                  (default = 1)

MCMC diagnostic options:
  -T,         --trace
  -P PREFIX,  --traceprefix=PREFIX        (default = 'trace')

ELOD options:
  -e          --elod
  -f FLOAT    --frequency=FLOAT           (default = 1.0e-04)
  -w FLOAT    --separation=FLOAT          (default = 0.0500)
  -k FLOAT,FLOAT,FLOAT --penetrance=FLOAT,FLOAT,FLOAT(default = 0.00,0.00,1.00)
  -u NUM      --replicates=NUM            (default = 1000000)

Runtime options:
  -c NUM,     --cores=NUM                 (default = 1)
  -g,         --gpu

Misc:
  -X,         --sexlinked
  -a,         --affectedonly
  -q NUM,     --peelseqiter=NUM           (default = 1000000)
  -r seedfile,--randomseeds=seedfile
  -v,         --verbose
  -h,         --help