GitHub - vibansal/ancestry: program to estimate admixture coefficients from individual genotype or sequence data

iAdmix: USING POPULATION ALLELE FREQUENCIES FOR COMPUTING INDIVIDUAL ADMIXTURE ESTIMATES:

Inference of ancestry is an important aspect of disease association studies as well as for understanding population history. We have developed a fast and accurate method for estimating the admixture proportions for an individual's ancestry using genotype or sequence data and population allele frequencies from a set of parental/reference populations. The method can work with genotype data or sequence data (aligned sequence reads in a BAM file) derived from low-coverage whole-genome sequencing, exome-sequencing or even targeted sequencing experiments. The method uses the L-BFGS-B code (a limited memory BFGS algorithm with bound constraints) for optimizing the likelihood function and is extremely fast.

The method is described in the paper: "Fast individual ancestry inference from DNA sequence data leveraging allele frequencies from multiple populations". Vikas Bansal and Ondrej Libiger. published in BMC Bioinformatics 2015.

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-014-0418-7

INPUT:

sorted BAM file for sequence data or simple genotype file (rsid genotype pairs)
population allele frequencies for common SNPs (generated using HapMap3 genotypes or other genotype datasets)

OUTPUT:

admixture coefficients for each reference population

HOW TO RUN THE PROGRAM:

To compile the code: run 'make all' in the directory with the source code. This should create the executables 'ANCESTRY' and 'calculateGLL' (in the sub-directory parsebam).

python runancestry.py gives all options for the program

analyzing bam file for ancestry: python runancestry.py -f populations.frequencies.txt --bam sample.sorted.bam -o sample.output --path path_directory_with_executable
Example for genotype file: python runancestry.py --freq populations.frequencies.txt --geno sample.genotypes --out sample.ancestry
Example for plink genotype file: python runancestry.py --freq populations.frequencies.txt --plink sample.genotypes --out sample.ancestry

For plink, the program will assume that the files sample.genotypes.ped and sample.genotypes.map exist

NOTES

the allele frequency file should be sorted by chromosome and position
For running iAdmix, the path to the directory where the 'ANCESTRY' executable is located needs to be provided using the --path option to runancestry.py. This should be the directory where you downloaded the source code and compiled it.
To run on bam files, you will need to calculate genotype likelihoods using the reads that overlap the variant sites. iAdmix provides a program called 'calculateGLL' for doing this. The binary file (compiled on ubuntu x86_64 platform) is available in the github repository. The source code has recently been added to the github repository and can be compiled with the 'make all' command.
It is not recommended to run the program directly from a VCF since VCFs typically don't have information about reference genotypes (0/0) and this may bias the ancestry inference.
Make sure that the chromosome names ('chr1' vs '1') and the reference genome version (hg18 vs hg19) in the BAM file match the allele frequency file. If your chromosome names have the 'chr' prefix, use the command line option "--addchr=True" for the runancestry.py script
The '-c' and '-m' options are experimental and only for genotype data with multiple individuals

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
DATA		DATA
Lbfgsb.3.0		Lbfgsb.3.0
parsebam		parsebam
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
UPDATES		UPDATES
ancestry-gll.c		ancestry-gll.c
create_allelefreq.py		create_allelefreq.py
pooledlikelihoods.c		pooledlikelihoods.c
runancestry.py		runancestry.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

iAdmix: USING POPULATION ALLELE FREQUENCIES FOR COMPUTING INDIVIDUAL ADMIXTURE ESTIMATES:

INPUT:

OUTPUT:

HOW TO RUN THE PROGRAM:

NOTES

About

Releases

Packages

Contributors 2

Languages

License

vibansal/ancestry

Folders and files

Latest commit

History

Repository files navigation

iAdmix: USING POPULATION ALLELE FREQUENCIES FOR COMPUTING INDIVIDUAL ADMIXTURE ESTIMATES:

INPUT:

OUTPUT:

HOW TO RUN THE PROGRAM:

NOTES

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages