Skip to content

Releases: choishingwan/PRSice

Temporary Fix

20 Sep 18:42
Compare
Choose a tag to compare

Update Log

This is a temporary fix while we re-structure PRSice for unit testings and improve extensibility. Unfortunately, this fix stretches a rather long period of time, so I might not have an accurate log of changes. Here are what I remembered:

  • Fix --perm. --perm should now run
  • --prevalence should now provide the PRS.R2.adj information correctly
  • We change the calculation of PRS.R2 when covariates were provided. Previous, it was calculated as Full.R2 - Null.R2. Now it is calculated as 1 - ( 1 - Full.R2) / ( 1 - Null.R2)
  • Some slight changes to code such that printing full score matrix should be faster when we have enough memory
  • Some attempt to reduce memory usage for bgen format with mixed success (it still require quite a lot of memory)

Known unfixed issue

  • bgen would not work without --allow-inter
  • bgen require more memory than it should
  • For some reason, we have to manually modify EIGEN library for window compilation. So there might be potential for bugs for the window build.

I will try to fix those whenever I have time, but I will mainly focus on the restructuring of PRSice.

BGEN sample selection bug fix

05 Aug 04:09
Compare
Choose a tag to compare

Update Log

  • Thanks to report from @charlisech, we were able to pinpoint a bug related to sample selection when using bgen data.

Minor Bug Fix

15 Jul 14:37
Compare
Choose a tag to compare

Update Log

  • Fix off by one error in PRSet best score output
  • Fix problem for bgen file when sample selection is performed on bgen files containing sample information

Update bug fix

23 May 15:55
Compare
Choose a tag to compare
  • Update Rscript such that it match features in executable (thus avoid problem in plotting)
  • Fix a bug where PRSice will crash when there are missing covariates

Major Rlease - Increased unit testing

18 May 14:54
Compare
Choose a tag to compare

Update Log (2020-05-21)

  • Previous bug fix fixed problem for no-regress, but caused all normal PRSice run to fail.

Update Log (2020-05-19)

  • Fix output error where we always say 0 valid phenotype were included for continuous trait
  • Fix problem with permutation where PRSice will crash if input are rank deficient
  • Fix problem when provide a binary phenotype file with a fam file containing -9 as phenotype, PRSice will wrongly state that there are no phenotype presented
  • Fix problem in Rscript where if sample ID is numeric and starts with 0, the best file will not merge with the phenotype file, causing 0 valid PRS to be observed

Update Log

  • We now support multi-threaded clumping (separated by chromosome)
    • Genotypes will be stored to memory during clumping (increase memory usage, significantly speed up clumping)
  • Will only generate one .prsice file for all phenotypes
    • .prsice file now has additional column call "Pheno"
  • Introduced --chr-id which generate rs id based on user provided formula
  • Format of --base-maf and --base-info are now changed to <name>:<value> from <name>,<value>
  • Fix a bug related to ambiguous allele dosage flipping when --keep-ambig is used
  • Better mismatch handling. For example, if your base file only provide the effective allele A without the non-effective allele information, PRSice will now do dosage flipping if your target file has G/C as effective allele and A /T as an non-effective allele (whereas previous this SNP will be considered as a mismatch)
  • Fix bug in 2.2.13 where PRSice won't output the error message during command parsing stage
  • If user provided the --stat information, PRSice will now error out instead of trying to look for BETA or OR in the file.
  • PRSice should now better recognize if phenotype file contains a header
  • various small bug fix

New Compilation and Unit Testing

10 Mar 19:10
Compare
Choose a tag to compare

Update Log

  • Implement unit testing for command parsing module
    • command parsing should now be more consistent and should be less likely to be source of bugs
  • Now allow the use of --a1 and --a2 instead of --A1 and --A2 to save one shift click
  • Can now properly handle bgen file with phasing information
  • Re-implement code for covariate parsing. --cov-factor and --cov-col should now have a more well defined behaviour
  • --full-back no longer require argument (the expected behaviour)
  • Fixed the default distance for --clump-kb, default was Mb instead of Kb (only affects version 2.2.12)
  • Correctly capture negative value in --binary-target
  • Correctly capture out of bound p-values and other parameters
  • Use of --memory will no longer error out PRSice unexpectedly
  • Behaviour change for --keep-ambig: Previously, when --keep-ambig was set, PRSice will keep all ambiguous SNP and will not perform any form of flipping, e.g. strand flipping A/C to T/G or dosage flipping A/C to C/A. Now when --keep-ambig was set, PRSice will perform dosage flipping but NOT strand flipping i.e. Base = A/T, Target = T/A, change Target dosage from 0,1,2 for T to 0,1,2 for A. You should only really use --keep-ambig if you are certain that the strand information between your base and target data are identical
  • Format for --base-info and --base-maf are changed to : from ,

Bug fix and update

21 Feb 22:01
Compare
Choose a tag to compare

Update Log

  1. We have fixed some problem observed in the beta version of 2.2.12
    a. Clumping now function as expected
    b. Standard error calculated should now always be correct
  2. Fix problem where PRSice doesn't honor the --model setting
  3. Fix INFO score and MAF filtering in the base
  4. Fix output of the --no-regress. --no-regress should now also generate a *.prsice file which contains the number of SNPs included in the PRS
  5. Fix problem related to set-based permutation
    • We also now drastically speed up set based permutation when the --ultra option is used (Require more memory).
  6. PRSice should now be able to handle special characters in the base file
  7. Add --num-auto. User can now change the number of autosome in their samples (Note: we assume all autosome to be diploid)
  8. Add --keep-ambig-as-is. When set, ambiguous SNPs that were kept will never be flipped. This should allow for slightly better control for the user
  9. Completely remove --pearson as we don't have the manpower to maintain this feature
  10. Also remove --enable-mmap as that doesn't help too much

Note

  • Window builds are completely failing and I have no idea why. We will try to figure out the problem but it is unlikely that it will be anytime soon. As a result of that, the window build will be unavailable until further noticed.
  • Currently, due to the flexibility of PRSice, there are large amount of functions that need to be tested before we can be confident for PRSice to work as expected. Therefore, I am hoping that until we complete the unit test for all feature of PRSice (which is extremely time consuming, and our current coverage is less than 5%), we will not add in any new features.

Fixing Standardization

14 Oct 16:22
Compare
Choose a tag to compare

Update Log

  • We have fixed the problem where the parameter --score, --missing and --model were not honored by PRSice
  • In addition, --score con_std should now work as expected (standardizing among controls only)
  • Fix problem where PRSice didn't automatically remove invalid covariates.

Update (Nightly build)

  • Fix memory and distance unit parsing

Note

  • Standardization and Control standardization will only calculate the mean and sd based on samples with valid phenotype and covariates (i.e. samples included in the regression model)

Quick Fix on Permutation

09 Oct 14:50
Compare
Choose a tag to compare

Update Log

  • In 2.2.10, there's a bug which caused 0 permutation to be performed even when --perm or --set-perm were set to a higher number. This is now fixed
  • Introduced --score con_std which perform standardization only in control samples (was introduced in 2.2.10, but there's some serious bug that were only fixed with 2.2.10.b. Note: This is untested)

Regression, Set based clumping and Refactoring

08 Oct 20:27
Compare
Choose a tag to compare

Update Log

  • Almost refactored the whole code base to make code cleaner and easier to read, thus hopefully reduce the number of bugs etc (Have not refactor code bases related permutation)
  • A bug was found in set based clumping. When more than 62 (or 30 for 32bit machine) sets were provided, the only the last few sets were properly clumped with the possibility of leaving some correlated SNPs in earlier sets
  • New glm algorithm for PRSice was sensitive to collinearity and can give very different result when compared to those calculated from R. This problem is now fixed
  • Problem regarding the --target-list in the Rscript is now fixed
  • Some changes to the log to make things a bit clearer
  • Add some more unit tests
  • Fix problem when bgen file are used for --ld where the sample size can be wrong when no external sample is provided and a phenotype file is provided for the target.

Manually tested feature

There are a lot of functionality of PRSice and I have not been able to write unit tests for most of the features (currently unit test coverage is less than 20%). The following features are tested manually using some toy data:

  • Binary PLINK input
    1. Clumping should generate identical results as PLINK 1.9
    2. PRS calculation should be identical as PLINK 1.9 (after considering flipping and when using the same input)
    3. MAF filtering should be identical to those calculated in PLINK (when there's no founder)
    4. Genotype missingness calculated should be identical to those calculated in PLINK
    5. Clumping with a reference panel should generate identical result as PLINK
    6. Filtering on LD reference panel work as expected
  • Binary GEN input
    1. Clumping should generate identical results as PLINK 1.9 (doesn't matter if whether we use --hard or --allow-inter)
    2. Automatic hard coding (--hard) should generate identical PRS as those calculated using PLINK
    3. PRS calculated on using dosage scoring (without --hard) are highly correlated with those generated with (--hard)
    4. Geno filtering and MAF filtering on target sample worked as expected.
    5. Geno filtering and MAF filtering on reference sample worked as expected.

Things that we have not tested

  1. We have not test data with founder samples
  2. We have only used the default --missing and --score parameters for our testing
  3. All permutation algorithms (--perm and --set-perm)
  4. Only tested the default genetic model (additive) of the --model parameter
  5. The window compilation are not tested
  6. --no-regress and --all-score were also not tested, but should in theory be ok

Features that might be problematic (use with caution)

  1. The INFO score calculated using --info filtering differ from those calculated from qctools. (Correlated, but in some situation, differ quite a lot). We have contacted author of qctool to see if that's an algorithmic difference or if there's a bug in PRSice (we have tried to follow the algorithm in the MaCH paper and in our manual testing, the number calculated from PRSice and those calculated manually are identical)

Note

I try my best to test run as many features as possible and are trying to implement as many unit test as possible. However, the lack of manpower means that there will always be features that I missed / things that are not thoroughly tested.

Please let us know if there are any problem or if PRSice didn't generate the expected results.