Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LiftOver function not working for hg38 to hg19? #7

Open
DavisCammann opened this issue Jan 19, 2023 · 1 comment
Open

LiftOver function not working for hg38 to hg19? #7

DavisCammann opened this issue Jan 19, 2023 · 1 comment
Labels
enhancement New feature or request

Comments

@DavisCammann
Copy link

Hello, I have been having an issue converting a GWAS file from genome build 38 to genome build 37 (hg19). In order to do this, I specified hg38 for "build" in the .json file of the GWAS, and then provided the two prepared DBSNP files from hg19 (--dbsnp-1 and --dbsnp-2), as well as the --chain-file that specifies conversion from hg38 to hg19.

This results in the program finishing at Step 3 without performing LiftOver as below:

SumStatsRehab v1.2.1 - fix command
input build: hg38
=== Step 1: Format the GWAS SS file ===
the SumStats file is a gzip. Unpacking
  Step 1 finished in 67.09121966362 seconds

=== Step 2: Validate entries in the formatted GWAS SS file and save the report ===
number of lines in the file: 20155434
validating entries : 100%|███████████████████████████████████████████████████████████████████████| 20155433/20155433 [02:08<00:00, 157126.27it/s]
calculating reports: 100%|███████████████████████████████████████████████████████████████████████| 20155433/20155433 [01:15<00:00, 268250.15it/s]
generating reports
found issues:
    rsID: 790496/20155433 (3.92%)
  Step 2 finished in 205.60818552970886 seconds

=== Step 3: Analyze the report and prepare for REHAB ===
  Step 3 finished in 0.0001633167266845703 seconds

The input file has nothing to resolve

To see if it would do anything, I decided to instead specify the "build" of my hg38 GWAS to hg19 in the .json file, and provide the same --dbsnp and --chain-file files as above, which resulted in LiftOver being performed successfully:

SumStatsRehab v1.2.1 - fix command
input build: hg19
=== Step 1: Format the GWAS SS file ===
the SumStats file is a gzip. Unpacking
  Step 1 finished in 68.19040560722351 seconds

=== Step 2: Validate entries in the formatted GWAS SS file and save the report ===
number of lines in the file: 20155434
validating entries : 100%|███████████████████████████████████████████████████████████████████████| 20155433/20155433 [02:07<00:00, 157908.08it/s]
calculating reports: 100%|███████████████████████████████████████████████████████████████████████| 20155433/20155433 [01:15<00:00, 265598.63it/s]
generating reports
found issues:
    rsID: 790496/20155433 (3.92%)
  Step 2 finished in 205.75064539909363 seconds

=== Step 3: Analyze the report and prepare for REHAB ===
    lifting over   : 100%|███████████████████████████████████████████████████████████████████████| 20155433/20155433 [01:29<00:00, 226464.01it/s]
finished liftover to hg38 (saved report)
number of lines in the file: 20155434
validating entries : 100%|███████████████████████████████████████████████████████████████████████| 20155433/20155433 [02:10<00:00, 154159.25it/s]
calculating reports: 100%|███████████████████████████████████████████████████████████████████████| 20155433/20155433 [01:16<00:00, 262338.98it/s]
generating reports
found issues:
    rsID: 790496/20155433 (3.92%)
    Chr: 834211/20155433 (4.14%)
    BP: 826569/20155433 (4.1%)
790496/20155433 entries are missing rsID
Going to sort the GWAS SS file by Chr and BP
Sorted by Chr and BP
  Step 3 finished in 341.96107244491577 seconds

=== Step 4: REHAB: loopping through the GWAS SS file and fixing entries ===
     loop-fix      : 100%|████████████████████████████████████████████████████████████████████████| 20155433/20155433 [26:07<00:00, 12858.91it/s]
  Step 4 finished in 1567.431545972824 seconds

=== Step 5: Validate entries in the fixed GWAS SS file and save the report ===
number of lines in the file: 20155434
validating entries : 100%|███████████████████████████████████████████████████████████████████████| 20155433/20155433 [02:11<00:00, 153484.10it/s]
calculating reports: 100%|███████████████████████████████████████████████████████████████████████| 20155433/20155433 [01:17<00:00, 259726.21it/s]
generating reports
found issues:
    rsID: 784818/20155433 (3.89%)
    Chr: 834211/20155433 (4.14%)
    BP: 826569/20155433 (4.1%)
  Step 5 finished in 211.63463640213013 seconds

=== Step 6: Analyze the report after REHAB ===
lost 834211 (4.14%) "Chr" fields after liftover
lost 826569 (4.1%) "BP" fields after liftover
restored 5678 (0.03%) "rsID" fields
  Step 6 finished in 0.00021147727966308594 seconds

Those issues which were possible to resolve have been resolved

This produces a file which appears to be successfully transferred from hg38 to hg19. Is the LiftOver function of SumStatsRehab only intended to be from lower genome builds to hg38?
It seems like specifying an hg38 GWAS as hg19, and then matching the dbsnp files to that lower build with a chain file that goes from hg38 to hg19 works. However, specifying the hg38 GWAS as its actual build does nothing if you want to convert it to build hg19.

@nyrvelli
Copy link

Is the LiftOver function of SumStatsRehab only intended to be from lower genome builds to hg38?

Yes. It was originally designed for lift over of earlier builds to build 38. We did not really consider using it in the opposite direction.

@Kukuster Kukuster added the enhancement New feature or request label Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants