Skip to content

Commit

Permalink
Merge pull request #55 from UofS-Pulse-Binfo/Documenation_update_pos_…
Browse files Browse the repository at this point in the history
…search

Documenation update for position search
  • Loading branch information
laceysanderson committed Feb 12, 2024
2 parents f3dbc6a + e65975d commit 906cf4d
Show file tree
Hide file tree
Showing 16 changed files with 162 additions and 1 deletion.
2 changes: 1 addition & 1 deletion docs/configuration/optional_info.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ Germplasm From Header
The names of all germplasm (individuals) in this vcf file. The germplasm list must be new line separated without any header or empty lines.

.. note::
If this textarea is not filled, the module is able to find the list from selected VCF fiels. However, waiting time of extracting germplasm list from a selected file can be sifnificant for large VCF files.
If this textarea is not filled, the module is able to find the list from selected VCF files. However, waiting time of extracting germplasm list from a selected file can be sifnificant for large VCF files.
``Loading time for a 10G VCF file will be about 3 seconds.``

Since the germplasm list can be generated, it's not necessary to generate such a list for configuration otherwise. We can leave this section blank, select this file and copy generated list back to configuration.
Expand Down
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,4 @@ This modules provides a form interface so users can custom filter existing VCF f
features
install
configuration
position_search
12 changes: 12 additions & 0 deletions docs/position_search.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
Position Search
===============
VCF Position Search is a companion tool for VCF Bulk Loader. It allows users to search variants cross all available vcf files and provides links back to Bulk Loader for variant specific filter.


.. toctree::
:maxdepth: 2
:caption: In Detail:

position_search/ps_features
position_search/ps_setup
position_search/ps_troubleshoot
Binary file added docs/position_search/position_search.1.search.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
14 changes: 14 additions & 0 deletions docs/position_search/ps_features.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
Features
========

Search one variant by Variant Name or by Backbone Name and Position.

.. image:: position_search.1.search.png

Any file included in VCF Bulk Loader containing this variant will show in Search Results. Each file name can redirect to VCF Bulk Loader.

.. image:: position_search.2.results.png

The VCF Bulk Loader page will have wanted file selected and wanted variant position filled in filter criteria of Regions.

.. image:: position_search.3.link2bulkloader.png
49 changes: 49 additions & 0 deletions docs/position_search/ps_setup.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
How to Set Up
=============

Installation of Samtools is required by VCF position search, please check `Samtools <http://www.htslib.org/>`_ for detail.

Prepare files
-------------
Corresponding compressed file and index file are required for each vcf file. Compressed VCF file can be generated by:

.. code:: bash
bgzip -c your_vcf_file.vcf > your_vcf_file.vcf.gz
Generate tbi format index:

.. code:: bash
bcftools index your_vcf_file.vcf.gz
Or, generate csi format index:

.. code:: bash
tabix -C your_vcf_file.vcf.gz
.. note::

Please move prepared files to the directory provided in VCF Bulk Loader.

Test with test files
--------------------

Test files are included in this module and it's recommended to test before use.

.. note::

Test files (in vcf_filter/tests/test_files/) include: test_file_1.vcf, test_file_1.vcf.gz and test_file_1.vcf.gz.csi for test1; test_short2.vcf, test_short2.vcf.gz and test_short2.vcf.gz.csi for test2.


- add both tests to VCF Bulk Loader in admin->Tripal->Extensions->VCF Filter->add as test1 and test2

- give proper access to both tests, and check if compressed vcf file and index file are provided

- search in Position Search: search for 19p111 should give results for both test1 and test2, search for Xp9 should give result for test1 only and search for 20p120 should give result for test2 only

- delete tests in VCF Bulk Loader
28 changes: 28 additions & 0 deletions docs/position_search/ps_troubleshoot.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
Troubleshoot
============

If VCF Position Search is not working as expected, please check:

- if bcftools works properly

- if vcf files can be downloaded properly in VCF Bulk Loader

- if compressed vcf file (your_file.vcf.gz) and index file (your_file.vcf.gz.tbi or your_file.vcf.gz.csi) are provided in right directory

- if your compressed vcf files remain good integrity by:

.. code:: bash
bcftools view your_file.vcf.gz
- if one variant (e.g.: 19:111) is searchable by:

.. code:: bash
cd directory_includes_vcf_fiels
bcftools view --no-header -r 19:111 your_file.vcf.gz
.. note::

If you have any questions or suggestions, please contacts us at knowpulse@usask.ca.
File renamed without changes.
34 changes: 34 additions & 0 deletions tests/test_files/test_short1.vcf
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
##fileformat=VCFv4.0
##fileDate=20090805
##source=myImputationProgramV3.1
##reference=1000GenomesPilot-NCBI36
##phasing=partial
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=AC,Number=.,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequency">
##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele">
##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129">
##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership">
##FILTER=<ID=q10,Description="Quality below 10">
##FILTER=<ID=s50,Description="Less than 50% of samples have data">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">
##ALT=<ID=DEL:ME:ALU,Description="Deletion of ALU element">
##ALT=<ID=CNV,Description="Copy number variable region">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA00003
19 111 . A C 9.6 . . GT:HQ 0|0:10,10 0|0:10,10 0/1:3,3
19 112 . A G 10 . . GT:HQ 0|0:10,10 0|0:10,10 0/1:3,3
20 14370 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:.,.
20 17330 . T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQ:DP:HQ 0|0:49:3:58,50 0|1:3:5:65,3 0/0:41:3:.,.
20 1110696 rs6040355 A G,T 67 PASS NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 1|2:21:6:23,27 2|1:2:0:18,2 2/2:35:4:.,.
20 1230237 . T . 47 PASS NS=3;DP=13;AA=T GT:GQ:DP:HQ 0|0:54:.:56,60 0|0:48:4:51,51 0/0:61:2:.,.
20 1234567 microsat1 G GA,GAC 50 PASS NS=3;DP=9;AA=G;AN=6;AC=3,1 GT:GQ:DP 0/1:.:4 0/2:17:2 1/1:40:3
20 1235237 . T . . . . GT 0/0 0|0 ./.
X 9 . A T 12.1 . . GT 0 0/1 1/0
X 10 rsTest AC A,ATG 10 PASS . GT 0 0/1 0|2
X 11 rsTest2 T A,<DEL:ME:ALU> 10 q10;s50 . GT:DP:GQ .:3:10 ./.:.:. 0|2:3:.
X 12 . T A 13 . . GT 0 1/0 1/1
Binary file added tests/test_files/test_short1.vcf.gz
Binary file not shown.
Binary file added tests/test_files/test_short1.vcf.gz.csi
Binary file not shown.
23 changes: 23 additions & 0 deletions tests/test_files/test_short2.vcf
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
##fileformat=VCFv4.0
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FILTER=<ID=q10,Description="Quality below 10">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT A
19 100 . GTTT G 1806 q10 DP=35 GT:GQ:DP 0/1:99:35
19 111 . C T,G 1792 PASS DP=32 GT:GQ:DP 0/1:99:32
19 111 . CAAA C 1792 PASS DP=32 GT:GQ:DP 0/1:99:32
19 120 . GA G 628 q10 DP=21 GT:GQ:DP 1/1:21:21
19 130 . G T 1016 PASS DP=22 GT:GQ:DP 0/1:99:22
19 130 . GAA GG 1016 PASS DP=22 GT:GQ:DP 0/1:99:22
19 140 . GT G 727 PASS DP=30 GT:GQ:DP 0/1:99:30
19 150 . TAAAA TA,T 246 PASS DP=10 GT:GQ:DP 1/2:12:10
19 160 . TAAAA TA,T 246 PASS DP=10 GT:GQ:DP 1/2:12:10
20 100 . GTTT G 1806 q10 DP=35 GT:GQ:DP 0/1:99:35
20 110 . CAAA C 1792 PASS DP=32 GT:GQ:DP 0/1:99:32
20 120 . GA G 628 q10 DP=21 GT:GQ:DP 1/1:21:21
20 130 . GAA G 1016 PASS DP=22 GT:GQ:DP 0/1:99:22
20 140 . GT G 727 PASS DP=30 GT:GQ:DP 0/1:99:30
20 150 . TAAAA TA,T 246 PASS DP=10 GT:GQ:DP 1/2:12:10
20 160 . TAAAA TA,TC,T 246 PASS DP=10 GT:GQ:DP 0/2:12:10
Binary file added tests/test_files/test_short2.vcf.gz
Binary file not shown.
Binary file added tests/test_files/test_short2.vcf.gz.tbi
Binary file not shown.

0 comments on commit 906cf4d

Please sign in to comment.