Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify the Gen3-DRS download option #287

Closed
Vlad-Dembrovskyi opened this issue Nov 18, 2021 · 4 comments · Fixed by #300 or #304
Closed

Simplify the Gen3-DRS download option #287

Vlad-Dembrovskyi opened this issue Nov 18, 2021 · 4 comments · Fixed by #300 or #304
Labels
enhancement New feature or request P0 Top priority: Breaks pipeline or implements wrong command, must be fixed asap

Comments

@Vlad-Dembrovskyi
Copy link
Contributor

Currently we need to manually edit the manifest file before using it for pipeline to only include the samples of interest. We need a way to only provide samples of interest and manifest to pipeline so that pipeline edits the manifest itself. Example code following shortly.

@Vlad-Dembrovskyi Vlad-Dembrovskyi added enhancement New feature or request P0 Top priority: Breaks pipeline or implements wrong command, must be fixed asap labels Nov 18, 2021
@angarb
Copy link
Collaborator

angarb commented Nov 23, 2021

Currently this is the method:
image

@angarb
Copy link
Collaborator

angarb commented Nov 23, 2021

I would like to:

  1. Add an input parameter called manifest (with the input being the .json manifest file downloaded from GTEX)
  2. For the reads.csv, I would like to input a list the files I want (either bams or crams)
    image
  3. Then, perhaps the subsetting of the manifest file can be done automatically. We would subset the manifest for the bam entries of interest in the reads.csv.

Example of manifest.json file (we will use the reads.csv to subset the "file_name" in the manifest(there could be several file types in the manifest):
image

Example of manifest.csv file (we will use the reads.csv to subset the second column/"file_name" in the manifest):
image

@angarb
Copy link
Collaborator

angarb commented Nov 23, 2021

Alternatively, we could give the specimen id GTEX-XXXX-XXXX-XX-XXXXX and it subsets the manifest for the .bam file entries.

This is possibly preferred, but it is important to note:

  1. There could be .bam and .bam.bai files in the manifest.
  2. Some files end in *.Aligned.sortedByCoord.out.patched.md.bam and some in *.Aligned.sortedByCoord.out.patched.bam.
  3. This means that bam would be the only option (though that seems fine as the cram files appear to only be for DNA seq in GTEX)

@Vlad-Dembrovskyi
Copy link
Contributor Author

Addition:
we can save the filenames requested but not found in original manifest file into a not_found_GTEX_samples.txt file. We Should also print them as warnings to stdout.

@cgpu cgpu linked a pull request Jan 12, 2022 that will close this issue
imendes93 added a commit to lifebit-ai/splicing-pipelines-nf that referenced this issue Jan 19, 2022
imendes93 added a commit to lifebit-ai/splicing-pipelines-nf that referenced this issue Jan 19, 2022
imendes93 added a commit that referenced this issue Jan 19, 2022
@imendes93 imendes93 linked a pull request Jan 19, 2022 that will close this issue
Vlad-Dembrovskyi added a commit that referenced this issue Feb 22, 2022
* Update usage.md

* Update run_on_sumner.md

* add dockerfile for csvtoolkit

* add process to convert manifest json to csv

* add process to filter manifest by file passed through --reads

* update help message

* fix bug on variable declaration

* Update nextflow.config - fix typo

* Revert "Merge branch 'master' into dev-v2.1-#287"

This reverts commit be2c2ab, reversing
changes made to 04285ef.

* Update main.nf

* patch projectDir error

* Fix oublishDir path for manifest

* Fix oublishDir path for manifest

* Fix typo

* Update filter_manifest.py

* Update filter_manifest.py

* fix bug on saving filenames that were not in manifest file

* Update filter_manifest.py

* remove logging of samples not found in manifest

* Update filter_manifest.py

* Makes filter_manifest txt output optional

Co-authored-by: angarb <62404570+angarb@users.noreply.github.com>
Co-authored-by: Vlad-Dembrovskyi <64809705+Vlad-Dembrovskyi@users.noreply.github.com>
Co-authored-by: Vlad-Dembrovskyi <vlad@lifebit.ai>
ilevantis pushed a commit that referenced this issue May 19, 2022
* Fixes env gtex issue #290 (#294)

* Change env() to stdout to save sample_name in gen3_drs

* Fix No such property: baseName for class: String

* Gen3-DRS prints md5 "file is good" to log not stdout

* Improves gen3-drs md5 error message

* Changes gtex input to support new manifest file format [#289] (#296)

* Updates ch_gtex_gen3_ids items #289

* Remove duplicate val(obj_id) in input of gen3-drs

Co-authored-by: cgpu <38183826+cgpu@users.noreply.github.com>

* Comments our fasta requirement for gen3-drs input (#297)

* Comments our fasta requirement for gen3-drs input

* Update usage.md that genome_fasta is only for CRAM

* Update usage.md typo

* Fix missing file from path issue

* change GLS executor from parameter to scope (#305)

* Remove gtex (#299)

* Remove mentions of old GTEX download option from main.nf

* Remove mentions of old GTEX download option from help

* Remove mentions of old GTEX download option from usage.md

* Renames Gen3-DRS into new GTEX download option

* Renames Gen3-DRS into new GTEX download opt in usage.md

* Dev v2.1 #287 - Simplify the Gen3-DRS download option (#304)

* Update usage.md

* Update run_on_sumner.md

* add dockerfile for csvtoolkit

* add process to convert manifest json to csv

* add process to filter manifest by file passed through --reads

* update help message

* fix bug on variable declaration

* Update nextflow.config - fix typo

* Revert "Merge branch 'master' into dev-v2.1-#287"

This reverts commit be2c2ab, reversing
changes made to 04285ef.

* Update main.nf

* patch projectDir error

* Fix oublishDir path for manifest

* Fix oublishDir path for manifest

* Fix typo

* Update filter_manifest.py

* Update filter_manifest.py

* fix bug on saving filenames that were not in manifest file

* Update filter_manifest.py

* remove logging of samples not found in manifest

* Update filter_manifest.py

* Makes filter_manifest txt output optional

Co-authored-by: angarb <62404570+angarb@users.noreply.github.com>
Co-authored-by: Vlad-Dembrovskyi <64809705+Vlad-Dembrovskyi@users.noreply.github.com>
Co-authored-by: Vlad-Dembrovskyi <vlad@lifebit.ai>

* Rename examples/gen3/README.md to examples/GTEX/README.md

Editing folder name to match new "download_from" name.

* Update and rename GEN3_DRS_config.md to GTEX_config.md

Updating parameters

* Delete examples/gen3 directory

* Update usage.md

Moving this information

* Update README.md

* Update README.md

* Delete PRJNA453538.SraRunTable.txt

Not needed

* Delete MCF10_MYCER.datafiles.csv

Not needed

* Create reads.csv

Adding reads.csv example

* Update README.md

* Create manifest.json

Adding example manifest.json

* Update README.md

* Update run_on_cloudos.md

* Update Copying_Files_From_Sumner_to_Cloud.md

Made neater

* Create Star_Index_Generation.md

Co-authored-by: cgpu <38183826+cgpu@users.noreply.github.com>
Co-authored-by: imendes93 <73831087+imendes93@users.noreply.github.com>
Co-authored-by: angarb <62404570+angarb@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request P0 Top priority: Breaks pipeline or implements wrong command, must be fixed asap
Projects
None yet
2 participants