Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement snakemake analysis #472

Open
wants to merge 128 commits into
base: master
Choose a base branch
from

Conversation

adamcantor22
Copy link
Member

@adamcantor22 adamcantor22 commented Jul 23, 2024

Pull Request Template for MMEDS

What has changed

This PR will move MMEDS analysis from a structure using "Tool" classes -- individual files for each process with decentralized ability to be modified -- to a central Analysis class that makes use of snakemake workflows. The two workflows included in this release are a 'core_pipeline_taxonomic', replacing and improving on the old "Qiime2" tool, and 'lefse', a vast improvement over the old lefse tool.

Checklist of pre-requisites

  • Does the code run?
  • Does the code follow the repository style?
  • Is the code tested?

How to use the feature

Analyses are started in the same ways they were previously: running the script run_analysis.py, or navigating to the Analysis page of the website and putting the information in there. There are new config templates, those can be found in resources under config_file.yaml and lefse_config.yaml.

Additional notes

I've decided to deprecate summaries for now. They're odd at best and usually completely useless in their current form. Leaving the summary file in entirely commented to allow for future reference or support.

Issue Closings

Closes #466
Closes #457 (will move the relevant comments to a new issue)
Closes #449
Closes #386
Closes #400
Closes #430
Closes #438
Closes #437
Closes #382

@adamcantor22 adamcantor22 added this to the 0.9.0 milestone Aug 8, 2024
@adamcantor22 adamcantor22 marked this pull request as ready for review August 8, 2024 19:42
@@ -1036,7 +1036,6 @@ def get_sequencing_run_locations(self, metadata, user, column=("RawDataProtocol"
for run in df[column]:
if run not in runs:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this conditional confuses me

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right this is bad, it's an old bit of code from before I knew about the unique() function, fixed it now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the file that parallelization fails?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, this file just has the functions that define what splits will be requested when snakemake is called. nothing actually running from this when those errors occur

iterations: 5
permutations: 5
type: spearman
alpha_metrics: all
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where do the parameters for sparCC go?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those SparCC parameters are example parameters from earlier MMEDS where you would specify additional analysis in this config file. Basically it was trying to say "after running standard analysis, also run SparCC". So that's no longer a valid example. But in terms of what about running SparCC otherwise, haven't gotten to that yet, will be part of a future PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants