Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "1m" timespan in Nextstrain profiles #1027

Merged
merged 5 commits into from
Nov 24, 2022
Merged

Conversation

victorlin
Copy link
Member

@victorlin victorlin commented Nov 2, 2022

Description of proposed changes

Motivated by this Slack thread.

Everything is copied from entries defining the existing "2m" timespan. Comments and other timespan references updated accordingly.

Preview staging builds

Note that these won't be ready until the trial runs have finished uploading.

Related issue(s)

N/A

Testing

Release checklist

If this pull request introduces backward incompatible changes, complete the following steps for a new release of the workflow N/A

If this pull request introduces new features, complete the following steps:

  • Update docs/src/reference/change_log.md in this pull request to document these changes by the date they were added.

@corneliusroemer
Copy link
Member

corneliusroemer commented Nov 3, 2022

Here are the job ids so one can check logs, open should be done by now but isn't...

nextstrain build --aws-batch --no-download --attach a3447a21-283d-4439-aef0-4f837dbcc89e .
nextstrain build --aws-batch --no-download --attach e64c991b-5d71-4eab-853a-9f1b757f1cf5 .

Jobs failed @victorlin

log1.txt
log2.txt

Some errors I don't understand:

./scripts/upload-to-s3 results/reference/reference_subsampled_sequences.fasta.xz s3://nextstrain-staging/files/ncov/open/trial/1m-timespan/reference/sequences.fasta.>
  19207 [batch] [2022-11-02T22:15:34+01:00] An error occurred (404) when calling the HeadObject operation: Not Found
  19208 [batch] [2022-11-02T22:15:34+01:00] 663b2d01b1d92c7ebafd75ae86e492b0c448e1165d294dee2c245ee883223306 results/reference/reference_subsampled_sequences.fasta.xz
  19209 [batch] [2022-11-02T22:15:34+01:00] 0000000000000000000000000000000000000000000000000000000000000000 s3://nextstrain-staging/files/ncov/open/trial/1m-timespan/reference/sequences.fasta.xz
  19210 [batch] [2022-11-02T22:15:34+01:00] Uploading results/reference/reference_subsampled_sequences.fasta.xz → s3://nextstrain-staging/files/ncov/open/trial/1m-timespan/reference/sequences.fasta.xz
  19211 [batch] [2022-11-02T22:15:35+01:00] upload: results/reference/reference_subsampled_sequences.fasta.xz to s3://nextstrain-staging/files/ncov/open/trial/1m-timespan/reference/sequences.fasta.xz
  19212 [batch] [2022-11-02T22:15:35+01:00] 

Open seems to have worked apart from some uploads that failed.

GISAID meanwhile failed like this:

[Wed Nov  2 21:47:54 2022]
   6822 [batch] [2022-11-02T22:47:54+01:00] Error in rule logistic_growth:
   6823 [batch] [2022-11-02T22:47:54+01:00]     jobid: 279
   6824 [batch] [2022-11-02T22:47:54+01:00]     output: results/africa_1m/logistic_growth.json
   6825 [batch] [2022-11-02T22:47:54+01:00]     log: logs/logistic_growth_africa_1m.txt (check log file(s) for error message)
   6826 [batch] [2022-11-02T22:47:54+01:00]     shell:
   6827 [batch] [2022-11-02T22:47:54+01:00]         
   6828 [batch] [2022-11-02T22:47:54+01:00]         python3 scripts/calculate_delta_frequency.py             --tree results/africa_1m/tree.nwk             --frequencies results/africa_1m/tip-frequencies.json  >
   6829 [batch] [2022-11-02T22:47:54+01:00]         
   6830 [batch] [2022-11-02T22:47:54+01:00]         (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
   6831 [batch] [2022-11-02T22:47:54+01:00] Logfile logs/logistic_growth_africa_1m.txt:
   6832 [batch] [2022-11-02T22:47:54+01:00] Traceback (most recent call last):
   6833 [batch] [2022-11-02T22:47:54+01:00]   File "/nextstrain/build/scripts/calculate_delta_frequency.py", line 86, in <module>
   6834 [batch] [2022-11-02T22:47:54+01:00]     delta_time = pivots[last_pivot_index] - pivots[first_pivot_index]
   6835 [batch] [2022-11-02T22:47:54+01:00] IndexError: index -7 is out of bounds for axis 0 with size 5
   6836 [batch] [2022-11-02T22:48:12+01:00] augur traits is using TreeTime version 0.9.4
   6837 [batch] [2022-11-02T22:48:12+01:00] Assigned discrete traits to 1338 out of 1338 taxa.
   6838 [batch] [2022-11-02T22:48:12+01:00] NOTE: previous versions (<0.7.0) of this command made a 'short-branch
   6839 [batch] [2022-11-02T22:48:12+01:00] length assumption. TreeTime now optimizes the overall rate numerically
   6840 [batch] [2022-11-02T22:48:12+01:00] and thus allows for long branches along which multiple changes
   6841 [batch] [2022-11-02T22:48:12+01:00] accumulated. This is expected to affect estimates of the overall rate
   6842 [batch] [2022-11-02T22:48:12+01:00] while leaving the relative rates mostly unchanged.
   6843 [batch] [2022-11-02T22:48:12+01:00] Inferred ancestral states of discrete character using TreeTime:
   6844 [batch] [2022-11-02T22:48:12+01:00]     Sagulenko et al. TreeTime: Maximum-likelihood phylodynamic analysis
   6845 [batch] [2022-11-02T22:48:12+01:00]     Virus Evolution, vol 4, https://academic.oup.com/ve/article/4/1/vex042/4794731
   6846 [batch] [2022-11-02T22:48:12+01:00] results written to results/africa_1m/traits.json

We should probably have an option --keep-going enabled in Snakemake so not the entire build stops when a single build has an issue.

Copy link
Member

@corneliusroemer corneliusroemer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both trial runs unfortunately failed, see comment

@victorlin
Copy link
Member Author

Thanks for digging into that @corneliusroemer. I'm about to sign out for the day, so feel free to make changes on the PR branch to your desire. Otherwise I can pick up on it tomorrow.

@victorlin
Copy link
Member Author

As @trvrb mentioned in Slack, logistic growth is set to 6 weekly pivots which is too much when frequencies is limited by --min-date 1M. I've pushed c220e1d which should fix this. Will run new trial builds.

@victorlin
Copy link
Member Author

Also @corneliusroemer it's the same error for both GISAID and open.

@victorlin victorlin force-pushed the victorlin/use-weekly-grouping-for-2m branch 2 times, most recently from e747ab8 to 081485d Compare November 3, 2022 17:50
Base automatically changed from victorlin/use-weekly-grouping-for-2m to master November 3, 2022 17:53
@corneliusroemer
Copy link
Member

Excellent, now it works. I think we can make the bucket size bigger to get more recent sequences for the open build. We only have 1.3k sequences from last 1M and 2k sequences in total.

A build with 4k sequences is a good size, so we can add another 2k sequences from the most recent month to make it as useful as possible.

Everything is copied from entries defining the existing "2m" timespan.
Comments and other timespan references updated accordingly.
This avoids an uncaught IndexError that had occurred with
pivots[first_pivot_index] when first_pivot_index was out of bounds.
@victorlin
Copy link
Member Author

@corneliusroemer I've just rebased onto latest master so it has your changes to upload builds as they are done. I think it'd be best if you take over this branch now and tweak to your desire. Would you be willing to do that?

@@ -5,6 +5,10 @@ We also use this change log to document new features that maintain backward comp

## New features since last version update

- X November 2022: Add "2m" timespan in Nextstrain profile builds. [PR 1027](https://github.com/nextstrain/ncov/pull/1027)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- X November 2022: Add "2m" timespan in Nextstrain profile builds. [PR 1027](https://github.com/nextstrain/ncov/pull/1027)
- 8 November 2022: Add "1m" timespan in Nextstrain profile builds. [PR 1027](https://github.com/nextstrain/ncov/pull/1027)

@@ -5,6 +5,10 @@ We also use this change log to document new features that maintain backward comp

## New features since last version update

- X November 2022: Add "2m" timespan in Nextstrain profile builds. [PR 1027](https://github.com/nextstrain/ncov/pull/1027)

- X November 2022: calculate_delta_frequency: Allow script to work with fewer pivots available than requested with `--delta-pivots`. [PR 1027](https://github.com/nextstrain/ncov/pull/1027)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- X November 2022: calculate_delta_frequency: Allow script to work with fewer pivots available than requested with `--delta-pivots`. [PR 1027](https://github.com/nextstrain/ncov/pull/1027)
- 8 November 2022: calculate_delta_frequency: Allow script to work with fewer pivots available than requested with `--delta-pivots`. [PR 1027](https://github.com/nextstrain/ncov/pull/1027)

@corneliusroemer corneliusroemer marked this pull request as ready for review November 8, 2022 22:58
@corneliusroemer corneliusroemer requested a review from a team November 8, 2022 22:58
@victorlin victorlin force-pushed the victorlin/1m-timespan branch 2 times, most recently from df21d8e to 2bd8471 Compare November 23, 2022 20:49
@corneliusroemer corneliusroemer merged commit 271f3c8 into master Nov 24, 2022
@corneliusroemer corneliusroemer deleted the victorlin/1m-timespan branch November 24, 2022 14:41
@victorlin
Copy link
Member Author

@corneliusroemer FYI, I'm deleting your branch very-recent-1m at ca23d68 (it looks like an old test branch related to this PR).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

3 participants