Skip to content

Commit

Permalink
Merge branch 'develop' into sps_use_new_quota_table
Browse files Browse the repository at this point in the history
  • Loading branch information
salonishah11 committed Sep 3, 2024
2 parents 63ab888 + 9b2e3c6 commit f21b11f
Show file tree
Hide file tree
Showing 116 changed files with 711 additions and 205 deletions.
2 changes: 0 additions & 2 deletions .github/workflows/azure_e2e_run_workflow.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
name: 'Azure e2e - Run Workflow'
on:
schedule:
- cron: '0 16 * * *' # UTC 4pm, EST 11am, EDT 12pm
workflow_dispatch:

env:
Expand Down
27 changes: 18 additions & 9 deletions .github/workflows/chart_update_on_merge.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ on:

jobs:
chart-update:
name: Cromwhelm Chart Auto Updater
name: Cromwell Version Auto Updater
if: github.event.pull_request.merged == true
runs-on: ubuntu-latest
steps:
Expand All @@ -24,7 +24,9 @@ jobs:
with:
distribution: 'temurin'
java-version: '17'
- name: Clone Cromwhelm
- name: (DISABLED) Clone Cromwhelm
# WX-1837 disabling CI for this chart, used by AoU RWB only
if: false
uses: actions/checkout@v2
with:
repository: broadinstitute/cromwhelm
Expand Down Expand Up @@ -73,7 +75,9 @@ jobs:
repository: broadinstitute/terra-helmfile
event-type: update-service
client-payload: '{"service": "cromiam", "version": "${{ env.CROMWELL_VERSION }}", "dev_only": false}'
- name: Edit & push cromwhelm chart
- name: (DISABLED) Edit & push cromwhelm chart
# WX-1837 disabling CI for this chart, used by AoU RWB only
if: false
env:
BROADBOT_GITHUB_TOKEN: ${{ secrets.BROADBOT_GITHUB_TOKEN }}
run: |
Expand All @@ -90,29 +94,34 @@ jobs:
git push https://broadbot:$BROADBOT_GITHUB_TOKEN@github.com/broadinstitute/cromwhelm.git main
cd -
- name: Clone terra-helmfile
### WX-1836 Steps below here are disabled Azure CI

- name: (DISABLED) Clone terra-helmfile
uses: actions/checkout@v3
if: false
with:
repository: broadinstitute/terra-helmfile
token: ${{ secrets.BROADBOT_GITHUB_TOKEN }} # Has to be set at checkout AND later when pushing to work
path: terra-helmfile

- name: Update workflows-app in terra-helmfile
- name: (DISABLED) Update workflows-app in terra-helmfile
if: false
run: |
set -e
cd terra-helmfile
sed -i "s|image: broadinstitute/cromwell:.*|image: broadinstitute/cromwell:$CROMWELL_VERSION|" charts/workflows-app/values.yaml
cd -
- name: Update cromwell-runner-app in terra-helmfile
- name: (DISABLED) Update cromwell-runner-app in terra-helmfile
if: false
run: |
set -e
cd terra-helmfile
sed -i "s|image: broadinstitute/cromwell:.*|image: broadinstitute/cromwell:$CROMWELL_VERSION|" charts/cromwell-runner-app/values.yaml
cd -
- name: Make PR in terra-helmfile
- name: (DISABLED) Make PR in terra-helmfile
if: false
env:
BROADBOT_TOKEN: ${{ secrets.BROADBOT_GITHUB_TOKEN }}
GH_TOKEN: ${{ secrets.BROADBOT_GITHUB_TOKEN }}
Expand Down
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,11 +20,13 @@ be found [here](https://cromwell.readthedocs.io/en/stable/backends/HPC/#optional

- The `genomics` configuration entry was renamed to `batch`, see [ReadTheDocs](https://cromwell.readthedocs.io/en/stable/backends/GCPBatch/) for more information.
- Fixes a bug with not being able to recover jobs on Cromwell restart.
- Fixes machine type selection to match the Google Cloud Life Sciences backend, including default n1 non shared-core machine types and correct handling of `cpuPlatform` to select n2 or n2d machine types as appropriate.
- Fixes the preemption error handling, now, the correct error message is printed, this also handles the other potential exit codes.
- Fixes error message reporting for failed jobs.
- Fixes the "retry with more memory" feature.
- Fixes pulling Docker image metadata from private GCR repositories.
- Fixed `google_project` and `google_compute_service_account` workflow options not taking effect when using GCP Batch backend
- Added a way to use a custom LogsPolicy for the job execution, setting `backend.providers.batch.config.batch.logs-policy` to "CLOUD_LOGGING" (default) keeps the current behavior, or, set it to "PATH" to save the logs into the the mounted disk, at the end, this log file gets copied to the google cloud storage bucket with "task.log" as the name.

### Improved handling of Life Sciences API quota errors

Expand Down
1 change: 1 addition & 0 deletions build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -237,6 +237,7 @@ lazy val googlePipelinesV2Beta = (project in backendRoot / "google" / "pipelines

lazy val googleBatch = (project in backendRoot / "google" / "batch")
.withLibrarySettings("cromwell-google-batch-backend")
.dependsOn(core)
.dependsOn(backend)
.dependsOn(gcsFileSystem)
.dependsOn(drsFileSystem)
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name: call_cache_cha_cha_papi
testFormat: CromwellRestartWithRecover
callMark: call_cache_cha_cha.sleep_during_restart
backends: [Papi]
backends: [Papi, GCPBATCH]
tags: [engine_upgrade]
retryTestFailures: false

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ testFormat: workflowsuccess
# `ignore`ing for now but hopefully can be re-enabled in Jenkins if full metadata fetches can be sidestepped (BT-380)
# or perhaps migrated to the Cromwell perf environment.
ignore: true
backends: [ Papiv2 ]
backends: [ Papiv2, GCPBATCH ]

files {
workflow: scale/lots_of_inputs_scattered/lots_of_inputs_scattered.wdl
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name: cromwell_restart
testFormat: CromwellRestartWithRecover
backends: [Papiv1]
backends: [Papiv1, GCPBATCH]
callMark: cromwell_restart.cromwell_killer
retryTestFailures: false

Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name: papi_upgrade
testFormat: PapiUpgrade
backends: [Papi]
backends: [Papi, GCPBATCH]
tags: [papi_upgrade]
callMark: papi_upgrade.cromwell_killer
retryTestFailures: false
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ tags: ["wdl_biscayne"]

# This test should only run in the Local suite, on its default `Local` backend. Unfortunately the `Local` backend
# leaks into other suites, so require an irrelevant `LocalNoDocker` backend that is only found in Local suite.
backendsMode: all
backends: [Local, LocalNoDocker]

files {
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name: bucket_name_with_trailing_slash
testFormat: workflowfailure
backends: [Papiv2]
backends: [Papiv2, GCPBATCH]

files {
workflow: attempt_to_localize_bucket_as_file/attempt_to_localize_bucket_as_file.wdl
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
name: call_cache_capoeira_jes
testFormat: workflowsuccess
backends: [Papi, Local, GCPBATCH]
# hashing backend name, doesn't match
# -Metadata mismatch for calls.call_cache_capoeira.read_array_files.callCaching.hashes.backend name - expected: "36EF4A8AB268D1A1C74D8108C93D48ED" but got: "F9B949AB11D336FE12AEEF8C8DB7D3F9"
backends: [Papi, GCPBATCH_NEEDS_ALT]
retryTestFailures: false

files {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@
# should not see the first's cache entries.
name: call_cache_hit_prefixes_two_roots_empty_hint_cache_hit_papi
testFormat: runthriceexpectingcallcaching
backends: [Papi, GCPBATCH]
# don't know
backends: [Papi, GCPBATCH_FAIL]

files {
workflow: call_cache_hit_prefixes/call_cache_hit_prefixes.wdl
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,12 @@
# Each run has different "jes_gcs_root"s so they should not see each other's cache entries.
name: call_cache_hit_prefixes_two_roots_empty_hint_cache_miss_papi
testFormat: runtwiceexpectingnocallcaching
backends: [Papi, GCPBATCH]
# not sure why failing, backend names should be different from PAPI so should not cross-talk
#
# should NOT call cache the second run of call_cache_hit_prefixes_two_roots_empty_hint_cache_miss_papi *** FAILED *** (11 minutes, 11 seconds)
# centaur.test.CentaurTestException: Found unexpected cache hits for call_cache_hit_prefixes_two_roots_empty_hint_cache_miss_papi:
# calls.call_cache_hit_prefixes.yo.callCaching.result: Cache Hit: 19e522ed-685e-4c53-9d49-949d8b05a2a9:call_cache_hit_prefixes.yo:-1
backends: [Papi, GCPBATCH_FAIL]

files {
workflow: call_cache_hit_prefixes/call_cache_hit_prefixes.wdl
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
name: check_network_in_vpc
testFormat: workflowsuccess
backends: [Papiv2-Virtual-Private-Cloud-Labels, Papiv2-Virtual-Private-Cloud-Literals]
# alt exists but is not exactly as capable as PAPI v2, see https://github.com/broadinstitute/cromwell/pull/7505
backends: [Papiv2-Virtual-Private-Cloud-Labels, Papiv2-Virtual-Private-Cloud-Literals, GCPBATCH_ALT]

files {
workflow: virtual_private_cloud/check_network_in_vpc.wdl
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,20 @@
name: custom_mount_point
testFormat: workflowsuccess
backends: [Papi, GCPBATCH]

# Error returned to Centaur:
#
# Task custom_mount_point.t:NA:1 failed: Job state is set from RUNNING to FAILED for job projects/blah/locations/us-central1/jobs/job-blah-blah.Job failed due to task failure. Specifically, task with index 0 failed due to the following task event: \"Task state is updated from RUNNING to FAILED on zones/us-central1-b/instances/blah with exit code 125.
#
# Exit code 125 appears to be a Docker invocation error. Seeing the following in the logs which seems to suggest the mount point could not be created:
# docker: Error response from daemon: error while creating mount source path '/some/mnt': mkdir /some: read-only file system.
#
# The /some/mnt volume appears to be mounted read-write:
# "volumes": [
# "/mnt/disks/cromwell_root:/mnt/disks/cromwell_root:rw",
# "/some/mnt:/some/mnt:rw"
# ]

backends: [Papi, GCPBATCH_FAIL]

files {
workflow: custom_mount_point/custom_mount_point.wdl
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
name: dedup_localizations_papi_v2
testFormat: workflowsuccess
backends: [Papiv2, GCPBATCH]
# don't know
backends: [Papiv2, GCPBATCH_FAIL]

files {
workflow: dedup_localizations_papi_v2/dedup_localizations_papi_v2.wdl
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name: do_not_retry_rc0
testFormat: workflowsuccess
backends: [Papiv2]
backends: [Papiv2, GCPBATCH_ALT]

files {
workflow: retry_with_more_memory/do_not_retry_rc0.wdl
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name: do_not_retry_rc1
testFormat: workflowsuccess
backends: [Papiv2, GCPBATCH]
backends: [Papiv2, GCPBATCH_ALT]

files {
workflow: retry_with_more_memory/do_not_retry_rc1.wdl
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
task dockerhub {
command {
echo "hello"
}
runtime {
docker: "broadinstitute/cloud-cromwell:dev"
backend: "Papiv2NoDockerHubConfig"
}
}

workflow docker_hash_dockerhub_private {
call dockerhub
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
task dockerhub {
command {
echo "hello"
}
runtime {
docker: "broadinstitute/cloud-cromwell:dev"
backend: "Papiv2USADockerhub"
}
}

workflow docker_hash_dockerhub_private {
call dockerhub
}
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
name: docker_hash_dockerhub_private
testFormat: workflowsuccess
backends: [Papi, GCPBATCH]
# see https://github.com/broadinstitute/cromwell/pull/7515
backends: [Papi, GCPBATCH_FAIL]

files {
workflow: docker_hash/docker_hash_dockerhub_private.wdl
Expand Down
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
name: docker_hash_dockerhub_private_config_usa_wf_options
testFormat: workflowsuccess
backends: [Papiv2USADockerhub]
# see https://github.com/broadinstitute/cromwell/pull/7515
backends: [Papiv2USADockerhub, GCPBATCH_FAIL, GCPBATCH_NEEDS_ALT]

files {
workflow: docker_hash/docker_hash_dockerhub_private.wdl
workflow: docker_hash/docker_hash_dockerhub_private_usa_dockerhub.wdl
# Updated the options to read_from_cache: false for
# https://github.com/broadinstitute/cromwell/issues/3998
options-dir: "Error: BA-6546 The environment variable CROMWELL_BUILD_RESOURCES_DIRECTORY must be set/export pointing to a valid path such as '${YOUR_CROMWELL_DIR}/target/ci/resources'"
Expand Down
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
name: docker_hash_dockerhub_private_wf_options
testFormat: workflowsuccess
backends: [Papiv2NoDockerHubConfig]
# see https://github.com/broadinstitute/cromwell/pull/7515
backends: [Papiv2NoDockerHubConfig, GCPBATCH_FAIL]

files {
workflow: docker_hash/docker_hash_dockerhub_private.wdl
workflow: docker_hash/docker_hash_dockerhub_private_no_dockerhub_config.wdl
# Updated the options to read_from_cache: false for
# https://github.com/broadinstitute/cromwell/issues/3998
options-dir: "Error: BA-6546 The environment variable CROMWELL_BUILD_RESOURCES_DIRECTORY must be set/export pointing to a valid path such as '${YOUR_CROMWELL_DIR}/target/ci/resources'"
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
name: docker_image_cache_false_false
testFormat: workflowsuccess
backends: [Papiv2-Docker-Image-Cache]
# if we're bringing over Docker image cache
backends: [Papiv2-Docker-Image-Cache, GCPBATCH_NEEDS_ALT]

files {
workflow: docker_image_cache/docker_image_cache_false.wdl
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
name: docker_image_cache_false_true
testFormat: workflowsuccess
backends: [Papiv2-Docker-Image-Cache]
# if we're keeping Docker image cache
backends: [Papiv2-Docker-Image-Cache, GCPBATCH_NEEDS_ALT]

files {
workflow: docker_image_cache/docker_image_cache_false.wdl
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
name: docker_image_cache_false_unspecified
testFormat: workflowsuccess
backends: [Papiv2-Docker-Image-Cache]
# if Docker image caches
backends: [Papiv2-Docker-Image-Cache, GCPBATCH_NEEDS_ALT]

files {
workflow: docker_image_cache/docker_image_cache_false.wdl
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
name: docker_image_cache_true_false
testFormat: workflowsuccess
backends: [Papiv2-Docker-Image-Cache]
# if we're doing Docker image caches in GCPBATCH
backends: [Papiv2-Docker-Image-Cache, GCPBATCH_NEEDS_ALT]

files {
workflow: docker_image_cache/docker_image_cache_true.wdl
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
name: docker_image_cache_true_true
testFormat: workflowsuccess
backends: [Papiv2-Docker-Image-Cache]
# if Docker image caches
backends: [Papiv2-Docker-Image-Cache, GCPBATCH_NEEDS_ALT]

files {
workflow: docker_image_cache/docker_image_cache_true.wdl
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
name: docker_image_cache_true_unspecified
testFormat: workflowsuccess
backends: [Papiv2-Docker-Image-Cache]
# This will need an alt if we want to bring Docker image cache to GCPBATCH which I'm not sure we do
backends: [Papiv2-Docker-Image-Cache, GCPBATCH_NEEDS_ALT]

files {
workflow: docker_image_cache/docker_image_cache_true.wdl
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name: docker_image_cache_unspecified_false
testFormat: workflowsuccess
backends: [Papiv2-Docker-Image-Cache]
backends: [Papiv2-Docker-Image-Cache, GCPBATCH_NEEDS_ALT]

files {
workflow: docker_image_cache/docker_image_cache_unspecified.wdl
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
name: docker_image_cache_unspecified_true
testFormat: workflowsuccess
backends: [Papiv2-Docker-Image-Cache]
# if Docker image caches
backends: [Papiv2-Docker-Image-Cache, GCPBATCH_NEEDS_ALT]

files {
workflow: docker_image_cache/docker_image_cache_unspecified.wdl
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
name: docker_image_cache_unspecified_unspecified
testFormat: workflowsuccess
backends: [Papiv2-Docker-Image-Cache]
# needs an alt if we're going to keep Docker image cache tests which I'm not sure we are
backends: [Papiv2-Docker-Image-Cache, GCPBATCH_NEEDS_ALT]

files {
workflow: docker_image_cache/docker_image_cache_unspecified.wdl
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
name: docker_size_dockerhub
testFormat: workflowsuccess
backends: [Papiv2, GCPBATCH]
# Not testing this for GCP Batch since Batch seems to give us a 30 GiB boot volume even when we ask for 10 GiB.
# Honestly that's fine, 30 GiB should be big enough to keep us out of trouble with large Docker images without being
# noticeably more expensive.
backends: [Papiv2, GCPBATCH_SKIP]

files {
workflow: docker_size/docker_size_dockerhub.wdl
Expand Down
Loading

0 comments on commit f21b11f

Please sign in to comment.