Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: gate hpc by request #9198

Merged
merged 5 commits into from
May 1, 2024
Merged

ci: gate hpc by request #9198

merged 5 commits into from
May 1, 2024

Conversation

eecsliu
Copy link
Contributor

@eecsliu eecsliu commented Apr 18, 2024

Ticket

Description

Quick solution so that slurm tests aren't run by default every push. We ideally do not want to run HPC tests on every commit to decrease spend and runtime, so for now we will ask users to enter circleCI and manually approve of requests to run these tests. A future improvement is to automatically decide which tests to run based on the modified files

Test Plan

Check CircleCI and make sure that under test-e2e, the slurm and hpc tests are not running and are awaiting approval

Checklist

  • Changes have been manually QA'd
  • User-facing API changes need the "User-facing API Change" label.
  • Release notes should be added as a separate file under docs/release-notes/.
    See Release Note for details.
  • Licenses should be included for new code which was copied and/or modified from any external code.

@eecsliu eecsliu requested review from a team as code owners April 18, 2024 21:05
@cla-bot cla-bot bot added the cla-signed label Apr 18, 2024
@determined-ci determined-ci requested a review from a team April 18, 2024 21:06
@determined-ci
Copy link
Collaborator

Docsite preview being generated for this PR.
You can (eventually) find the generated docsite here.

@determined-ci determined-ci added the documentation Improvements or additions to documentation label Apr 18, 2024
@determined-ci
Copy link
Collaborator

Docsite preview being generated for this PR.
You can (eventually) find the generated docsite here.

Copy link

netlify bot commented Apr 18, 2024

Deploy Preview for determined-ui ready!

Name Link
🔨 Latest commit 7358578
🔍 Latest deploy log https://app.netlify.com/sites/determined-ui/deploys/663276af0e79a900088fb43c
😎 Deploy Preview https://deploy-preview-9198--determined-ui.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

codecov bot commented Apr 18, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 44.58%. Comparing base (b78020d) to head (7358578).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9198      +/-   ##
==========================================
+ Coverage   44.56%   44.58%   +0.01%     
==========================================
  Files        1275     1275              
  Lines      156216   156216              
  Branches     2451     2451              
==========================================
+ Hits        69625    69647      +22     
+ Misses      86351    86329      -22     
  Partials      240      240              
Flag Coverage Δ
backend 41.74% <ø> (+0.04%) ⬆️
harness 64.09% <ø> (ø)
web 35.00% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

see 6 files with indirect coverage changes

@determined-ci determined-ci removed the documentation Improvements or additions to documentation label Apr 18, 2024
@eecsliu eecsliu removed request for a team, ashtonG and jgongd April 18, 2024 21:48
@@ -4207,11 +4207,16 @@ workflows:
only:
- main

- request-e2e-hpc-tests:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these tests won't run on main automatically with this change right?

I think we need to duplicate the jobs into test-e2e-longrunning and put the request-e2e-hpc-tests there.

@@ -4286,6 +4293,7 @@ workflows:
mark: ["e2e_slurm and not parallel and not gpu_required"]
requires:
- build-go
- request-e2e-hpc-tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have another layer of skipping tests that only runs certain slurm tests with github pr labels
https://github.com/determined-ai/determined/blob/e5ba597858bcc6cf37d6db42d2720a21a9c3e10e/.circleci/real_config.yml#L2556-L2561C15

test-e2e-hpc-gcp will run every time though?

Personally I'm not against just removing the label running mechanism

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll give this a look

host: localhost
port: 5432
name: determined
password: launcher

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should use circleci env vars for passwords

Copy link

@molinamelendezj molinamelendezj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just one NIT.

- unless:
condition:
or:
- equal: [ true, <<parameters.always-run>> ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think always-run is unused now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, will remove

mark: ["e2e_pbs and not parallel and not gpu_required"]
extra-pytest-flags: ["-k 'not test_slurm_verify_home'"]
requires:
- build-go
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

I think the dependencies are mixed up? build-go is behind test-perf-feature branch so that it doesn't run on every commit twice

package and publish system is behind the package test requests.

Ideally I think approving the hpc tests should be enough to run them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the dependencies can be fixed pretty easily, but I'm not sure what you meant by build-go would get run twice on a commit. Would it be an issue just to move it outside of the request?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example build-react runs in e2e-tests and also in the test-e2e-longrunning. In a perfect world we would only build go / react once per commit. These tests run on every commit a lot so it is wasteful.

I don't really know an easy solution besides duplicating build go so if you just want to move it outside of the request that is fine too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NicholasBlaskey sorry got caught up in release stuff. I think I fixed the dependencies issue and updated

Copy link
Contributor

@NicholasBlaskey NicholasBlaskey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eecsliu eecsliu merged commit 9ee9270 into main May 1, 2024
83 of 96 checks passed
@eecsliu eecsliu deleted the gate-hpc-by-request branch May 1, 2024 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants