Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: fix tensorboard reattach k8s flake [RM-39] #8906

Merged
merged 2 commits into from
Feb 27, 2024

Conversation

NicholasBlaskey
Copy link
Contributor

Description

Fix flake https://app.circleci.com/pipelines/github/determined-ai/determined/51619/workflows/5358e0ec-7fc1-4893-b1d3-5422a43be44a/jobs/2298153

The basic premise of the flake was that the experiment tensorboard uploads would wait on gcs 429 errors making the experiment take longer than 60 seconds

https://circleci.com/api/v1.1/project/github/determined-ai/determined/2293269/output/145/0?file=true&allocation-id=65dcd1db5c29034450ab1342-0-build%2FABCDEFGH

My understanding of why this is happening recently is we changed how retrying works in
#8780

here we would just skip tensorboard uploads that failed due to 429s, but now we retry it and block on it

The fix is just to run a shorter experiment since this isn't what we are testing for this test.

Test Plan

Merge it and see if the flakes keep happening

Commentary (optional)

Checklist

  • Changes have been manually QA'd
  • User-facing API changes need the "User-facing API Change" label.
  • Release notes should be added as a separate file under docs/release-notes/.
    See Release Note for details.
  • Licenses should be included for new code which was copied and/or modified from any external code.

Ticket

Copy link

netlify bot commented Feb 27, 2024

Deploy Preview for determined-ui canceled.

Name Link
🔨 Latest commit e47dad9
🔍 Latest deploy log https://app.netlify.com/sites/determined-ui/deploys/65de35602b2aee0008a5b1fc

@NicholasBlaskey NicholasBlaskey changed the title test: fix tensorboard reattach k8s flake test: fix tensorboard reattach k8s flake [RM-39] Feb 27, 2024
Copy link

codecov bot commented Feb 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 47.53%. Comparing base (ca96da1) to head (e47dad9).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8906      +/-   ##
==========================================
- Coverage   47.53%   47.53%   -0.01%     
==========================================
  Files        1066     1066              
  Lines      170230   170230              
  Branches     2235     2237       +2     
==========================================
- Hits        80917    80916       -1     
- Misses      89155    89156       +1     
  Partials      158      158              
Flag Coverage Δ
backend 43.36% <ø> (-0.01%) ⬇️
harness 63.77% <ø> (ø)
web 42.50% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

see 3 files with indirect coverage changes

@NicholasBlaskey NicholasBlaskey merged commit 8f82087 into main Feb 27, 2024
79 of 92 checks passed
@NicholasBlaskey NicholasBlaskey deleted the fix_e2e_k8s_flake branch February 27, 2024 22:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants