-
Notifications
You must be signed in to change notification settings - Fork 354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Pause & Resume run #9129
feat: Pause & Resume run #9129
Conversation
✅ Deploy Preview for determined-ui canceled.
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #9129 +/- ##
==========================================
- Coverage 49.00% 48.96% -0.04%
==========================================
Files 1234 1234
Lines 159671 159836 +165
Branches 2779 2778 -1
==========================================
+ Hits 78248 78268 +20
- Misses 81248 81393 +145
Partials 175 175
Flags with carried forward coverage won't be shown. Click here to find out more.
|
|
||
var results []*apiv1.RunActionResult | ||
visibleIDs := set.New[int32]() | ||
expIDs := set.New[int32]() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pausing a run is not the same as pausing a search.
we need to go back to product about this feature. as is it doesn't make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussion we will only be pausing runs part of single-trial experiments
@AmanuelAaron are we going to close this PR? |
master/internal/api_runs.go
Outdated
Column("r.id"). | ||
ColumnExpr("COALESCE((r.archived OR e.archived OR p.archived OR w.archived), FALSE) AS archived"). | ||
ColumnExpr("r.experiment_id as exp_id"). | ||
ColumnExpr("((SELECT COUNT(*) FROM runs r WHERE e.id = r.experiment_id) > 1) as is_multitrial"). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to look at the searcher type not at the number of runs. asha doesn't create all its trials at once. also it have a side benefit that it's not an agg and should be faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably make sure searchRuns
also behaves like this so the value is consistent throughout. should i make a ticket?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created ticket here: https://hpe-aiatscale.atlassian.net/browse/ET-296
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm modulo final feedback (the comment about checking searcher type instead should be considered blocking but i dont feel the need to re-review how it is implemented).
i don't quite agree with the bff style here, i think we'd be better having pause/resume on the runs table do a search runs and then call bulk resume/pause experiments when the experiment ids for the single serachers directly. but we can agree to disagree.
master/internal/api_runs.go
Outdated
} | ||
expIDs.Insert(*cand.ExpID) | ||
} | ||
if filter == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes intentions clearer i think
if filter == nil { | |
if len(runIds) != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a shared bool as there is another similar check above
master/internal/api_runs.go
Outdated
var results []*apiv1.RunActionResult | ||
visibleIDs := set.New[int32]() | ||
expIDs := set.New[int32]() | ||
expToRun := make(map[int32][]int32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be a single mapping or there is an error?
expToRun := make(map[int32][]int32) | |
expToRun := make(map[int32]int32) |
making this clear in the code here would simplify code later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was left over from when we allowed multi-trial. Refactored now, including a few things below.
master/internal/api_runs.go
Outdated
for _, expRes := range expResults { | ||
val, ok := expToRun[expRes.ID] | ||
if !ok { | ||
val = []int32{-1} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-1 seems much less useful than a message saying "some invariant in the code was broken and we don't to an action on an experiment we didn't expect to"..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
Description
Adding pause & resume run endpoints to Runs API
Test Plan
e2e tests should pass
Checklist
docs/release-notes/
.See Release Note for details.
Ticket
ET-91