-
Notifications
You must be signed in to change notification settings - Fork 354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Create MoveRun endpoint #9001
Conversation
✅ Deploy Preview for determined-ui canceled.
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #9001 +/- ##
==========================================
+ Coverage 46.82% 46.83% +0.01%
==========================================
Files 1156 1156
Lines 142447 142611 +164
Branches 2387 2387
==========================================
+ Hits 66698 66796 +98
- Misses 75559 75625 +66
Partials 190 190
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! 🚀
master/internal/api_runs.go
Outdated
err = db.Bun().NewUpdate().Table("runs"). | ||
Set("experiment_id = ?", cloneExpID). | ||
Where("id = ?", runID). | ||
Returning("runs.id"). | ||
Model(&acceptedIDs). | ||
Scan(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to clone the run as well. this is to avoid situations where an experiment has no runs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to clone run and set to terminal state in destination project
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left two main major blocking review comments
// The ids of the runs being moved. | ||
repeated int32 run_ids = 1; | ||
// The id of the current parent project. | ||
int32 source_project_id = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this source_project_id?
Move experiment doesn't have this constraint that all runs must be from the same project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
source_project_id is used to cover the case where a user tries to move a run that exists but doesn't exist in the source project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From an API perspective I don't see the issue moving runs from different projects.
I'm not understanding when that case would come up and the reason we need to prevent this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A scenario could be: If user 1 moves a run from project A to project B and user 2 (who's table hasn't refreshed yet, and does not have permissions on project B) tries to move the run from project A to project C.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case they would fail the permission check for delete / view on that run independent of any source_project_id
I'm curious why MoveExperiment doesn't need it and this does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another reason why we would need the source project id is for the filter. MoveExperiments uses BulkFilter which limits how it can be used by frontend. We want to have the same filtering that is available to SearchRuns.
master/internal/api_runs.go
Outdated
getQ := db.Bun().NewSelect(). | ||
ModelTableExpr("runs AS r"). | ||
Model(&runChecks). | ||
Column("r.id"). | ||
ColumnExpr("COALESCE((e.archived OR p.archived OR w.archived), FALSE) AS archived"). | ||
ColumnExpr("r.experiment_id as exp_id"). | ||
ColumnExpr("((SELECT COUNT(*) FROM runs r WHERE e.id = r.experiment_id) > 1) as is_multitrial"). | ||
Join("LEFT JOIN experiments e ON r.experiment_id=e.id"). | ||
Join("JOIN projects p ON r.project_id = p.id"). | ||
Join("JOIN workspaces w ON p.workspace_id = w.id"). | ||
Where("r.project_id = ?", req.SourceProjectId) | ||
|
||
if req.Filter == nil { | ||
getQ = getQ.Where("r.id IN (?)", bun.In(req.RunIds)) | ||
} else { | ||
var efr experimentFilterRoot | ||
err := json.Unmarshal([]byte(*req.Filter), &efr) | ||
if err != nil { | ||
return nil, err | ||
} | ||
getQ = getQ.WhereGroup(" AND ", func(q *bun.SelectQuery) *bun.SelectQuery { | ||
_, err = efr.toSQL(q) | ||
return q | ||
}).WhereGroup(" AND ", func(q *bun.SelectQuery) *bun.SelectQuery { | ||
if !efr.ShowArchived { | ||
return q.Where(`e.archived = false`) | ||
} | ||
return q | ||
}) | ||
if err != nil { | ||
return nil, err | ||
} | ||
} | ||
|
||
err = getQ.Scan(ctx) | ||
if err != nil { | ||
return nil, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we just call the search runs API from here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are adding a permissions check in our query that does not exist in search runs.
if getQ, err = experiment.AuthZProvider.Get().FilterExperimentsQuery(ctx, *curUser, nil, getQ,
[]rbacv1.PermissionType{
rbacv1.PermissionType_PERMISSION_TYPE_VIEW_EXPERIMENT_METADATA,
rbacv1.PermissionType_PERMISSION_TYPE_DELETE_EXPERIMENT,
}); err != nil {
return nil, err
}
master/internal/api_runs.go
Outdated
} | ||
|
||
var results []*apiv1.RunActionResult | ||
var visibleIDs []int32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be more natural for visibileIDs/validIDs
to be master/pkg/set
or map[int32]bool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated visibleIDs
as we are doing a Contains check. However validIds
is simply used in a bun.In()
and is better off left as a slice
master/internal/api_runs.go
Outdated
} | ||
} | ||
if len(validIDs) > 0 { | ||
tx, err := db.Bun().BeginTx(ctx, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use db.Bun().RunInTx(ctx, nil, func(ctx context.Context, tx bun.Tx) error
instead so we don't have to worry about all this defer and logging errors if the transcation can't commit / rollback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
master/internal/api_runs.go
Outdated
if err != nil { | ||
// Delete cloned experiment if run cloning fails | ||
err = db.Bun().NewDelete().Table("experiments").Where("id = ?", cloneExpID).Scan(ctx) | ||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could swallow the error from clone run into dest project
if the delete on the experiments table succeeded.
Callers would think we did create a new experiment when we didn't
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to show both error on delete error
master/internal/api_runs.go
Outdated
var cloneExpID int32 | ||
|
||
// clone experiment into dest project | ||
err := db.Bun().NewRaw(`INSERT INTO experiments(state, config, model_definition, start_time, end_time, archived, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use a transaction here and the insert run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
master/internal/api_runs.go
Outdated
|
||
var cloneRunID int32 | ||
// clone run into dest project | ||
err = db.Bun().NewRaw(`INSERT INTO runs(experiment_id, state, start_time, end_time, hparams, warm_start_checkpoint_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blocking: this clone doesn't include metrics or checkpoints or tasks right?
Does anything in the ui work? I wouldn't expect logs or charts to be populated on this cloned run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This just copies the row that exists in the run table with a new id. This simply allows a user to view the run in a different project in the Run Table. The original run and associated metrics/checkpoints/tasks are unchanged and no new ones are made
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't used the runs table but I'm assuming there is going to be a link to the runs detail page or the experiment page?
This experiment / run detail page won't be complete because of the shallow copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussion, we have decided to pause on implementing the clone feature. For now we can either skip moving the multi-trial or move the entire experiment.
return nil, err | ||
} | ||
// check that user can view source project | ||
srcProject, err := a.GetProjectByID(ctx, req.SourceProjectId, *curUser) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blocking: do we check that the user can delete the run?
MoveExperiments does the check here
rbacv1.PermissionType_PERMISSION_TYPE_DELETE_EXPERIMENT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added permission check below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
} | ||
} | ||
|
||
if getQ, err = experiment.AuthZProvider.Get().FilterExperimentsQuery(ctx, *curUser, nil, getQ, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about runs not tied to experiments? Maybe make a ticket for when we support that this will need to be updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding the implementation of FilterExperimentsQuery
filters on all the workspaces in which the current user has the the provided permission type. So if the run is in a project that is in one of those workspaces it should work, even if it doesn't have an experiment.
master/internal/api_runs.go
Outdated
failedExpMoveIds := []int32{-1} | ||
for _, res := range expMoveResults { | ||
if res.Error != nil { | ||
failedExpMoveIds = append(failedExpMoveIds, res.ID) | ||
} | ||
} | ||
var acceptedIDs []int32 | ||
if _, err = tx.NewUpdate().Table("runs"). | ||
Set("project_id = ?", req.DestinationProjectId). | ||
Where("runs.id IN (?)", bun.In(validIDs)). | ||
Where("runs.experiment_id NOT IN (?)", bun.In(failedExpMoveIds)). | ||
Returning("runs.id"). | ||
Model(&acceptedIDs). | ||
Exec(ctx); err != nil { | ||
return fmt.Errorf("updating run's project IDs: %w", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
experiment.MoveExperiments should already set this so I don't think we need this update
Set("project_id = ?", destinationProjectID). |
We might still need a way to map expMoveResult to the run API response but we might be able to use runChecks to convert expMoveResult
to run IDs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a query to populate move results with failed experiment moves
master/internal/api_runs.go
Outdated
} | ||
} | ||
if len(validIDs) > 0 { | ||
err = db.Bun().RunInTx(ctx, nil, func(ctx context.Context, tx bun.Tx) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is only one query in this transaction so it doesn't really have a purpose and should remove it
experiment.MoveExperiment isn't using the transaction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
master/internal/api_runs.go
Outdated
var runChecks []archiveRunOKResult | ||
getQ := db.Bun().NewSelect(). | ||
ModelTableExpr("runs AS r"). | ||
Model(&runChecks). | ||
Column("r.id"). | ||
ColumnExpr("COALESCE((e.archived OR p.archived OR w.archived), FALSE) AS archived"). | ||
ColumnExpr("r.experiment_id as exp_id"). | ||
ColumnExpr("((SELECT COUNT(*) FROM runs r WHERE e.id = r.experiment_id) > 1) as is_multitrial"). | ||
Join("LEFT JOIN experiments e ON r.experiment_id=e.id"). | ||
Join("JOIN projects p ON r.project_id = p.id"). | ||
Join("JOIN workspaces w ON p.workspace_id = w.id"). | ||
Where("r.project_id = ?", req.SourceProjectId) | ||
|
||
if req.Filter == nil { | ||
getQ = getQ.Where("r.id IN (?)", bun.In(req.RunIds)) | ||
} else { | ||
var efr experimentFilterRoot | ||
err := json.Unmarshal([]byte(*req.Filter), &efr) | ||
if err != nil { | ||
return nil, err | ||
} | ||
getQ = getQ.WhereGroup(" AND ", func(q *bun.SelectQuery) *bun.SelectQuery { | ||
_, err = efr.toSQL(q) | ||
return q | ||
}).WhereGroup(" AND ", func(q *bun.SelectQuery) *bun.SelectQuery { | ||
if !efr.ShowArchived { | ||
return q.Where(`e.archived = false`) | ||
} | ||
return q | ||
}) | ||
if err != nil { | ||
return nil, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this should be a helper shared between this and searchruns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added helper for the filter logic as that part was the exact same for both. Since we have different models and columns for each query I don't think a helper for the entire highlight code.
exp1 := createTestExpWithProjectID(t, api, curUser, projectIDInt) | ||
exp2 := createTestExpWithProjectID(t, api, curUser, projectIDInt) | ||
|
||
task1 := &model.Task{TaskType: model.TaskTypeTrial, TaskID: model.NewTaskID()} | ||
require.NoError(t, db.AddTask(ctx, task1)) | ||
require.NoError(t, db.AddTrial(ctx, &model.Trial{ | ||
State: model.PausedState, | ||
ExperimentID: exp1.ID, | ||
StartTime: time.Now(), | ||
}, task1.TaskID)) | ||
|
||
task2 := &model.Task{TaskType: model.TaskTypeTrial, TaskID: model.NewTaskID()} | ||
require.NoError(t, db.AddTask(ctx, task2)) | ||
require.NoError(t, db.AddTrial(ctx, &model.Trial{ | ||
State: model.PausedState, | ||
ExperimentID: exp2.ID, | ||
StartTime: time.Now(), | ||
}, task2.TaskID)) | ||
|
||
req := &apiv1.SearchRunsRequest{ | ||
ProjectId: &sourceprojectID, | ||
Sort: ptrs.Ptr("id=asc"), | ||
} | ||
resp, err := api.SearchRuns(ctx, req) | ||
require.NoError(t, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can probably replace this with
trial, _ := createTestTrial(t, api, curUser)
I don't think the test changes at all if you create the run in uncategorized and check that is moved to the workspace you created
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
exp := createTestExpWithProjectID(t, api, curUser, projectIDInt) | ||
|
||
task1 := &model.Task{TaskType: model.TaskTypeTrial, TaskID: model.NewTaskID()} | ||
require.NoError(t, db.AddTask(ctx, task1)) | ||
require.NoError(t, db.AddTrial(ctx, &model.Trial{ | ||
State: model.PausedState, | ||
ExperimentID: exp.ID, | ||
StartTime: time.Now(), | ||
}, task1.TaskID)) | ||
|
||
task2 := &model.Task{TaskType: model.TaskTypeTrial, TaskID: model.NewTaskID()} | ||
require.NoError(t, db.AddTask(ctx, task2)) | ||
require.NoError(t, db.AddTrial(ctx, &model.Trial{ | ||
State: model.PausedState, | ||
ExperimentID: exp.ID, | ||
StartTime: time.Now(), | ||
}, task2.TaskID)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a lot of boilerplate setup in these tests. Can we make some helpers for creating multitrial experiments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
Description
Creating a MoveRun endpoint to move runs from one project to another. Input can be either a list of run ids or a filter.
Test Plan
Integration tests should pass
Checklist
docs/release-notes/
.See Release Note for details.
Ticket
ET-67