Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Create MoveRun endpoint #9001

Merged
merged 26 commits into from
Apr 4, 2024
Merged

feat: Create MoveRun endpoint #9001

merged 26 commits into from
Apr 4, 2024

Conversation

AmanuelAaron
Copy link
Contributor

@AmanuelAaron AmanuelAaron commented Mar 14, 2024

Description

Creating a MoveRun endpoint to move runs from one project to another. Input can be either a list of run ids or a filter.

Test Plan

Integration tests should pass

Checklist

  • Changes have been manually QA'd
  • User-facing API changes need the "User-facing API Change" label.
  • Release notes should be added as a separate file under docs/release-notes/.
    See Release Note for details.
  • Licenses should be included for new code which was copied and/or modified from any external code.

Ticket

ET-67

@cla-bot cla-bot bot added the cla-signed label Mar 14, 2024
Copy link

netlify bot commented Mar 14, 2024

Deploy Preview for determined-ui canceled.

Name Link
🔨 Latest commit 54b58de
🔍 Latest deploy log https://app.netlify.com/sites/determined-ui/deploys/660efa6d716dca000857a716

@AmanuelAaron AmanuelAaron marked this pull request as ready for review March 20, 2024 17:03
@AmanuelAaron AmanuelAaron requested a review from a team as a code owner March 20, 2024 17:03
Copy link

codecov bot commented Mar 20, 2024

Codecov Report

Attention: Patch coverage is 62.28571% with 66 lines in your changes are missing coverage. Please review.

Project coverage is 46.83%. Comparing base (f03a8a8) to head (54b58de).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9001      +/-   ##
==========================================
+ Coverage   46.82%   46.83%   +0.01%     
==========================================
  Files        1156     1156              
  Lines      142447   142611     +164     
  Branches     2387     2387              
==========================================
+ Hits        66698    66796      +98     
- Misses      75559    75625      +66     
  Partials      190      190              
Flag Coverage Δ
backend 43.29% <74.59%> (+0.07%) ⬆️
harness 64.03% <33.96%> (-0.05%) ⬇️
web 38.16% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
master/internal/api_runs.go 81.38% <74.59%> (-5.12%) ⬇️
harness/determined/common/api/bindings.py 39.98% <33.96%> (-0.03%) ⬇️

... and 2 files with indirect coverage changes

Copy link
Contributor

@corban-beaird corban-beaird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! 🚀

master/internal/api_runs.go Outdated Show resolved Hide resolved
@corban-beaird corban-beaird self-requested a review March 25, 2024 15:33
Comment on lines 427 to 432
err = db.Bun().NewUpdate().Table("runs").
Set("experiment_id = ?", cloneExpID).
Where("id = ?", runID).
Returning("runs.id").
Model(&acceptedIDs).
Scan(ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to clone the run as well. this is to avoid situations where an experiment has no runs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to clone run and set to terminal state in destination project

Copy link
Contributor

@NicholasBlaskey NicholasBlaskey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left two main major blocking review comments

// The ids of the runs being moved.
repeated int32 run_ids = 1;
// The id of the current parent project.
int32 source_project_id = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this source_project_id?

Move experiment doesn't have this constraint that all runs must be from the same project.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

source_project_id is used to cover the case where a user tries to move a run that exists but doesn't exist in the source project.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From an API perspective I don't see the issue moving runs from different projects.

I'm not understanding when that case would come up and the reason we need to prevent this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A scenario could be: If user 1 moves a run from project A to project B and user 2 (who's table hasn't refreshed yet, and does not have permissions on project B) tries to move the run from project A to project C.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case they would fail the permission check for delete / view on that run independent of any source_project_id

I'm curious why MoveExperiment doesn't need it and this does?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another reason why we would need the source project id is for the filter. MoveExperiments uses BulkFilter which limits how it can be used by frontend. We want to have the same filtering that is available to SearchRuns.

Comment on lines 286 to 323
getQ := db.Bun().NewSelect().
ModelTableExpr("runs AS r").
Model(&runChecks).
Column("r.id").
ColumnExpr("COALESCE((e.archived OR p.archived OR w.archived), FALSE) AS archived").
ColumnExpr("r.experiment_id as exp_id").
ColumnExpr("((SELECT COUNT(*) FROM runs r WHERE e.id = r.experiment_id) > 1) as is_multitrial").
Join("LEFT JOIN experiments e ON r.experiment_id=e.id").
Join("JOIN projects p ON r.project_id = p.id").
Join("JOIN workspaces w ON p.workspace_id = w.id").
Where("r.project_id = ?", req.SourceProjectId)

if req.Filter == nil {
getQ = getQ.Where("r.id IN (?)", bun.In(req.RunIds))
} else {
var efr experimentFilterRoot
err := json.Unmarshal([]byte(*req.Filter), &efr)
if err != nil {
return nil, err
}
getQ = getQ.WhereGroup(" AND ", func(q *bun.SelectQuery) *bun.SelectQuery {
_, err = efr.toSQL(q)
return q
}).WhereGroup(" AND ", func(q *bun.SelectQuery) *bun.SelectQuery {
if !efr.ShowArchived {
return q.Where(`e.archived = false`)
}
return q
})
if err != nil {
return nil, err
}
}

err = getQ.Scan(ctx)
if err != nil {
return nil, err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just call the search runs API from here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are adding a permissions check in our query that does not exist in search runs.

if getQ, err = experiment.AuthZProvider.Get().FilterExperimentsQuery(ctx, *curUser, nil, getQ,
		[]rbacv1.PermissionType{
			rbacv1.PermissionType_PERMISSION_TYPE_VIEW_EXPERIMENT_METADATA,
			rbacv1.PermissionType_PERMISSION_TYPE_DELETE_EXPERIMENT,
		}); err != nil {
		return nil, err
	}

}

var results []*apiv1.RunActionResult
var visibleIDs []int32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be more natural for visibileIDs/validIDs to be master/pkg/set or map[int32]bool

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated visibleIDs as we are doing a Contains check. However validIds is simply used in a bun.In() and is better off left as a slice

master/internal/api_runs.go Show resolved Hide resolved
}
}
if len(validIDs) > 0 {
tx, err := db.Bun().BeginTx(ctx, nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use db.Bun().RunInTx(ctx, nil, func(ctx context.Context, tx bun.Tx) error instead so we don't have to worry about all this defer and logging errors if the transcation can't commit / rollback

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

if err != nil {
// Delete cloned experiment if run cloning fails
err = db.Bun().NewDelete().Table("experiments").Where("id = ?", cloneExpID).Scan(ctx)
return err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could swallow the error from clone run into dest project if the delete on the experiments table succeeded.

Callers would think we did create a new experiment when we didn't

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to show both error on delete error

var cloneExpID int32

// clone experiment into dest project
err := db.Bun().NewRaw(`INSERT INTO experiments(state, config, model_definition, start_time, end_time, archived,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use a transaction here and the insert run

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated


var cloneRunID int32
// clone run into dest project
err = db.Bun().NewRaw(`INSERT INTO runs(experiment_id, state, start_time, end_time, hparams, warm_start_checkpoint_id,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocking: this clone doesn't include metrics or checkpoints or tasks right?

Does anything in the ui work? I wouldn't expect logs or charts to be populated on this cloned run

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just copies the row that exists in the run table with a new id. This simply allows a user to view the run in a different project in the Run Table. The original run and associated metrics/checkpoints/tasks are unchanged and no new ones are made

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't used the runs table but I'm assuming there is going to be a link to the runs detail page or the experiment page?

This experiment / run detail page won't be complete because of the shallow copy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion, we have decided to pause on implementing the clone feature. For now we can either skip moving the multi-trial or move the entire experiment.

return nil, err
}
// check that user can view source project
srcProject, err := a.GetProjectByID(ctx, req.SourceProjectId, *curUser)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocking: do we check that the user can delete the run?

MoveExperiments does the check here

rbacv1.PermissionType_PERMISSION_TYPE_DELETE_EXPERIMENT,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added permission check below

Copy link
Contributor

@NicholasBlaskey NicholasBlaskey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

}
}

if getQ, err = experiment.AuthZProvider.Get().FilterExperimentsQuery(ctx, *curUser, nil, getQ,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about runs not tied to experiments? Maybe make a ticket for when we support that this will need to be updated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding the implementation of FilterExperimentsQuery filters on all the workspaces in which the current user has the the provided permission type. So if the run is in a project that is in one of those workspaces it should work, even if it doesn't have an experiment.

Comment on lines 373 to 388
failedExpMoveIds := []int32{-1}
for _, res := range expMoveResults {
if res.Error != nil {
failedExpMoveIds = append(failedExpMoveIds, res.ID)
}
}
var acceptedIDs []int32
if _, err = tx.NewUpdate().Table("runs").
Set("project_id = ?", req.DestinationProjectId).
Where("runs.id IN (?)", bun.In(validIDs)).
Where("runs.experiment_id NOT IN (?)", bun.In(failedExpMoveIds)).
Returning("runs.id").
Model(&acceptedIDs).
Exec(ctx); err != nil {
return fmt.Errorf("updating run's project IDs: %w", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

experiment.MoveExperiments should already set this so I don't think we need this update

Set("project_id = ?", destinationProjectID).

We might still need a way to map expMoveResult to the run API response but we might be able to use runChecks to convert expMoveResult to run IDs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a query to populate move results with failed experiment moves

}
}
if len(validIDs) > 0 {
err = db.Bun().RunInTx(ctx, nil, func(ctx context.Context, tx bun.Tx) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is only one query in this transaction so it doesn't really have a purpose and should remove it

experiment.MoveExperiment isn't using the transaction

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Comment on lines 283 to 315
var runChecks []archiveRunOKResult
getQ := db.Bun().NewSelect().
ModelTableExpr("runs AS r").
Model(&runChecks).
Column("r.id").
ColumnExpr("COALESCE((e.archived OR p.archived OR w.archived), FALSE) AS archived").
ColumnExpr("r.experiment_id as exp_id").
ColumnExpr("((SELECT COUNT(*) FROM runs r WHERE e.id = r.experiment_id) > 1) as is_multitrial").
Join("LEFT JOIN experiments e ON r.experiment_id=e.id").
Join("JOIN projects p ON r.project_id = p.id").
Join("JOIN workspaces w ON p.workspace_id = w.id").
Where("r.project_id = ?", req.SourceProjectId)

if req.Filter == nil {
getQ = getQ.Where("r.id IN (?)", bun.In(req.RunIds))
} else {
var efr experimentFilterRoot
err := json.Unmarshal([]byte(*req.Filter), &efr)
if err != nil {
return nil, err
}
getQ = getQ.WhereGroup(" AND ", func(q *bun.SelectQuery) *bun.SelectQuery {
_, err = efr.toSQL(q)
return q
}).WhereGroup(" AND ", func(q *bun.SelectQuery) *bun.SelectQuery {
if !efr.ShowArchived {
return q.Where(`e.archived = false`)
}
return q
})
if err != nil {
return nil, err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this should be a helper shared between this and searchruns

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added helper for the filter logic as that part was the exact same for both. Since we have different models and columns for each query I don't think a helper for the entire highlight code.

Comment on lines 256 to 280
exp1 := createTestExpWithProjectID(t, api, curUser, projectIDInt)
exp2 := createTestExpWithProjectID(t, api, curUser, projectIDInt)

task1 := &model.Task{TaskType: model.TaskTypeTrial, TaskID: model.NewTaskID()}
require.NoError(t, db.AddTask(ctx, task1))
require.NoError(t, db.AddTrial(ctx, &model.Trial{
State: model.PausedState,
ExperimentID: exp1.ID,
StartTime: time.Now(),
}, task1.TaskID))

task2 := &model.Task{TaskType: model.TaskTypeTrial, TaskID: model.NewTaskID()}
require.NoError(t, db.AddTask(ctx, task2))
require.NoError(t, db.AddTrial(ctx, &model.Trial{
State: model.PausedState,
ExperimentID: exp2.ID,
StartTime: time.Now(),
}, task2.TaskID))

req := &apiv1.SearchRunsRequest{
ProjectId: &sourceprojectID,
Sort: ptrs.Ptr("id=asc"),
}
resp, err := api.SearchRuns(ctx, req)
require.NoError(t, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can probably replace this with

	trial, _ := createTestTrial(t, api, curUser)

I don't think the test changes at all if you create the run in uncategorized and check that is moved to the workspace you created

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Comment on lines +327 to +343
exp := createTestExpWithProjectID(t, api, curUser, projectIDInt)

task1 := &model.Task{TaskType: model.TaskTypeTrial, TaskID: model.NewTaskID()}
require.NoError(t, db.AddTask(ctx, task1))
require.NoError(t, db.AddTrial(ctx, &model.Trial{
State: model.PausedState,
ExperimentID: exp.ID,
StartTime: time.Now(),
}, task1.TaskID))

task2 := &model.Task{TaskType: model.TaskTypeTrial, TaskID: model.NewTaskID()}
require.NoError(t, db.AddTask(ctx, task2))
require.NoError(t, db.AddTrial(ctx, &model.Trial{
State: model.PausedState,
ExperimentID: exp.ID,
StartTime: time.Now(),
}, task2.TaskID))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a lot of boilerplate setup in these tests. Can we make some helpers for creating multitrial experiments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@AmanuelAaron AmanuelAaron merged commit d6059e9 into main Apr 4, 2024
68 of 81 checks passed
@AmanuelAaron AmanuelAaron deleted the move-run branch April 4, 2024 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants