feat: Create MoveRun endpoint #9001

AmanuelAaron · 2024-03-14T18:40:21Z

Description

Creating a MoveRun endpoint to move runs from one project to another. Input can be either a list of run ids or a filter.

Test Plan

Integration tests should pass

Checklist

Changes have been manually QA'd
User-facing API changes need the "User-facing API Change" label.
Release notes should be added as a separate file under docs/release-notes/.
See Release Note for details.
Licenses should be included for new code which was copied and/or modified from any external code.

Ticket

ET-67

netlify · 2024-03-14T18:40:41Z

✅ Deploy Preview for determined-ui canceled.

Name	Link
🔨 Latest commit	`54b58de`
🔍 Latest deploy log	https://app.netlify.com/sites/determined-ui/deploys/660efa6d716dca000857a716

codecov · 2024-03-20T17:04:56Z

Codecov Report

Attention: Patch coverage is 62.28571% with 66 lines in your changes are missing coverage. Please review.

Project coverage is 46.83%. Comparing base (f03a8a8) to head (54b58de).
Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #9001      +/-   ##
==========================================
+ Coverage   46.82%   46.83%   +0.01%     
==========================================
  Files        1156     1156              
  Lines      142447   142611     +164     
  Branches     2387     2387              
==========================================
+ Hits        66698    66796      +98     
- Misses      75559    75625      +66     
  Partials      190      190

Flag	Coverage Δ
backend	`43.29% <74.59%> (+0.07%)`	⬆️
harness	`64.03% <33.96%> (-0.05%)`	⬇️
web	`38.16% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
master/internal/api_runs.go	`81.38% <74.59%> (-5.12%)`	⬇️
harness/determined/common/api/bindings.py	`39.98% <33.96%> (-0.03%)`	⬇️

... and 2 files with indirect coverage changes

corban-beaird

Looks good! 🚀

master/internal/api_runs.go

ashtonG · 2024-03-25T17:16:11Z

master/internal/api_runs.go

+	err = db.Bun().NewUpdate().Table("runs").
+		Set("experiment_id = ?", cloneExpID).
+		Where("id = ?", runID).
+		Returning("runs.id").
+		Model(&acceptedIDs).
+		Scan(ctx)


this needs to clone the run as well. this is to avoid situations where an experiment has no runs.

Updated to clone run and set to terminal state in destination project

NicholasBlaskey

Left two main major blocking review comments

NicholasBlaskey · 2024-04-01T14:51:02Z

proto/src/determined/api/v1/run.proto

+  // The ids of the runs being moved.
+  repeated int32 run_ids = 1;
+  // The id of the current parent project.
+  int32 source_project_id = 2;


do we need this source_project_id?

Move experiment doesn't have this constraint that all runs must be from the same project.

source_project_id is used to cover the case where a user tries to move a run that exists but doesn't exist in the source project.

From an API perspective I don't see the issue moving runs from different projects.

I'm not understanding when that case would come up and the reason we need to prevent this case.

A scenario could be: If user 1 moves a run from project A to project B and user 2 (who's table hasn't refreshed yet, and does not have permissions on project B) tries to move the run from project A to project C.

In that case they would fail the permission check for delete / view on that run independent of any source_project_id

I'm curious why MoveExperiment doesn't need it and this does?

Another reason why we would need the source project id is for the filter. MoveExperiments uses BulkFilter which limits how it can be used by frontend. We want to have the same filtering that is available to SearchRuns.

NicholasBlaskey · 2024-04-01T14:57:53Z

master/internal/api_runs.go

+	getQ := db.Bun().NewSelect().
+		ModelTableExpr("runs AS r").
+		Model(&runChecks).
+		Column("r.id").
+		ColumnExpr("COALESCE((e.archived OR p.archived OR w.archived), FALSE) AS archived").
+		ColumnExpr("r.experiment_id as exp_id").
+		ColumnExpr("((SELECT COUNT(*) FROM runs r WHERE e.id = r.experiment_id) > 1) as is_multitrial").
+		Join("LEFT JOIN experiments e ON r.experiment_id=e.id").
+		Join("JOIN projects p ON r.project_id = p.id").
+		Join("JOIN workspaces w ON p.workspace_id = w.id").
+		Where("r.project_id = ?", req.SourceProjectId)
+
+	if req.Filter == nil {
+		getQ = getQ.Where("r.id IN (?)", bun.In(req.RunIds))
+	} else {
+		var efr experimentFilterRoot
+		err := json.Unmarshal([]byte(*req.Filter), &efr)
+		if err != nil {
+			return nil, err
+		}
+		getQ = getQ.WhereGroup(" AND ", func(q *bun.SelectQuery) *bun.SelectQuery {
+			_, err = efr.toSQL(q)
+			return q
+		}).WhereGroup(" AND ", func(q *bun.SelectQuery) *bun.SelectQuery {
+			if !efr.ShowArchived {
+				return q.Where(`e.archived = false`)
+			}
+			return q
+		})
+		if err != nil {
+			return nil, err
+		}
+	}
+
+	err = getQ.Scan(ctx)
+	if err != nil {
+		return nil, err
+	}


can we just call the search runs API from here?

We are adding a permissions check in our query that does not exist in search runs.

if getQ, err = experiment.AuthZProvider.Get().FilterExperimentsQuery(ctx, *curUser, nil, getQ, []rbacv1.PermissionType{ rbacv1.PermissionType_PERMISSION_TYPE_VIEW_EXPERIMENT_METADATA, rbacv1.PermissionType_PERMISSION_TYPE_DELETE_EXPERIMENT, }); err != nil { return nil, err }

NicholasBlaskey · 2024-04-01T15:06:27Z

master/internal/api_runs.go

+	}
+
+	var results []*apiv1.RunActionResult
+	var visibleIDs []int32


might be more natural for visibileIDs/validIDs to be master/pkg/set or map[int32]bool

I updated visibleIDs as we are doing a Contains check. However validIds is simply used in a bun.In() and is better off left as a slice

master/internal/api_runs.go

NicholasBlaskey · 2024-04-01T15:10:38Z

master/internal/api_runs.go

+		}
+	}
+	if len(validIDs) > 0 {
+		tx, err := db.Bun().BeginTx(ctx, nil)


use db.Bun().RunInTx(ctx, nil, func(ctx context.Context, tx bun.Tx) error instead so we don't have to worry about all this defer and logging errors if the transcation can't commit / rollback

NicholasBlaskey · 2024-04-01T15:14:33Z

master/internal/api_runs.go

+	if err != nil {
+		// Delete cloned experiment if run cloning fails
+		err = db.Bun().NewDelete().Table("experiments").Where("id = ?", cloneExpID).Scan(ctx)
+		return err


we could swallow the error from clone run into dest project if the delete on the experiments table succeeded.

Callers would think we did create a new experiment when we didn't

updated to show both error on delete error

NicholasBlaskey · 2024-04-01T15:14:42Z

master/internal/api_runs.go

+	var cloneExpID int32
+
+	// clone experiment into dest project
+	err := db.Bun().NewRaw(`INSERT INTO experiments(state, config, model_definition, start_time, end_time, archived,


can we use a transaction here and the insert run

NicholasBlaskey · 2024-04-01T15:34:02Z

master/internal/api_runs.go

+
+	var cloneRunID int32
+	// clone run into dest project
+	err = db.Bun().NewRaw(`INSERT INTO runs(experiment_id, state, start_time, end_time, hparams, warm_start_checkpoint_id,


blocking: this clone doesn't include metrics or checkpoints or tasks right?

Does anything in the ui work? I wouldn't expect logs or charts to be populated on this cloned run

This just copies the row that exists in the run table with a new id. This simply allows a user to view the run in a different project in the Run Table. The original run and associated metrics/checkpoints/tasks are unchanged and no new ones are made

I haven't used the runs table but I'm assuming there is going to be a link to the runs detail page or the experiment page?

This experiment / run detail page won't be complete because of the shallow copy?

After discussion, we have decided to pause on implementing the clone feature. For now we can either skip moving the multi-trial or move the entire experiment.

NicholasBlaskey · 2024-04-01T15:37:17Z

master/internal/api_runs.go

+		return nil, err
+	}
+	// check that user can view source project
+	srcProject, err := a.GetProjectByID(ctx, req.SourceProjectId, *curUser)


blocking: do we check that the user can delete the run?

MoveExperiments does the check here

determined/master/internal/experiment/bulk_action.go

Line 654 in b726cf9

rbacv1.PermissionType_PERMISSION_TYPE_DELETE_EXPERIMENT,

Added permission check below

NicholasBlaskey

lgtm

NicholasBlaskey · 2024-04-04T13:57:06Z

master/internal/api_runs.go

+		}
+	}
+
+	if getQ, err = experiment.AuthZProvider.Get().FilterExperimentsQuery(ctx, *curUser, nil, getQ,


what about runs not tied to experiments? Maybe make a ticket for when we support that this will need to be updated

From my understanding the implementation of FilterExperimentsQuery filters on all the workspaces in which the current user has the the provided permission type. So if the run is in a project that is in one of those workspaces it should work, even if it doesn't have an experiment.

NicholasBlaskey · 2024-04-04T14:02:16Z

master/internal/api_runs.go

+			failedExpMoveIds := []int32{-1}
+			for _, res := range expMoveResults {
+				if res.Error != nil {
+					failedExpMoveIds = append(failedExpMoveIds, res.ID)
+				}
+			}
+			var acceptedIDs []int32
+			if _, err = tx.NewUpdate().Table("runs").
+				Set("project_id = ?", req.DestinationProjectId).
+				Where("runs.id IN (?)", bun.In(validIDs)).
+				Where("runs.experiment_id NOT IN (?)", bun.In(failedExpMoveIds)).
+				Returning("runs.id").
+				Model(&acceptedIDs).
+				Exec(ctx); err != nil {
+				return fmt.Errorf("updating run's project IDs: %w", err)
+			}


experiment.MoveExperiments should already set this so I don't think we need this update

determined/master/internal/experiment/bulk_action.go

Line 721 in bc8c31c

Set("project_id = ?", destinationProjectID).

We might still need a way to map expMoveResult to the run API response but we might be able to use runChecks to convert expMoveResult to run IDs

added a query to populate move results with failed experiment moves

NicholasBlaskey · 2024-04-04T14:02:54Z

master/internal/api_runs.go

+		}
+	}
+	if len(validIDs) > 0 {
+		err = db.Bun().RunInTx(ctx, nil, func(ctx context.Context, tx bun.Tx) error {


There is only one query in this transaction so it doesn't really have a purpose and should remove it

experiment.MoveExperiment isn't using the transaction

NicholasBlaskey · 2024-04-04T14:07:57Z

master/internal/api_runs.go

+	var runChecks []archiveRunOKResult
+	getQ := db.Bun().NewSelect().
+		ModelTableExpr("runs AS r").
+		Model(&runChecks).
+		Column("r.id").
+		ColumnExpr("COALESCE((e.archived OR p.archived OR w.archived), FALSE) AS archived").
+		ColumnExpr("r.experiment_id as exp_id").
+		ColumnExpr("((SELECT COUNT(*) FROM runs r WHERE e.id = r.experiment_id) > 1) as is_multitrial").
+		Join("LEFT JOIN experiments e ON r.experiment_id=e.id").
+		Join("JOIN projects p ON r.project_id = p.id").
+		Join("JOIN workspaces w ON p.workspace_id = w.id").
+		Where("r.project_id = ?", req.SourceProjectId)
+
+	if req.Filter == nil {
+		getQ = getQ.Where("r.id IN (?)", bun.In(req.RunIds))
+	} else {
+		var efr experimentFilterRoot
+		err := json.Unmarshal([]byte(*req.Filter), &efr)
+		if err != nil {
+			return nil, err
+		}
+		getQ = getQ.WhereGroup(" AND ", func(q *bun.SelectQuery) *bun.SelectQuery {
+			_, err = efr.toSQL(q)
+			return q
+		}).WhereGroup(" AND ", func(q *bun.SelectQuery) *bun.SelectQuery {
+			if !efr.ShowArchived {
+				return q.Where(`e.archived = false`)
+			}
+			return q
+		})
+		if err != nil {
+			return nil, err
+		}


maybe this should be a helper shared between this and searchruns

Added helper for the filter logic as that part was the exact same for both. Since we have different models and columns for each query I don't think a helper for the entire highlight code.

NicholasBlaskey · 2024-04-04T14:10:33Z

master/internal/api_runs_intg_test.go

+	exp1 := createTestExpWithProjectID(t, api, curUser, projectIDInt)
+	exp2 := createTestExpWithProjectID(t, api, curUser, projectIDInt)
+
+	task1 := &model.Task{TaskType: model.TaskTypeTrial, TaskID: model.NewTaskID()}
+	require.NoError(t, db.AddTask(ctx, task1))
+	require.NoError(t, db.AddTrial(ctx, &model.Trial{
+		State:        model.PausedState,
+		ExperimentID: exp1.ID,
+		StartTime:    time.Now(),
+	}, task1.TaskID))
+
+	task2 := &model.Task{TaskType: model.TaskTypeTrial, TaskID: model.NewTaskID()}
+	require.NoError(t, db.AddTask(ctx, task2))
+	require.NoError(t, db.AddTrial(ctx, &model.Trial{
+		State:        model.PausedState,
+		ExperimentID: exp2.ID,
+		StartTime:    time.Now(),
+	}, task2.TaskID))
+
+	req := &apiv1.SearchRunsRequest{
+		ProjectId: &sourceprojectID,
+		Sort:      ptrs.Ptr("id=asc"),
+	}
+	resp, err := api.SearchRuns(ctx, req)
+	require.NoError(t, err)


can probably replace this with

trial, _ := createTestTrial(t, api, curUser)

I don't think the test changes at all if you create the run in uncategorized and check that is moved to the workspace you created

NicholasBlaskey · 2024-04-04T14:12:16Z

master/internal/api_runs_intg_test.go

+	exp := createTestExpWithProjectID(t, api, curUser, projectIDInt)
+
+	task1 := &model.Task{TaskType: model.TaskTypeTrial, TaskID: model.NewTaskID()}
+	require.NoError(t, db.AddTask(ctx, task1))
+	require.NoError(t, db.AddTrial(ctx, &model.Trial{
+		State:        model.PausedState,
+		ExperimentID: exp.ID,
+		StartTime:    time.Now(),
+	}, task1.TaskID))
+
+	task2 := &model.Task{TaskType: model.TaskTypeTrial, TaskID: model.NewTaskID()}
+	require.NoError(t, db.AddTask(ctx, task2))
+	require.NoError(t, db.AddTrial(ctx, &model.Trial{
+		State:        model.PausedState,
+		ExperimentID: exp.ID,
+		StartTime:    time.Now(),
+	}, task2.TaskID))


there is a lot of boilerplate setup in these tests. Can we make some helpers for creating multitrial experiments?

create move endpoint

50c2465

cla-bot bot added the cla-signed label Mar 14, 2024

AmanuelAaron added 5 commits March 15, 2024 13:53

add validation check

ab7c000

move run

79b4e71

Merge branch 'main' into move-run

4f859d4

fix after merge

15555c1

add tests and filtering

b8f1221

AmanuelAaron requested review from ashtonG, stoksc and corban-beaird March 20, 2024 17:01

AmanuelAaron marked this pull request as ready for review March 20, 2024 17:03

AmanuelAaron requested a review from a team as a code owner March 20, 2024 17:03

corban-beaird approved these changes Mar 22, 2024

View reviewed changes

master/internal/api_runs.go Outdated Show resolved Hide resolved

AmanuelAaron added 3 commits March 25, 2024 10:49

experiment move logic

7291f8d

remove user not found check

bed0735

update test

c5e7d0e

corban-beaird self-requested a review March 25, 2024 15:33

option to move multitrial

d233d30

ashtonG reviewed Mar 25, 2024

View reviewed changes

AmanuelAaron added 4 commits March 25, 2024 13:44

removed deleted exp columns

097d578

fix test equal order

c8d2d85

clone run

caaa063

don't change state if in terminal state for run cloning

540a82c

AmanuelAaron requested a review from NicholasBlaskey April 1, 2024 14:14

NicholasBlaskey reviewed Apr 1, 2024

View reviewed changes

AmanuelAaron added 3 commits April 1, 2024 13:35

small fixes

1d2cc1f

add workspace perimission check

60fd350

use set

9674de7

AmanuelAaron added 3 commits April 2, 2024 10:41

transaction + fixes

a48113e

use tx

69adabe

skip run

f21c3b1

NicholasBlaskey approved these changes Apr 4, 2024

View reviewed changes

AmanuelAaron added 6 commits April 4, 2024 11:56

fix tests

2b78ee1

fixes

82ce413

Merge branch 'main' into move-run

937aef9

updated api build

093e623

update -> select

bb70979

fix select

54b58de

AmanuelAaron merged commit d6059e9 into main Apr 4, 2024
68 of 81 checks passed

AmanuelAaron deleted the move-run branch April 4, 2024 19:57

feat: Create MoveRun endpoint #9001

feat: Create MoveRun endpoint #9001

Conversation

AmanuelAaron commented Mar 14, 2024 • edited Loading

Description

Test Plan

Checklist

Ticket

netlify bot commented Mar 14, 2024 • edited Loading

✅ Deploy Preview for determined-ui canceled.

codecov bot commented Mar 20, 2024 • edited Loading

Codecov Report

corban-beaird left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NicholasBlaskey left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NicholasBlaskey left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AmanuelAaron commented Mar 14, 2024 •

edited

Loading

netlify bot commented Mar 14, 2024 •

edited

Loading

codecov bot commented Mar 20, 2024 •

edited

Loading