Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track the execution state of every integration connection on the backend #1220

Merged
merged 2 commits into from
Apr 20, 2023

Conversation

kenxu95
Copy link
Contributor

@kenxu95 kenxu95 commented Apr 17, 2023

@likawind as lead reviewer. @hsubbaraj-spiral as secondary to maybe help answer Qs. The meat of the changes are in connect_integration.go.

Describe your changes and why you are making these changes

Conda and Lambda integrations already manage the integration lifecycle from registration, setup, to completion. This PR makes the same pattern available for all integrations. This means that when we register an integration:

  1. A "exec_state" key in the pending state is always written to the initial integration table entry.
  2. This integration table entry will be updated to the appropriate terminal state. For all non-conda and non-lambda integrations, this happens synchronously.

The list_integrations endpoint now returns the execution state instead of the validated flag. The validated column is dropped from the integrations table, as it is no longer useful. To keep this PR well scoped, we keep the data model of everything else (eg. the createdAt column).

This is what the database now looks like (after a couple successful connections and a Lambda integration that failed to setup):
image

This is what the describe() and list_integrations() SDK methods now look like:
image

image

As an immediate follow-up, I error on the SDK if we attempt to preview or publish against any integration that has not been successfully registered. Note that this change is purely on the SDK and Backend, and has no affect on our contract with the UI (failed integration cards will still show up normally for now).

Related issue number (if any)

ENG-2782

Loom demo (if any)

Checklist before requesting a review

  • I have created a descriptive PR title. The PR title should complete the sentence "This PR...".
  • I have performed a self-review of my code.
  • I have included a small demo of the changes. For the UI, this would be a screenshot or a Loom video.
  • If this is a new feature, I have added unit tests and integration tests.
  • I have run the integration tests locally and they are passing.
  • I have run the linter script locally (See python3 scripts/run_linters.py -h for usage).
  • All features on the UI continue to work correctly.
  • Added one of the following CI labels:
    • run_integration_test: Runs integration tests
    • skip_integration_test: Skips integration tests (Should be used when changes are ONLY documentation/UI)

// TODO(ENG-2523): move base conda env creation outside of ConnectIntegration.
if args.Service == shared.Conda {
condaDB, err := database.NewDatabase(DB.Config())
Copy link
Contributor Author

@kenxu95 kenxu95 Apr 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@likawind @hsubbaraj-spiral why do we define a new database here (and in lambda)? I copied this from existing code. I would like to document the reason here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You couldn't pass the DB object to a separate go routine, otherwise it will error since some underlying connection is lost since you lost the parent scope that creates the DB object. Here, you'd need to call this NewDatabase as a part of go func() to ensure you initialize the DB and use it within the same scope.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both testing and editing an integration should update the exec_state. We can track the source of the change to so we can display "Tested on" vs "Edited on" vs "Created On" timestamps. What do ya'll think? cc: @vsreekanti This ofc means that testing an integration and finding out that it is failing may cause SDK operations that are using that integration to fail. But tbh that seems like proper behavior, since execution is going to fail anyways (it'll just fail earlier now).

Maybe "Last Verified" would make sense here. Or even have "Last Used" and make an entry to the table each time the integration is invoked. That would give a sense of stability even without using the testing integration button.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the "Last Used" idea, yeah. By making an entry to the table each time, you mean inserting an entry into the integrations table?

Copy link
Contributor

@likawind likawind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good! Have you verified with conda or lambda to ensure we have no regression in async cases?

Blocking mainly for the part initializing DB in go routines. I have a few other comments, which can be addressed as follow ups:

  • Why don't we add a new column to track exec state if it applies to all integrations? This reduce the need to deserialize the values
  • I don't have impression that you can drop column in sqlite, but if tests are passing it's probably fine.
  • I assume these information will show up in UI in the new project. If they do show up, what do we do if the user click 'test connect integration'? Should the exec state be updated if the test-connect failed?

// TODO(ENG-2523): move base conda env creation outside of ConnectIntegration.
if args.Service == shared.Conda {
condaDB, err := database.NewDatabase(DB.Config())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You couldn't pass the DB object to a separate go routine, otherwise it will error since some underlying connection is lost since you lost the parent scope that creates the DB object. Here, you'd need to call this NewDatabase as a part of go func() to ensure you initialize the DB and use it within the same scope.

src/golang/cmd/server/handler/connect_integration.go Outdated Show resolved Hide resolved
return
condaErr := setupCondaAsync(integrationRepo, integrationObject.ID, publicConfig, condaDB)
if condaErr != nil {
log.Errorf("Conda setup failed: %v", condaErr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where do we handle updating failure exec state here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inside setupCondaAsync()

// the that entry to reflect the failure.
defer func() {
if err != nil {
execution_state.UpdateOnFailure(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm my understanding - this is only useful for sync connections?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I'll make a note of that.

Copy link
Contributor Author

@kenxu95 kenxu95 Apr 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually no that's not true. It's only useful for the synchronous connection part for all integrations. Even lambda and conda have additional setup that is done after this. I think since the goroutines are the last thing that is done in this method, we can be assured that there are no update races between synch and async. I'll document that.

@kenxu95
Copy link
Contributor Author

kenxu95 commented Apr 17, 2023

Why don't we add a new column to track exec state if it applies to all integrations? This reduce the need to deserialize the values.

I initially made a separate table for connections called integration_connections. Then I realized that we only needed the exec_state of the latest connection, so then I did what you're suggesting and made a new exec_state column. Then I saw all the code for conda and lambda and realized it would take quite a bit of effort to get everything backfilled correctly and working the same way with all our integrations, so I took the easier route of reusing the execution state helpers and the config column for now. I can make a task for migrating this column to it's own exec_state as a follow up - but I don't see it as completely necessary right now.

I don't have impression that you can drop column in sqlite, but if tests are passing it's probably fine.

I think you can, we have other migrations that do this.

I assume these information will show up in UI in the new project. If they do show up, what do we do if the user click 'test connect integration'? Should the exec state be updated if the test-connect failed?

I think both testing and editing an integration should update the exec_state. We can track the source of the change to so we can display "Tested on" vs "Edited on" vs "Created On" timestamps. What do ya'll think? cc: @vsreekanti This ofc means that testing an integration and finding out that it is failing may cause SDK operations that are using that integration to fail. But tbh that seems like proper behavior, since execution is going to fail anyways (it'll just fail earlier now).

@kenxu95 kenxu95 requested a review from likawind April 17, 2023 23:28
@kenxu95
Copy link
Contributor Author

kenxu95 commented Apr 17, 2023

@likawind addressed the database thing. Good to know! I haven't fully tested it E2E but will do that soon.

Copy link
Contributor

@likawind likawind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR LGTM!

@likawind
Copy link
Contributor

If there's too much work it make sense to keep exec_state in the dict.

Generally agree with your proposals on test-connection. Following that, should we update integrations to failure state if we detect any such connection failed during operator execution?

@kenxu95 kenxu95 force-pushed the eng-2782-track-the-status-of-a-resource branch from bbd06a7 to b4d3694 Compare April 18, 2023 15:41
some fixes

shoot. Maybe it's better to keep it two tables

moved it to an integration_connection table, because of editing

the migration works. Now to worry about the read and write paths

added the drop column one

ok now I'm confused. Unwinding back to a single table approach.

there are so many thing swrong with this but at least I see a path through now.

made some progress. Write path internals in a good state

unwound a bunch of stuff

prune all the stuff that is unnecessary now

Moved the transaction back inside

reverted a bunch of stuff back, still need too drop validated column

added validated column drop. Ready for testing now.

remove dsdk changes

last things, lint and block usage of broken integrations.

lint

done

small formatting fixes

fixed gh actions

made exec_state optional on SDK

cr

fixed err

removed elem

update backend
@kenxu95 kenxu95 force-pushed the eng-2782-track-the-status-of-a-resource branch from f516f10 to fb1047e Compare April 19, 2023 15:33
@kenxu95 kenxu95 added the run_integration_test Triggers integration tests label Apr 19, 2023
@kenxu95 kenxu95 merged commit 1bcf1fc into main Apr 20, 2023
@kenxu95 kenxu95 deleted the eng-2782-track-the-status-of-a-resource branch April 20, 2023 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run_integration_test Triggers integration tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants