Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1/4] Eng 2711 Implement operator and artifact node views to capture structure information #1164

Merged
merged 19 commits into from
Apr 14, 2023

Conversation

likawind
Copy link
Contributor

@likawind likawind commented Apr 3, 2023

Describe your changes and why you are making these changes

Base on our API 2.0 Design, we will return up / downstream information when querying operator / artifacts. As a result, we are foreseeing the need of having a DB view that consolidate this inputs / outputs information so that we can save lots of application logic on map manipulation.

This PR introduces two views, operator_node and artifact_node that additionally add dag_id, inputs and outputs for op and artf objects. The idea is to use GROUP BY to aggregate all dag edges starting from / ends at op / artf to construct the list. In subsequent PR, we will consume these two views in our DB query library.

One thing to note is that since json_group_array has non-deterministic behavior on ordering, we explicitly capture the idx of each field and will push the sorting to go side.

Related issue number (if any)

ENG 2711

Tests (if any)

All new DB queries have unit tests. Performed manual QA with full UI build

Checklist before requesting a review

  • I have created a descriptive PR title. The PR title should complete the sentence "This PR...".
  • I have performed a self-review of my code.
  • I have included a small demo of the changes. For the UI, this would be a screenshot or a Loom video.
  • If this is a new feature, I have added unit tests and integration tests.
  • I have run the integration tests locally and they are passing.
  • I have run the linter script locally (See python3 scripts/run_linters.py -h for usage).
  • All features on the UI continue to work correctly.
  • Added one of the following CI labels:
    • run_integration_test: Runs integration tests
    • skip_integration_test: Skips integration tests (Should be used when changes are ONLY documentation/UI)

@likawind likawind added the skip_integration_test Skips required integration test (only documentation/UI changes) label Apr 3, 2023
@@ -0,0 +1,141 @@
package _000026_add_operator_artifact_node_views

const upSqliteScript = `
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Views in SQLite are just stored SQL statements, so there isn't any performance benefit of having precomputed values ready to go. I feel like it may be more flexible to just define this as a regular query inside of a method in one of our repos instead of defining a database view for it.

That way if we need to make changes to the query it can have no overhead as opposed to updating a view with another migration step.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, would you suggest by pre-running these CREATE VIEW while executing the queries?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, do you see any draw-back if this view creation is running multiple times?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove the CREATE VIEW part completely and just run the query, because SQLite won't update the view when the underlying tables change: https://www.sqlite.org/lang_createview.html.

This is different from something like Postgres which has support for updateable views, but we don't support Postgres right now, so we should optimize for Postgres when we do add support for it: https://www.percona.com/blog/postgresql-updatable-views-performing-schema-updates-with-minimal-downtime/#:~:text=PostgreSQL%20has%20introduced%20automatically%20updatable,as%20on%20a%20regular%20table.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main thing is running CREATE VIEW doesn't add any value for us right now, and running the view creation multiple times is extra overhead for SQLite, as opposed to just running the query.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! So should we just do WITH (%s) AS operator_nodes and put this in a helper?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can start from WITH edge_ordered AS... and keep everything else the same. I don't think you need to define operator_nodes since that is what you are returning

Copy link
Contributor Author

@likawind likawind Apr 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, @saurav-c this PR and #1165 should be ready!

@likawind likawind requested a review from saurav-c April 11, 2023 01:03
@@ -0,0 +1,59 @@
package views

const ArtifactNodeViewSubQuery = `
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment to the top of these files so describe what it does?

A little confused about what the json_group_array contains. For the outputs json_group_array, does value=all operators that take in this artifact as input and the inputs json_group_array take in all the operators that create the artifact as output? If so, shouldn't it just be one operator that generates the artifact?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah for artifact inputs we could make it unique. In general json_group_array operates on group by clause and put together elements that are 'grouped' as an json array. We make use of it to put together inputs / outputs list which were separate rows before group by.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, updated query with unique artf input

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

query changes will follow up in #1165

@likawind likawind changed the title [1/n] Eng 2711 Implement operator and artifact node views to capture structure information [1/4] Eng 2711 Implement operator and artifact node views to capture structure information Apr 12, 2023
@@ -0,0 +1,59 @@
package views

const ArtifactNodeViewSubQuery = `
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like us to only keep struct definitions in lib/models/views. Any SQL queries should go under the proper repos directory. Something like this should belong in the artifact repo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Would you mind reviewing #1165 and #1199 as well? Now that the stack is high and there will be lots of rebases, so I'd like to get all comments and address them all at once

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup reviewing rn

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done :)

@@ -15,6 +15,57 @@ import (
"github.com/google/uuid"
)

const ArtifactNodeViewSubQuery = `
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is public because linter complaints if private const is not used

@likawind likawind added the run_integration_test Triggers integration tests label Apr 14, 2023
…g-2711-m1-implement-node-routes-for-operator-1
@likawind likawind removed the skip_integration_test Skips required integration test (only documentation/UI changes) label Apr 14, 2023
…g-2711-m1-implement-node-routes-for-operator-1
@likawind likawind merged commit d1d84cb into main Apr 14, 2023
@vsreekanti vsreekanti deleted the eng-2711-m1-implement-node-routes-for-operator-1 branch April 18, 2023 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run_integration_test Triggers integration tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants