-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[1/4] Eng 2711 Implement operator and artifact node views to capture structure information #1164
[1/4] Eng 2711 Implement operator and artifact node views to capture structure information #1164
Conversation
…g-2698-m1-implement-v2-api-framework
…g-2711-m1-implement-node-routes-for-operator-1
@@ -0,0 +1,141 @@ | |||
package _000026_add_operator_artifact_node_views | |||
|
|||
const upSqliteScript = ` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Views in SQLite are just stored SQL statements, so there isn't any performance benefit of having precomputed values ready to go. I feel like it may be more flexible to just define this as a regular query inside of a method in one of our repos instead of defining a database view for it.
That way if we need to make changes to the query it can have no overhead as opposed to updating a view with another migration step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, would you suggest by pre-running these CREATE VIEW
while executing the queries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, do you see any draw-back if this view creation is running multiple times?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove the CREATE VIEW
part completely and just run the query, because SQLite won't update the view when the underlying tables change: https://www.sqlite.org/lang_createview.html.
This is different from something like Postgres which has support for updateable views, but we don't support Postgres right now, so we should optimize for Postgres when we do add support for it: https://www.percona.com/blog/postgresql-updatable-views-performing-schema-updates-with-minimal-downtime/#:~:text=PostgreSQL%20has%20introduced%20automatically%20updatable,as%20on%20a%20regular%20table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the main thing is running CREATE VIEW
doesn't add any value for us right now, and running the view creation multiple times is extra overhead for SQLite, as opposed to just running the query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! So should we just do WITH (%s) AS operator_nodes
and put this in a helper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can start from WITH edge_ordered AS...
and keep everything else the same. I don't think you need to define operator_nodes
since that is what you are returning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,59 @@ | |||
package views | |||
|
|||
const ArtifactNodeViewSubQuery = ` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment to the top of these files so describe what it does?
A little confused about what the json_group_array contains. For the outputs json_group_array, does value=all operators that take in this artifact as input and the inputs json_group_array take in all the operators that create the artifact as output? If so, shouldn't it just be one operator that generates the artifact?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah for artifact inputs we could make it unique. In general json_group_array
operates on group by
clause and put together elements that are 'grouped' as an json array. We make use of it to put together inputs / outputs list which were separate rows before group by
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, updated query with unique artf input
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
query changes will follow up in #1165
@@ -0,0 +1,59 @@ | |||
package views | |||
|
|||
const ArtifactNodeViewSubQuery = ` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like us to only keep struct definitions in lib/models/views
. Any SQL queries should go under the proper repos
directory. Something like this should belong in the artifact
repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup reviewing rn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done :)
@@ -15,6 +15,57 @@ import ( | |||
"github.com/google/uuid" | |||
) | |||
|
|||
const ArtifactNodeViewSubQuery = ` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is public because linter complaints if private const is not used
…g-2711-m1-implement-node-routes-for-operator-1
…g-2711-m1-implement-node-routes-for-operator-1
Describe your changes and why you are making these changes
Base on our API 2.0 Design, we will return up / downstream information when querying operator / artifacts. As a result, we are foreseeing the need of having a DB view that consolidate this inputs / outputs information so that we can save lots of application logic on map manipulation.
This PR introduces two views,
operator_node
andartifact_node
that additionally adddag_id
,inputs
andoutputs
for op and artf objects. The idea is to useGROUP BY
to aggregate all dag edges starting from / ends at op / artf to construct the list. In subsequent PR, we will consume these two views in our DB query library.One thing to note is that since
json_group_array
has non-deterministic behavior on ordering, we explicitly capture theidx
of each field and will push the sorting to go side.Related issue number (if any)
ENG 2711
Tests (if any)
All new DB queries have unit tests. Performed manual QA with full UI build
Checklist before requesting a review
python3 scripts/run_linters.py -h
for usage).run_integration_test
: Runs integration testsskip_integration_test
: Skips integration tests (Should be used when changes are ONLY documentation/UI)