Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sketches Schema UI view #1015

Merged
merged 6 commits into from
Jul 16, 2024
Merged

Sketches Schema UI view #1015

merged 6 commits into from
Jul 16, 2024

Conversation

skrawcz
Copy link
Collaborator

@skrawcz skrawcz commented Jul 4, 2024

Original description follows. This has been fixed up to work correctly.

This is hacky UI code. It does not handle "run ids".

  1. Adds to pandas & pyspark schema capture using h_schema.py.
  2. Adds SchemaView and related types to try to show a table...

So it's either we've set something up with too many things, or this is just a factor of using typescript... Basically I dont like how many things I had to add to do this..

Otherwise it's non-obvious to me what the pattern should be to handle comparing runs -- this UI change for the schema table doesn't take that into account -- it should probably use Generic Table but I couldn't wrap my head around it.

Screen Shot 2024-07-04 at 3 18 26 PM
Screen Shot 2024-07-04 at 3 17 54 PM
Screen Shot 2024-07-04 at 3 26 52 PM

Changes

  • SDK
  • DataObservabilityView

How I tested this

  • locally

Notes

  • need help to handle multiple run Ids and styling it appropriately.
  • we might not need to use h_schema since I kind of already have some of this... we should align on what the "schema" view actually takes in too.

Checklist

  • PR has an informative and human-readable title (this will be pulled into the release notes)
  • Changes are limited to a single goal (no scope creep)
  • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future TODOs are captured in comments
  • Project documentation has been updated if adding/changing functionality.

@skrawcz skrawcz marked this pull request as draft July 4, 2024 22:34
@skrawcz
Copy link
Collaborator Author

skrawcz commented Jul 4, 2024

To complete this:

  • decide on how to create schema stuff — do we use pyarrow stuff or not… (it would mean a dependency on pyarrow is required)
  • redo the table view to enable comparing multiple run IDs.
  • fix SDK tests

@skrawcz skrawcz added UI Related to the Hamilton UI SDK Related to hamilton SDK for th UI labels Jul 4, 2024
skrawcz and others added 2 commits July 11, 2024 16:17
This is hacky UI code. It does not handle "run ids".

1. Adds to pandas & pyspark schema capture using `h_schema.py`.
2. Adds SchemaView and related types to try to show a table...

So it's either we've set something up with too many things, or this
is just a factor of using typescript... Basically I dont like how many
things I had to add to do this..

Otherwise it's non-obvious to me what the pattern should be to
handle comparing runs -- this UI change for the schema table
doesn't take that into account -- it should probably use Generic Table
but I couldn't wrap my head around it.

Otherwise open question whether we reuse the h_schema pyarrow
stuff, or just roll our own again...
This will make it look like the others, make it expandable, etc...
This allows us to hide attributes now that we have multiple
@elijahbenizzy
Copy link
Collaborator

Fixed up schema view:
image

Added expand/contract now that we have multiple:
image

We now have three categories:
1. Result summaries -- we can output one and if not it will be
   unsupported in the UI
2. Schema -- we can output one or none
3. Additional -- as many as we want

We use a single dispatch function for each, and it makes the code a lot
cleaner. We no longer put stuff in lists unless its an additional
result. Note they can also supply their own name, if not, we will
generate a unique attribute name.

In the long term we'll likely want to stop using single dispatch as we
need to register multiple with roles (and single dispatch requires one
function per role). For now this is clean enough and easy to work with,
however.
Comment on lines 150 to 189
// dataTypeDisplay={(item: string) => {
// return (
// <RunLink
// projectId={props.projectId}
// runId={parseInt(item) as number}
// setHighlightedRun={() => void 0}
// highlightedRun={null}
// ></RunLink>
// );
// }}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commented out?

Comment on lines 92 to 108
{
"observability_type": "primitive",
"observability_value": {
"type": str(str),
"value": o_value["cost_explain"],
},
"observability_schema_version": "0.0.1",
"name": "Cost Explain",
},
{
"observability_type": "primitive",
"observability_value": {
"type": str(str),
"value": o_value["extended_explain"],
},
"observability_schema_version": "0.0.1",
"name": "Extended Explain",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see these added to additional...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I just stuck in the o_value there cause I figured it was easier. Doesn't allow us to compare, though, so it might be nice to have. Will add it to additional

@elijahbenizzy elijahbenizzy force-pushed the schema_viz branch 2 times, most recently from 1c142f9 to 32b1ef1 Compare July 13, 2024 17:05
This just treats all primitives as strings, which can show the view.
This works with strings and will actually just look fine otherwise. It
uses the ReactDiffViewer component which is simple, and we use elsewhere
in the app.

We can probably tune this a bit (the interaction is a little jumpy), but
for now this is OK.
We had captured all of them. Now we capture individual data as well,
which allows for easy comparison. It's duplicated, so we use an
lru_tools cache (which should cache based on the pyspark dataframe ID)
@elijahbenizzy elijahbenizzy marked this pull request as ready for review July 16, 2024 18:39
@elijahbenizzy elijahbenizzy self-requested a review July 16, 2024 19:55
Copy link
Collaborator

@elijahbenizzy elijahbenizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been fixed up + approved

Comment on lines 92 to 108
{
"observability_type": "primitive",
"observability_value": {
"type": str(str),
"value": o_value["cost_explain"],
},
"observability_schema_version": "0.0.1",
"name": "Cost Explain",
},
{
"observability_type": "primitive",
"observability_value": {
"type": str(str),
"value": o_value["extended_explain"],
},
"observability_schema_version": "0.0.1",
"name": "Extended Explain",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I just stuck in the o_value there cause I figured it was easier. Doesn't allow us to compare, though, so it might be nice to have. Will add it to additional

@elijahbenizzy elijahbenizzy merged commit 7a3dee8 into main Jul 16, 2024
28 checks passed
@elijahbenizzy elijahbenizzy deleted the schema_viz branch July 16, 2024 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
SDK Related to hamilton SDK for th UI UI Related to the Hamilton UI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants