Sketches Schema UI view #1015

skrawcz · 2024-07-04T22:34:33Z

Original description follows. This has been fixed up to work correctly.

This is hacky UI code. It does not handle "run ids".

Adds to pandas & pyspark schema capture using h_schema.py.
Adds SchemaView and related types to try to show a table...

So it's either we've set something up with too many things, or this is just a factor of using typescript... Basically I dont like how many things I had to add to do this..

Otherwise it's non-obvious to me what the pattern should be to handle comparing runs -- this UI change for the schema table doesn't take that into account -- it should probably use Generic Table but I couldn't wrap my head around it.

Changes

SDK
DataObservabilityView

How I tested this

locally

Notes

need help to handle multiple run Ids and styling it appropriately.
we might not need to use h_schema since I kind of already have some of this... we should align on what the "schema" view actually takes in too.

Checklist

PR has an informative and human-readable title (this will be pulled into the release notes)
Changes are limited to a single goal (no scope creep)
Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
Any change in functionality is tested
New functions are documented (with a description, list of inputs, and expected output)
Placeholder code is flagged / future TODOs are captured in comments
Project documentation has been updated if adding/changing functionality.

skrawcz · 2024-07-04T22:37:30Z

To complete this:

decide on how to create schema stuff — do we use pyarrow stuff or not… (it would mean a dependency on pyarrow is required)
redo the table view to enable comparing multiple run IDs.
fix SDK tests

This is hacky UI code. It does not handle "run ids". 1. Adds to pandas & pyspark schema capture using `h_schema.py`. 2. Adds SchemaView and related types to try to show a table... So it's either we've set something up with too many things, or this is just a factor of using typescript... Basically I dont like how many things I had to add to do this.. Otherwise it's non-obvious to me what the pattern should be to handle comparing runs -- this UI change for the schema table doesn't take that into account -- it should probably use Generic Table but I couldn't wrap my head around it. Otherwise open question whether we reuse the h_schema pyarrow stuff, or just roll our own again...

This will make it look like the others, make it expandable, etc...

This allows us to hide attributes now that we have multiple

elijahbenizzy · 2024-07-12T00:12:08Z

Fixed up schema view:

Added expand/contract now that we have multiple:

We now have three categories: 1. Result summaries -- we can output one and if not it will be unsupported in the UI 2. Schema -- we can output one or none 3. Additional -- as many as we want We use a single dispatch function for each, and it makes the code a lot cleaner. We no longer put stuff in lists unless its an additional result. Note they can also supply their own name, if not, we will generate a unique attribute name. In the long term we'll likely want to stop using single dispatch as we need to register multiple with roles (and single dispatch requires one function per role). For now this is clean enough and easy to work with, however.

skrawcz · 2024-07-13T16:15:33Z

ui/frontend/src/components/dashboard/Runs/Task/result-summaries/DataObservability.tsx

+        // dataTypeDisplay={(item: string) => {
+        //   return (
+        //     <RunLink
+        //       projectId={props.projectId}
+        //       runId={parseInt(item) as number}
+        //       setHighlightedRun={() => void 0}
+        //       highlightedRun={null}
+        //     ></RunLink>
+        //   );
+        // }}


commented out?

skrawcz · 2024-07-13T16:20:25Z

ui/sdk/src/hamilton_sdk/tracking/pyspark_stats.py

-        {
-            "observability_type": "primitive",
-            "observability_value": {
-                "type": str(str),
-                "value": o_value["cost_explain"],
-            },
-            "observability_schema_version": "0.0.1",
-            "name": "Cost Explain",
-        },
-        {
-            "observability_type": "primitive",
-            "observability_value": {
-                "type": str(str),
-                "value": o_value["extended_explain"],
-            },
-            "observability_schema_version": "0.0.1",
-            "name": "Extended Explain",


I don't see these added to additional...

Oh I just stuck in the o_value there cause I figured it was easier. Doesn't allow us to compare, though, so it might be nice to have. Will add it to additional

This just treats all primitives as strings, which can show the view. This works with strings and will actually just look fine otherwise. It uses the ReactDiffViewer component which is simple, and we use elsewhere in the app. We can probably tune this a bit (the interaction is a little jumpy), but for now this is OK.

We had captured all of them. Now we capture individual data as well, which allows for easy comparison. It's duplicated, so we use an lru_tools cache (which should cache based on the pyspark dataframe ID)

elijahbenizzy

This has been fixed up + approved

elijahbenizzy · 2024-07-15T21:57:27Z

ui/sdk/src/hamilton_sdk/tracking/pyspark_stats.py

-        {
-            "observability_type": "primitive",
-            "observability_value": {
-                "type": str(str),
-                "value": o_value["cost_explain"],
-            },
-            "observability_schema_version": "0.0.1",
-            "name": "Cost Explain",
-        },
-        {
-            "observability_type": "primitive",
-            "observability_value": {
-                "type": str(str),
-                "value": o_value["extended_explain"],
-            },
-            "observability_schema_version": "0.0.1",
-            "name": "Extended Explain",


Oh I just stuck in the o_value there cause I figured it was easier. Doesn't allow us to compare, though, so it might be nice to have. Will add it to additional

skrawcz marked this pull request as draft July 4, 2024 22:34

skrawcz added UI Related to the Hamilton UI SDK Related to hamilton SDK for th UI labels Jul 4, 2024

skrawcz and others added 2 commits July 11, 2024 16:17

Gets the schema view to use GenericGroupedTable

ecc9337

This will make it look like the others, make it expandable, etc...

elijahbenizzy force-pushed the schema_viz branch from d34c9dc to ecc9337 Compare July 12, 2024 00:01

Adds expand/contract of attributes

1c7190c

This allows us to hide attributes now that we have multiple

skrawcz commented Jul 13, 2024

View reviewed changes

elijahbenizzy force-pushed the schema_viz branch 2 times, most recently from 1c142f9 to 32b1ef1 Compare July 13, 2024 17:05

elijahbenizzy added 2 commits July 16, 2024 11:39

Adds back in pyspark individual metrics

161b028

We had captured all of them. Now we capture individual data as well, which allows for easy comparison. It's duplicated, so we use an lru_tools cache (which should cache based on the pyspark dataframe ID)

elijahbenizzy force-pushed the schema_viz branch from 71ed230 to 161b028 Compare July 16, 2024 18:39

elijahbenizzy marked this pull request as ready for review July 16, 2024 18:39

elijahbenizzy self-requested a review July 16, 2024 19:55

elijahbenizzy approved these changes Jul 16, 2024

View reviewed changes

elijahbenizzy merged commit 7a3dee8 into main Jul 16, 2024
28 checks passed

elijahbenizzy deleted the schema_viz branch July 16, 2024 19:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sketches Schema UI view #1015

Sketches Schema UI view #1015

skrawcz commented Jul 4, 2024 •

edited by elijahbenizzy

Loading

skrawcz commented Jul 4, 2024

elijahbenizzy commented Jul 12, 2024

skrawcz Jul 13, 2024

skrawcz Jul 13, 2024

elijahbenizzy Jul 15, 2024

elijahbenizzy left a comment

elijahbenizzy Jul 15, 2024

Sketches Schema UI view #1015

Sketches Schema UI view #1015

Conversation

skrawcz commented Jul 4, 2024 • edited by elijahbenizzy Loading

Changes

How I tested this

Notes

Checklist

skrawcz commented Jul 4, 2024

elijahbenizzy commented Jul 12, 2024

skrawcz Jul 13, 2024

Choose a reason for hiding this comment

skrawcz Jul 13, 2024

Choose a reason for hiding this comment

elijahbenizzy Jul 15, 2024

Choose a reason for hiding this comment

elijahbenizzy left a comment

Choose a reason for hiding this comment

elijahbenizzy Jul 15, 2024

Choose a reason for hiding this comment

skrawcz commented Jul 4, 2024 •

edited by elijahbenizzy

Loading