`ui` sklearn attr #979

zilto · 2024-06-21T20:59:45Z

Display scikit-learn objects as a dictionary of estimator/model parameters and an HTML widget (produce by scikit-learn).

Changes

SDK: Track scikit-learn objects
SDK: the adapter can handle more than one "stats/result/attribute"
frontend/backend: HTML attribute type that can be tracked and displayed
Hit an issue with compute_stats_dict() recursive inspection for scikit-learn's function that returns a list rather than a dict

How I tested this

manual testing with examples/hamilton_ui. Other attributes don't seem broken either.

Note

commit history is a bit messy. Needed to use cherry-pick to avoid merge conflict with the addition of pydantic in the list of supported "attributes"

skrawcz · 2024-06-22T14:36:47Z

ui/sdk/src/hamilton_sdk/tracking/scikit_learn_stats.py

+        {
+            "name": "Parameters",
+            "observability_type": "dict",
+            "observability_value": {
+                "type": str(type(result)),
+                "value": result.get_params(deep=True),
+            },
+            "observability_schema_version": "0.0.2",
+        },
+        {
+            "name": "Components",
+            "observability_type": "html",
+            "observability_value": {"html": result._repr_html_inner()},  # get_params(deep=True),
+            "observability_schema_version": "0.0.1",
+        },


I'm not sure I like this pattern of it returning more than one thing.

Can we not frame these two as one thing? Why do they have to be separate? Or split them - one for state, the other as "schema"?

stats == summary of object
schema == structural representation

The solution was built according to the current constraints:

the current use of process_results() and @singledispatch means I can't have 2 functions for the same object type. All attributes must be returned from the same function

these Python dictionaries are strongly coupled with their UI display. To display both a JSON dict and an HTML object, I need to output two dict

Thoughts:

Regarding 2., we want to give responsibility to the UI to display the object. Creating an "scikit-learn" object that has both JSON and HTML in the same bundle would mean we need to create this bundle in the Python SDK and unbundle it from the TypeScript frontend, making it difficult to manage

I strongly doubt that we can fit any Python objects under a "schema" and "stats" or any narrow framework. For instance, these two scikit-learn objects represent the exact same information, so it would be confusing to split them between schema/stats, but it remains useful to have both presentation in the UI. How would you categorize a PNG of a query plan?

In all honesty, this change is hacky, but I think it's high value to deliver a lot of UI components shortly to at least cover "all common types" used in our examples to avoid "not implemented yet" messages.

So really this is only a hack with regard to "recursion" of a result, e.g. a list or dictionary. Otherwise this works fine, right?

We do have a design decision:

thematically group things and have explicitly multiple calls

put them all in a single function

My worry is being able to control and tweak them with configuration. So we should consider what the impact this design has on that. E.g. we could more explicitly control registration of functions in various ways -- some could be cleaner than others -- I'm thinking we control registration to know what to allow to register or not...

Otherwise:

I strongly doubt that we can fit any Python objects under a "schema" and "stats" or any narrow framework. For instance, these two scikit-learn objects represent the exact same information, so it would be confusing to split them between schema/stats, but it remains useful to have both presentation in the UI. How would you categorize a PNG of a query plan?

Yes we could instead have more: summary of object (compressing data), schema/structure, debug/extra, etc. This too could be registration driven. e.g. for this type we want these things to run...

The "hacky" part to me is in the adapter code where I assume that the first attribute returned is the result_summary (if it's a list rather than a dictionary). Once we redo the data model tracking for requests & responses, we could have things requested by names (e.g., "summary", "schema", "size", "path", "parameters").

My worry is being able to control and tweak them with configuration. So we should consider what the impact this design has on that. E.g. we could more explicitly control registration of functions in various ways -- some could be cleaner than others -- I'm thinking we control registration to know what to allow to register or not...

I agree that we'll need an API to give back control to the user. From general user feedback, people really appreciate "smart defaults" that just "do the right thing" in 80% of cases. Then, giving full control and supporting "custom" features is necessary 20% of cases and more committal (we previously might have been to quick here).

For this PR, the scikit-learn introspection has almost 0 performance and metadata storage costs (inspect a dictionary and generate an HTML string). Also, changes are not user facing. Given the design decisions for multiple attributes and the required public API will require time and refactoring anyways, I don't think it should block this PR

I think the problem is that we're calling it "stats" whereas it just got expanded to "observations".

This API is largely internal (users cannot yet register their own), so I'm not too concerned yet about doing this the wrong way -- we can always change it.

The contract with the server/UI is that we can have as many attributes as we want with unique names (the results_summary view will be at the top), so I think its OK to have multiple returned by this function (for now)?

I agree with @skrawcz, however, we'll want to add configuration on what, specifically we capture. One way would be to have "tags" on each attribute that you can turn on/off (schema, results_summary, etc...). TBD how.

System contract looks like this:

SDK captures n attributes per node (n>=0)

Server stores it (doesn't care about what n is or the schema or whatever)

UI renders it

So this is how to do (1) -- which we can improve later on.

A middle ground could also be having one function to do capturing of the results summary and another to get any additional attributes. Or register using a custom single dispatch for attribute "name" (E.G. schema, results_summary, etc...).

A middle ground could also be having one function to do capturing of the results summary and another to get any additional attributes. Or register using a custom single dispatch for attribute "name" (E.G. schema, results_summary, etc...).

Same thoughts here. I could edit the PR slightly to ensure that a result_summary attribute must exist (as the sole dict returned or included in a list of dict) and other "attributes" have unique names

Cool the main thing I wanted is to think about it. I don't think anything is a blocker. If we do change / revert things we'll just need to make the frontend is backwards compatible :)

skrawcz · 2024-06-24T22:18:08Z

ui/sdk/src/hamilton_sdk/tracking/scikit_learn_stats.py

+        {
+            "name": "Parameters",
+            "observability_type": "dict",
+            "observability_value": {
+                "type": str(type(result)),
+                "value": result.get_params(deep=True),
+            },
+            "observability_schema_version": "0.0.2",
+        },
+        {
+            "name": "Components",
+            "observability_type": "html",
+            "observability_value": {"html": result._repr_html_inner()},  # get_params(deep=True),
+            "observability_schema_version": "0.0.1",
+        },


Cool the main thing I wanted is to think about it. I don't think anything is a blocker. If we do change / revert things we'll just need to make the frontend is backwards compatible :)

zilto added 2 commits June 21, 2024 16:56

working HTML view

ec00844

added scikit-learn support

8b1a72e

zilto added enhancement New feature or request UI Related to the Hamilton UI labels Jun 21, 2024

skrawcz reviewed Jun 22, 2024

View reviewed changes

skrawcz approved these changes Jun 24, 2024

View reviewed changes

zilto merged commit de80383 into main Jun 24, 2024
27 checks passed

zilto deleted the ui/sklearn branch June 24, 2024 23:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`ui` sklearn attr #979

`ui` sklearn attr #979

zilto commented Jun 21, 2024 •

edited

Loading

skrawcz Jun 22, 2024

zilto Jun 23, 2024 •

edited

Loading

zilto Jun 23, 2024

skrawcz Jun 24, 2024

zilto Jun 24, 2024

elijahbenizzy Jun 24, 2024

zilto Jun 24, 2024

skrawcz Jun 24, 2024

skrawcz Jun 24, 2024

ui sklearn attr #979

ui sklearn attr #979

Conversation

zilto commented Jun 21, 2024 • edited Loading

Changes

How I tested this

Note

Choose a reason for hiding this comment

zilto Jun 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

`ui` sklearn attr #979

`ui` sklearn attr #979

zilto commented Jun 21, 2024 •

edited

Loading

zilto Jun 23, 2024 •

edited

Loading