Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ui sklearn attr #979

Merged
merged 2 commits into from
Jun 24, 2024
Merged

ui sklearn attr #979

merged 2 commits into from
Jun 24, 2024

Conversation

zilto
Copy link
Collaborator

@zilto zilto commented Jun 21, 2024

Display scikit-learn objects as a dictionary of estimator/model parameters and an HTML widget (produce by scikit-learn).

image

Changes

  • SDK: Track scikit-learn objects
  • SDK: the adapter can handle more than one "stats/result/attribute"
  • frontend/backend: HTML attribute type that can be tracked and displayed
  • Hit an issue with compute_stats_dict() recursive inspection for scikit-learn's function that returns a list rather than a dict

How I tested this

  • manual testing with examples/hamilton_ui. Other attributes don't seem broken either.

Note

  • commit history is a bit messy. Needed to use cherry-pick to avoid merge conflict with the addition of pydantic in the list of supported "attributes"

@zilto zilto added enhancement New feature or request UI Related to the Hamilton UI labels Jun 21, 2024
Comment on lines +15 to +29
{
"name": "Parameters",
"observability_type": "dict",
"observability_value": {
"type": str(type(result)),
"value": result.get_params(deep=True),
},
"observability_schema_version": "0.0.2",
},
{
"name": "Components",
"observability_type": "html",
"observability_value": {"html": result._repr_html_inner()}, # get_params(deep=True),
"observability_schema_version": "0.0.1",
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I like this pattern of it returning more than one thing.

Can we not frame these two as one thing? Why do they have to be separate? Or split them - one for state, the other as "schema"?

stats == summary of object
schema == structural representation

Copy link
Collaborator Author

@zilto zilto Jun 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The solution was built according to the current constraints:

  1. the current use of process_results() and @singledispatch means I can't have 2 functions for the same object type. All attributes must be returned from the same function
  2. these Python dictionaries are strongly coupled with their UI display. To display both a JSON dict and an HTML object, I need to output two dict

Thoughts:

  • Regarding 2., we want to give responsibility to the UI to display the object. Creating an "scikit-learn" object that has both JSON and HTML in the same bundle would mean we need to create this bundle in the Python SDK and unbundle it from the TypeScript frontend, making it difficult to manage
  • I strongly doubt that we can fit any Python objects under a "schema" and "stats" or any narrow framework. For instance, these two scikit-learn objects represent the exact same information, so it would be confusing to split them between schema/stats, but it remains useful to have both presentation in the UI. How would you categorize a PNG of a query plan?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In all honesty, this change is hacky, but I think it's high value to deliver a lot of UI components shortly to at least cover "all common types" used in our examples to avoid "not implemented yet" messages.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So really this is only a hack with regard to "recursion" of a result, e.g. a list or dictionary. Otherwise this works fine, right?

We do have a design decision:

  • thematically group things and have explicitly multiple calls
  • put them all in a single function

My worry is being able to control and tweak them with configuration. So we should consider what the impact this design has on that. E.g. we could more explicitly control registration of functions in various ways -- some could be cleaner than others -- I'm thinking we control registration to know what to allow to register or not...

Otherwise:

I strongly doubt that we can fit any Python objects under a "schema" and "stats" or any narrow framework. For instance, these two scikit-learn objects represent the exact same information, so it would be confusing to split them between schema/stats, but it remains useful to have both presentation in the UI. How would you categorize a PNG of a query plan?

Yes we could instead have more: summary of object (compressing data), schema/structure, debug/extra, etc. This too could be registration driven. e.g. for this type we want these things to run...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "hacky" part to me is in the adapter code where I assume that the first attribute returned is the result_summary (if it's a list rather than a dictionary). Once we redo the data model tracking for requests & responses, we could have things requested by names (e.g., "summary", "schema", "size", "path", "parameters").

My worry is being able to control and tweak them with configuration. So we should consider what the impact this design has on that. E.g. we could more explicitly control registration of functions in various ways -- some could be cleaner than others -- I'm thinking we control registration to know what to allow to register or not...

I agree that we'll need an API to give back control to the user. From general user feedback, people really appreciate "smart defaults" that just "do the right thing" in 80% of cases. Then, giving full control and supporting "custom" features is necessary 20% of cases and more committal (we previously might have been to quick here).

For this PR, the scikit-learn introspection has almost 0 performance and metadata storage costs (inspect a dictionary and generate an HTML string). Also, changes are not user facing. Given the design decisions for multiple attributes and the required public API will require time and refactoring anyways, I don't think it should block this PR

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem is that we're calling it "stats" whereas it just got expanded to "observations".

This API is largely internal (users cannot yet register their own), so I'm not too concerned yet about doing this the wrong way -- we can always change it.

The contract with the server/UI is that we can have as many attributes as we want with unique names (the results_summary view will be at the top), so I think its OK to have multiple returned by this function (for now)?

I agree with @skrawcz, however, we'll want to add configuration on what, specifically we capture. One way would be to have "tags" on each attribute that you can turn on/off (schema, results_summary, etc...). TBD how.

System contract looks like this:

  1. SDK captures n attributes per node (n>=0)
  2. Server stores it (doesn't care about what n is or the schema or whatever)
  3. UI renders it

So this is how to do (1) -- which we can improve later on.

A middle ground could also be having one function to do capturing of the results summary and another to get any additional attributes. Or register using a custom single dispatch for attribute "name" (E.G. schema, results_summary, etc...).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A middle ground could also be having one function to do capturing of the results summary and another to get any additional attributes. Or register using a custom single dispatch for attribute "name" (E.G. schema, results_summary, etc...).

Same thoughts here. I could edit the PR slightly to ensure that a result_summary attribute must exist (as the sole dict returned or included in a list of dict) and other "attributes" have unique names

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool the main thing I wanted is to think about it. I don't think anything is a blocker. If we do change / revert things we'll just need to make the frontend is backwards compatible :)

Comment on lines +15 to +29
{
"name": "Parameters",
"observability_type": "dict",
"observability_value": {
"type": str(type(result)),
"value": result.get_params(deep=True),
},
"observability_schema_version": "0.0.2",
},
{
"name": "Components",
"observability_type": "html",
"observability_value": {"html": result._repr_html_inner()}, # get_params(deep=True),
"observability_schema_version": "0.0.1",
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool the main thing I wanted is to think about it. I don't think anything is a blocker. If we do change / revert things we'll just need to make the frontend is backwards compatible :)

@zilto zilto merged commit de80383 into main Jun 24, 2024
27 checks passed
@zilto zilto deleted the ui/sklearn branch June 24, 2024 23:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request UI Related to the Hamilton UI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants