-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ui
sklearn attr
#979
ui
sklearn attr
#979
Conversation
{ | ||
"name": "Parameters", | ||
"observability_type": "dict", | ||
"observability_value": { | ||
"type": str(type(result)), | ||
"value": result.get_params(deep=True), | ||
}, | ||
"observability_schema_version": "0.0.2", | ||
}, | ||
{ | ||
"name": "Components", | ||
"observability_type": "html", | ||
"observability_value": {"html": result._repr_html_inner()}, # get_params(deep=True), | ||
"observability_schema_version": "0.0.1", | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I like this pattern of it returning more than one thing.
Can we not frame these two as one thing? Why do they have to be separate? Or split them - one for state, the other as "schema"?
stats == summary of object
schema == structural representation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The solution was built according to the current constraints:
- the current use of
process_results()
and@singledispatch
means I can't have 2 functions for the same object type. All attributes must be returned from the same function - these Python dictionaries are strongly coupled with their UI display. To display both a JSON dict and an HTML object, I need to output two dict
Thoughts:
- Regarding 2., we want to give responsibility to the UI to display the object. Creating an "scikit-learn" object that has both JSON and HTML in the same bundle would mean we need to create this bundle in the Python SDK and unbundle it from the TypeScript frontend, making it difficult to manage
- I strongly doubt that we can fit any Python objects under a "schema" and "stats" or any narrow framework. For instance, these two scikit-learn objects represent the exact same information, so it would be confusing to split them between schema/stats, but it remains useful to have both presentation in the UI. How would you categorize a PNG of a query plan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In all honesty, this change is hacky, but I think it's high value to deliver a lot of UI components shortly to at least cover "all common types" used in our examples to avoid "not implemented yet" messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So really this is only a hack with regard to "recursion" of a result, e.g. a list or dictionary. Otherwise this works fine, right?
We do have a design decision:
- thematically group things and have explicitly multiple calls
- put them all in a single function
My worry is being able to control and tweak them with configuration. So we should consider what the impact this design has on that. E.g. we could more explicitly control registration of functions in various ways -- some could be cleaner than others -- I'm thinking we control registration to know what to allow to register or not...
Otherwise:
I strongly doubt that we can fit any Python objects under a "schema" and "stats" or any narrow framework. For instance, these two scikit-learn objects represent the exact same information, so it would be confusing to split them between schema/stats, but it remains useful to have both presentation in the UI. How would you categorize a PNG of a query plan?
Yes we could instead have more: summary of object (compressing data), schema/structure, debug/extra, etc. This too could be registration driven. e.g. for this type we want these things to run...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "hacky" part to me is in the adapter code where I assume that the first attribute returned is the result_summary
(if it's a list rather than a dictionary). Once we redo the data model tracking for requests & responses, we could have things requested by names (e.g., "summary", "schema", "size", "path", "parameters").
My worry is being able to control and tweak them with configuration. So we should consider what the impact this design has on that. E.g. we could more explicitly control registration of functions in various ways -- some could be cleaner than others -- I'm thinking we control registration to know what to allow to register or not...
I agree that we'll need an API to give back control to the user. From general user feedback, people really appreciate "smart defaults" that just "do the right thing" in 80% of cases. Then, giving full control and supporting "custom" features is necessary 20% of cases and more committal (we previously might have been to quick here).
For this PR, the scikit-learn introspection has almost 0 performance and metadata storage costs (inspect a dictionary and generate an HTML string). Also, changes are not user facing. Given the design decisions for multiple attributes and the required public API will require time and refactoring anyways, I don't think it should block this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the problem is that we're calling it "stats" whereas it just got expanded to "observations".
This API is largely internal (users cannot yet register their own), so I'm not too concerned yet about doing this the wrong way -- we can always change it.
The contract with the server/UI is that we can have as many attributes as we want with unique names (the results_summary
view will be at the top), so I think its OK to have multiple returned by this function (for now)?
I agree with @skrawcz, however, we'll want to add configuration on what, specifically we capture. One way would be to have "tags" on each attribute that you can turn on/off (schema, results_summary, etc...). TBD how.
System contract looks like this:
- SDK captures n attributes per node (
n>=0
) - Server stores it (doesn't care about what
n
is or the schema or whatever) - UI renders it
So this is how to do (1) -- which we can improve later on.
A middle ground could also be having one function to do capturing of the results summary and another to get any additional attributes. Or register using a custom single dispatch for attribute "name" (E.G. schema
, results_summary
, etc...).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A middle ground could also be having one function to do capturing of the results summary and another to get any additional attributes. Or register using a custom single dispatch for attribute "name" (E.G. schema, results_summary, etc...).
Same thoughts here. I could edit the PR slightly to ensure that a result_summary
attribute must exist (as the sole dict returned or included in a list of dict) and other "attributes" have unique names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool the main thing I wanted is to think about it. I don't think anything is a blocker. If we do change / revert things we'll just need to make the frontend is backwards compatible :)
{ | ||
"name": "Parameters", | ||
"observability_type": "dict", | ||
"observability_value": { | ||
"type": str(type(result)), | ||
"value": result.get_params(deep=True), | ||
}, | ||
"observability_schema_version": "0.0.2", | ||
}, | ||
{ | ||
"name": "Components", | ||
"observability_type": "html", | ||
"observability_value": {"html": result._repr_html_inner()}, # get_params(deep=True), | ||
"observability_schema_version": "0.0.1", | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool the main thing I wanted is to think about it. I don't think anything is a blocker. If we do change / revert things we'll just need to make the frontend is backwards compatible :)
Display scikit-learn objects as a dictionary of estimator/model parameters and an HTML widget (produce by scikit-learn).
Changes
compute_stats_dict()
recursive inspection for scikit-learn's function that returns a list rather than a dictHow I tested this
examples/hamilton_ui
. Other attributes don't seem broken either.Note
cherry-pick
to avoid merge conflict with the addition ofpydantic
in the list of supported "attributes"