You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
sklearn stores cross validation (e.g. from cross_validate or GridSearchCV) results in a specific tabular format. We can easily add those results to the model card with the help of tabulate. As a user, I can thus easily see all the different outcomes for the splits and hyperparameters.
One potential issue is that the resulting table can become quite big. We could either provide arguments to reduce the size or at the very least document how the user how they can shrink the size in (say, only show the top k best scores from grid search when there are hundreds of rows).
We could also consider this feature to work with pandas dataframes, since df = pd.DataFrame(grid_search.cv_results_) is probably a common pattern.
The text was updated successfully, but these errors were encountered:
Solves skops-dev#87
This feature adds a new method to the Card class called add_table. It
adds a generic table to the end of the model card by creating a new
section (similar to plots).
Any generic table of type dict[str, list[Any]] is supported, as well as
DataFrames. However, the main use case is for adding CV results, as e.g.
created by GridSearchCV.
Implementation
I hope you can stomach the type-y implementation Adrin ;)
I wanted to make the extra sections more generic in PR skops-dev#89 as a
preparation for this PR (thus avoiding a big PR with refactor + new
feature). However, the changes in skops-dev#89 were not sufficient (I guess
that's the danger of splitting the work).
The problem was that the formatting of adding the plots in the save
method was very specific to plots, as a markdown link would be added.
This, of course, does not work for tables.
My solution is that _extra_sections now takes ExtraSection objects as
values instead of strings. These objects only need to implement the
format method (so technically, str would still be okay). So for a
plot, the PlotSection class has a format method that returns a markdown
link. And for tables, the TableSection class has a format method that
uses tabulate to return a table as string.
This approach should hopefully be flexible enough for us to later add
new section types if needed.
At first glance, the implementation might look overly complex. However,
I think it's better than having a bunch of if...else inside of save that
formats each possible type differently. But please LMK if you have a
better idea, I'm open to suggestions.
Examples
Here is the hyperparameter table generated by plot_model_card.py:
https://huggingface.co/skops/hf_hub_example-b959cadc-ffb3-4f07-856b-2d05d2d8f8e0#hyperparameter-search-results
sklearn stores cross validation (e.g. from
cross_validate
orGridSearchCV
) results in a specific tabular format. We can easily add those results to the model card with the help oftabulate
. As a user, I can thus easily see all the different outcomes for the splits and hyperparameters.One potential issue is that the resulting table can become quite big. We could either provide arguments to reduce the size or at the very least document how the user how they can shrink the size in (say, only show the top k best scores from grid search when there are hundreds of rows).
We could also consider this feature to work with pandas dataframes, since
df = pd.DataFrame(grid_search.cv_results_
) is probably a common pattern.The text was updated successfully, but these errors were encountered: