Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding CV results to model cards #87

Closed
BenjaminBossan opened this issue Aug 10, 2022 · 0 comments
Closed

Adding CV results to model cards #87

BenjaminBossan opened this issue Aug 10, 2022 · 0 comments
Assignees

Comments

@BenjaminBossan
Copy link
Collaborator

sklearn stores cross validation (e.g. from cross_validate or GridSearchCV) results in a specific tabular format. We can easily add those results to the model card with the help of tabulate. As a user, I can thus easily see all the different outcomes for the splits and hyperparameters.

One potential issue is that the resulting table can become quite big. We could either provide arguments to reduce the size or at the very least document how the user how they can shrink the size in (say, only show the top k best scores from grid search when there are hundreds of rows).

We could also consider this feature to work with pandas dataframes, since df = pd.DataFrame(grid_search.cv_results_) is probably a common pattern.

@merveenoyan merveenoyan self-assigned this Aug 10, 2022
BenjaminBossan added a commit to BenjaminBossan/skops that referenced this issue Aug 10, 2022
Solves skops-dev#87

This feature adds a new method to the Card class called add_table. It
adds a generic table to the end of the model card by creating a new
section (similar to plots).

Any generic table of type dict[str, list[Any]] is supported, as well as
DataFrames. However, the main use case is for adding CV results, as e.g.
created by GridSearchCV.

Implementation

I hope you can stomach the type-y implementation Adrin ;)

I wanted to make the extra sections more generic in PR skops-dev#89 as a
preparation for this PR (thus avoiding a big PR with refactor + new
feature). However, the changes in skops-dev#89 were not sufficient (I guess
that's the danger of splitting the work).

The problem was that the formatting of adding the plots in the save
method was very specific to plots, as a markdown link would be added.
This, of course, does not work for tables.

My solution is that _extra_sections now takes ExtraSection objects as
values instead of strings. These objects only need to implement the
format method (so technically, str would still be okay). So for a
plot, the PlotSection class has a format method that returns a markdown
link. And for tables, the TableSection class has a format method that
uses tabulate to return a table as string.

This approach should hopefully be flexible enough for us to later add
new section types if needed.

At first glance, the implementation might look overly complex. However,
I think it's better than having a bunch of if...else inside of save that
formats each possible type differently. But please LMK if you have a
better idea, I'm open to suggestions.

Examples

Here is the hyperparameter table generated by plot_model_card.py:

https://huggingface.co/skops/hf_hub_example-b959cadc-ffb3-4f07-856b-2d05d2d8f8e0#hyperparameter-search-results
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants