Adding CV results to model cards #87

BenjaminBossan · 2022-08-10T09:39:12Z

sklearn stores cross validation (e.g. from cross_validate or GridSearchCV) results in a specific tabular format. We can easily add those results to the model card with the help of tabulate. As a user, I can thus easily see all the different outcomes for the splits and hyperparameters.

One potential issue is that the resulting table can become quite big. We could either provide arguments to reduce the size or at the very least document how the user how they can shrink the size in (say, only show the top k best scores from grid search when there are hundreds of rows).

We could also consider this feature to work with pandas dataframes, since df = pd.DataFrame(grid_search.cv_results_) is probably a common pattern.

The text was updated successfully, but these errors were encountered:

Solves skops-dev#87 This feature adds a new method to the Card class called add_table. It adds a generic table to the end of the model card by creating a new section (similar to plots). Any generic table of type dict[str, list[Any]] is supported, as well as DataFrames. However, the main use case is for adding CV results, as e.g. created by GridSearchCV. Implementation I hope you can stomach the type-y implementation Adrin ;) I wanted to make the extra sections more generic in PR skops-dev#89 as a preparation for this PR (thus avoiding a big PR with refactor + new feature). However, the changes in skops-dev#89 were not sufficient (I guess that's the danger of splitting the work). The problem was that the formatting of adding the plots in the save method was very specific to plots, as a markdown link would be added. This, of course, does not work for tables. My solution is that _extra_sections now takes ExtraSection objects as values instead of strings. These objects only need to implement the format method (so technically, str would still be okay). So for a plot, the PlotSection class has a format method that returns a markdown link. And for tables, the TableSection class has a format method that uses tabulate to return a table as string. This approach should hopefully be flexible enough for us to later add new section types if needed. At first glance, the implementation might look overly complex. However, I think it's better than having a bunch of if...else inside of save that formats each possible type differently. But please LMK if you have a better idea, I'm open to suggestions. Examples Here is the hyperparameter table generated by plot_model_card.py: https://huggingface.co/skops/hf_hub_example-b959cadc-ffb3-4f07-856b-2d05d2d8f8e0#hyperparameter-search-results

merveenoyan self-assigned this Aug 10, 2022

BenjaminBossan mentioned this issue Aug 10, 2022

ENH Add table to model cards #90

Merged

BenjaminBossan closed this as completed Aug 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding CV results to model cards #87

Adding CV results to model cards #87

BenjaminBossan commented Aug 10, 2022

Adding CV results to model cards #87

Adding CV results to model cards #87

Comments

BenjaminBossan commented Aug 10, 2022