ENH Add table to model cards #90

BenjaminBossan · 2022-08-10T14:56:46Z

Solves #87

Description

This feature adds a new method to the Card class called add_table. It
adds a generic table to the end of the model card by creating a new
section (similar to plots).

Any generic table of type dict[str, list[Any]] is supported, as well as
DataFrames. However, the main use case is for adding CV results, as e.g.
created by GridSearchCV.

Implementation

I hope you can stomach the type-y implementation Adrin ;)

I wanted to make the extra sections more generic in PR #89 as a
preparation for this PR (thus avoiding a big PR with refactor + new
feature). However, the changes in #89 were not sufficient (I guess
that's the danger of splitting the work).

The problem was that the formatting of adding the plots in the save
method was very specific to plots, as a markdown link would be added.
This, of course, does not work for tables.

My solution is that _extra_sections now takes ExtraSection objects as
values instead of strings. These objects only need to implement the
format method (so technically, str would still be okay). So for a
plot, the PlotSection class has a format method that returns a markdown
link. And for tables, the TableSection class has a format method that
uses tabulate to return a table as string.

This approach should hopefully be flexible enough for us to later add
new section types if needed.

At first glance, the implementation might look overly complex. However,
I think it's better than having a bunch of if...else inside of save that
formats each possible type differently. But please LMK if you have a
better idea, I'm open to suggestions.

Tangential changes

So far, the plots were just added without an extra section to the bottom.
Now, there is an "# Additioinal Content" section and each plot and table
gets their own subsection.

Examples

Here is the hyperparameter table generated by plot_model_card.py:

Solves skops-dev#87 This feature adds a new method to the Card class called add_table. It adds a generic table to the end of the model card by creating a new section (similar to plots). Any generic table of type dict[str, list[Any]] is supported, as well as DataFrames. However, the main use case is for adding CV results, as e.g. created by GridSearchCV. Implementation I hope you can stomach the type-y implementation Adrin ;) I wanted to make the extra sections more generic in PR skops-dev#89 as a preparation for this PR (thus avoiding a big PR with refactor + new feature). However, the changes in skops-dev#89 were not sufficient (I guess that's the danger of splitting the work). The problem was that the formatting of adding the plots in the save method was very specific to plots, as a markdown link would be added. This, of course, does not work for tables. My solution is that _extra_sections now takes ExtraSection objects as values instead of strings. These objects only need to implement the format method (so technically, str would still be okay). So for a plot, the PlotSection class has a format method that returns a markdown link. And for tables, the TableSection class has a format method that uses tabulate to return a table as string. This approach should hopefully be flexible enough for us to later add new section types if needed. At first glance, the implementation might look overly complex. However, I think it's better than having a bunch of if...else inside of save that formats each possible type differently. But please LMK if you have a better idea, I'm open to suggestions. Examples Here is the hyperparameter table generated by plot_model_card.py: https://huggingface.co/skops/hf_hub_example-b959cadc-ffb3-4f07-856b-2d05d2d8f8e0#hyperparameter-search-results

BenjaminBossan · 2022-08-10T15:42:39Z

@skops-dev/maintainers ready for review.

Regarding failing codecov: One instance is spurious, the other comes back to #53, which we haven't tackled yet.

(Note:

You may wonder why this is inside of _model_card.py instead of fixes.py:

if sys.version_info >= (3, 8):
    from typing import Protocol
else:
    from typing_extensions import Protocol

I actually tried to put it in fixes.py and it didn't work, resulting in a mypy error, so I had to put it in the module where it's used. If you have an idea what's going on, please let me know. What I tried but didn't help: remove # noqa, grouping and not grouping imports.)

adrinjalali · 2022-08-10T16:57:41Z

Instead of an extra type, I'd just store the tables, and have a little function which takes the table and return what should be rendered in the md file. We don't need a protocol for adding tables to the model card.

BenjaminBossan · 2022-08-10T17:07:22Z

Instead of an extra type, I'd just store the tables, and have a little function which takes the table and return what should be rendered in the md file. We don't need a protocol for adding tables to the model card.

I assume you mean inside the save method. How would you know at that point what function to use to render the value? Sure, we could say str -> plot and dict -> table but I don't like that too much and it would probably not work with more types being added. IMO it also doesn't add much complexity -- you can directly see how different types of section are formatted and don't need to figure out which function is being called on them.

As to the protocol, it is not strictly needed, it's just to add a bit of extra type safety.

adrinjalali · 2022-08-10T17:12:24Z

skops/card/_model_card.py

+        """
+        for key, val in kwargs.items():
+            section = TableSection(table=val)
+            self._extra_sections[key] = section


here we could add to self._extra_tables instead and then we know they're tables. Each method adds to their corresponding attribute.

I wanted to avoid having separate containers for each type of addition. One reason is that it makes the code more lengthy (we'll later have to loop over each one separately), the other is that we lose the information of which order the extra sections are stored (e.g. I add a plot, a table, then a plot, but at the end I get plot, plot, table).

Another design I wanted to avoid is formatting already inside the add_* method, to not lock us in into a specific format at that point.

okay, so if the order is important (which I agree with), we should make it a list of named tuples instead of a dict (and relying on the dict preserving insertion order during iteration).

If we remove the protocol and not typing our private attribute, then I'm fine with the PR

Happy to make those changes.

Regarding the dict to list of tuples proposal: From my understanding, as long as you only add to the dict, the order should be preserved, right? So the difference would mainly be if we want to later allow deletion, or if a user wants to have multiple sections with the same name?

that's true in modern new python versions. I think of it as an implementation detail, and rather not rely on it really.

When the dict change was made in Python 3.6, it was indeed so:

The order-preserving aspect of this new implementation is considered an implementation detail and should not be relied upon

(link)

With Python 3.7 (which is our min version), the order is officially part of the language:

the insertion-order preservation nature of dict objects has been declared to be an official part of the Python language spec.

(link)

So this wouldn't be my main concern here. I'd rather think about these implications:

We can have duplicate section names in the extra sections

Time complexity to retrieve items is now O(N) and there is no guarantee that only a single key matches (due to 1.)

More control about order (e.g. when deleting and reinserting items)

Due to the nature of the extra sections, which can contain anything, I think the trade-off is okay, whereas template sections should remain a dict.

I know all these details about python's dict, but not many people do, and a "dictionary" isn't necessarily ordered in other languages. I tend to avoid things which are somewhat common in different languages but slightly differ in python for some specific reason.

- _extra_sections is now a list of key, val tuples - Remove the ExtraSection protocol

adrinjalali

I really like it now.

Things left:

user guide for the added functionality
changelog entry

BenjaminBossan · 2022-08-11T08:58:27Z

I'll work on that and let you know once it's ready for review.

BenjaminBossan · 2022-08-11T10:12:11Z

@adrinjalali This is now ready for review.

In the changelog, I have not yet added subsection for new features, bugfixes, etc. LMK if now is a good time.

adrinjalali

Otherwise LGTM!

docs/changes.rst

examples/plot_model_card.py

adrinjalali · 2022-08-11T11:28:47Z

skops/card/tests/test_card.py

+    def test_raise_error_if_no_cols(self):
+        msg = "Empty table added"
+        with pytest.raises(ValueError, match=msg):
+            TableSection(table={})
+
+    def test_raise_error_if_no_rows(self):
+        msg = "Empty table added"
+        with pytest.raises(ValueError, match=msg):
+            TableSection(table={"col": []})
+
+    def test_raise_error_df_empty(self):
+        pd = pytest.importorskip("pandas")
+        df = pd.DataFrame([])
+        msg = "Empty table added"
+        with pytest.raises(ValueError, match=msg):
+            TableSection(table=df)


these could be a single parameterized test.

Done. Please review again.

Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>

BenjaminBossan added 5 commits August 10, 2022 16:40

Fix errors in example in docstring

6724f8f

Satisfy mypy, black, flake8

419b9f1

Make Protocol import work with Py 3.7

d209fc7

Make test pass on Windoofs

ad578eb

adrinjalali reviewed Aug 10, 2022

View reviewed changes

Make extra_sections a list of tuples, untype it

ec40653

- _extra_sections is now a list of key, val tuples - Remove the ExtraSection protocol

adrinjalali reviewed Aug 11, 2022

View reviewed changes

Add unit test for missing pandas import

29e0f3f

BenjaminBossan and others added 3 commits August 11, 2022 11:53

Add entry the changes.rst

091c9f9

Update documentation about adding tables

a3c4028

Merge branch 'main' into add-table-to-model-cards

5822e1d

adrinjalali reviewed Aug 11, 2022

View reviewed changes

BenjaminBossan and others added 3 commits August 11, 2022 13:42

Update author in changes.rst

6bb4d12

Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>

Update section title

6a1bf73

Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>

Parametrize unit tests for empty tables

55e33b2

adrinjalali approved these changes Aug 11, 2022

View reviewed changes

adrinjalali changed the title ~~Add table to model cards~~ ENH Add table to model cards Aug 11, 2022

adrinjalali merged commit 5a8184e into skops-dev:main Aug 11, 2022

BenjaminBossan deleted the add-table-to-model-cards branch August 11, 2022 13:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH Add table to model cards #90

ENH Add table to model cards #90

BenjaminBossan commented Aug 10, 2022 •

edited

Loading

BenjaminBossan commented Aug 10, 2022 •

edited

Loading

adrinjalali commented Aug 10, 2022

BenjaminBossan commented Aug 10, 2022

adrinjalali Aug 10, 2022

BenjaminBossan Aug 10, 2022

adrinjalali Aug 11, 2022

BenjaminBossan Aug 11, 2022

adrinjalali Aug 11, 2022

BenjaminBossan Aug 11, 2022

adrinjalali Aug 11, 2022

adrinjalali left a comment

BenjaminBossan commented Aug 11, 2022

BenjaminBossan commented Aug 11, 2022

adrinjalali left a comment

adrinjalali Aug 11, 2022

BenjaminBossan Aug 11, 2022

ENH Add table to model cards #90

ENH Add table to model cards #90

Conversation

BenjaminBossan commented Aug 10, 2022 • edited Loading

Description

Implementation

Tangential changes

Examples

BenjaminBossan commented Aug 10, 2022 • edited Loading

adrinjalali commented Aug 10, 2022

BenjaminBossan commented Aug 10, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adrinjalali left a comment

Choose a reason for hiding this comment

BenjaminBossan commented Aug 11, 2022

BenjaminBossan commented Aug 11, 2022

adrinjalali left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BenjaminBossan commented Aug 10, 2022 •

edited

Loading

BenjaminBossan commented Aug 10, 2022 •

edited

Loading