Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MNT: refactor model card to render sections lazily #310

Conversation

BenjaminBossan
Copy link
Collaborator

Currently, the way model cards work is that if a user adds new content,
a Section object is created and added as a (sub)section. That object
contains the formatted content of that section as str, which is then
used when rendering.

So if the user adds, e.g., a table, the table is eagerly formatted to a
str and during rendering, the str is simply returned.

With this refactor, when a user adds new content, a specific type of
section is added instead, which does not eagerly format the string.

For simple text content, this is just a PlainSection which holds the content
as a str, so no big change. However, for more complex content like
tables or figures, a special section type is added that does not contain
the final content as rendered.

When the model card is rendered, the section.format() method is called
to render the actual content, i.e. the formatting is now called lazily.

The main benefit of this refactor is that the sections in a model card
are now "aware" of what type of section they are. This allows us to more
easily modify them later. E.g. let's say we want to modify the path to a
figure or a row in a table: Right now, this would require some quite
complex regex to modify the already formatted str. With this refactor,
we can just access the attributes of the section, e.g. section.path or
section.table, and modify them.

A minor benefit could also be performance, because we don't render until
it's strictly needed, but that's probably very minor.

From the user perspective, this refactor shouldn't change anything as
long as they were sticking with the official API. When calling .render()
or .save(), the output is the same as before. Only if the users were
directly interacting with Sections might there be a breakage.

A small feature added in this PR is that .add_plot and .add_table now take
an additional argument, description. This way, consistent with other .add_*
methods, users can now add a description to their plot or table. If a user adds
multiple plots/tables with a single call, they will all get the same
description. So if that's not desired, users need to call the methods multiple
times.

On top of that, for plots, users can now also pass the alt_text argument,
which allows them to set a different alt texts. Same caveats as for
description apply when adding multiple figures.

Another minor visible change this PR brings as well is that the repr of
the card changes slightly. Now it indicates what type of section we're
dealing with, e.g. TableSection(...), except if it's a PlainSection,
which, same as previously, just returns the repr of its content.

Comment:

The classes TableSection and PlotSection were moved further down in the
file. Therefore, the diff might look bigger than it actually is. Still, there
are some changes to the classes as well, so please review them too.

Currently, the way model cards work is that if a user adds new content,
a Section object is created and added as a (sub)section. That object
contains the *formatted* content of that section as str, which is then
used when rendering.

So if the user adds, e.g., a table, the table is eagerly formatted to a
str and during rendering, the str is simply returned.

With this refactor, when a user adds new content, a specific type of
section is added instead, which does not eagerly format the string.

For simple text content, this is just a PlainSection which holds the content
as a str, so no big change. However, for more complex content like
tables or figures, a special section type is added that does not contain
the final content as rendered.

When the model card is rendered, the section.format() method is called
to render the actual content, i.e. the formatting is now called lazily.

The main benefit of this refactor is that the sections in a model card
are now "aware" of what type of section they are. This allows us to more
easily modify them later. E.g. let's say we want to modify the path to a
figure or a row in a table: Right now, this would require some quite
complex regex to modify the already formatted str. With this refactor,
we can just access the attributes of the section, e.g. section.path or
section.table, and modify them.

A minor benefit could also be performance, because we don't render until
it's strictly needed, but that's probably very minor.

From the user perspective, this refactor shouldn't change anything as
long as they were sticking with the official API. When calling render()
or save(), the output is the same as before. Only if the users were
directly interacting with sections might there be a breakage.

One minor visible change this PR brings as well is that the repr of the
card changes slightly. Now it indicates what type of section we're
dealing with, e.g. TableSection(...), except if it's a PlainSection,
which, same as previously, just returns the repr of its content.

This commit is WIP because I also added new "description" parameters to
add_* methods that were missing it so far. Now, if a user adds, say, a
plot, they can optionally add a description for the plot too. This
functionality lacks unit tests as of now.
@BenjaminBossan
Copy link
Collaborator Author

@skops-dev/maintainers Ready for review. After 5 restarts or so, codecov also passed (the "uncovered line" is a NotImplementedError in a base class...). Adrin, this refactor is what we shortly discussed earlier today.

(btw., this refactor was greatly facilitated by mypy)

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Are you also adding description to all add_ methods?

Comment on lines 269 to 270
title, description = "", ""
res = TableSection(title, description, table=table).format()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably have done

Suggested change
title, description = "", ""
res = TableSection(title, description, table=table).format()
res = TableSection(title="", description="", table=table).format()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (though the argument isn't always description, even if it might serve the purpose, which is why I had it that way).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there missing commits you haven't pushed? here you say done, but I don't see a change.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I just missed this one. Should be done now.

skops/card/_model_card.py Outdated Show resolved Hide resolved
skops/card/_model_card.py Outdated Show resolved Hide resolved
skops/card/_model_card.py Outdated Show resolved Hide resolved
The add_permutation_importances method now also has a description
argument.

I also reworked the tests for permutation importances a bit:

- added a new test for description
- put all related tests inside a test class
- moved the permutation importance computation into a fixture
- moved to the new testing method of selecting section and checking its
  content exactly instead of checking a substring in model_card.render()
Now, Section is a concrete class (taking the role of PlainSection).
Copy link
Collaborator Author

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I addressed your comments.

Are you also adding description to all add_ methods?

I only see add_permutation_importance, right? For add itself, it wouldn't make sense.

While making the change for permutation importance, I also reworked the tests a bit:

  • put all related tests inside a test class
  • moved the permutation importance computation into a fixture
  • moved to the new testing method of selecting section and checking its content exactly instead of checking a substring in model_card.render() (more precise testing, consistent with other tests)

But the content itself of the tests remains the same as previously.

Comment on lines 269 to 270
title, description = "", ""
res = TableSection(title, description, table=table).format()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (though the argument isn't always description, even if it might serve the purpose, which is why I had it that way).

skops/card/_model_card.py Outdated Show resolved Hide resolved
skops/card/_model_card.py Outdated Show resolved Hide resolved
skops/card/_model_card.py Outdated Show resolved Hide resolved
Missing from last commit.
Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this.

@adrinjalali adrinjalali merged commit 9f73d47 into skops-dev:main Mar 7, 2023
BenjaminBossan added a commit to BenjaminBossan/skops that referenced this pull request Mar 7, 2023
We merged skops-dev#298 and skops-dev#310 shortly after each other but they contained an
incompatibility that broke the fairlearn tests (the code itself was
fine). This PR fixes this incompatibility.

On top, I added the description argument to add_fairlearn_metric_frame,
to be consistent with all the other methods, and also as a test for it.

Finally, 2 small fixes:

- Added type annotation to transpose argument
- Changed order of arguments in docstring to match order in signature
adrinjalali pushed a commit that referenced this pull request Mar 7, 2023
We merged #298 and #310 shortly after each other but they contained an incompatibility that broke the fairlearn tests (the code itself was fine). This PR fixes this incompatibility.

To be clear, the only change needed to fix the tests is the following:

```python
- actual_table = card.select("Metric Frame Table").content.format()
+ actual_table = card.select("Metric Frame Table").format()
```

On top, I added the `description` argument to `add_fairlearn_metric_frame`, to be consistent with all the other methods (also changed in #310), and I also added as a test for it. Since we now have 2 tests, I moved the `metric_frame` variable to a fixture.

Finally, 2 small fixes:

- Added type annotation to transpose argument
- Changed order of arguments in docstring to match order in signature
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants