ENH Refactor Model Card #37

merveenoyan · 2022-07-18T16:54:33Z

Partially solves #29

merveenoyan · 2022-07-18T17:02:50Z

@adrinjalali @BenjaminBossan I cleaned up a little as we talked and updated example so you can see the usage.
Looks like this.

Note that this is a draft PR and I haven't implemented add_inspection and add methods yet.

examples/plot_model_card.py

merveenoyan · 2022-07-19T12:17:57Z

@adrinjalali this might be redundant/trivial but I wanted to confirm we previously talked about how we should enable people to create their own Displays both in #29 and the model card examination PR. I looked at source of Display objects and they're essentially matplotlib plots. I say add_inspection method doesn't limit to Display objects (which are again matplotlib plots) and we take matplotlib plots in general (which I thought of when I was writing the docstring for it). We can put two examples inside the plot_model_card one including a matplotlib plot and other including a Display.

BenjaminBossan · 2022-07-19T13:33:58Z

First of all, thanks for working on this and taking into account the feedback we gave. From my POV, this is going in the right direction.

I don't have a full blown review, but some more high level comments:

In the example, it is still required for the user to create a CardData object from model_cards. My personal preference would be that we don't require that. Instead, could we not have special fields on the Card class like tags, which we would use to populate the CardData when the user exports their model card? That way, they don't need to bother with that.

Probably, there is an argument to be made that this card data constitutes meta data and should thus be treated differently. In that case, we could have a special method add_metadata. WDYT?

The save method may need some cleanup.

A lot of the code is concerned with determining the template path, I think that should be factored out.
The method mutates self.template_sections, we should avoid this kind of side-effect if possible.
Do we want to write it in a more generic way? E.g. have an argument like format='huggingface' and call a specific method when for each format (only HF at the start).

Regarding the code in extract_estimator_config, I would propose that we use tabulate instead of concatenating strings to create tables, which becomes unwieldy very quickly. Of course, we should avoid adding dependencies if possible but tabulate has been very dependable for me in the past and is a very thin library.

merveenoyan · 2022-07-19T13:47:11Z

@BenjaminBossan I know about tabulate but if this is the only place we tabulate things I'm not in favor of adding an additional dependency.
I don't see why template_sections being mutated is a problem. The saving is where we pass the template_sections dictionary so it should be there.
I can refactor template_path related code, sure. Again I'm not nitpicking atm really given it's not open to review yet.
For CardData part, sure we can treat it like any other section and let people set things using add. 🙂
something like below should work:

metadata_keys = ["language", "license", "tags", "datasets", "metrics", "model-index"]
card_data_keys = {}
for key in metadata_keys:
    card_data_keys[key] = self.template_sections.pop(key)
card_data = CardData(**card_data_keys)

Then the rest can be passed to rendering directly.

For this comment
Do we want to write it in a more generic way? E.g. have an argument like format='huggingface' and call a specific method when for each format (only HF at the start). I like the idea.

merveenoyan · 2022-07-19T14:26:06Z

I wouldn't like to limit the user from adding as many plots as they wish, does something like below sound ok to you??

# append plot_name at the end of the template 
template = open(self.template_sections["template_path"], 'a')
for section in self.template_sections:
    if section not in template:
        template.write("\n\n")
        template.write(f"{section}\n")
        template.write("{{ " + f"{section}"+ " }}\n")
template.close()

which simply adds plot_name and a slot for it in template.
@adrinjalali @BenjaminBossan

BenjaminBossan · 2022-07-19T15:32:38Z

I know about tabulate but if this is the only place we tabulate things I'm not in favor of adding an additional dependency.

Okay, let's see if there will be other places we need it. If it's just this single one, I can see your argument.

I don't see why template_sections being mutated is a problem. The saving is where we pass the template_sections dictionary so it should be there.

In general, I would try to avoid (non-obvious) side effects if possible, since they make it harder to reason about the code. E.g. it can easily happen that calling the same method twice leads to different outcomes.

I wouldn't like to limit the user from adding as many plots as they wish

Yes, we shouldn't limit the user. As for the specific implementation, I can't really judge, I would need to better understand the involved code first.

adrinjalali

I think this API looks a lot better. We can move to documentation and tests now IMO.

I left a few comments, but nothing major.

As for tabulate, I'd agree with @BenjaminBossan that it'd be useful, but I don't mind not adding it in this PR. I think we'd need more tabulation when we move to adding confusion matrix as a matrix as opposed to a plot and other tabular data we'd like to add in the future. We can add the dependency and refactor when those things happen.

skops/card/_model_card.py

merveenoyan · 2022-07-20T10:32:15Z

Model card created with this implementation is here @adrinjalali

adrinjalali

Looking pretty good, but my previous review still stands, please have a look at them as well :)

examples/plot_model_card.py

skops/card/_model_card.py

Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>

merveenoyan · 2022-07-20T11:19:09Z

Model card with the latest version: https://huggingface.co/merve/sklearn-9b93776d-fb95-4b81-b267-fa378476bf15

I'll add tests and documentation now @adrinjalali

merveenoyan · 2022-07-20T14:54:22Z

@adrinjalali updated the examples, updated tests and added more tests. The hub_utils tests aren't passing though.

merveenoyan · 2022-07-20T15:03:01Z

WARNING: /Users/mervenoyan/Desktop/skops/skops/examples/plot_hf_hub.py failed to execute correctly: Traceback (most recent call last):
  File "/Users/mervenoyan/Desktop/skops/skops/examples/plot_hf_hub.py", line 100, in <module>
    token = os.environ["HF_HUB_TOKEN"]
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/os.py", line 679, in __getitem__
    raise KeyError(key) from None
KeyError: 'HF_HUB_TOKEN'

I get above failing while I try to get a local build of docs.
On my local:

skops/card/tests/test_card.py::test_save_model_card PASSED
skops/card/tests/test_card.py::test_hyperparameter_table PASSED
skops/card/tests/test_card.py::test_plot_model PASSED
skops/card/tests/test_card.py::test_add PASSED
skops/card/tests/test_card.py::test_add_inspection PASSED
skops/hub_utils/tests/test_hf_hub.py::test_validate_folder PASSED
skops/hub_utils/tests/test_hf_hub.py::test_create_config PASSED
skops/hub_utils/tests/test_hf_hub.py::test_init PASSED
skops/hub_utils/tests/test_hf_hub.py::test_push_download[True] FAILED
skops/hub_utils/tests/test_hf_hub.py::test_push_download[False] FAILED
skops/hub_utils/tests/test_hf_hub.py::test_get_config PASSED

merveenoyan · 2022-07-20T15:05:52Z

Every time user adds something to template the default template file (if it's used) itself changes, maybe it's better if I add a temporary template when user saves the template with added goods.

adrinjalali · 2022-07-20T15:08:53Z

You can add your token to the local run by something like:

HF_HUB_TOKEN=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 make doc

Every time user adds something to template the default template file (if it's used) itself changes, maybe it's better if I add a temporary template when user saves the template with added goods.

Yeah the default template shouldn't change.

osanseviero · 2022-07-21T10:19:26Z

Is that an actual token? 👀 I would invalidate it from the Hub if it is

adrinjalali · 2022-07-21T10:24:43Z

Is that an actual token? eyes I would invalidate it from the Hub if it is

That token is already hard coded in hub_utils/tests/common.py. Related: #47

merveenoyan · 2022-07-25T17:44:43Z

examples/plot_hf_hub.py

skops/card/_model_card.py

adrinjalali

Some final nits, then I'm happy to merge.

examples/plot_model_card.py

skops/card/_model_card.py

osanseviero · 2022-07-26T14:16:59Z

I see many open discussions. Would be great if you could marked as resolved the ones that you verified are fixed or that don't require any additional action in this PR please 🙏 Great work! 🔥

merveenoyan · 2022-07-26T14:45:05Z

@adrinjalali codecov on windows fails for some reason. it wasn't failing before.

adrinjalali · 2022-07-26T15:00:02Z

@merveenoyan seems like an issue on codecov side, re-running or waiting a few hours seems to be the suggested action lol

adrinjalali

LGTM!

examples/plot_model_card.py

Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>

adrinjalali · 2022-07-26T15:09:19Z

Since this is quite significant, I rather not merge with one approval. Let's wait for another one.

…model_card_class

BenjaminBossan

We're almost good to go.

The only real issue I had was still with the save method implementation. Please take a look at my suggestions.

BenjaminBossan · 2022-07-27T08:07:51Z

skops/card/tests/test_card.py

+    # read original template
+    with open(Path(root[0]) / "card" / "default_template.md") as f:
+        default_template = f.read()
+        f.seek(0)


Do we need the seek operation? It doesn't seem to do anything. I deleted it and the test still passed.

BenjaminBossan · 2022-07-27T09:04:48Z

skops/card/_model_card.py

+            self._figure_paths[plot_name] = plot_path
+        return self
+
+    def save(self, path):


I have a few issues with this function, please let me know if you agree:

It mutates self.template_sections

As a user, I would be surprised to find that the save method actually mutates the object itself. At the moment, that might not cause a bug, but this type of hidden mutation has the tendency to come back to bite you. Therefore, I would avoid it except if there is no good reason.

To achieve this, simply declare a copy at the beginning of the method, such as template_sections = self.template_sections.copy(). After that, each time you use self.template_sections in this method, use template_sections instead.

It would be even safer to use copy.deepcopy but here we're creating a whole duplicate, I'm not sure if that can be too expensive in some circumstances (doesn't seem so but I'm not sure).

If you agree with this change, here is a unit test that would fail without the change but passes with the change:

import copy ... def test_template_sections_not_mutated_by_save(destination_path, model_card): # TODO template_sections_before = copy.deepcopy(model_card.template_sections) model_card.save(Path(destination_path) / "README.md") template_sections_after = copy.deepcopy(model_card.template_sections) assert template_sections_before == template_sections_after

Remove if self._figure_paths: branch

I saw that there is some code duplication that we can avoid. It is not strictly necessary to check if self._figure_paths and then do an else. The reason it can be avoided is because the for plot in self._figure_paths already is a no-op if self._figure_paths is empty, so the outcome is the same. Thus this whole code:

if self._figure_paths: with tempfile.TemporaryDirectory() as tmpdirname: shutil.copyfile( self.template_sections["template_path"], f"{tmpdirname}/temporary_template.md", ) # create a temporary template with the additional plots self.template_sections[ "template_path" ] = f"{tmpdirname}/temporary_template.md" # add plots at the end of the template with open(self.template_sections["template_path"], "a") as template: for plot in self._figure_paths: template.write( f"\n\n{plot}\n" + f"![{plot}]({self._figure_paths[plot]})\n\n" ) card = ModelCard.from_template( card_data=card_data, hyperparameter_table=self.hyperparameter_table, model_plot=self.model_plot, **self.template_sections, ) else: card = ModelCard.from_template( card_data=card_data, hyperparameter_table=self.hyperparameter_table, model_plot=self.model_plot, **self.template_sections, )

can be simplified to:

with tempfile.TemporaryDirectory() as tmpdirname: shutil.copyfile( template_sections["template_path"], f"{tmpdirname}/temporary_template.md", ) # create a temporary template with the additional plots template_sections["template_path"] = f"{tmpdirname}/temporary_template.md" # add plots at the end of the template with open(template_sections["template_path"], "a") as template: for plot in self._figure_paths: template.write( f"\n\n{plot}\n" + f"![{plot}]({self._figure_paths[plot]})\n\n" ) card = ModelCard.from_template( card_data=card_data, hyperparameter_table=self.hyperparameter_table, model_plot=self.model_plot, **template_sections, )

There is, however, a small disadvantage to the proposed change: It will result in the temporary copy of the template file to be made even if there is no plot. IMHO this is a small price to pay for the code simplification but feel free to disagree.

Breaking down the method

This should probably be done in another PR but I wanted to write it down or else I'll forget:

The save method does too many different things at the same time and should be broken down. E.g. there could be one sub-method to get the card_data and another one to get the card.

merveenoyan · 2022-07-27T11:33:53Z

@BenjaminBossan addressed your comments, used deepcopy.

skops/card/_model_card.py

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

BenjaminBossan

Thanks, a big change, happy to see it ready.

merveenoyan added 2 commits July 18, 2022 18:37

refactored card class

b0049bd

updated example

1d34c6f

adrinjalali reviewed Jul 19, 2022

View reviewed changes

examples/plot_model_card.py Outdated Show resolved Hide resolved

add function and removed adding sections in save

e8d27e4

adrinjalali reviewed Jul 20, 2022

View reviewed changes

skops/card/_model_card.py Outdated Show resolved Hide resolved

skops/card/_model_card.py Outdated Show resolved Hide resolved

skops/card/_model_card.py Outdated Show resolved Hide resolved

skops/card/_model_card.py Outdated Show resolved Hide resolved

added plots & more

c9414b4

merveenoyan requested a review from adrinjalali July 20, 2022 10:20

merveenoyan marked this pull request as ready for review July 20, 2022 10:21

adrinjalali reviewed Jul 20, 2022

View reviewed changes

examples/plot_model_card.py Outdated Show resolved Hide resolved

skops/card/_model_card.py Outdated Show resolved Hide resolved

skops/card/_model_card.py Outdated Show resolved Hide resolved

skops/card/_model_card.py Outdated Show resolved Hide resolved

adrinjalali added this to the 0.1 milestone Jul 20, 2022

merveenoyan and others added 2 commits July 20, 2022 13:15

addressed comments

6000294

Update skops/card/_model_card.py

c9331ae

Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>

added docs and fixed tests

4901ee0

merveenoyan requested a review from adrinjalali July 20, 2022 14:54

fix test

1ba069a

added temporary plot

7bd29b1

putting ellipsis inside object

e0691ff

swapped namedtemporaryfile with tempdir for windows tests

9e338b5

adrinjalali reviewed Jul 26, 2022

View reviewed changes

examples/plot_hf_hub.py Show resolved Hide resolved

skops/card/_model_card.py Outdated Show resolved Hide resolved

skops/card/_model_card.py Show resolved Hide resolved

skops/card/_model_card.py Outdated Show resolved Hide resolved

merveenoyan added 5 commits July 26, 2022 12:21

removed ellipsis at every line

f15b7d4

brought back ellipsis

68fd0e9

revert whitespaces for ellipsis

2f2b05f

added model card generation to fixtures and added tests to metadata

cb6379a

added model card generation to fixtures and added tests to metadata

57ab801

adrinjalali reviewed Jul 26, 2022

View reviewed changes

examples/plot_model_card.py Outdated Show resolved Hide resolved

examples/plot_model_card.py Outdated Show resolved Hide resolved

skops/card/_model_card.py Show resolved Hide resolved

skops/card/_model_card.py Show resolved Hide resolved

misc nits

bfc9f4d

BenjaminBossan mentioned this pull request Jul 26, 2022

Implement __repr__ and __str__ for Card object #59

Closed

adrinjalali approved these changes Jul 26, 2022

View reviewed changes

examples/plot_model_card.py Outdated Show resolved Hide resolved

Update examples/plot_model_card.py

adbb9e3

Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>

merveenoyan added 2 commits July 26, 2022 17:11

removed extra parantheses

e01143d

Merge branch 'model_card_class' of github.com:merveenoyan/skops into …

3407014

…model_card_class

adrinjalali mentioned this pull request Jul 26, 2022

ENH add examples and dtypes to CardData and config.json #45

Merged

BenjaminBossan reviewed Jul 27, 2022

View reviewed changes

addressed comments

8eaf98b

BenjaminBossan reviewed Jul 27, 2022

View reviewed changes

skops/card/_model_card.py Outdated Show resolved Hide resolved

BenjaminBossan reviewed Jul 27, 2022

View reviewed changes

skops/card/_model_card.py Outdated Show resolved Hide resolved

merveenoyan and others added 2 commits July 27, 2022 13:40

Update skops/card/_model_card.py

37858c1

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

Update skops/card/_model_card.py

ce57f6d

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

BenjaminBossan approved these changes Jul 27, 2022

View reviewed changes

BenjaminBossan merged commit d8f2afe into skops-dev:main Jul 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH Refactor Model Card #37

ENH Refactor Model Card #37

merveenoyan commented Jul 18, 2022

merveenoyan commented Jul 18, 2022 •

edited

Loading

merveenoyan commented Jul 19, 2022 •

edited

Loading

BenjaminBossan commented Jul 19, 2022

merveenoyan commented Jul 19, 2022 •

edited

Loading

merveenoyan commented Jul 19, 2022 •

edited

Loading

BenjaminBossan commented Jul 19, 2022

adrinjalali left a comment

merveenoyan commented Jul 20, 2022

adrinjalali left a comment

merveenoyan commented Jul 20, 2022

merveenoyan commented Jul 20, 2022 •

edited

Loading

merveenoyan commented Jul 20, 2022 •

edited

Loading

merveenoyan commented Jul 20, 2022 •

edited

Loading

adrinjalali commented Jul 20, 2022

osanseviero commented Jul 21, 2022

adrinjalali commented Jul 21, 2022

merveenoyan commented Jul 25, 2022

adrinjalali left a comment

osanseviero commented Jul 26, 2022 •

edited

Loading

merveenoyan commented Jul 26, 2022 •

edited

Loading

adrinjalali commented Jul 26, 2022

adrinjalali left a comment

adrinjalali commented Jul 26, 2022 •

edited

Loading

BenjaminBossan left a comment

BenjaminBossan Jul 27, 2022

BenjaminBossan Jul 27, 2022

merveenoyan commented Jul 27, 2022

BenjaminBossan left a comment

ENH Refactor Model Card #37

ENH Refactor Model Card #37

Conversation

merveenoyan commented Jul 18, 2022

merveenoyan commented Jul 18, 2022 • edited Loading

merveenoyan commented Jul 19, 2022 • edited Loading

BenjaminBossan commented Jul 19, 2022

merveenoyan commented Jul 19, 2022 • edited Loading

merveenoyan commented Jul 19, 2022 • edited Loading

BenjaminBossan commented Jul 19, 2022

adrinjalali left a comment

Choose a reason for hiding this comment

merveenoyan commented Jul 20, 2022

adrinjalali left a comment

Choose a reason for hiding this comment

merveenoyan commented Jul 20, 2022

merveenoyan commented Jul 20, 2022 • edited Loading

merveenoyan commented Jul 20, 2022 • edited Loading

merveenoyan commented Jul 20, 2022 • edited Loading

adrinjalali commented Jul 20, 2022

osanseviero commented Jul 21, 2022

adrinjalali commented Jul 21, 2022

merveenoyan commented Jul 25, 2022

adrinjalali left a comment

Choose a reason for hiding this comment

osanseviero commented Jul 26, 2022 • edited Loading

merveenoyan commented Jul 26, 2022 • edited Loading

adrinjalali commented Jul 26, 2022

adrinjalali left a comment

Choose a reason for hiding this comment

adrinjalali commented Jul 26, 2022 • edited Loading

BenjaminBossan left a comment

Choose a reason for hiding this comment

BenjaminBossan Jul 27, 2022

Choose a reason for hiding this comment

BenjaminBossan Jul 27, 2022

Choose a reason for hiding this comment

merveenoyan commented Jul 27, 2022

BenjaminBossan left a comment

Choose a reason for hiding this comment

merveenoyan commented Jul 18, 2022 •

edited

Loading

merveenoyan commented Jul 19, 2022 •

edited

Loading

merveenoyan commented Jul 19, 2022 •

edited

Loading

merveenoyan commented Jul 19, 2022 •

edited

Loading

merveenoyan commented Jul 20, 2022 •

edited

Loading

merveenoyan commented Jul 20, 2022 •

edited

Loading

merveenoyan commented Jul 20, 2022 •

edited

Loading

osanseviero commented Jul 26, 2022 •

edited

Loading

merveenoyan commented Jul 26, 2022 •

edited

Loading

adrinjalali commented Jul 26, 2022 •

edited

Loading