-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH Refactor Model Card #37
Conversation
@adrinjalali @BenjaminBossan I cleaned up a little as we talked and updated example so you can see the usage. Note that this is a draft PR and I haven't implemented |
@adrinjalali this might be redundant/trivial but I wanted to confirm we previously talked about how we should enable people to create their own Displays both in #29 and the model card examination PR. I looked at source of Display objects and they're essentially matplotlib plots. I say |
First of all, thanks for working on this and taking into account the feedback we gave. From my POV, this is going in the right direction. I don't have a full blown review, but some more high level comments: In the example, it is still required for the user to create a Probably, there is an argument to be made that this card data constitutes meta data and should thus be treated differently. In that case, we could have a special method The
Regarding the code in |
@BenjaminBossan I know about
Then the rest can be passed to rendering directly. For this comment |
I wouldn't like to limit the user from adding as many plots as they wish, does something like below sound ok to you??
which simply adds |
Okay, let's see if there will be other places we need it. If it's just this single one, I can see your argument.
In general, I would try to avoid (non-obvious) side effects if possible, since they make it harder to reason about the code. E.g. it can easily happen that calling the same method twice leads to different outcomes.
Yes, we shouldn't limit the user. As for the specific implementation, I can't really judge, I would need to better understand the involved code first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this API looks a lot better. We can move to documentation and tests now IMO.
I left a few comments, but nothing major.
As for tabulate
, I'd agree with @BenjaminBossan that it'd be useful, but I don't mind not adding it in this PR. I think we'd need more tabulation when we move to adding confusion matrix as a matrix as opposed to a plot and other tabular data we'd like to add in the future. We can add the dependency and refactor when those things happen.
Model card created with this implementation is here @adrinjalali |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking pretty good, but my previous review still stands, please have a look at them as well :)
Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
Model card with the latest version: https://huggingface.co/merve/sklearn-9b93776d-fb95-4b81-b267-fa378476bf15 I'll add tests and documentation now @adrinjalali |
@adrinjalali updated the examples, updated tests and added more tests. The hub_utils tests aren't passing though. |
I get above failing while I try to get a local build of docs.
|
Every time user adds something to template the default template file (if it's used) itself changes, maybe it's better if I add a temporary template when user saves the template with added goods. |
You can add your token to the local run by something like:
Yeah the default template shouldn't change. |
Is that an actual token? 👀 I would invalidate it from the Hub if it is |
That token is already hard coded in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some final nits, then I'm happy to merge.
I see many open discussions. Would be great if you could marked as resolved the ones that you verified are fixed or that don't require any additional action in this PR please 🙏 Great work! 🔥 |
@adrinjalali codecov on windows fails for some reason. it wasn't failing before. |
@merveenoyan seems like an issue on codecov side, re-running or waiting a few hours seems to be the suggested action lol |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
Since this is quite significant, I rather not merge with one approval. Let's wait for another one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're almost good to go.
The only real issue I had was still with the save
method implementation. Please take a look at my suggestions.
skops/card/tests/test_card.py
Outdated
# read original template | ||
with open(Path(root[0]) / "card" / "default_template.md") as f: | ||
default_template = f.read() | ||
f.seek(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the seek operation? It doesn't seem to do anything. I deleted it and the test still passed.
self._figure_paths[plot_name] = plot_path | ||
return self | ||
|
||
def save(self, path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few issues with this function, please let me know if you agree:
- It mutates
self.template_sections
As a user, I would be surprised to find that the save
method actually mutates the object itself. At the moment, that might not cause a bug, but this type of hidden mutation has the tendency to come back to bite you. Therefore, I would avoid it except if there is no good reason.
To achieve this, simply declare a copy at the beginning of the method, such as template_sections = self.template_sections.copy()
. After that, each time you use self.template_sections
in this method, use template_sections
instead.
It would be even safer to use copy.deepcopy
but here we're creating a whole duplicate, I'm not sure if that can be too expensive in some circumstances (doesn't seem so but I'm not sure).
If you agree with this change, here is a unit test that would fail without the change but passes with the change:
import copy
...
def test_template_sections_not_mutated_by_save(destination_path, model_card):
# TODO
template_sections_before = copy.deepcopy(model_card.template_sections)
model_card.save(Path(destination_path) / "README.md")
template_sections_after = copy.deepcopy(model_card.template_sections)
assert template_sections_before == template_sections_after
- Remove
if self._figure_paths:
branch
I saw that there is some code duplication that we can avoid. It is not strictly necessary to check if self._figure_paths
and then do an else
. The reason it can be avoided is because the for plot in self._figure_paths
already is a no-op if self._figure_paths
is empty, so the outcome is the same. Thus this whole code:
if self._figure_paths:
with tempfile.TemporaryDirectory() as tmpdirname:
shutil.copyfile(
self.template_sections["template_path"],
f"{tmpdirname}/temporary_template.md",
)
# create a temporary template with the additional plots
self.template_sections[
"template_path"
] = f"{tmpdirname}/temporary_template.md"
# add plots at the end of the template
with open(self.template_sections["template_path"], "a") as template:
for plot in self._figure_paths:
template.write(
f"\n\n{plot}\n"
+ f"![{plot}]({self._figure_paths[plot]})\n\n"
)
card = ModelCard.from_template(
card_data=card_data,
hyperparameter_table=self.hyperparameter_table,
model_plot=self.model_plot,
**self.template_sections,
)
else:
card = ModelCard.from_template(
card_data=card_data,
hyperparameter_table=self.hyperparameter_table,
model_plot=self.model_plot,
**self.template_sections,
)
can be simplified to:
with tempfile.TemporaryDirectory() as tmpdirname:
shutil.copyfile(
template_sections["template_path"],
f"{tmpdirname}/temporary_template.md",
)
# create a temporary template with the additional plots
template_sections["template_path"] = f"{tmpdirname}/temporary_template.md"
# add plots at the end of the template
with open(template_sections["template_path"], "a") as template:
for plot in self._figure_paths:
template.write(
f"\n\n{plot}\n"
+ f"![{plot}]({self._figure_paths[plot]})\n\n"
)
card = ModelCard.from_template(
card_data=card_data,
hyperparameter_table=self.hyperparameter_table,
model_plot=self.model_plot,
**template_sections,
)
There is, however, a small disadvantage to the proposed change: It will result in the temporary copy of the template file to be made even if there is no plot. IMHO this is a small price to pay for the code simplification but feel free to disagree.
- Breaking down the method
This should probably be done in another PR but I wanted to write it down or else I'll forget:
The save
method does too many different things at the same time and should be broken down. E.g. there could be one sub-method to get the card_data
and another one to get the card
.
@BenjaminBossan addressed your comments, used deepcopy. |
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, a big change, happy to see it ready.
Partially solves #29