Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC california housing example #308

Merged

Conversation

BenjaminBossan
Copy link
Collaborator

To quote:

The goal of this exercise is to go through a semi-realistic data science
and machine learning task and develop a practical solution for it. We
will learn about the following topics:

  • Perform exploratory data analysis
  • Do some non-trivial feature engineering
  • Explain how the feature engineering informs the choice of machine
    learning model
    and vice versa
  • Show how to make use of a couple of advanced scikit-learn features
    and explain why we use them - Create a model card that provides
    useful information about the model
  • Share the model by uploading it to the Hugging Face Hub

To quote:

The goal of this exercise is to go through a semi-realistic data science
and machine learning task and develop a practical solution for it. We
will learn about the following topics:

- Perform *exploratory data analysis*
- Do some non-trivial *feature engineering*
- Explain how the feature engineering informs the *choice of machine
  learning model* and vice versa
- Show how to make use of a couple of *advanced scikit-learn* features
  and explain why we use them - Create a *model card* that provides
  useful information about the model
- Share the model by uploading it to the *Hugging Face Hub*
@BenjaminBossan
Copy link
Collaborator Author

@skops-dev/maintainers ready for review

It was unfortunately quite some work to convert this from ipynb to this percent-py format. Hopefully, I didn't miss anything.

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to ship this, it's pretty nice!

@@ -158,7 +158,7 @@
model_card.add_permutation_importances(
importances,
X_test.columns,
plot_file=Path(local_repo) / "importance.png",
plot_file=str(Path(local_repo) / "importance.png"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this change needed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add_permutation_importances expects plot_file to be str.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we be fixing that instead?

examples/plot_california_housing.py Show resolved Hide resolved
examples/plot_california_housing.py Show resolved Hide resolved
examples/plot_california_housing.py Outdated Show resolved Hide resolved
@adrinjalali adrinjalali added this to the v0.6 milestone Mar 21, 2023
Copy link
Collaborator Author

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replied to your comments, please review again.

@@ -158,7 +158,7 @@
model_card.add_permutation_importances(
importances,
X_test.columns,
plot_file=Path(local_repo) / "importance.png",
plot_file=str(Path(local_repo) / "importance.png"),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add_permutation_importances expects plot_file to be str.

examples/plot_california_housing.py Show resolved Hide resolved
examples/plot_california_housing.py Show resolved Hide resolved
examples/plot_california_housing.py Outdated Show resolved Hide resolved
Previously, it was annotated as only taking str.
@adrinjalali
Copy link
Member

RTD failing?

@BenjaminBossan
Copy link
Collaborator Author

For posterity, a quick summary of what caused the bug:

When sphinx-gallery runs the examples, it seems to use a single process for all. Because of that, sklearnex's patch_sklearn() was active when running the new example. This resulted in the new example using the patched KNeighborRegressor, even though it was not supposed to.

KNeighborRegressor from sklearnex has a bug, namely that it has a classes_ attribute, even though it is a regressor. Consequently, DecisionBoundaryDisplay fails because it makes a check hasattr(estimator, "classes_") to determine if the estimator is a classifier. This will result in the observed failure down the line.

When running the new example in isolation, the bug does not occur because patching is not happening.

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the debugging @BenjaminBossan , this was nasty to deal with.

cc @ahuber21, @napetrov, your example is causing very hard to debug issues. We're removing relevant pieces from the example.

Comment on lines 247 to 248
# in faster inference times, since loading a persisted model will always load
# the objects exactly as they were saved.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment section should also be removed, I think we should remove all mentions of patch_sklearn() since it should never been used.

@ahuber21
Copy link
Contributor

As an intermediate check, you could also call unpatch_sklearn() to restore the original state in the sphinx-gallery process. The part of the example using patch_sklearn() was just left there to clarify the behavior of loading unpatched models, since it caused some discussion before. Sorry for the trouble!

@adrinjalali adrinjalali changed the title Doc california housing example DOC california housing example Mar 23, 2023
@adrinjalali adrinjalali merged commit 3490842 into skops-dev:main Mar 23, 2023
@BenjaminBossan BenjaminBossan deleted the DOC-california-housing-example branch March 23, 2023 14:07
@napetrov
Copy link

@BenjaminBossan - thanks for reporting - fixing this here - intel/scikit-learn-intelex#1224

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants