Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ModelVisualizer + VisualPipeline #1200

Open
falcaopetri opened this issue Oct 17, 2021 · 1 comment
Open

ModelVisualizer + VisualPipeline #1200

falcaopetri opened this issue Oct 17, 2021 · 1 comment
Labels
type: technical debt work to optimize or generalize code

Comments

@falcaopetri
Copy link
Contributor

Is your feature request related to a problem? Please describe.
I was trying to combine the KElbowVisualizer with VisualPipeline but was stuck getting the unexpected error below:

AttributeError: 'KMeans' object has no attribute 'axes'

Full pipeline snippet
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_blobs

from yellowbrick.cluster import KElbowVisualizer
from yellowbrick.pipeline import VisualPipeline

X, _ = make_blobs(n_samples=100, centers=3, n_features=2, random_state=0)
pipe = VisualPipeline([
    ('scale', StandardScaler()),
    ('elbow', KElbowVisualizer(KMeans()))
])
pipe.fit_transform_show(X)

The important part is that I was trying to run pipe.fit_transform_show(X). Of course, pipe.fit(X); pipe.show() worked fine, and was similar to the usage in KElbowVisualizer's example.

The issue

The issue is that VisualPipeline will try to call KElbowVisualizer.fit_transform, which does not exist. Due to Wrapper, KMeans.fit_transform will be executed instead, which means that KElbowVisualizer.fit is never called.

Feature request

The main problem was my rush to get it all in one line, fit_transform_show(). I still think though it's an honest usage and might be tried by other people. Some ideas came in mind to improve this usage:

  1. Better documentation and more examples about VisualPipeline. It's a nice feature, but I was only able to find brief referentes to it such as in Classification Visualizers.
  2. Have VisualPipeline implementing a fit_show(X) method. This would probably yield more confusion, but has the nice property of not returning the undesired transformed output (when compared to fit_transform_show).
  3. FeatureVisualizer currently implements sklearn's TransformerMixin, but ModelVisualizer does not. In contrast, KMeans, although an estimator, also implements the TransformerMixin. ModelVisualizer (or simply Visualizer) implementing the TransformerMixin would force the call of KElbowVisualizer().fit().transform().
    3.1. A complementary approach would be ModelVisualizer implementing an empty transform method. This would allow us to apply KElbowVisualizer.fit_transform_show() without getting back an array of distances (from KMeans.transform)

I don't know the impact in the other ModelVisualizer's or on VisualPipeline usages, but I'd be glad to help implementing these or other ideas to improve VisualPipeline.

@bbengfort
Copy link
Member

@falcaopetri thank you for contributing to Yellowbrick and for reporting the issue that you're having with the VisualPipeline. I know that the VisualPipeline needs a lot of love, it's more or less a prototype and has not really risen to the level of core functionality in Yellowbrick quite yet. All of your suggestions are excellent though! If you're interested in a second PR after we get through #1202 - we'd be happy to review your work!

@bbengfort bbengfort added the type: technical debt work to optimize or generalize code label Nov 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: technical debt work to optimize or generalize code
Projects
None yet
Development

No branches or pull requests

2 participants