Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dimension issue in dtreeviz_sklearn_pipeline_visualisations.ipynb #231

Closed
mepland opened this issue Jan 1, 2023 · 3 comments
Closed

Dimension issue in dtreeviz_sklearn_pipeline_visualisations.ipynb #231

mepland opened this issue Jan 1, 2023 · 3 comments
Milestone

Comments

@mepland
Copy link
Collaborator

mepland commented Jan 1, 2023

@parrt @tlapusan Looks like there is a bug with extract_params_from_pipeline(). Try running dtreeviz_sklearn_pipeline_visualisations.ipynb in the dev branch.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[10], line 1
----> 1 tree_classifier, X_train, features_model = extract_params_from_pipeline(
      2     pipeline=model,
      3     X_train=dataset[features],
      4     feature_names=features)

File ~/dtreeviz/dtreeviz/utils.py:192, in extract_params_from_pipeline(pipeline, X_train, feature_names)
    186 tree_model = pipeline.steps[-1][1]
    188 feature_names = _extract_final_feature_names(
    189     pipeline=pipeline,
    190     features=feature_names
    191 )
--> 192 X_train = pd.DataFrame(
    193     data=pipeline[:-1].transform(X_train),
    194     columns=feature_names
    195 )
    196 return tree_model, X_train, feature_names

File ~/.venvs/dtreeviz/lib64/python3.11/site-packages/pandas/core/frame.py:721, in DataFrame.__init__(self, data, index, columns, dtype, copy)
    711         mgr = dict_to_mgr(
    712             # error: Item "ndarray" of "Union[ndarray, Series, Index]" has no
    713             # attribute "name"
   (...)
    718             typ=manager,
    719         )
    720     else:
--> 721         mgr = ndarray_to_mgr(
    722             data,
    723             index,
    724             columns,
    725             dtype=dtype,
    726             copy=copy,
    727             typ=manager,
    728         )
    730 # For data is list-like, or Iterable (will consume into list)
    731 elif is_list_like(data):

File ~/.venvs/dtreeviz/lib64/python3.11/site-packages/pandas/core/internals/construction.py:349, in ndarray_to_mgr(values, index, columns, dtype, copy, typ)
    344 # _prep_ndarraylike ensures that values.ndim == 2 at this point
    345 index, columns = _get_axes(
    346     values.shape[0], values.shape[1], index=index, columns=columns
    347 )
--> 349 _check_values_indices_shape_match(values, index, columns)
    351 if typ == "array":
    353     if issubclass(values.dtype.type, str):

File ~/.venvs/dtreeviz/lib64/python3.11/site-packages/pandas/core/internals/construction.py:420, in _check_values_indices_shape_match(values, index, columns)
    418 passed = values.shape
    419 implied = (len(index), len(columns))
--> 420 raise ValueError(f"Shape of passed values is {passed}, indices imply {implied}")

ValueError: Shape of passed values is (891, 16), indices imply (891, 5)****
@tlapusan
Copy link
Collaborator

tlapusan commented Jan 2, 2023

@mepland ran on both master and dev and the notebook was working....but I was running with sklearn version 1.1.3 and I assume you have the latest version 1.2.0.

For version 1.1.3 there was a deprecated method which now is not supported for 1.2.0 and this cause the problem.

@windisch can you please to take a look also ?

from my first debug, I replaced the
hasattr(component, 'get_feature_names'):
with
'hasattr(component, 'get_feature_names_out'):'

but we still have an error. This is because now (version 1.2.0) the component[0] also has the 'get_feature_names_out' attribute. I fix it by including an 'elif'
for component in pipeline[:-1]: if hasattr(component, 'get_support'): features = [f for f, s in zip(features, component.get_support()) if s] elif hasattr(component, 'get_feature_names_out'): features = component.get_feature_names_out(features)
It works but you @windisch have more in depth details about pipelines and knows better if this a good fix and will be applicable for other pipelines. Thanks.

@mepland
Copy link
Collaborator Author

mepland commented Jan 2, 2023

Yes I was using sklearn 1.2.0.

@parrt
Copy link
Owner

parrt commented Jan 5, 2023

Fixed by #233

@parrt parrt closed this as completed Jan 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants