Dimension issue in dtreeviz_sklearn_pipeline_visualisations.ipynb #231

mepland · 2023-01-01T21:17:06Z

@parrt @tlapusan Looks like there is a bug with extract_params_from_pipeline(). Try running dtreeviz_sklearn_pipeline_visualisations.ipynb in the dev branch.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[10], line 1
----> 1 tree_classifier, X_train, features_model = extract_params_from_pipeline(
      2     pipeline=model,
      3     X_train=dataset[features],
      4     feature_names=features)

File ~/dtreeviz/dtreeviz/utils.py:192, in extract_params_from_pipeline(pipeline, X_train, feature_names)
    186 tree_model = pipeline.steps[-1][1]
    188 feature_names = _extract_final_feature_names(
    189     pipeline=pipeline,
    190     features=feature_names
    191 )
--> 192 X_train = pd.DataFrame(
    193     data=pipeline[:-1].transform(X_train),
    194     columns=feature_names
    195 )
    196 return tree_model, X_train, feature_names

File ~/.venvs/dtreeviz/lib64/python3.11/site-packages/pandas/core/frame.py:721, in DataFrame.__init__(self, data, index, columns, dtype, copy)
    711         mgr = dict_to_mgr(
    712             # error: Item "ndarray" of "Union[ndarray, Series, Index]" has no
    713             # attribute "name"
   (...)
    718             typ=manager,
    719         )
    720     else:
--> 721         mgr = ndarray_to_mgr(
    722             data,
    723             index,
    724             columns,
    725             dtype=dtype,
    726             copy=copy,
    727             typ=manager,
    728         )
    730 # For data is list-like, or Iterable (will consume into list)
    731 elif is_list_like(data):

File ~/.venvs/dtreeviz/lib64/python3.11/site-packages/pandas/core/internals/construction.py:349, in ndarray_to_mgr(values, index, columns, dtype, copy, typ)
    344 # _prep_ndarraylike ensures that values.ndim == 2 at this point
    345 index, columns = _get_axes(
    346     values.shape[0], values.shape[1], index=index, columns=columns
    347 )
--> 349 _check_values_indices_shape_match(values, index, columns)
    351 if typ == "array":
    353     if issubclass(values.dtype.type, str):

File ~/.venvs/dtreeviz/lib64/python3.11/site-packages/pandas/core/internals/construction.py:420, in _check_values_indices_shape_match(values, index, columns)
    418 passed = values.shape
    419 implied = (len(index), len(columns))
--> 420 raise ValueError(f"Shape of passed values is {passed}, indices imply {implied}")

ValueError: Shape of passed values is (891, 16), indices imply (891, 5)****

The text was updated successfully, but these errors were encountered:

tlapusan · 2023-01-02T09:14:15Z

@mepland ran on both master and dev and the notebook was working....but I was running with sklearn version 1.1.3 and I assume you have the latest version 1.2.0.

For version 1.1.3 there was a deprecated method which now is not supported for 1.2.0 and this cause the problem.

@windisch can you please to take a look also ?

from my first debug, I replaced the
hasattr(component, 'get_feature_names'):
with
'hasattr(component, 'get_feature_names_out'):'

but we still have an error. This is because now (version 1.2.0) the component[0] also has the 'get_feature_names_out' attribute. I fix it by including an 'elif'
for component in pipeline[:-1]: if hasattr(component, 'get_support'): features = [f for f, s in zip(features, component.get_support()) if s] elif hasattr(component, 'get_feature_names_out'): features = component.get_feature_names_out(features)
It works but you @windisch have more in depth details about pipelines and knows better if this a good fix and will be applicable for other pipelines. Thanks.

mepland · 2023-01-02T17:58:28Z

Yes I was using sklearn 1.2.0.

parrt · 2023-01-05T01:44:02Z

Fixed by #233

windisch mentioned this issue Jan 3, 2023

Adapt utility function for sklearn>=1.2.0 #233

Merged

parrt added the compatibility label Jan 5, 2023

parrt added this to the 2.1 milestone Jan 5, 2023

parrt closed this as completed Jan 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dimension issue in dtreeviz_sklearn_pipeline_visualisations.ipynb #231

Dimension issue in dtreeviz_sklearn_pipeline_visualisations.ipynb #231

mepland commented Jan 1, 2023

tlapusan commented Jan 2, 2023 •

edited

Loading

mepland commented Jan 2, 2023

parrt commented Jan 5, 2023

Dimension issue in dtreeviz_sklearn_pipeline_visualisations.ipynb #231

Dimension issue in dtreeviz_sklearn_pipeline_visualisations.ipynb #231

Comments

mepland commented Jan 1, 2023

tlapusan commented Jan 2, 2023 • edited Loading

mepland commented Jan 2, 2023

parrt commented Jan 5, 2023

tlapusan commented Jan 2, 2023 •

edited

Loading