-
-
Notifications
You must be signed in to change notification settings - Fork 328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: sklearn.pipeline.Pipeline model type #11
Comments
may need to use these |
Oh, interesting, will have to have a look at how that would work. Problem is that I need to pass the final model and the final input features to the shap explainer. But what could work is to put the data through the whole pipeline, except the final model. Store that as the input data. Then take out the final model and store that. And then get the shap values. Could work! |
So I could build something like this into the Explainer to support pipelines: take all steps except the last and use it to transform the input X, and take the final step of the pipeline and extract the model. However, here I make some strong assumption that the columns of the transformed X are the same as the input X.columns. This is not true in general though (e.g. with onehotencoders you would add additional columns). In general sklearn transformers output numpy arrays instead of dataframes, so that makes it a bit tricky to assign column names... Any idea on how best to handle this in order to support Pipelines? from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True, as_frame=True)
pipe = Pipeline(steps=[
('standardscale', StandardScaler()),
('model', RandomForestClassifier())]).fit(X, y)
def split_pipeline(pipeline, X):
X_transformed = pd.DataFrame(Pipeline(pipeline.steps[:-1]).transform(X), columns=X.columns)
model = pipeline.steps[-1][1]
return X_transformed, model
Xt, model = split_pipeline(pipe, X)
model.predict(Xt)
explainer = ClassifierExplainer(model, Xt, y) |
So getting the feature names of the transformed dataframe seems to be an as of yet unresolved issue in sklearn (although multiple SLEPs have been proposed to deal with the issue). For now I added support for Pipelines as long as they do not add, remove or reorder any columns in the input dataframe. (next release). When sklearn.Pipeline will support a proper |
Whenever possible I try and use sklearn's pipeline to log my transformations.
It would be great if explainerdashboard could work with these https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html
The text was updated successfully, but these errors were encountered: