Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to explain prediction for a data with just a few features (from all features of training dataset)? #731

Open
williamty opened this issue Oct 22, 2023 · 2 comments

Comments

@williamty
Copy link

I have generated lightGBM models for prediction. I can explain the predictions with all features by filling user input data with NAs. Is there any way to explain prediction for the original user input data without filling it?

@apoplexi24
Copy link

apoplexi24 commented Oct 25, 2023

You can run the LIME Explainer on select few columns/features too.
Change the LIME explainer from classification to regression if your model is used for regression.

// write logic to select the features that you want to run LIME Explainer on
// for example
selected_data = data[['col1', 'col2']]

// Create a LimeTabularExplainer
explainer = LimeTabularExplainer(selected_data.values, feature_names=selected_data.columns.values, mode="classification")

// select instance to explain
data_row = selected_data.iloc[0] # Get the first row in the selected data
// the num_features helps us select the features that we want to predict
explanation = explainer.explain_instance(data_row, lgbm_model.predict, num_features=len(selected_data.columns))

// Display the explanation
explanation.show_in_notebook()

The above method works for explaining the predictions when you want to have selective features. But if you want to generate predictions, your testing data (x_test) has to have the same features as the training data (x_train) in the ML model, or else it'll throw error of features not being the same.

@williamty
Copy link
Author

williamty commented Oct 26, 2023

@apoplexi24 Thank you for your kind reply!! It worked! By the way, I have also changed the code of predict function, setting the 'predict_disable_shape_check' parameter to true:
`
def predict_fn(x):
if len(np.array(x).shape) == 1:
# Reshape individual data points to 2D
return ldl.predict(np.array(x).reshape(1, -1), predict_disable_shape_check=True)
else:
# Predict for the entire dataset
return ldl.predict(x, predict_disable_shape_check=True)

def predict_fn_binary(x):
return np.column_stack((1 - predict_fn(x), predict_fn(x)))
`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants