Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inverse scaling/ normalization to get actual unscaled values in explanation : Old issue but I made a way around #750

Open
ShahrinNakkhatra-optimizely opened this issue Jul 7, 2024 · 0 comments

Comments

@ShahrinNakkhatra-optimizely

Not sure if this is the accurate way as I also searched for it a lot and didn't find any answer. But I'm doing it this way:

  • Save the state of your dataset before using scaling/ normalization
  • Then write a function to be used as predict_fn which not just provides the prediction, but also does the necessary preprocessing steps including the scaling and the later steps.
  • Pass that function as your predict_fn in LimeTabularExplainer instance

This will provide you the explanations according to your unscaled values. I have removed some lines from my code for privacy, but you get the idea.

Can someone from the dev team please respond to this so that I can be sure that this is the correct approach? @marcotcr

    def explain_pipe(self, selected_df=None, cols=None):
        if selected_df is None:
            selected_df = self.selected_df

        if cols is None:
            cols = self.cols

        temp_df = pd.DataFrame(selected_df, columns=cols)
        selected_df_ = temp_df.copy()
        dp = DataProcessingPrediction(selected_df_, self.local_directory, self.product)

        selected_df_ = dp.scale_df(
            scaler_path=os.path.join(self.local_directory, "scaler_objects.pkl"),
            col_names_path=os.path.join(self.local_directory, "scaled_col_names.pkl"),
        )

        selected_df_ = dp.clean_column_names()

        selected_df_ = dp.load_and_reorder(os.path.join(self.local_directory, "column_order.pkl"))

        output = self.model.predict_proba(selected_df_)  # [ :,1]
        return output

    def explain_row(self, X_train, X_pred, row_number: int):
        lime_explainer = lime_tabular.LimeTabularExplainer(
            training_data=np.array(X_train),
            training_labels=self.training_labels,
            feature_names=X_train.columns,
            class_names=[.....],
            mode="classification",
        )

        instance = X_pred.iloc[row_number]
        lime_exp = lime_explainer.explain_instance(data_row=instance, predict_fn=self.explain_pipe)
        logging.info(lime_exp.as_list())
        return lime_exp
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant