Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

model_diagnostics fails due to no rows to aggregate #109

Closed
MJimitater opened this issue Jul 21, 2022 · 10 comments
Closed

model_diagnostics fails due to no rows to aggregate #109

MJimitater opened this issue Jul 21, 2022 · 10 comments
Labels
invalid ❕ This doesn't seem right

Comments

@MJimitater
Copy link

MJimitater commented Jul 21, 2022

Hi, Im trying to get modelStudio() (v3.1.0) working in RStudio. It did work once, but after some changes to the xgboost model it is difficult for me to see why its not working anymore.
Creating DALEX::explain() works fine, also explain$predict_function(trained_model, test[1, , drop = FALSE]) returns a valid score.
But running modelStudio(trained_model) lately always gets stuck at the same spot:

image

I tried finding out at what point aggregation fails, but its hard for me to really pin-point. Does anyone have an idea or ran into this issue before?
Sorry that I cannot provide any reproducible example, please let me know if further code snippets are needed, Im happy to provide and grateful for any help!

@hbaniecki
Copy link
Member

Hi @MJimitater, without code and/or any reproducible example, e.g. on a different data/model, I might not be able to help you.

@MJimitater
Copy link
Author

Hi @hbaniecki, this is the code that I used

explain <- DALEX::explain(trained_model, data = train[1:train_size,], y = as.numeric(train[1:train_size,]$PRUEFERGEBNIS)-1, type = "classification", label = "xgboost", predict_function = 
                              function(trained_model, obs){
                                previous_na_action <- options('na.action')
                                options(na.action='na.pass')
                                sparse_matrix_test <- sparse.model.matrix(PRUEFERGEBNIS ~., data = obs)
                                options(na.action=previous_na_action$na.action)
                                
                                results_test <- predict(trained_model, sparse_matrix_test, type = "response")
                                round(results_test,3)
                              }, predict_function_target_column = 0)
  
  new_observation <- test[ind_obs, , drop = FALSE]
  
  xai_dashboard <- modelStudio(explain, new_observation = new_observation)

I hope this helps somewhat.
If this doesn't help much, then we can see how to provide the model and data. Thanks

@hbaniecki
Copy link
Member

Hi @MJimitater, of what class is the object passed to data? Does the following example help you by any chance https://modelstudio.drwhy.ai/articles/ms-r-python-examples.html#xgboost-dashboard?

Googling the error message, I'm unsure if this issue is related to DALEX and modelStudio.

Might try to debug it if you provide data/code.

@MJimitater
Copy link
Author

MJimitater commented Jul 26, 2022

Hi @hbaniecki , the data object passed to explain() is a "data.table" "data.frame" and train[1:train_size,] the first train_size -rows.

Yes, Im also unsure if this issue is related to modelStudio.. is there way to provide data/code confidentially?

@hbaniecki
Copy link
Member

Can you serialize both in R, so I can load them into my environment and share with me at hbaniecki@gmail.com? Data can be small (a few rows). You can also create a simpler xgboost model where the error still occurs--perhaps a small subset of features suffices.

Anyhow, I don't have your email to reach out.

@MJimitater
Copy link
Author

@hbaniecki Thanks so much for your help! Just now, I sent you an email with the model and data in .rds format. I hope you can reproduce, keep me posted ;) Cheers

@hbaniecki hbaniecki added the invalid ❕ This doesn't seem right label Jul 26, 2022
@MJimitater
Copy link
Author

MJimitater commented Aug 12, 2022

Hi @hbaniecki sorry for my late reply, Thanks again for your excellent help in debugging!
I have thoroughly examined the data for NA-values and have removed or imputed them in the numeric features, and changed them to "(Missing)" in the factorial features (by fct_explicit_na()) hoping that the problem with the variable_splits is solved. I do hope that this is a valid approach to generating error-free model- and predict profile explanations.
However , running modelStudio(explain, new_observation, B=3) with the updates runs in the following error:

 Calculating ingredients::ceteris_paribus (1)         
  Calculating ...                                                                                             
    Elapsed time: 00:10:37 ETA:10s Warning messages:
1: In value[[3L]](cond) : 
Error occurred in ingredients::describe.ceteris_paribus function: missing value where TRUE/FALSE needed
2: In value[[3L]](cond) : 
Error occurred in ingredients::describe.ceteris_paribus function: missing value where TRUE/FALSE needed

Now Im trying to find where the bug occurs, but seemingly everything works so far:

#doesn't work:
ms <- modelStudio(explain, new_observation, B=3)

#works:
mp <- model_performance(explain)

#works:
mr <- model_diagnostics(explain)

#works:
fi <- model_parts(explain, B=2)
plot(fi)

#works:
pdp <- model_profile(explain)

#works:
pd <- ingredients::partial_dependence(explain) 

#works:
pp <- predict_profile(explain, new_observation)

#both work:
cp <- ingredients::ceteris_paribus(explain, new_observation)
pdp <- ingredients::aggregate_profiles(cp, type = "partial")

Now Im wondering that cp <- ingredients::ceteris_paribus(explain, new_observation) worked, but somehow still throws this error in modelStudio. What else can I try? What other functions can I try to unit test in order to debug? Do you think this is still a problem with the "(Missing)"-values in the categories? Thank you so much

@hbaniecki
Copy link
Member

Hi, I presume your modelstudio is working just fine and you are only worried about the warning?

As the message suggests, something went wrong in the ingredients::describe(<ceteris paribus object>) function call, which produces textual description of the ceteris paribus explanation. This is an experimental feature to display text when you hover on the D button next to your plot.

For me to debug it further, it would be best to obtain the updated model/data.

@MJimitater
Copy link
Author

Thanks for your fast reply, I'll get back to you next week! ;)

@MJimitater
Copy link
Author

Actually you are right, modelStudio works just fine! Finally the explanation of single test cases works without errors!

BTW, would be nice to have a faster way of explaining and visualizing individual test cases, than waiting a lengthy time for modelStudio to finish calculating, but I guess this the way it is ;) Great software!

Thanks again for the excellent support, closing for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
invalid ❕ This doesn't seem right
Projects
None yet
Development

No branches or pull requests

2 participants