Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plot.model_performance_explainer outliers' labels depend on the order of model input #49

Open
12tafran opened this issue Oct 7, 2018 · 2 comments · May be fixed by #567
Open

plot.model_performance_explainer outliers' labels depend on the order of model input #49

12tafran opened this issue Oct 7, 2018 · 2 comments · May be fixed by #567
Assignees

Comments

@12tafran
Copy link
Contributor

12tafran commented Oct 7, 2018

Hi,

Following the example on https://pbiecek.github.io/DALEX/reference/plot.model_performance_explainer.html , if you rearrange the order of arguments from plot(mp_rf, mp_glm, mp_lm, geom = "boxplot", show_outliers = 1) to plot(mp_glm, mp_lm, mp_rf, geom = "boxplot", show_outliers = 1), you will get a graph where the outliers don't match the model.

It seems like we have to input the models best to worst in terms of root mean square of residuals for it for the outliers' label to match the model.

@pbiecek pbiecek closed this as completed Feb 18, 2020
@pbiecek
Copy link
Member

pbiecek commented Feb 18, 2020

close due to lack of human resources

@AngelFelizR
Copy link

We still have the same problem in R

library("DALEX")
#> Welcome to DALEX (version: 2.4.3).
#> Find examples and detailed introduction at: http://ema.drwhy.ai/
library("randomForest")
#> randomForest 4.7-1.1
#> Type rfNews() to see new features/changes/bug fixes.

model_apart_lm <- archivist::aread("pbiecek/models/55f19")
explain_apart_lm <- DALEX::explain(model = model_apart_lm, 
                                   data    = apartments_test[,-1], 
                                   y       = apartments_test$m2.price, 
                                   label   = "Linear Regression")
#> Preparation of a new explainer is initiated
#>   -> model label       :  Linear Regression 
#>   -> data              :  9000  rows  5  cols 
#>   -> target variable   :  9000  values 
#>   -> predict function  :  yhat.lm  will be used (  default  )
#>   -> predicted values  :  No value for predict function target column. (  default  )
#>   -> model_info        :  package stats , ver. 4.2.3 , task regression (  default  ) 
#>   -> predicted values  :  numerical, min =  1792.597 , mean =  3506.836 , max =  6241.447  
#>   -> residual function :  difference between y and yhat (  default  )
#>   -> residuals         :  numerical, min =  -257.2555 , mean =  4.687686 , max =  472.356  
#>   A new explainer has been created!

model_apart_rf <- archivist::aread("pbiecek/models/fe7a5")
explain_apart_rf <- DALEX::explain(model = model_apart_rf, 
                                   data    = apartments_test[,-1], 
                                   y       = apartments_test$m2.price, 
                                   label   = "Random Forest")
#> Preparation of a new explainer is initiated
#>   -> model label       :  Random Forest 
#>   -> data              :  9000  rows  5  cols 
#>   -> target variable   :  9000  values 
#>   -> predict function  :  yhat.randomForest  will be used (  default  )
#>   -> predicted values  :  No value for predict function target column. (  default  )
#>   -> model_info        :  package randomForest , ver. 4.7.1.1 , task regression (  default  ) 
#>   -> predicted values  :  numerical, min =  1985.837 , mean =  3506.107 , max =  5788.052  
#>   -> residual function :  difference between y and yhat (  default  )
#>   -> residuals         :  numerical, min =  -762.3422 , mean =  5.416971 , max =  1318.093  
#>   A new explainer has been created!

mr_lm <- DALEX::model_performance(explain_apart_lm)
mr_rf <- DALEX::model_performance(explain_apart_rf)

# Works good
plot(mr_rf, mr_lm, 
     geom = "boxplot",
     show_outliers = 1)

# Doesn't assing the outliners correctly
plot(mr_lm, mr_rf, 
     geom = "boxplot",
     show_outliers = 1)

Created on 2024-06-08 with reprex v2.0.2

@maksymiuks maksymiuks reopened this Jun 9, 2024
@maksymiuks maksymiuks self-assigned this Jun 9, 2024
@maksymiuks maksymiuks linked a pull request Jun 9, 2024 that will close this issue
@maksymiuks maksymiuks linked a pull request Jun 9, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants