12 Jan 12:41

mayer79

8e4c726

CRAN release 0.9.3 Latest

Latest

`sv_dependence()`: Control over automatic color feature selection

How is the color feature selected, anyway?

If no SHAP interaction values are available, by default, the color feature v' is selected by the heuristic potential_interaction(), which works as follows:

If the feature v (the on the x-axis) is numeric, it is binned into nbins bins.
Per bin, the SHAP values of v are regressed onto v' and the R-squared is calculated. Rows with missing v' are discarded.
The R-squared are averaged over bins, weighted by the number of non-missing v' values.

This measures how much variability in the SHAP values of v is explained by v', after accounting for v.

We have introduced four parameters to control the heuristic. Their defaults are in line with the old behaviour.

nbin = NULL: Into how many quantile bins should a numeric v be binned? The default NULL equals the smaller of $n/20$ and $\sqrt n$ (rounded up), where $n$ is the sample size.
color_num Should color features be converted to numeric, even if they are factors/characters? Default is TRUE.
scale = FALSE: Should R-squared be multiplied with the sample variance of
within-bin SHAP values? If TRUE, bins with stronger vertical scatter will get higher weight. The default is FALSE.
adjusted = FALSE: Should adjusted R-squared be calculated?

If SHAP interaction values are available, these parameters have no effect. In sv_dependence() they are called ih_nbin etc.

This partly implements the ideas in #119 of Roel Verbelen, thanks a lot for your patient explanations!

Further plans?

We will continue to experiment with the defaults, which might change in the future. A good alternative to the current (naive) defaults could be:

nbins = 7: Smaller than now to not overfit too strongly with factor/character color features.
color_num = FALSE: To not naively integer encode factors/characters.
scale = TRUE: To account for non-equal spread in bins.
adjusted = TRUE: To not put too much weight on factors with many categories.

Other user-visible changes

sv_dependence(): If color_var = "auto" (default) and no color feature seems to be relevant (SHAP interaction is NULL, or heuristic returns no positive value), there won't be any color scale. Furthermore, in some edge cases, a different
color feature might be selected.
mshapviz() objects can now be rowbinded via rbind() or +. Implemented by @jmaspons in #110.
mshapviz() is more strict when combining multiple "shapviz" objects. These now need to have identical column names, see #114.

Small changes

The README is shorter and easier.
Updated vignettes.
print.shapviz() now shows top two rows of SHAP matrix.
Re-activate all unit tests.
Setting nthread = 1 in all calls to xgb.DMatrix() as suggested by @jmaspons in #109.
Added "How to contribute" to README.
permshap() connector is now part of {kerneshap} #122.

Bug fixes

sv_dependence2D(): In case add_vars are passed, x and/or y are removed from it in order to not use any variable twice. #116.
split.shapviz() now drops empty levels. They launched an error because empty "shapviz" objects are currently not supported. #117, #118

Assets 2

14 Oct 17:29

mayer79

0.9.2

8980218

CRAN release 0.9.2

User-visible changes

sv_importance() of a "mshapviz" object now returns a dodged barplot instead of separate barplots via {patchwork}. Use the new argument bar_type to switch to a stacked barplot (bar_type = "stack"), to "facets" (via {ggplot2}), or "separate" for the old behaviour.

New features

Added connector to permshap, a package calculating permutation SHAP values for regression and (probabilistic) classification.

Other changes

Revised vignette on "mshapviz".
Commenting out most unit tests as they would not pass timings measured on Debian.

Assets 2

18 Jul 19:25

mayer79

0.9.1

fd1a01f

CRAN release 0.9.1

New features

dimnames.shapviz() has received a replacement method. You can thus change the column names of SHAP matrix and feature data (as well as SHAP interactions) by colnames(x) <- ..., see #98

Maintenance

Fix for #100 (package_version() applied to numeric value will be deprecated in the future)

Assets 2

09 Jun 15:16

mayer79

0.9.0

d88af20

CRAN release 0.9.0

New features

New plot function sv_dependence2D(): x and y coordinates are two features, while their summed SHAP values are shown on the color scale. If interaction = TRUE, SHAP interaction values are shown on the color scale instead. The function is vectorized in x and/or y. This visualization is especially useful for models with geographic components.
split(x, f) splits a "shapviz" object x into a "mshapviz" object.

Documentation

Slight improvements in help/docu.
New vignette on models with geographic components.
Added a fantastic house price dataset with about 14,000 houses sold in Miami-Date County, thanks Steven C. Bourassa.

API improvements

"mshapviz" object created from multioutput "kernelshap" object retains names.

Assets 2

10 May 05:03

mayer79

0.8.0

3dc5f29

CRAN release 0.8.0

API improvement

For (upcoming) {fastshap} version >0.0.7, fastshap::explain() offers the option shap_only. To conveniently construct the "shapviz" object, use shapviz(fastshap::explain(..., shap_only = FALSE)). This not only passes the SHAP matrix but also the feature data and the baseline. Thanks, Brandon @bgreenwell !

Documentation

Better help files
Switched from "import ggplot2" to "ggplot2::function" code style
Vignette "Multiple 'shapviz' objects": Fixed mistake in Random Forest + Kernel SHAP example

Contributors

bgreenwell

Assets 2

11 Apr 05:08

mayer79

0.7.0

e561dd1

CRAN release 0.7.0

Milestone: Working with multiple 'shapviz' objects

Sometimes, you will find it necessary to work with several "shapviz" objects at the same time:

To visualize SHAP values of a multiclass or multi-output model.
To compare SHAP plots of different models.
To compare SHAP plots between subgroups.

To simplify the workflow, {shapviz} introduces the "mshapviz" object ("m" like "multi"). You can create it in different ways:

Use shapviz() on multiclass XGBoost or LightGBM models.
Use shapviz() on "kernelshap" objects created from multiclass/multioutput models.
Use c(Mod_1 = s1, Mod_2 = s2, ...) on "shapviz" objects s1, s2, ...
Or mshapviz(list(Mod_1 = s1, Mod_2 = s2, ...))

The sv_*() functions use the {patchwork} package to glue the individual plots together.

See the new vignette for more info and specific examples.

Other new features

sv_dependence() now allows multiple v and/or color_var to be plotted (glued via {patchwork}).
{DALEX}: Support for "predict_parts" objects from {DALEX}, thanks to Adrian Stando.
Aggregated SHAP values: The argument row_id of sv_waterfall() and sv_force() now also allows a vector of integers or a logical vector. If more than one row is selected, SHAP values and predictions are averaged before plotting (aggregated SHAP values in {DALEX}).
Row bind: "shapviz" objects x1, x2 can now be concatenated in rowwise manner using x1 + x2 or rbind(x1, x2), again thanks to Adrian.
colnames(): "shapviz" objects x have received a dimnames() function, so you can now, e.g., use colnames(x) to see the feature names.
Subsetting: "shapviz" x can now be subsetted using x[cond, features].

Maintenance

We have a new contributor: Adrian Stando - welcome on the SHAP board.
To be close to my sister package {kernelshap}, I have moved to https://github.com/ModelOriented/shapviz
Webpage created with "pgkdown"
New dependency: {patchwork}

Other changes

Color guides are closer to the plot area. This affects sv_dependence(), sv_importance(kind="bee"), and sv_interaction().
The lengthy y axis title "SHAP interaction value" in sv_dependence() has been shortened to "SHAP interaction".
As announced, the argument show_other of sv_importance() has been removed.
Slightly less picky checks on S_inter.
print.shapviz() is much more compact, use summary.shapviz() for more info.

Bug fixes

sv_waterfall(): Using order_fun() would not work as expected with max_display. This has been fixed.
sv_dependence(): Passing viridis_args = NULL would hide the color guide title. This has been fixed. But please pass viridis_args = list() instead.

Assets 2

05 Mar 17:04

mayer79

0.6.0

c86edc9

CRAN release 0.6.0

Change in defaults

sv_dependence() now uses color_var = "auto" instead of color_var = NULL.
sv_dependence() now uses "SHAP value" as y label (instead of the more verbose "SHAP value of [feature]").

Assets 2

03 Feb 11:30

mayer79

0.5.0

bd7b25f

CRAN release 0.5.0

shapviz 0.5.0

Major improvement: SHAP interaction values

Introduced API for SHAP interaction values S_inter (3D array):
- Matrix method: shapviz(object, ..., S_inter = NULL)
- XGBoost method: shapviz(object, ..., interactions = TRUE)
- treeshap method: shapviz(object, ...)
sv_interaction(x) shows matrix of beeswarm plots.
sv_dependence(x, v = "x1", color_var = "x2", interactions = TRUE) plots SHAP interaction values.
sv_dependence(x, v = "x1", interactions = TRUE) plots pure main effects of "x1".
If SHAP interaction values are available, sv_dependence(..., color_var = "auto") uses those to determine the most interacting color variable.
collapse_shap() also works for SHAP interaction arrays.
SHAP interaction values can be extracted by get_shap_interactions().

User visible changes

sv_importance(): In case of too many features, sv_importance() used to collapse the remaining features into an additional bar/beeswarm. This logic has been removed, and the show_other argument has been deprecated.
By default, sv_dependence() automatically adds horizontal jitter for discrete v. This now also works if v is numeric with at most seven unique values, not only for logicals, factors, and character v.

Compatibility with "ggplot2"

"ggplot2" 3.4 has replaced the "size" aesthetic in line-based geoms by "linewidth". This has been adapted. "shapviz" now depends on ggplot2 >= 3.4.

Technical changes

sv_importance() does not use a flipped coordinate system anymore.

Assets 2

11 Jan 18:18

mayer79

0.4.1

1b1cf72

CRAN release 0.4.1

New functionality

Hide "other": sv_importance() has received a new argument show_others = TRUE. Set to FALSE to hide the "other" bar/beeswarm.

Assets 2

09 Dec 12:07

mayer79

0.4.0

7516ac4

CRAN release 0.4.0

shapviz 0.4.0

Removed dependencies

The following dependencies have been removed:

"ggbeeswarm"
"vipor"
"beeswarm"

Changes in `sv_importance()`

New argument bee_width: Relative width of the beeswarms. The default is 0.4. It replaces the width argument passed via ....
New argument bee_adjust: Relative adjustment factor of the bandwidth used in estimating the density of the beeswarms. Default is 0.5.
In case a beeswarm is shown: the ... arguments are now passed to geom_point().

Improvement with Plotly

plotly::ggplotly() now works for most functionalities of sv_importance(), including beeswarms.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`sv_dependence()`: Control over automatic color feature selection

How is the color feature selected, anyway?

Further plans?

Other user-visible changes

Small changes

Bug fixes

User-visible changes

New features

Other changes

New features

Maintenance

New features

Documentation

API improvements

API improvement

Documentation

Contributors

Milestone: Working with multiple 'shapviz' objects

Other new features

Maintenance

Other changes

Bug fixes

Change in defaults

shapviz 0.5.0

Major improvement: SHAP interaction values

User visible changes

Compatibility with "ggplot2"

Technical changes

New functionality

shapviz 0.4.0

Removed dependencies

Changes in `sv_importance()`

Improvement with Plotly

Releases: ModelOriented/shapviz

CRAN release 0.9.3

sv_dependence(): Control over automatic color feature selection

How is the color feature selected, anyway?

Further plans?

Other user-visible changes

Small changes

Bug fixes

CRAN release 0.9.2

User-visible changes

New features

Other changes

CRAN release 0.9.1

New features

Maintenance

CRAN release 0.9.0

New features

Documentation

API improvements

CRAN release 0.8.0

API improvement

Documentation

Contributors

CRAN release 0.7.0

Milestone: Working with multiple 'shapviz' objects

Other new features

Maintenance

Other changes

Bug fixes

CRAN release 0.6.0

Change in defaults

CRAN release 0.5.0

shapviz 0.5.0

Major improvement: SHAP interaction values

User visible changes

Compatibility with "ggplot2"

Technical changes

CRAN release 0.4.1

New functionality

CRAN release 0.4.0

shapviz 0.4.0

Removed dependencies

Changes in sv_importance()

Improvement with Plotly

`sv_dependence()`: Control over automatic color feature selection

Changes in `sv_importance()`