You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+55-17Lines changed: 55 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,22 +13,30 @@
13
13
14
14
## Overview
15
15
16
-
This package offers an efficient implementation of Kernel SHAP, see [1] and [2]. For up to $p=8$ features, the resulting Kernel SHAP values are exact regarding the selected background data. For larger $p$, an almost exact hybrid algorithm involving iterative sampling is used by default.
16
+
The package contains two workhorses to calculate SHAP values for any model:
17
17
18
-
The typical workflow to explain any model `object`:
18
+
-`kernelshap()`: Kernel SHAP algorithm of [1] and [2]. By default, exact Kernel SHAP is used for up to $p=8$ features, and an almost exact hybrid algorithm otherwise.
19
+
-`permshap()`: Exact permutation SHAP (currently available for up to $p=14$ features).
20
+
21
+
### Kernel SHAP or permutation SHAP?
22
+
23
+
- Exact Kernel SHAP and exact permutation SHAP values agree for additive models, and differ for models with interactions.
24
+
- If the number of features is sufficiently small, we recommend `permshap()` over `kernelshap()`.
25
+
26
+
### Typical workflow to explain any model
19
27
20
28
1.**Sample rows to explain:** Sample 500 to 2000 rows `X` to be explained. If the training dataset is small, simply use the full training data for this purpose. `X` should only contain feature columns.
21
-
2.**Select background data:**Kernel SHAP requires a representative background dataset `bg_X` to calculate marginal means. For this purpose, set aside 50 to 500 rows from the training data.
29
+
2.**Select background data:**Both algorithms require a representative background dataset `bg_X` to calculate marginal means. For this purpose, set aside 50 to 500 rows from the training data.
22
30
If the training data is small, use the full training data. In cases with a natural "off" value (like MNIST digits), this can also be a single row with all values set to the off value.
23
-
3.**Crunch:** Use `kernelshap(object, X, bg_X, ...)` to calculate SHAP values. Runtime is proportional to `nrow(X)`, while memory consumption scales linearly in `nrow(bg_X)`.
24
-
4.**Analyze:** Use {shapviz} to visualize the result.
31
+
3.**Crunch:** Use `kernelshap(object, X, bg_X, ...)`or `permshap(object, X, bg_X, ...)`to calculate SHAP values. Runtime is proportional to `nrow(X)`, while memory consumption scales linearly in `nrow(bg_X)`.
32
+
4.**Analyze:** Use {shapviz} to visualize the results.
25
33
26
34
**Remarks**
27
35
28
36
- Multivariate predictions are handled at no additional computational cost.
29
37
- Factor-valued predictions are automatically turned into one-hot-encoded columns.
30
-
- By changing the defaults, the iterative pure sampling approach in [2] can be enforced.
31
38
- Case weights are supported via the argument `bg_w`.
39
+
- By changing the defaults in `kernelshap()`, the iterative pure sampling approach in [2] can be enforced.
32
40
33
41
## Installation
34
42
@@ -82,6 +90,17 @@ shap_lm
82
90
sv_lm<- shapviz(shap_lm)
83
91
sv_importance(sv_lm)
84
92
sv_dependence(sv_lm, "log_carat", color_var=NULL)
93
+
94
+
# Since the model is additive, permutation SHAP gives the same results:
0 commit comments