Releases: ModelOriented/kernelshap
CRAN release 0.9.0
Bug fix
With input from Mario Wuethrich and Ian Covert and his repo,
we have fixed a bug in how kernelshap()
calculates Kernel weights.
- The differences caused by this are typically very small.
- Models with interactions of order up to two have been unaffected.
- Exact Kernel SHAP now provides identical results to exact permutation SHAP.
Fixed in #168, which also has received
unit tests against Python's "shap".
API
- The argument
feature_names
can now also be used with matrix input (#166). kernelshap()
andpermshap()
have received aseed = NULL
argument (#170).- Parallel mode: If missing packages or globals have to be specified, this now has to be done through
parallel_args = list(packages = ..., globals = ...)
instead ofparallel_args = list(.packages = ..., .globals = ...)
, see section on parallelism below.
The list is passed to[foreach::foreach(.options.future = ...)]
.
Speed and memory improvements
permshap()
andkernelshap()
require about 10% less memory (#166).permshap()
andkernelshap()
are faster for data.frame input,
and slightly slower for matrix input (#166).- Additionally,
permshap(, exact = TRUE)
is faster by pre-calculating more
elements used across rows (#165).
Internal changes
- Matrices holding on-off vectors are now consistently of type logical (#167).
kernelshap()
solver: Replacing the Moore-Penrose pseudo-inverse by two direct solves, a trick of Ian Covert,
and ported to R in (#171).
Changes in parallelism
We have switched from %dopar%
to doFuture
(#170) with the following impact:
- No need for calling
registerDoFuture()
anymore. - Random seeding is properly handled, and respects
seed
, thanks #163 for reporting. - If missing packages or globals have to be specified, this now has to be done through
parallel_args = list(packages = ..., globals = ...)
instead ofparallel_args = list(.packages = ..., .globals = ...)
. The list is passed to[foreach::foreach(.options.future = ...)]
.
Dependencies
- {MASS}: Dropped from imports
- {doFuture}: suggests -> imports
CRAN release 0.8.0
kernelshap 0.8.0
Major improvement
permshap()
has received a sampling version, which is useful if the number of features p is larger than 8.
The algorithm iterates until the resulting values are sufficiently precise.
Additionally, standard errors are provided (#152).
During each iteration, the algorithm cycles twice through a random permutation:
It starts with all feature components "turned on" (i.e., taking them
from the observation to be explained), then gradually turning off components
according to the permutation (i.e., marginalizing them over the background data).
When all components are turned off, the algorithm - one by one - turns the components
back on, until all components are turned on again. This antithetic scheme allows to
evaluate Shapley's formula 2p times with each permutation, using a total of
2p + 1 evaluations of marginal means.
For models with interactions up to order two, one can show that
even a single iteration provides exact SHAP values (with respect to the
given background dataset).
The Python implementation "shap" uses a similar approach, but without
providing standard errors, and without early stopping. To mimic its behavior,
we would need to set max_iter = p
in R, and max_eval = (2*p+1)*p
in Python.
For faster convergence, we use balanced permutations in the sense that
p subsequent permutations each start with a different feature.
Furthermore, the 2p on-off vectors with sum <=1 or >=p-1 are evaluated only once,
similar to the degree 1 hybrid in [kernelshap()] (but covering less weight).
User visible changes
- In exact mode,
kernelshap()
does not return the following elements anymore:
m
(= 0),converged
(allTRUE
),n_iter
(all 1), andSE
(all values 0) (#153). - In sampling mode of
kernelshap()
, above elements have been moved to the end of the output list (#153). - Removed unpaired sampling in
kernelshap()
(#154). - The stopping criterion in sampling mode of
kernelshap()
used a slightly too strict convergence rule.
This has been relaxed in #156.
Documentation
- New DESCRIPTION file.
- Adapted docstrings to reflect above changes (#155)
Maintenance
- Improve code coverage (#156).
Bug fixes
kernelshap()
withmax_iter = 1
will now work (#160).
CRAN release 0.7.0
This release is intended to be the last before stable version 1.0.0.
Major change
Passing a background dataset bg_X
is now optional.
If the explanation data X
is sufficiently large (>= 50 rows), bg_X
is derived as a random sample of bg_n = 200
rows from X
. If X
has less than bg_n
rows, then simply
bg_X = X
. If X
has too few rows (< 50), you will have to pass an explicit bg_X
.
Minor changes
ranger()
survival models now also work out-of-the-box without passing a tailored prediction function. Use the new argumentsurvival = "chf"
inkernelshap()
andpermshap()
to distinguish cumulative hazards (default) and survival probabilities per time point.- The resulting object of
kernelshap()
andpermshap()
now containbg_X
andbg_w
used to calculate the SHAP values.
CRAN release 0.6.0
This release is intended to be the last before stable version 1.0.0.
Major changes
- Factor-valued predictions are not supported anymore.
Maintenance
- Fix CRAN note about unavailable link to
gam::gam()
. - Added dependency to {MASS} for calculating Moore-Penrose generalized matrix inverse.
CRAN release 0.5.0
New features
New additive explainer additive_shap()
that works for models fitted via
lm()
,glm()
,mgcv::gam()
,mgcv::bam()
,gam::gam()
,survival::coxph()
,survival::survreg()
.
The explainer uses predict(..., type = "terms")
, a beautiful trick
used in fastshap::explain.lm()
. The result will be identical to those returned by kernelshap()
and permshap()
but exponentially faster. Thanks David Watson for the great idea discussed in #130.
User visible changes
permshap()
now returns an object of class "kernelshap" to reduce the number of redundant methods.- To distinguish which algorithm has generated the "kernelshap" object, the outputs of
kernelshap()
,permshap()
(andadditive_shap()
) got an element "algorithm". is.permshap()
has been removed.
CRAN release 0.4.1
Performance improvements
- Significant speed-up for pure data.frames, i.e., no data.tables or tibbles.
- Some small performance improvements, e.g., for factor predictions and univariate predictions.
- Slight speed-up of
permshap()
by caching calculations for the two special permutations of all 0 and all 1. Consequently, them_exact
component in the output is reduced by 2.
Documentation
- Rewrote many examples in the README.
- Added reference to Erik Strumbelj and Ivan Kononeko (2014).
CRAN release 0.4.0
Major changes
- Added
permshap()
to calculate exact permutation SHAP values. The function currently works for up to 14 features. - Factor-valued predictions are now supported. Each level is represented by its dummy variable.
Other changes
- Slight speed-up.
- Integer valued case weights are now turned into doubles to avoid integer overflow.
CRAN release 0.3.8
API improvements
- Multi-output case: column names of predictions are now used as list names of the resulting
S
andSE
lists.
Bug fixes
- {mlr3} probabilistic classification would not work out-of-the-box. This has been fixed (with corresponding example in the README) in #100
- The progress bar was initialized at 1 instead of 0. This is fixed.
Maintenance
- Added explanation of sampling Kernel SHAP to help file.
- In internal calculations, use explicit
feature_names
as dimnames (#96).
CRAN release 0.3.7
Maintenance
- Fixed problem in Latex math for MacOS.
CRAN release 0.3.6
Maintenance
- Improved help files and README