Skip to content

Releases: ModelOriented/kernelshap

CRAN release 0.9.0

25 Jul 13:10
d86ab1e
Compare
Choose a tag to compare

Bug fix

With input from Mario Wuethrich and Ian Covert and his repo,
we have fixed a bug in how kernelshap() calculates Kernel weights.

  • The differences caused by this are typically very small.
  • Models with interactions of order up to two have been unaffected.
  • Exact Kernel SHAP now provides identical results to exact permutation SHAP.

Fixed in #168, which also has received
unit tests against Python's "shap".

API

  • The argument feature_names can now also be used with matrix input (#166).
  • kernelshap() and permshap() have received a seed = NULL argument (#170).
  • Parallel mode: If missing packages or globals have to be specified, this now has to be done through parallel_args = list(packages = ..., globals = ...)
    instead of parallel_args = list(.packages = ..., .globals = ...), see section on parallelism below.
    The list is passed to [foreach::foreach(.options.future = ...)].

Speed and memory improvements

  • permshap() and kernelshap() require about 10% less memory (#166).
  • permshap() and kernelshap() are faster for data.frame input,
    and slightly slower for matrix input (#166).
  • Additionally, permshap(, exact = TRUE) is faster by pre-calculating more
    elements used across rows (#165).

Internal changes

  • Matrices holding on-off vectors are now consistently of type logical (#167).
  • kernelshap() solver: Replacing the Moore-Penrose pseudo-inverse by two direct solves, a trick of Ian Covert,
    and ported to R in (#171).

Changes in parallelism

We have switched from %dopar% to doFuture (#170) with the following impact:

  • No need for calling registerDoFuture() anymore.
  • Random seeding is properly handled, and respects seed, thanks #163 for reporting.
  • If missing packages or globals have to be specified, this now has to be done through parallel_args = list(packages = ..., globals = ...)
    instead of parallel_args = list(.packages = ..., .globals = ...). The list is passed to [foreach::foreach(.options.future = ...)].

Dependencies

  • {MASS}: Dropped from imports
  • {doFuture}: suggests -> imports

CRAN release 0.8.0

09 Jul 05:43
af75928
Compare
Choose a tag to compare

kernelshap 0.8.0

Major improvement

permshap() has received a sampling version, which is useful if the number of features p is larger than 8.
The algorithm iterates until the resulting values are sufficiently precise.
Additionally, standard errors are provided (#152).

During each iteration, the algorithm cycles twice through a random permutation:
It starts with all feature components "turned on" (i.e., taking them
from the observation to be explained), then gradually turning off components
according to the permutation (i.e., marginalizing them over the background data).
When all components are turned off, the algorithm - one by one - turns the components
back on, until all components are turned on again. This antithetic scheme allows to
evaluate Shapley's formula 2p times with each permutation, using a total of
2p + 1 evaluations of marginal means.

For models with interactions up to order two, one can show that
even a single iteration provides exact SHAP values (with respect to the
given background dataset).

The Python implementation "shap" uses a similar approach, but without
providing standard errors, and without early stopping. To mimic its behavior,
we would need to set max_iter = p in R, and max_eval = (2*p+1)*p in Python.

For faster convergence, we use balanced permutations in the sense that
p subsequent permutations each start with a different feature.
Furthermore, the 2p on-off vectors with sum <=1 or >=p-1 are evaluated only once,
similar to the degree 1 hybrid in [kernelshap()] (but covering less weight).

User visible changes

  • In exact mode, kernelshap() does not return the following elements anymore:
    m (= 0), converged (all TRUE), n_iter (all 1), and SE (all values 0) (#153).
  • In sampling mode of kernelshap(), above elements have been moved to the end of the output list (#153).
  • Removed unpaired sampling in kernelshap() (#154).
  • The stopping criterion in sampling mode of kernelshap() used a slightly too strict convergence rule.
    This has been relaxed in #156.

Documentation

  • New DESCRIPTION file.
  • Adapted docstrings to reflect above changes (#155)

Maintenance

  • Improve code coverage (#156).

Bug fixes

  • kernelshap() with max_iter = 1 will now work (#160).

CRAN release 0.7.0

17 Aug 16:38
29f0de1
Compare
Choose a tag to compare

This release is intended to be the last before stable version 1.0.0.

Major change

Passing a background dataset bg_X is now optional.

If the explanation data X is sufficiently large (>= 50 rows), bg_X is derived as a random sample of bg_n = 200 rows from X. If X has less than bg_n rows, then simply
bg_X = X. If X has too few rows (< 50), you will have to pass an explicit bg_X.

Minor changes

  • ranger() survival models now also work out-of-the-box without passing a tailored prediction function. Use the new argument survival = "chf" in kernelshap() and permshap() to distinguish cumulative hazards (default) and survival probabilities per time point.
  • The resulting object of kernelshap() and permshap() now contain bg_X and bg_w used to calculate the SHAP values.

CRAN release 0.6.0

13 Jul 07:27
435fa43
Compare
Choose a tag to compare

This release is intended to be the last before stable version 1.0.0.

Major changes

  • Factor-valued predictions are not supported anymore.

Maintenance

  • Fix CRAN note about unavailable link to gam::gam().
  • Added dependency to {MASS} for calculating Moore-Penrose generalized matrix inverse.

CRAN release 0.5.0

29 May 21:19
9808196
Compare
Choose a tag to compare

New features

New additive explainer additive_shap() that works for models fitted via

  • lm(),
  • glm(),
  • mgcv::gam(),
  • mgcv::bam(),
  • gam::gam(),
  • survival::coxph(),
  • survival::survreg().

The explainer uses predict(..., type = "terms"), a beautiful trick
used in fastshap::explain.lm(). The result will be identical to those returned by kernelshap() and permshap() but exponentially faster. Thanks David Watson for the great idea discussed in #130.

User visible changes

  • permshap() now returns an object of class "kernelshap" to reduce the number of redundant methods.
  • To distinguish which algorithm has generated the "kernelshap" object, the outputs of kernelshap(), permshap() (and additive_shap()) got an element "algorithm".
  • is.permshap() has been removed.

CRAN release 0.4.1

03 Dec 15:42
70e0c68
Compare
Choose a tag to compare

Performance improvements

  • Significant speed-up for pure data.frames, i.e., no data.tables or tibbles.
  • Some small performance improvements, e.g., for factor predictions and univariate predictions.
  • Slight speed-up of permshap() by caching calculations for the two special permutations of all 0 and all 1. Consequently, the m_exact component in the output is reduced by 2.

Documentation

  • Rewrote many examples in the README.
  • Added reference to Erik Strumbelj and Ivan Kononeko (2014).

CRAN release 0.4.0

10 Nov 20:30
a9456f4
Compare
Choose a tag to compare

Major changes

  • Added permshap() to calculate exact permutation SHAP values. The function currently works for up to 14 features.
  • Factor-valued predictions are now supported. Each level is represented by its dummy variable.

Other changes

  • Slight speed-up.
  • Integer valued case weights are now turned into doubles to avoid integer overflow.

CRAN release 0.3.8

24 Sep 14:59
2a6fd0c
Compare
Choose a tag to compare

API improvements

  • Multi-output case: column names of predictions are now used as list names of the resulting S and SE lists.

Bug fixes

  • {mlr3} probabilistic classification would not work out-of-the-box. This has been fixed (with corresponding example in the README) in #100
  • The progress bar was initialized at 1 instead of 0. This is fixed.

Maintenance

  • Added explanation of sampling Kernel SHAP to help file.
  • In internal calculations, use explicit feature_names as dimnames (#96).

CRAN release 0.3.7

17 May 07:53
349db09
Compare
Choose a tag to compare

Maintenance

  • Fixed problem in Latex math for MacOS.

CRAN release 0.3.6

03 May 18:56
1f3158e
Compare
Choose a tag to compare

Maintenance

  • Improved help files and README