Skip to content

tsurubee/shappack

Repository files navigation

ShapPack

ShapPack is a Python package for interpretable machine learning based on Shapley values.

ShapPack is currently in beta and under active development!

Installation

$ pip install shappack

Usage

The usage of ShapPack is almost the same as that of slundberg/shap.

import shappack
import numpy as np
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVR
SEED = 123
np.random.seed(SEED)

# Prepare dataset
boston = load_boston()
X_train, X_test, y_train, y_test = train_test_split(boston["data"], boston["target"], test_size=0.2, random_state=SEED)
scaler = StandardScaler()
X_train_std = scaler.fit_transform(X_train)
X_test_std = scaler.transform(X_test)

# Prepare model
model = SVR(kernel="rbf")
model.fit(X_train_std, y_train)

# Coumpute SHAP value
i = 2
explainer = shappack.KernelExplainer(model.predict, X_train_std[:100])
shap_value = explainer.shap_values(X_test_std[i], n_workers=-1)

For now, ShapPack does not have own visualization mechanism, so it is necessary to use slundberg/shap for visualization.

import shap
shap.initjs()
shap.force_plot(explainer.base_val[0], shap_value, X_test[i], boston.feature_names)

boston-force-plot

An important difference from slundberg/shap is that the shap_values function in ShapPack has three new arguments: n_workers, skip_features, and characteristic_func, which contribute to faster computation and scalability.

The usage of each of these arguments is described below.

n_workers: Multiprocessing

Specify the number of processes used for the calculation of SHAP values. n_workers=-1 means using all processors.

If the program is running on a multi-core server, we can expect a reduction in computation time.

shap_value = explainer.shap_values(X_test_std[i], n_workers=-1)

skip_features: Skip the calculation of SHAP values

We can skips the calculation of SHAP values for the features specified in skip_features. The features to be skipped can be specified by feature name or index number.

Note that we need to pass a list of feature names to KernelExplainer's feature_names argument when specifying skip_features by feature names.

explainer = shappack.KernelExplainer(model.predict, X_train_std[:100], feature_names=boston.feature_names)
skip_features=["PTRATIO", "TAX"]
shap_value = explainer.shap_values(X_test_std[i], skip_features=skip_features, n_workers=-1)
feature_names = np.delete(boston.feature_names, explainer.skip_idx)
x_test = np.delete(X_test[i], explainer.skip_idx)
shap.force_plot(explainer.base_val[0], shap_value, x_test, feature_names)

characteristic_func: Incorporate own characteristic function

We can incorporate own implemented characteristic functions into the characteristic_func argument.

The example below is a function that replaces the expected value calculation in the original Kernel SHAP's characteristic function with a minimum value calculation.

def my_characteristic_func(instance, subsets, model, data):
    n_subsets = subsets.shape[0]
    n_data = data.shape[0]
    synth_data = np.tile(data, (n_subsets, 1))
    for i, subset in enumerate(subsets):
        offset = i * n_data
        features_idx = np.where(subset == 1.0)[0]
        synth_data[offset : offset + n_data, features_idx] = instance[:, features_idx][0]
    model_preds = model(synth_data)
    ey = np.zeros(n_subsets)
    for i in range(n_subsets):
        ey[i] = np.min(model_preds[i * n_data : i * n_data + n_data])
    return ey

shap_value = explainer.shap_values(X_test_std[i], characteristic_func=my_characteristic_func)

License

This project is licensed under the terms of the MIT license, see LICENSE.

About

Interpretable machine learning based on Shapley values

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages