shap.GradientExplainer

class shap.GradientExplainer(model, data, session=None, batch_size=50, local_smoothing=0)

Explains a model using expected gradients (an extension of integrated gradients).

Expected gradients an extension of the integrated gradients method (Sundararajan et al. 2017), a feature attribution method designed for differentiable models based on an extension of Shapley values to infinite player games (Aumann-Shapley values). Integrated gradients values are a bit different from SHAP values, and require a single reference value to integrate from. As an adaptation to make them approximate SHAP values, expected gradients reformulates the integral as an expectation and combines that expectation with sampling reference values from the background dataset. This leads to a single combined expectation of gradients that converges to attributions that sum to the difference between the expected model output and the current output.

Examples

See Gradient Explainer Examples

__init__(model, data, session=None, batch_size=50, local_smoothing=0)

An explainer object for a differentiable model using a given background dataset.

Parameters
modeltf.keras.Model, (input

(model, layer), where both are torch.nn.Module objects

For TensorFlow this can be a model object, or a pair of TensorFlow tensors (or a list and a tensor) that specifies the input and output of the model to be explained. Note that for TensowFlow 2 you must pass a tensorflow function, not a tuple of input/output tensors).

For PyTorch this can be a nn.Module object (model), or a tuple (model, layer), where both are nn.Module objects. The model is an nn.Module object which takes as input a tensor (or list of tensors) of shape data, and returns a single dimensional output. If the input is a tuple, the returned shap values will be for the input of the layer argument. layer must be a layer in the model, i.e. model.conv2.

data[numpy.array] or [pandas.DataFrame] or [torch.tensor]

The background dataset to use for integrating out features. Gradient explainer integrates over these samples. The data passed here must match the input tensors given in the first argument. Single element lists can be passed unwrapped.

Methods

__init__(model, data[, session, batch_size, …])

An explainer object for a differentiable model using a given background dataset.

explain_row(*row_args, max_evals, …)

Explains a single row and returns the tuple (row_values, row_expected_values, row_mask_shapes, main_effects).

shap_values(X[, nsamples, ranked_outputs, …])

Return the values for the model applied to X.

supports_model(model)

Determines if this explainer can handle the given model.

explain_row(*row_args, max_evals, main_effects, error_bounds, outputs, silent, **kwargs)

Explains a single row and returns the tuple (row_values, row_expected_values, row_mask_shapes, main_effects).

This is an abstract method meant to be implemented by each subclass.

Returns
tuple

A tuple of (row_values, row_expected_values, row_mask_shapes), where row_values is an array of the attribution values for each sample, row_expected_values is an array (or single value) representing the expected value of the model for each sample (which is the same for all samples unless there are fixed inputs present, like labels when explaining the loss), and row_mask_shapes is a list of all the input shapes (since the row_values is always flattened),

shap_values(X, nsamples=200, ranked_outputs=None, output_rank_order='max', rseed=None, return_variances=False)

Return the values for the model applied to X.

Parameters
Xlist,

if framework == ‘tensorflow’: numpy.array, or pandas.DataFrame if framework == ‘pytorch’: torch.tensor A tensor (or list of tensors) of samples (where X.shape[0] == # samples) on which to explain the model’s output.

ranked_outputsNone or int

If ranked_outputs is None then we explain all the outputs in a multi-output model. If ranked_outputs is a positive integer then we only explain that many of the top model outputs (where “top” is determined by output_rank_order). Note that this causes a pair of values to be returned (shap_values, indexes), where shap_values is a list of numpy arrays for each of the output ranks, and indexes is a matrix that tells for each sample which output indexes were chosen as “top”.

output_rank_order“max”, “min”, “max_abs”, or “custom”

How to order the model outputs when using ranked_outputs, either by maximum, minimum, or maximum absolute value. If “custom” Then “ranked_outputs” contains a list of output nodes.

rseedNone or int

Seeding the randomness in shap value computation (background example choice, interpolation between current and background example, smoothing).

Returns
array or list

For a models with a single output this returns a tensor of SHAP values with the same shape as X. For a model with multiple outputs this returns a list of SHAP value tensors, each of which are the same shape as X. If ranked_outputs is None then this list of tensors matches the number of model outputs. If ranked_outputs is a positive integer a pair is returned (shap_values, indexes), where shap_values is a list of tensors with a length of ranked_outputs, and indexes is a matrix that tells for each sample which output indexes were chosen as “top”.

static supports_model(model)

Determines if this explainer can handle the given model.

This is an abstract static method meant to be implemented by each subclass.