Explainable Artificial Intelligence (XAI) is an important subfield of AI that focuses on making the decision-making processes of machine learning models understandable to humans. This is especially important for black-box models, such as ensemble or deep learning algorithms, which are too complex even for experts to understand (see also my article on Explainable Artificial Intelligence). A better understanding of such black-box models leads, among other things, to more trust among users and gives developers hints on how to improve a model.
A popular method of Explainable Artificial Intelligence is SHAP (SHapley Additive ExPlanations). SHAP can be applied to any machine learning model and explains how individual features affect a model’s predictions. This series of articles will give you an easy-to-understand introduction to SHAP and the game-theoretic concept of Shapley values that is central to it. You will also learn how to explain the predictions of black-box models using the Python library SHAP with a practical programming example. The series consists of 3 articles:
- How to better understand black box models with Shapley values?
- The ultimate guide that makes mastering SHAP easy
- A comprehensive Python tutorial for quickly diving into SHAP
This is the second article in the series. In this article, you will learn how the SHAP explanation method can be used to explain machine learning models. You will also learn about the mathematical properties of SHAP, as well as the advantages and disadvantages of SHAP. By the end of this article, you will have a solid understanding of SHAP and how it can help you better understand black-box models.
Overview
This blog post is divided into the following sections:
- What is SHAP?
- What is the mathematical definition of SHAP?
- What are the advantages of SHAP?
- What are the disadvantages of SHAP?
- In which libraries is SHAP implemented?
What is SHAP?
SHAP (SHapley Additive exPlanations) is an explanation method that describes how different features of the input data influence the individual predictions of a machine learning model. Strictly speaking, SHAP is a collective term for several methods for estimating classical Shapley values from cooperative game theory. There are model-specific SHAP methods, which are tailored to specific model architectures, and model-agnostic SHAP methods, which can be applied to any machine learning method.
Since SHAP is a post hoc explanation method, it is applied after a model has been trained. SHAP can be used not only for regression models, but also for classification models when the output of the classification model is a probability. In addition, SHAP is suitable for various types of data, including tabular data, image data, and text data. The various SHAP methods were first proposed by Lundberg and Lee in 2017.
To explain the prediction of a machine learning model for an instance of a dataset, SHAP estimates a Shapley value for each feature of the instance, see Figure 1. In doing so, SHAP decomposes the prediction for that instance into the sum of the contributions of each feature, called feature attributions. The Shapley value of a feature is the contribution of the feature to the prediction of the model for that particular instance.

Properties of Shapley values
A Shapley value can be negative or positive, depending on whether the feature value decreases or increases the model prediction relative to a base value. The base value is the expected value of all model predictions. In practice, this expected value can be estimated by calculating the average of the model predictions for all instances in the data set. The sum of the base value and the Shapley values of each feature of an instance is always exactly equal to the model prediction for that instance.
Shapley values of a feature vary for different instances in a data set, depending on the value of that feature and the values of the other features. The magnitude of the absolute Shapley value of a feature can also be used to infer the importance of the feature for a prediction relative to the other features. Thus, SHAP can be used to determine how different features contribute to the prediction of a machine learning model.
Initially, the Shapley values estimated with SHAP provide only a local explanation, i.e., they explain only individual predictions of a machine learning model. However, in addition to local explanations, SHAP also provides the ability to generate global explanations. Global explanations describe the overall behavior of a model.
Global explanations can be generated by aggregating the Shapley values for groups of instances (e.g., the Shapley values of all instances in a data set). Various statistical measures can be used for aggregation, such as the mean or the maximum function. These aggregated explanations provide information about the global importance of the features and the global relationships between the different feature values and the model predictions.
Can SHAP be used for causal analysis?
It should be noted that SHAP analyses generally do not allow conclusions to be drawn about causal relationships between the features and the target variable. Therefore, caution should be taken when inferring statements such as “If we increase feature X, this will have a positive effect on the target variable” from a SHAP analysis. SHAP only describes the behavior of a machine learning model in the context of the data on which it was trained. Therefore, SHAP only provides information about the relationships between features and the model’s predictions for the target variable. These relationships do not necessarily correspond to the actual relationships between the features and the target variable in the real world, as described in this article.
Types of SHAP methods
There are several kinds of SHAP methods. The four most important ones you should know about are Tree SHAP, Deep SHAP, Linear SHAP, and Kernel SHAP. The Kernel SHAP method is model agnostic, which means that it works with all types of machine learning algorithms. Therefore, SHAP can be used to estimate Shapley values for any machine learning model. In addition, there are a number of specialized SHAP methods for different types of machine learning models. These SHAP methods are tailored to the particular architecture of those models and can therefore approximate Shapley values in a very efficient and fast way. The most important SHAP methods are summarized in the table below:
SHAP method | Suitable for | Special features |
---|---|---|
Tree SHAP | Tree-based machine learning models (decision trees, random forests, XGBoost, LightGBM, etc.) | Fast and accurate |
Deep SHAP | Deep learning models | Fast, but only an approximation |
Linear SHAP | Linear models | Considers correlations between features |
Kernel SHAP | Any kind of machine learning model | Model agnostic, but slower than other model-specific SHAP methods |
SHAP in a nutshell
Let’s recap the most important details of SHAP:
- SHAP estimates a Shapley value for each feature of an instance.
- The magnitude of the absolute Shapley value provides information about the importance of the feature relative to the other features.
- A feature’s Shapley value can be positive or negative, depending on whether the feature’s value increases or decreases the model’s prediction relative to a base value.
- The base value is the expected value of all model predictions.
- The model prediction for an instance is exactly the sum of the base value and the Shapley values of all features for that instance.
What is the mathematical definition of SHAP?
Lundberg and Lee have shown that Shapley values can be represented as a so-called additive feature attribution method. Additive feature attribution methods are linear explanation models that locally approximate the original model. Additive feature attribution methods specify the explanation for a prediction in the following form:
\[g \left( z ^\prime \right) = \phi _0 + \sum\limits_{i=1}^n \phi _i z ^\prime _i
\]
The explanation model g explains the local behavior of the original model by its parameters φ0 and φi. The combination vector z’∈{0,1}n describes which features are included in a feature combination (value 1) and which are not (value 0). Here, n is the total number of features. Explanation methods with explanation models corresponding to additive feature attribution methods assign a contribution value φi to each feature. Such an interpretable explanation model has the advantage of being much easier to understand than the complex machine learning models it is designed to explain.
SHAP as an additive feature attribution method
If all the features are present, the vector z’ consists only of ones. In this case, the formula is simplified:
\[g \left( z ^\prime \right) = \phi _0 + \sum\limits_{i=1}^n \phi _i
\]
This means that the prediction for an instance can be represented as the sum of the contributions of the different features to that prediction and the constant φ0.
You are already familiar with this concept from Shapley values. In Shapley values, φi corresponds to the Shapley value for feature i. This Shapley value is computed as the weighted average of all marginal contributions of feature i to all possible feature combinations containing feature i. All Shapley values are relative to a base value. The base value is the constant φ0 in the formula above. For Shapley values, φ0 is the expected value of all model predictions. In practice, this expected value can be estimated by taking the mean of the model predictions for all instances in the data set.
Thus, in the case of Shapley values, the prediction of a machine learning model is represented by the sum of the base value and the computed Shapley values of each feature. In pseudo-mathematical terms, this means:
\[\text{model prediction} = \text{base value} + \sum \text{(Shapley values of the features)}.
\]
Mathematical properties of SHAP
SHAP is unique in that it has been shown to be the only additive feature attribution method that satisfies all three of the following desirable mathematical properties:
- Local accuracy: At least for the instance x whose prediction is to be explained by the explanation model, the prediction of the original model must exactly match the prediction of the explanation model.
- Missingness: A missing feature is assigned a Shapley value of zero (i.e., φi = 0). In practice, however, this property is only relevant for constant features.
- Consistency: If a model is modified so that the marginal contribution of a feature value increases or remains the same (independent of other features), then the Shapley value also increases or remains the same.
As Lundberg and Lee have shown, LIME, DeepLIFT, and Layer-wise Relevance Propagation (LRP), among others, also belong to the class of additive feature attribution methods. Thus, they use the same explanation model as SHAP. However, none of these methods satisfies all three mathematical properties simultaneously (they violate local accuracy and/or consistency). Therefore, SHAP unifies these three methods and also provides better mathematical properties. In particular, SHAP provides more mathematical guarantees for the accuracy of the explanations than these three explanation methods.
What are the advantages of SHAP?
The main advantages of the SHAP explanation method are as follows:
- Versatile applicability:
- SHAP can be applied to any machine learning algorithm.
- SHAP can be used for both classification and regression tasks.
- SHAP is suitable for different types of data (tabular data, image data, text data, etc.).
- SHAP provides both local and global explanations, where global explanations are consistent with local explanations.
- SHAP also allows for contrastive explanations. Instead of comparing a prediction to the expected value of all model predictions, it can be compared to a subset of the data set or even to a single instance.
- Mathematical foundation: SHAP is based on Shapley values, which have a solid theoretical foundation in game theory.
- Fast computation: By using computationally efficient approximation methods, SHAP can be used in cases where the exact computation of Shapley values would be too slow (in particular, very fast SHAP methods exist for tree-based machine learning models and deep learning algorithms).
- Full explanations: SHAP computes the effect of each feature on a prediction so that the effects are evenly distributed across the feature values for that instance, and sum to the difference between the actual model prediction for the instance and the base value.
- Intuitive unit: Shapley values have the same unit as the target variable the model is designed to predict, making them easier to interpret.
What are the disadvantages of SHAP?
The SHAP explanation method has the following drawbacks:
- Uncertainty: The reliability of explanations by SHAP cannot be guaranteed.
- SHAP is an approximation method that may contain approximation errors.
- SHAP can be used to create intentionally misleading explanations, as this study has shown.
- Misinterpretation: Shapley values can be easily misinterpreted.
- The Shapley value of a feature is not the effect on the prediction when the feature is removed from the model training. Instead, the Shapley value indicates the contribution of a feature to the difference between the actual prediction and the average prediction in the context of the other features.
- SHAP does not necessarily describe the actual relationships between the features and the target variable in the real world. SHAP only describes the behavior of the model in the context of the data on which it was trained.
- Data access: To compute Shapley values for new instances, access to the data is usually required.
- Long computation time: The exact calculation of Shapley values is computationally expensive. Although SHAP significantly speeds up the computation of Shapley values, some approximation algorithms (e.g., Kernel SHAP) are still slow and therefore unsuitable for large datasets.
In which libraries is SHAP implemented?
The following programming libraries implement the SHAP explanation method in the Python, R, or Julia programming languages:
Library | Programming language |
---|---|
SHAP | Python (for models in scikit-learn, PyTorch, Tensorflow, XGBoost, LightGBM, etc.) |
Captum | Python (for PyTorch models only) |
shapper | R |
fastshap | R |
shapr | R |
ShapML | Julia |
In the next article in this series, you will learn how to easily use SHAP in Python.
Further reading
In this section you will find additional literature to help you learn more about SHAP.
Books
Scientific publications
- Lundberg, Scott M., and Su-In Lee. “A unified approach to interpreting model predictions.” Advances in Neural Information Processing Systems (2017).
- Lundberg, Scott M., et al. “From local explanations to global understanding with explainable AI for trees.” Nature machine intelligence 2.1 (2020): 56-67.
Summary
In this blog post, you learned how to use the SHAP explanation method to explain the inner workings of machine learning models.
Specifically, you learned:
- SHAP is a collective term for several methods for estimating Shapley values.
- Shapley values describe the contributions of different features to the prediction of a machine learning model relative to a base value.
- SHAP unifies the LIME, DeepLIFT, and Layer-Wise Relevance Propagation explanation methods, and also has better mathematical properties than these methods.
Do you have any questions?
Feel free to leave them in the comments below and I will do my best to answer them.
P.S.: Of course I also appreciate constructive feedback on this blog post 😊

Hi, my name is René Heinrich. I am a data scientist doing my PhD in the area of trustworthy artificial intelligence. On this blog I share my experiences and everything I have learned on my own knowledge journey.