Explaining Amazon SageMaker Autopilot models with SHAP

Machine learning (ML) models have long been considered black boxes because predictions from these models are hard to interpret. However, recently, several frameworks aiming at explaining ML models were proposed. Model interpretation can be divided into local and global explanations. A local explanation considers a single sample and answers questions like “Why does the model predict that Customer A will stop using the product?” or “Why did the ML system refuse John Doe a loan?” Another interesting question is “What should John Doe change in order to get the loan approved?” In contrast, global explanations aim at explaining the model itself and answer questions like “Which features are important for prediction?” You can use local explanations to derive global explanations by averaging many samples. For further reading on interpretable ML, see the excellent book Interpretable Machine Learning by Christoph Molnar.

In this post, we demonstrate using the popular model interpretation framework SHAP for both local and global interpretation.

SHAP

SHAP is a game theoretic framework inspired by shapley values that provides local explanations for any model. SHAP has gained popularity in recent years, probably due to its strong theoretical basis. The SHAP package contains several algorithms that, when given a sample and model, derive the SHAP value for each of the model’s input features. The SHAP value of a feature represents its contribution to the model’s prediction.

To explain models built by Amazon SageMaker Autopilot, we use SHAP’s KernelExplainer, which is a black box explainer. KernelExplainer is robust and can explain any model, so can handle the complex feature processing of Amazon SageMaker Autopilot. KernelExplainer only requires that the model support an inference functionality that, when given a sample, returns the model’s prediction for that sample. The prediction is the predicted value for regression and the class probability for classification.

SHAP includes several other explainers, such as TreeExplainer and DeepExplainer, which are specific for decision forest and neural networks, respectively. These are not black box explainers and require knowledge of the model structure and trained params. TreeExplainer and DeepExplainer are limited and, as of this writing, can’t support any feature processing.

Creating a notebook instance

You can run the example code provided in this post. It’s recommended to run the code inside an Amazon SageMaker instance type of ml.m5.xlarge or larger to accelerate running time. To launch the notebook with the example code using Amazon SageMaker Studio, complete the following steps:

Launch an Amazon SageMaker Studio instance.
Open terminal and clone the GitHub repo: git clone https://github.com/awslabs/amazon-sagemaker-examples.git
Open the notebook autopilot/model-explainability/explaining_customer_churn_model.ipynb.
Use kernel Python 3 (Data Science).

Setting up the required packages

In this post, we start with a model built by Amazon SageMaker Autopilot, which was already trained on a binary classification task. See the following code:

import boto3
import pandas as pd
import sagemaker
from sagemaker import AutoML
from datetime import datetime
import numpy as np
region = boto3.Session().region_name
session = sagemaker.Session()

For instructions on creating and training an Amazon SageMaker Autopilot model, see Customer Churn Prediction with Amazon SageMaker Autopilot.

Install SHAP with the following code:

!conda install -c conda-forge -y shap
import shap
from shap import KernelExplainer
from shap import sample
from scipy.special import expit

Initialize the plugin to make the plots interactive.
shap.initjs()

Creating an inference endpoint

Create an inference endpoint for the trained model built by Amazon SageMaker Autopilot. See the following code:

autopilot_job_name = '<your_automl_job_name_here>'
autopilot_job = AutoML.attach(autopilot_job_name, sagemaker_session=session)
ep_name = 'sagemaker-automl-' + datetime.now().strftime('%Y-%m-%d-%H-%M-%S')

For classification response to work with SHAP we need the probability scores. This can be achieved by providing a list of keys for response content. The order of the keys will dictate the content order in the response. This parameter is not needed for regression.

inference_response_keys = ['predicted_label', 'probability']

Create the inference endpoint

autopilot_job.deploy(initial_instance_count=1, instance_type='ml.m5.2xlarge', inference_response_keys=inference_response_keys, endpoint_name=ep_name)

You can skip this step if an endpoint with the argument inference_response_keys set as ['predicted_label', 'probability'] was already created.

Wrapping the Amazon SageMaker Autopilot endpoint with an estimator class

For ease of use, we wrap the inference endpoint with a custom estimator class. Two inference functions are provided: predict, which returns the numeric prediction value to be used for regression, and predict_proba, which returns the class probabilities to be used for classification. See the following code:

from sagemaker.predictor import RealTimePredictor
from sagemaker.content_types import CONTENT_TYPE_CSV

class AutomlEstimator:
    def __init__(self, endpoint, sagemaker_session):
        self.predictor = RealTimePredictor(
            endpoint=endpoint,
            sagemaker_session=sagemaker_session,
            content_type=CONTENT_TYPE_CSV,
            accept=CONTENT_TYPE_CSV
        )
    
    def get_automl_response(self, x):
        if x.__class__.__name__ == 'ndarray':
            payload = ""
            for row in x:
                payload = payload + ','.join(map(str, row)) + 'n'
        else:
            payload = x.to_csv(sep=',', header=False, index=False)
        return self.predictor.predict(payload).decode('utf-8')

    # Prediction function for regression
    def predict(self, x):
        response = self.get_automl_response(x)
        # Return the first column from the response array containing the numeric prediction value (or label in case of classification)
        response = np.array([x.split(',')[0] for x in response.split('n')[:-1]])
        return response

    # Prediction function for classification
    def predict_proba(self, x):
        # Return the probability score from AutoPilot’s endpoint response
        response = self.get_automl_response(x)
        response = np.array([x.split(',')[1] for x in response.split('n')[:-1]])
        return response.astype(float)

Create an instance of AutomlEstimator:

automl_estimator = AutomlEstimator(endpoint=ep_name, sagemaker_session=session)

Data

In this notebook, we use the same dataset as used in the Customer Churn Prediction with Amazon SageMaker Autopilot GitHub repo. Follow the notebook in the GitHub repo to download the dataset if it was not previously downloaded.

Background data

KernelExplainer requires a sample of the data to be used as background data. KernelExplainer uses this data to simulate a feature being missing by replacing the feature value with a random value from the background. We use shap.sample to sample 50 rows from the dataset to be used as background data. Using more samples as background data produces more accurate results, but runtime increases. The clustering algorithms provided in SHAP only support numeric data. You can use a vector of zeros as background data to produce reasonable results.

Choosing background data is challenging. For more information, see AI Explanations Whitepaper and Runtime considerations.

churn_data = pd.read_csv('../Data sets/churn.txt')
data_without_target = churn_data.drop(columns=['Churn?'])
background_data = sample(data_without_target, 50)

Setting up KernelExplainer

Next, we create the KernelExplainer. Because it’s a black box explainer, KernelExplainer only requires a handle to the predict (or predict_proba) function and doesn’t require any other information about the model. For classification, it’s recommended to derive feature importance scores in the log-odds space because additivity is a more natural assumption there, so we use Logit. For regression, you should use Identity. See the following code:

problem_type = automl_job.describe_auto_ml_job(job_name=automl_job_name)['ResolvedAttributes']['ProblemType'] 
link = "identity" if problem_type == 'Regression' else "logit"

The handle to predict_proba is passed to KernelExplainer since KernelSHAP requires the class probability:

explainer = KernelExplainer(automl_estimator.predict_proba, background_data, link=link)

By analyzing the background data, KernelExplainer provides us with explainer.expected_value, which is the model prediction with all features missing. Considering a customer for which we have no data at all (all features are missing), this should theoretically be the model prediction. See the following code:

Since expected_value is given in the log-odds space we convert it back to probability using expit which is the inverse function to logit

print('expected value =', expit(explainer.expected_value))
expected value = 0.21051377184689046

Local explanation with KernelExplainer

We use KernelExplainer to explain the prediction of a single sample, the first sample in the dataset. See the following code:

# Get the first sample
x = data_without_target.iloc[0:1]

ManagedEndpoint will auto delete the endpoint after calculating the SHAP values. To disable auto delete, use ManagedEndpoint(ep_name, auto_delete=False)

from managed_endpoint import ManagedEndpoint
with ManagedEndpoint(ep_name) as mep:
    shap_values = explainer.shap_values(x, nsamples='auto', l1_reg='aic')

The SHAP package includes many visualization tools. The following force_plot code provides a visualization for the SHAP values of a single sample. Since shap_values are provided in the log-odds space, we convert them back to the probability space by using Logit

shap.force_plot(explainer.expected_value, shap_values, x, link=link)

The following visualization is the result.

From this plot, we learn that the most influential feature is VMail Message, which pushes the probability down by about 7%. VMail Message = 25 makes the probability 7% lower in comparison to the notion of that feature being missing. SHAP values don’t provide the information of how increasing or decreasing VMail Message affects prediction.

In many use cases, we’re interested only in the most influential features. By setting l1_reg='num_features(5)', SHAP provides non-zero scores for only the most influential five features:

with ManagedEndpoint(ep_name) as mep:
    shap_values = explainer.shap_values(x, nsamples='auto', l1_reg='num_features(5)')
shap.force_plot(explainer.expected_value, shap_values, x, link=link)

The following visualization is the result.

KernelExplainer computation cost

KernelExplainer computation cost is dominated by the inference calls. To estimate SHAP values for a single sample, KernelExplainer calls the inference function twice: first with the sample unaugmented, and then with many randomly augmented instances of the sample. The number of augmented instances in our use case is 50 (number of samples in the background data) * 2088 (nsamples = 'auto') = 104,400. So, for this use case, the cost of running KernelExplainer for a single sample is roughly the cost of 104,400 inference calls.

Global explanation with KernelExplainer

Next, we use KernelExplainer to provide insight about the model as a whole. We do this by running KernelExplainer locally on 50 samples and aggregating the results:

X = sample(data_without_target, 50)
with ManagedEndpoint(ep_name) as mep:
    shap_values = explainer.shap_values(X, nsamples='auto', l1_reg='aic')

You can use force_plot to visualize SHAP values for many samples simultaneously, force_plot then rotates the plot of each sample by 90 degrees and stacks the plots horizontally. See the following code:

shap.force_plot(explainer.expected_value, shap_values, X, link=link)

The resulting plot is interactive (in the notebook) and can be manually analyzed.

summary_plot is another visualization tool displaying the mean absolute value of the SHAP values for each feature using a bar plot. Currently, summary_plot doesn’t support link functions, so the SHAP values are presented in the log-odds space (and not the probability space). See the following code:

shap.summary_plot(shap_values, X, plot_type="bar")

The following graph shows the results.

Conclusion

In this post, we demonstrated how to use KernelSHAP to explain models created by Amazon SageMaker Autopilot, both locally and globally. KernelExplainer is a robust black box explainer that requires only that the model support an inference functionality that, when given a sample, returns the model’s prediction for that sample. This inference functionality was provided by wrapping the Amazon SageMaker Autopilot inference endpoint with a custom estimator class.

For more information about Amazon SageMaker Autopilot, see Amazon SageMaker Autopilot.

To explore related features of Amazon SageMaker, see the following:

About the Authors

Yotam Elor is a Senior Applied Scientist at AWS Sagemaker. He works on Sagemaker Autopilot – AWS’s auto ML solution.

Somnath Sarkar is a Software Engineer in the AWS SageMaker Autopilot team. He enjoys machine learning in general with focus in scalable and distributed systems.

Vedere AI