Materials towards Homework 4: CP and PDP with XGBoost¶

Part of the eXplainable Machine Learning course for Machine Learning (MSc) studies at the University of Warsaw. @pbiecek @hbaniecki

v0.1.0: 2022-10-26

https://github.com/mim-uw/eXplainableMachineLearning-2023/tree/main/Homeworks/HW4

0. Import packages¶

import dalex as dx
import xgboost
import alibi

import sklearn

import pandas as pd
import numpy as np

import warnings
warnings.filterwarnings("ignore")

import platform
print(f'Python {platform.python_version()}')

{package.__name__: package.__version__ for package in [dx, xgboost, alibi, sklearn, pd, np]}

Python 3.9.2

{'dalex': '1.5.0',
 'xgboost': '1.6.2',
 'alibi': '0.8.0',
 'sklearn': '1.1.3',
 'pandas': '1.3.5',
 'numpy': '1.19.5'}

We use the same XGBoost classifier trained on the Titanic dataset as in the previous materials towards Homework 2 \& Homework 3.

Unfortunately, at this moment, we can't compare dalex to alibi v0.8.0 because the latter one seem not to support categorical variables (tested using code from this notebook).

1. Load and preprocess data¶

df = dx.datasets.load_titanic()

df.loc[:, df.dtypes == 'object'] =\
    df.select_dtypes(['object'])\
    .apply(lambda x: x.astype('category'))

X = df.drop(columns='survived')
# convert gender to binary only because the `max_cat_to_onehot` parameter in XGBoost is yet to be working properly..
X =  pd.get_dummies(X, columns=["gender"], drop_first=True) 
y = df.survived

X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split( X, y, test_size=0.33, random_state=42)

2. Model¶

model = xgboost.XGBClassifier(
    n_estimators=50, 
    max_depth=2, 
    use_label_encoder=False, 
    eval_metric="logloss",
    
    enable_categorical=True,
    tree_method="hist"
)

model.fit(X_train, y_train)

XGBClassifier(base_score=0.5, booster='gbtree', callbacks=None,
              colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1,
              early_stopping_rounds=None, enable_categorical=True,
              eval_metric='logloss', gamma=0, gpu_id=-1,
              grow_policy='depthwise', importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_bin=256, max_cat_to_onehot=4, max_delta_step=0, max_depth=2,
              max_leaves=0, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=50, n_jobs=0,
              num_parallel_tree=1, predictor='auto', random_state=0,
              reg_alpha=0, reg_lambda=1, ...)

XGBClassifier(base_score=0.5, booster='gbtree', callbacks=None,
              colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1,
              early_stopping_rounds=None, enable_categorical=True,
              eval_metric='logloss', gamma=0, gpu_id=-1,
              grow_policy='depthwise', importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_bin=256, max_cat_to_onehot=4, max_delta_step=0, max_depth=2,
              max_leaves=0, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=50, n_jobs=0,
              num_parallel_tree=1, predictor='auto', random_state=0,
              reg_alpha=0, reg_lambda=1, ...)

def pf_xgboost_classifier_categorical(model, df):
    df.loc[:, df.dtypes == 'object'] =\
        df.select_dtypes(['object'])\
        .apply(lambda x: x.astype('category'))
    return model.predict_proba(df)[:, 1]

explainer = dx.Explainer(model, X_test, y_test, predict_function=pf_xgboost_classifier_categorical)

Preparation of a new explainer is initiated

  -> data              : 729 rows 7 cols
  -> target variable   : Parameter 'y' was a pandas.Series. Converted to a numpy.ndarray.
  -> target variable   : 729 values
  -> model_class       : xgboost.sklearn.XGBClassifier (default)
  -> label             : Not specified, model's class short name will be used. (default)
  -> predict function  : <function pf_xgboost_classifier_categorical at 0x000001EC83778280> will be used
  -> predict function  : Accepts only pandas.DataFrame, numpy.ndarray causes problems.
  -> predicted values  : min = 0.0258, mean = 0.333, max = 0.99
  -> model type        : classification will be used (default)
  -> residual function : difference between y and yhat (default)
  -> residuals         : min = -0.987, mean = -0.00781, max = 0.936
  -> model_info        : package xgboost

A new explainer has been created!

3. Evaluate!¶

explainer.model_performance()

Which variables are important?

explainer.model_parts().result

Let's analyze variable effects with both PDP and ALE.

Ceteris Paribus¶

See the API documentation for all the possible values of the specific parameters:

Explainer.predict_profile(), which returns an object of class
CeterisParibus, which can be visualized using
CeterisParibus.plot().

cp = explainer.predict_profile(new_observation=X.iloc[[400]])

Calculating ceteris paribus: 100%|███████████████████| 7/7 [00:00<00:00, 89.74it/s]

We can visualize what-if analysis for a single observaiton of interest..

cp.plot(variables=["age", "sibsp"])

..and for many observations at the same time.

cp_10 = explainer.predict_profile(new_observation=X.iloc[400:410])
cp_10.plot(variables=["age", "sibsp"])

Calculating ceteris paribus: 100%|███████████████████| 7/7 [00:00<00:00, 82.35it/s]

An average of many single profiles (local explanations) estimates the partial dependence (global explanation).

Partial Dependence Plots¶

See the API documentation for all the possible values of the specific parameters:

Explainer.model_profile(), which returns an object of class
AggregatedProfiles, which can be visualized using
AggregatedProfiles.plot().

pdp = explainer.model_profile() # defaults: type="pdp", N=300

Calculating ceteris paribus: 100%|███████████████████| 7/7 [00:00<00:00, 42.68it/s]

pdp.result

pdp.plot(variables=["age", "fare"])

Explanations seem to indicate high importance of age.

pdp.plot(variables=["age", "fare"], geom="profiles", title="Partial Dependence Plot with individual profiles")

How about categorical variables?

pdp.plot(variables=["class"])

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [15], in <cell line: 1>()
----> 1 pdp.plot(variables=["class"])

File c:\users\hbani\appdata\local\programs\python\python39\lib\site-packages\dalex\model_explanations\_aggregated_profiles\object.py:239, in AggregatedProfiles.plot(self, objects, geom, variables, center, size, alpha, color, facet_ncol, facet_scales, title, y_title, horizontal_spacing, vertical_spacing, show)
    237     all_variables = _global_utils.intersect_unsorted(variables, all_variables)
    238     if len(all_variables) == 0:
--> 239         raise TypeError("variables do not overlap with " + ''.join(variables))
    241     _result_df = _result_df.loc[_result_df['_vname_'].isin(all_variables), :]
    243 #  calculate y axis range to allow for fixedrange True

TypeError: variables do not overlap with class

model_profile uses variable_type="numerical" by default. Let's fix this.

pdp_categorical = explainer.model_profile(variable_type="categorical")

Calculating ceteris paribus: 100%|███████████████████| 7/7 [00:00<00:00, 44.30it/s]

pdp_categorical.plot(variables="class", title="PDP")

For example, these numerical plots might not look as intended.

pdp.plot(variables=["gender_male", "sibsp"])

We can fix this by computing variable effects in points of interest.

explainer.model_profile(
    variable_splits={'gender_male': X.gender_male.unique(), 'sibsp': X.sibsp.unique()}, 
    verbose=False)\
.plot(geom="bars", title="Partial Dependence Plot")

We can also combine the effects between the numerical and categorical variables.

pdp_grouped = explainer.model_profile(groups="gender_male")

Calculating ceteris paribus: 100%|███████████████████| 7/7 [00:00<00:00, 47.62it/s]

pdp_grouped.plot(variables=["age", "fare"], title="PDP")

Note that the plots depend on the grid of split points. Default is uniformly distributed. We can change it to quantiles.

# default: variable_splits_type="uniform"
pdp_splits = explainer.model_profile(groups="gender_male", variable_splits_type="quantile")

Calculating ceteris paribus: 100%|███████████████████| 7/7 [00:00<00:00, 68.63it/s]

pdp_splits.plot(variables=["age", "fare"])

Accumulated Local Effects¶

Now, change the algorithm for estimating variable effects using type="ale".

ale_num = explainer.model_profile(type="ale", variables=["age", "fare"], center=False)

Calculating ceteris paribus: 100%|███████████████████| 2/2 [00:00<00:00, 35.09it/s]
Calculating accumulated dependency: 100%|████████████| 2/2 [00:00<00:00, 12.66it/s]

ale_num.plot(title="Accumulated Local Effects")

4. Set proper `label`s to compare the two explanations!¶

ale_num = explainer.model_profile(type="ale", N=None, variables=["age", "fare"], label="ALE", verbose=False)
pdp_num = explainer.model_profile(type="pdp", N=None, variables=["age", "fare"], label="PDP", verbose=False)

ale_num.plot(pdp_num)

Note that ALE has an arbitrary vertical baseline (mean of the model's prediction by default).

It seems that there is not much difference in the shape of the two profiles in case of this model.

5. Compare models¶

Let's create a larger XGBoost classifier.

model_large = xgboost.XGBClassifier(
    n_estimators=1000, 
    max_depth=30, 
    use_label_encoder=False, 
    eval_metric="logloss",
    
    enable_categorical=True,
    tree_method="hist"
)

model_large.fit(X_train, y_train)

XGBClassifier(base_score=0.5, booster='gbtree', callbacks=None,
              colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1,
              early_stopping_rounds=None, enable_categorical=True,
              eval_metric='logloss', gamma=0, gpu_id=-1,
              grow_policy='depthwise', importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_bin=256, max_cat_to_onehot=4, max_delta_step=0, max_depth=30,
              max_leaves=0, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=1000, n_jobs=0,
              num_parallel_tree=1, predictor='auto', random_state=0,
              reg_alpha=0, reg_lambda=1, ...)

XGBClassifier(base_score=0.5, booster='gbtree', callbacks=None,
              colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1,
              early_stopping_rounds=None, enable_categorical=True,
              eval_metric='logloss', gamma=0, gpu_id=-1,
              grow_policy='depthwise', importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_bin=256, max_cat_to_onehot=4, max_delta_step=0, max_depth=30,
              max_leaves=0, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=1000, n_jobs=0,
              num_parallel_tree=1, predictor='auto', random_state=0,
              reg_alpha=0, reg_lambda=1, ...)

Remember to change the default label value.

explainer_large = dx.Explainer(
    model_large, 
    X_test, 
    y_test, 
    predict_function=pf_xgboost_classifier_categorical,
    label="Larger XGBoost",
    verbose=False
)

Compare the performance.

pd.concat([explainer.model_performance().result, explainer_large.model_performance().result])

The larger model seems overfitted. Can we observe that in the explanations?

pdp_large = explainer_large.model_profile()

Calculating ceteris paribus: 100%|███████████████████| 7/7 [00:02<00:00,  3.46it/s]

pdp_large.plot(pdp, variables=["age", "fare"], title="PDP")

Note that the simpler model has smoother variable effects. Will u trust the larger model based on these explanations?

For a theoretical introduction and more examples, see: https://ema.drwhy.ai ¶

	recall	precision	f1	accuracy	auc
XGBClassifier	0.582278	0.730159	0.647887	0.794239	0.809509
Larger XGBoost	0.548523	0.646766	0.593607	0.755830	0.783592

	variable	dropout_loss	label
0	parch	0.189779	XGBClassifier
1	_full_model_	0.190491	XGBClassifier
2	embarked	0.194837	XGBClassifier
3	sibsp	0.196293	XGBClassifier
4	fare	0.197369	XGBClassifier
5	age	0.243815	XGBClassifier
6	class	0.274981	XGBClassifier
7	gender_male	0.378696	XGBClassifier
8	_baseline_	0.495097	XGBClassifier

	_vname_	_label_	_x_	_yhat_	_ids_
0	age	XGBClassifier	0.166667	0.787982	0
1	age	XGBClassifier	0.905000	0.787982	0
2	age	XGBClassifier	1.643333	0.787982	0
3	age	XGBClassifier	2.381667	0.787982	0
4	age	XGBClassifier	3.120000	0.787982	0
...	...	...	...	...	...
500	gender_male	XGBClassifier	0.960000	0.770857	0
501	gender_male	XGBClassifier	0.970000	0.770857	0
502	gender_male	XGBClassifier	0.980000	0.770857	0
503	gender_male	XGBClassifier	0.990000	0.770857	0
504	gender_male	XGBClassifier	1.000000	0.227313	0