Explaining Black-Box Models through Counterfactuals

07/29/2022, 5:00 PM — 5:30 PM UTC

Red

Abstract:

We propose CounterfactualExplanations.jl: a package for explaining black-box models through counterfactuals. Counterfactual explanations are based on the simple idea of strategically perturbing model inputs to change model predictions. Our package is novel, easy-to-use and extensible. It can be used to explain custom predictive models including those developed and trained in other programming languages.

Description:

The Need for Explainability ⬛

Machine learning models like deep neural networks have become so complex, opaque and underspecified in the data that they are generally considered as black boxes. Nonetheless, they often form the basis for data-driven decision-making systems. This creates the following problem: human operators in charge of such systems have to rely on them blindly, while those individuals subject to them generally have no way of challenging an undesirable outcome:

“You cannot appeal to (algorithms). They do not listen. Nor do they bend.” — Cathy O'Neil in Weapons of Math Destruction, 2016

Enter: Counterfactual Explanations 🔮

Counterfactual Explanations can help human stakeholders make sense of the systems they develop, use or endure: they explain how inputs into a system need to change for it to produce different decisions. Explainability benefits internal as well as external quality assurance. Explanations that involve realistic and actionable changes can be used for the purpose of algorithmic recourse (AR): they offer human stakeholders a way to not only understand the system's behaviour, but also strategically react to it. Counterfactual Explanations have certain advantages over related tools for explainable artificial intelligence (XAI) like surrogate eplainers (LIME and SHAP). These include:

Full fidelity to the black-box model, since no proxy is involved.
Connection to Probabilisitic Machine Learning and Causal Inference.
No need for (reasonably) interpretable features.
Less susceptible to adversarial attacks than LIME and SHAP.

Problem: Limited Availability in Julia Ecosystem 😔

Software development in the space of XAI has largely focused on various global methods and surrogate explainers with implementations available for both Python and R. In the Julia space we have only been able to identify one package that falls into the broader scope of XAI, namely ShapML.jl. Support for Counterfactual Explanations has so far not been implemented in Julia.

Solution: `CounterfactualExplanations.jl` 🎉

Through this project we aim to close that gap and thereby contribute to broader community efforts towards explainable AI. Highlights of our new package include:

Simple and intuitive interface to generate counterfactual explanations for differentiable classification models trained in Julia, Python and R.
Detailed documentation involving illustrative example datasets and various counterfactual generators for binary and multi-class prediction tasks.
Interoperability with other popular programming languages as demonstrated through examples involving deep learning models trained in Python and R (see here).
Seamless extensibility through custom models and counterfactual generators (see here).

Ambitions for the Package 🎯

Our goal is to provide a go-to place for counterfactual explanations in Julia. To this end, the following is a non-exhaustive list of exciting future developments we envision: