Explainable Natural Language Inference - Trying to explain the faithfulness of the model

We are currently in that time when researchers have started to dig deep into why a model is giving certain output rather than trying to push the accuracy of a certain model by few strides. In the NLP community, interpretability is a very hot topic today but people have failed to differentiate between its distinct aspects such as readability, plausibility, and faithfulness. In this post, lets together try to explore more on the faithfulness of the model explicitly.
We should note that faithfulness and Plausibility are two distinct concepts and is generally understood as the following in the community:

Faithful interpretation is generally defined as one that accurately represents the reasoning process behind the model’s prediction.

Plausibility refers to how convincing the interpretation is to humans

Let’s take a detour to understand the distinctness between faithfulness and plausibility. One use case with prominent evaluation literature is Intelligent User Interfaces(IUI), via Human-Computer Interaction(HCI), of automatic models assisting human decision-makers. The goal of the explanation here is to increase the degree of trust between the users and the system, giving the user more nuance towards whether the system’s decision is likely correct or not. So, note here, that increased performance in this setting is not indicative of faithfulness; rather, it is indicative of a correlation between plausibility of the explanation and the model’s performance. Let’s take another example, suppose when the model’s output is correct, we can see that a particular word/phrase is present in the input sentence and when the model’s output is incorrect then we can’t see those identified word/phrases in the input sequence. So, note here, that the explanation is more likely to appear plausible when the model is correct, while at the same time not reflecting the true decision process of the model(faithfulness). I hope we understood the difference between plausibility and faithfulness to some extent now.

We discussed earlier that “Faithful interpretation is generally defined as one that accurately represents the reasoning process behind the model’s prediction.” So, what exactly is this reasoning process, and how it can be compared for two separate interpretations. There are a lot of articles, researches that tries to explain this reasoning process uniquely but they do follow some hidden assumptions. Alon et al. uncovered three assumptions that define faithfulness which is described as follows:

Assumption 1: (The model assumption) Two models will make the same predictions if and only if they use the same reasoning process.

Corollary 1.1: An interpretation system is unfaithful if it results in different interpretations of models that make the same decisions.

Corollary 1.2: An interpretation is unfaithful if it results in different decisions than the model it interprets.

Assumption 2: (The prediction Assumption) On similar input, the model makes similar decisions if and only if its reasoning is similar.

Corollary 2: An interpretation system is unfaithful if it provides a different interpretation for similar input and outputs.

Assumption 3: (The linearity assumption): Certain parts of the input are more important to the model reasoning than others. Moreover, the contributions of a different part of the input are independent of each other

Corollary 3: Under certain circumstances, heat map interpretation can be faithful.

The main issue nowadays is that people are trying to evaluate faithfulness in a binary manner, i.e, whether the interpretation is faithful or not. There has been a trend in recent times that researchers have come up with counter-examples showing that the interpretation is not globally faithful.

Alon et al. advocate that instead of evaluating interpretability in a binary manner we should start evaluating when a method is sufficiently faithful to be useful in practice.

I will edit/modify this post once I am more acquainted with the subject. Thanks for reading the article. I hope you liked it.

References:

  1. https://arxiv.org/pdf/2004.03685.pdf

NLP Enthusiast currently working towards Explainable NLP