Improve faithfulness, robustness and localisation of DNNs’ explanations — with NoiseGrad and…

Original Source Here

As a XAI researcher, it is well-known that explanations for DNNs tend to be very fragile. This has been shown many times already [1, 2, 3, 4, 5, 6].

Together with my team at Understandable Machine Intelligence Lab, we therefore, started to brainstorm ideas that could enhance existing explanation methods. To this end, a simple method called SmoothGrad has established itself among both researchers and practitioners. In a nutshell, it works by creating multiple noisy versions of an input (typically Gaussian noise) and then averaging the explanations of the many inputs, to finally create one single explanation outcome. Although it is very simple, it has been reported to make gradient-based attribution methods less visually diffuse and more robust against adversarial attacks. Given this, we came to ask:

In the same fashion that SmoothGrad enhances explanations by exploring the neighbourhood of a given input, can we further improve our explanations by exploring the neighbourhood of a given model?

Similar to how ensemble learning works to improve the generalisation performance of machine learning models (by averaging predictions from multiple classifiers), we wanted to create a similar “wisdom of the crowd” scenario where our explanations are based on the collective (expert opinion) of several models, rather than relying on a single model (one expert opinion). This further motivated us to explore another way of using stochasticity: while forming an explanation — instead of adding noise to the input data, we add noise to the weights of a neural network.

Illustration of our proposed methods: the functionality of the individual methods, i.e., Baseline, SmoothGrad, NoiseGrad and NoiseGrad++ are visualised schematically from left to right, each partitioned in input space, model space and the resulting explanation from bottom to top. The baseline explanations are computed in a deterministic fashion — one input (dog), one model (black square), one explanation. SmoothGrad enhances the explanation by explaining multiple noisy versions of the input. In contrast, our proposed method, NoiseGrad, works by explaining multiple versions of the model. NoiseGrad++ combines SmoothGrad and NoiseGrad by incorporating both stochasticities in the input space and model space.

Methods. In thelight of this, we came up with two methods: NoiseGrad and NoiseGrad++. These are stochastic, method-agnostic explanation-enhancing methods that add multiplicative Gaussian noise to the weights instead of (only) to the input data.

where E is an explanation function that attributes relevances to the features of the input x with respect to the neural network f(·, W_i), ξj represents noise added to the input x and N, M is the number of noisy models and -input samples, respectively. Here, W_i represents one specific network sample drawn from the Bayes posterior.

Since we know that approximating the posterior distribution of a neural network is computationally expensive (most methods require full retraining of the network), in the spirit of MC dropout, we therefore mainly focus on approximating the posterior with multiplicative Gaussian noise. Theoretically, NoiseGrad as such can be seen as performing a quite crude Laplace approximation. Yet, despite the simplicity of adding the same amount of multiplicative noise to all weights in the network, in our experiments, we observed that it still creates a sufficiently accurate approximation in order to get an insight into the uncertainty of the model as to enhance explanations.

Results. Early on, we could observe that adding noise to a neural network’s parameters had a positive effect on explanations:

This visualisation shows an Integrated Gradient explanation of a llama. Clearly, our proposed methods (NoiseGrad and its extension NoiseGrad++) offer qualitative improvements. This example can be reproduced with a given notebook at our Github repo.

Although evaluation of XAI is still a largely unsolved problem (!) in experiments, we could observe that explanations become more faithful, robust and localized to the object of interest, when our methods are used.

In comparison to the Baseline method (vanilla Saliency explanations where no noise is added to neither the input nor the model weights), a significant boost in attribution quality is achieved while explaining with SmoothGrad, NoiseGrad and NoiseGrad++. The combination of SmooothGrad and NoiseGrad i.e., NoiseGrad++ is significantly better than either method alone. For each of the examined quality criteria, the values range between [0, 1]. Higher values are better.

Hyperparameter tuning. While applying our proposed explanation-enhancing methods, a question may pop up: exactly how much noise should be added to the model weights? Do we need to adjust the noise level depending on the given model architecture or the dataset?

In our paper, to help the user set an advantageous level of noise (more noise is not always better!) we put forward a simple hypothesis: since we need signals from the models whose decision boundary is close to the test samples, we might choose the noise level σ such that we observe a certain accuracy drop. As such, we developed a simple heuristic for tuning the noise level of NG and NG++ that works for various DNN architectures: add noise to the model weights until you observe an accuracy drop of 5% comapred to the original test accuracy.

A visual interpretation of the proposed “5 percent accuracy drop” noise heuristic. Each line represents a different model architecture used in the CMNIST experiment and every dot reflects the average results from 200 randomly test samples. We can observe that in general, an increase in the noise level until a drop in accuracy of 0.05 (black horizontal line), boosts attribution quality (AUC localisation value).

From experimental results, we found that setting the relative accuracy drop AD(σ) = 1−(ACC(σ)−ACC(∞))/(ACC(0)−ACC(∞)) to around 5% works for tested DNN architectures, including LeNet, VGGs and ResNets. Here, ACC(σ) denotes the classification accuracy at the noise level σ. Note that ACC(0) and ACC(∞) correspond to the original accuracy and the chance level, respectively.

Final thoughts. It is important to remember that any explanation will only be as good (or robust) as the model it is trying to explain — needless to say, a XAI method can never be disentangled from its model (nor its deficiencies). Before you decide to put trust in a XAI method for decision-making, proper model testing is crucial.

That being said, it is fascinating how such simple techniques i.e., NoiseGrad and NoiseGrad++ actually can bring about more faithful, robust and localised explanations. Of course, it remains to further investigate the performance of our proposed methods on other tasks than image classification such as time-series prediction or NLP. In the future, we are also interested in quantitatively validating to what extent localisation as an attribution quality criteria is useful on natural datasets (we used semi-natural datasets in our paper to control the placement of attributional evidence for “ground truth”). With these final words, one last illustration:

Illustration of the qualitative performance of Baseline, SG, NG and NG++ for two base explanation methods: Integrated Gradient (IG) and GradientSHAP (GradSHAP) for a randomly chosen image from the PASCAL VOC 2012 dataset with overlayed segmentation.

Code and examples

Read our arXiv pre-print


@anna_hedstroem @TUBerlin_UMI


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: