Skip to main content

Evaluation of Feature Attribution Methods in Interpretable Machine Learning

Resource type
Date created
2022-08-05
Authors/Contributors
Author (aut): Rattenberry, Paige
Abstract
The emergence of the next generation of artificial intelligent systems including deep neural networks (DNNs) is occurring rapidly. However, DNNs are currently unable to explain their predictions to humans in an interpretable, transparent, and trustworthy way, and are also susceptible to adversarial attacks, due to their underlying “black-box” nature. Therefore, interpretable machine learning is essential to enable end-users to understand, appropriately trust, and effectively manage these DNNs.
If explainability algorithms can be proven to be sensitive to malicious adversarial attacks and simultaneously capable of generating robust, reproducible, and replicable feature attribution maps that correctly describe a network’s predictions, they will be critical in enabling end-users to trust that DNNs deployed in real-world, mission-critical applications and high reliability systems will consistently make successful, safe, and unbiased predictions.
Explainability algorithms, such as feature attribution methods, are themselves, a set of mathematical operations with certain assumptions and, therefore unfortunately, contribute an extra layer of abstraction to the evaluation of their explanations for a network’s prediction.
Class activation maps are a category of feature attribution methods which are popular in interpretable machine learning. Class activation maps compute the attribution of each input feature to its importance to the model’s prediction, and generate heatmaps to visualize this relationship. However, there is minimal systematic evaluation of feature attribution methods due to a lack of ground truth attribution.
This thesis investigated the success of class activation map methods (CAMs), specifically Grad-CAM, Grad-CAM++, Layer-CAM, Eigen-CAM, and Full-Grad, at giving attribution to the ground truth, for near-perfect deep learning networks trained on medical images, in their attempts to explain a DNN’s prediction decisions. This crucial examination was accomplished through a database modification procedure that imposed ground truth to ensure that any accurate and precise DNN should only make the correct prediction classifications if it is relying solely on the introduced input feature perturbations, which successful class activation map methods should highlight exclusively. The CAMs were analyzed both qualitatively and quantitatively through IoU computation. Results demonstrated that Full-Grad appeared to be the most robust, precise and accurate method at localizing discriminatory image features and detecting input perturbations, and that its performance could be optimized by thresholding its output at key threshold values to remove dispersion.
This evaluation will hopefully be an important step in the development and optimization of successful and robust interpretability algorithms, which is essential to gaining user trust and confidence in the use of deep neural networks in real-world, mission-critical applications and high reliability systems, where DNNs that make incorrect predictions could lead to catastrophic outcomes, but also have the potential to make revolutionary breakthroughs.
Document
Copyright statement
Copyright is held by the author(s).
Scholarly level
Supervisor or Senior Supervisor
Thesis advisor (ths): Sjoerdsma, Michael
Language
English
Download file Size
SFU_BASc_THESIS_Paige_Rattenberry1.pdf 8.65 MB

Views & downloads - as of June 2023

Views: 0
Downloads: 0