Black-box models are widely used nowadays because of their ability to achieve high accuracy scores in a variety of Machine Learning tasks. However, such scores may be driven by complex decisions that are difficult to be interpreted. High complexity and low interpretability may decrease the trust users have in black-box models, as well as hinder the development and testing process itself. Three new methodologies are introduced in this thesis for injecting post-hoc explanations into the black-box model training process. The goal of these methods is to find a reasonable trade-off between accuracy and interpretability by using the explanations computed on a black-box model by SHAP in the training of a new black-box model with the same architecture. The first method, called Iterative Dataset Weighting, weights each feature of each data point with its importance. The second, named Similarity Sample Weighting, exploits the similarity between the local explanations and the global explanation as sample weights during training. The third, called Targeted Replacement Value, employs the explanations to find values to replace all but the top-K features in each data point. The results show that the use of explanations to support the training of black-box models helps reducing the explanation complexity, at the expense of some accuracy. Such balance is highly influenced by the employed methodology, the actual dataset the models are trained on, and the architecture of the model.

Lessons Learned on Injecting Post-Hoc Explanations in Training Machine Learning Models

LOMUSCIO, FRANCESCO
2020/2021

Abstract

Black-box models are widely used nowadays because of their ability to achieve high accuracy scores in a variety of Machine Learning tasks. However, such scores may be driven by complex decisions that are difficult to be interpreted. High complexity and low interpretability may decrease the trust users have in black-box models, as well as hinder the development and testing process itself. Three new methodologies are introduced in this thesis for injecting post-hoc explanations into the black-box model training process. The goal of these methods is to find a reasonable trade-off between accuracy and interpretability by using the explanations computed on a black-box model by SHAP in the training of a new black-box model with the same architecture. The first method, called Iterative Dataset Weighting, weights each feature of each data point with its importance. The second, named Similarity Sample Weighting, exploits the similarity between the local explanations and the global explanation as sample weights during training. The third, called Targeted Replacement Value, employs the explanations to find values to replace all but the top-K features in each data point. The results show that the use of explanations to support the training of black-box models helps reducing the explanation complexity, at the expense of some accuracy. Such balance is highly influenced by the employed methodology, the actual dataset the models are trained on, and the architecture of the model.
ENG
IMPORT DA TESIONLINE
File in questo prodotto:
File Dimensione Formato  
937986_tesifrancescolomuscio.pdf

non disponibili

Tipologia: Altro materiale allegato
Dimensione 5.76 MB
Formato Adobe PDF
5.76 MB Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14240/66477