Lessons Learned on Injecting Post-Hoc Explanations in Training Machine Learning Models

Black-box models are widely used nowadays because of their ability to achieve high accuracy scores in a variety of Machine Learning tasks. However, such scores may be driven by complex decisions that are difficult to be interpreted. High complexity and low interpretability may decrease the trust users have in black-box models, as well as hinder the development and testing process itself. Three new methodologies are introduced in this thesis for injecting post-hoc explanations into the black-box model training process. The goal of these methods is to find a reasonable trade-off between accuracy and interpretability by using the explanations computed on a black-box model by SHAP in the training of a new black-box model with the same architecture. The first method, called Iterative Dataset Weighting, weights each feature of each data point with its importance. The second, named Similarity Sample Weighting, exploits the similarity between the local explanations and the global explanation as sample weights during training. The third, called Targeted Replacement Value, employs the explanations to find values to replace all but the top-K features in each data point. The results show that the use of explanations to support the training of black-box models helps reducing the explanation complexity, at the expense of some accuracy. Such balance is highly influenced by the employed methodology, the actual dataset the models are trained on, and the architecture of the model.

Lessons Learned on Injecting Post-Hoc Explanations in Training Machine Learning Models

LOMUSCIO, FRANCESCO

2020/2021

Abstract

Black-box models are widely used nowadays because of their ability to achieve high accuracy scores in a variety of Machine Learning tasks. However, such scores may be driven by complex decisions that are difficult to be interpreted. High complexity and low interpretability may decrease the trust users have in black-box models, as well as hinder the development and testing process itself. Three new methodologies are introduced in this thesis for injecting post-hoc explanations into the black-box model training process. The goal of these methods is to find a reasonable trade-off between accuracy and interpretability by using the explanations computed on a black-box model by SHAP in the training of a new black-box model with the same architecture. The first method, called Iterative Dataset Weighting, weights each feature of each data point with its importance. The second, named Similarity Sample Weighting, exploits the similarity between the local explanations and the global explanation as sample weights during training. The third, called Targeted Replacement Value, employs the explanations to find values to replace all but the top-K features in each data point. The results show that the use of explanations to support the training of black-box models helps reducing the explanation complexity, at the expense of some accuracy. Such balance is highly influenced by the employed methodology, the actual dataset the models are trained on, and the architecture of the model.

Scheda breve

	Facoltà/Dipartimento
	
				INFORMATICA
			
	Corso di studio
	
				INFORMATICA
			
	Lingua
	
				ENG
			
	Relatrice / Relatore
	
				AMPARORE, Elvio Gilberto
			
	Modalità consultazione tesi
	
				IMPORT DA TESIONLINE
			
	Appare nelle tipologie:
	
				Corso di Laurea Magistrale

File in questo prodotto:

File	Dimensione	Formato
937986_tesifrancescolomuscio.pdf non disponibili Tipologia: Altro materiale allegato Dimensione 5.76 MB Formato Adobe PDF	5.76 MB	Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14240/66477