Explainable convolutional autoencoder for unsupervised patient phenomapping with Healthcare Administrative Data

Precision medicine aims at delivering medical interventions, both as a preventive approach and as a treatment, to the right individual at the right time based on the individual's medical history, environment, and lifestyle. To reach this goal, one should understand that individuals find themselves in different places along a continuum, based on diseases, diseases risk factors, and other factors, such as age. It is therefore important to map each individual in the right space at the right time, that is, to identify subgroups of individuals based on data similarity, or pattern (phenomapping). Such subgroups should also be explainable, that is, one should always be able to understand why a given individual was assigned to a given subgroup. Machine learning, and recently, deep learning approaches have proved very effective in building useful subgroups, that are, however, not interpretable due to the black box nature of such approaches. Here, we developed an unsupervised framework to provide a meaningful and explainable phenomapping of patients. Our framework starts with a deep learning approach, a convolutional autoencoder, that summarizes high-dimensional medical data providing a latent representation (embeddings) that still contains all the relevant features present in the original data. While being a useful summary, the embedding dimensionality is still too large to be treated computationally, therefore, we implemented a further dimensionality-reduction step prior to the final density-based clustering for the subgroup indetification. Finally, we applied the Local Interpretable Model-Agnostic Explanations (LIME) algorithm to provide an interpretation of which medical events characterize patients included in each subgroup. Using data from 161,767 patients accessing one of the local health services in the Piedmont region of Italy (ASL-CN2), which provided healthcare administrative data from 2019-01-01 to 2024-02-29, the implemented framework identified 180 subgroups. Among those, we were able to distinguish subgroups that include pregnant people, individuals with cardiac and neurological problems, and those whose health is reasonably well, but experienced fractures in the past. These subgroups also correlated with different frequencies of inappropriate access to emergency services, highlighting further uses of our approach, such as supporting the planning of resources and health intervention on the territory.

Explainable convolutional autoencoder for unsupervised patient phenomapping with Healthcare Administrative Data

RONDINONE, FRANCESCA

2023/2024

Abstract

Precision medicine aims at delivering medical interventions, both as a preventive approach and as a treatment, to the right individual at the right time based on the individual's medical history, environment, and lifestyle. To reach this goal, one should understand that individuals find themselves in different places along a continuum, based on diseases, diseases risk factors, and other factors, such as age. It is therefore important to map each individual in the right space at the right time, that is, to identify subgroups of individuals based on data similarity, or pattern (phenomapping). Such subgroups should also be explainable, that is, one should always be able to understand why a given individual was assigned to a given subgroup. Machine learning, and recently, deep learning approaches have proved very effective in building useful subgroups, that are, however, not interpretable due to the black box nature of such approaches. Here, we developed an unsupervised framework to provide a meaningful and explainable phenomapping of patients. Our framework starts with a deep learning approach, a convolutional autoencoder, that summarizes high-dimensional medical data providing a latent representation (embeddings) that still contains all the relevant features present in the original data. While being a useful summary, the embedding dimensionality is still too large to be treated computationally, therefore, we implemented a further dimensionality-reduction step prior to the final density-based clustering for the subgroup indetification. Finally, we applied the Local Interpretable Model-Agnostic Explanations (LIME) algorithm to provide an interpretation of which medical events characterize patients included in each subgroup. Using data from 161,767 patients accessing one of the local health services in the Piedmont region of Italy (ASL-CN2), which provided healthcare administrative data from 2019-01-01 to 2024-02-29, the implemented framework identified 180 subgroups. Among those, we were able to distinguish subgroups that include pregnant people, individuals with cardiac and neurological problems, and those whose health is reasonably well, but experienced fractures in the past. These subgroups also correlated with different frequencies of inappropriate access to emergency services, highlighting further uses of our approach, such as supporting the planning of resources and health intervention on the territory.

Scheda breve

	Facoltà/Dipartimento
	
				MATEMATICA "GIUSEPPE PEANO"
			
	Corso di studio
	
				STOCHASTICS AND DATA SCIENCE
			
	Titolo inglese
	
				Explainable convolutional autoencoder for unsupervised patient phenomapping with Healthcare Administrative Data
			
	Abstract in inglese
	
				Precision medicine aims at delivering medical interventions, both as a preventive approach and as a treatment, to the right individual at the right time based on the individual's medical history, environment, and lifestyle. 
To reach this goal, one should understand that individuals find themselves in different places along a continuum, based on diseases, diseases risk factors, and other factors, such as age. 
It is therefore important to map each individual in the right space at the right time, that is, to identify subgroups of individuals based on data similarity, or pattern (phenomapping). Such subgroups should also be explainable, that is, one should always be able to understand why a given individual was assigned to a given subgroup.
Machine learning, and recently, deep learning approaches have proved very effective in building useful subgroups, that are, however, not interpretable due to the black box nature of such approaches. 
Here, we developed an unsupervised framework to provide a meaningful and explainable phenomapping of patients.
Our framework starts with a deep learning approach, a convolutional autoencoder, that summarizes high-dimensional medical data providing a latent representation (embeddings) that still contains all the relevant features present in the original data. While being a useful summary, the embedding dimensionality is still too large to be treated computationally, therefore, we implemented a further dimensionality-reduction step prior to the final density-based clustering for the subgroup indetification. Finally, we applied the Local Interpretable Model-Agnostic Explanations (LIME) algorithm to provide an interpretation of which medical events characterize patients included in each subgroup.
Using data from 161,767 patients accessing one of the local health services in the Piedmont region of Italy (ASL-CN2), which provided healthcare administrative data from 2019-01-01 to 2024-02-29, the implemented framework identified 180 subgroups. Among those, we were able to distinguish subgroups that include pregnant people, individuals with cardiac and neurological problems, and those whose health is reasonably well, but experienced fractures in the past. These subgroups also correlated with different frequencies of inappropriate access to emergency services, highlighting further uses of our approach, such as supporting the planning of resources and health intervention on the territory.
			
	Relatrice / Relatore
	
				BECCUTI, MARCO
VISCONTI, ALESSIA
			
	Controrelatrice / Controrelatore
	
				BERCHIALLA, PAOLA
CONTALDO, SANDRO GEPIRO
			
	Modalità consultazione tesi
	
				Autorizzo consultazione esterna dell'elaborato
			
	Appare nelle tipologie:
	
				Corso di Laurea Magistrale

File in questo prodotto:

File	Dimensione	Formato
Rondinone_thesis.pdf non disponibili Dimensione 3.31 MB Formato Adobe PDF	3.31 MB	Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14240/166465