Precision medicine aims at delivering medical interventions, both as a preventive approach and as a treatment, to the right individual at the right time based on the individual's medical history, environment, and lifestyle. To reach this goal, one should understand that individuals find themselves in different places along a continuum, based on diseases, diseases risk factors, and other factors, such as age. It is therefore important to map each individual in the right space at the right time, that is, to identify subgroups of individuals based on data similarity, or pattern (phenomapping). Such subgroups should also be explainable, that is, one should always be able to understand why a given individual was assigned to a given subgroup. Machine learning, and recently, deep learning approaches have proved very effective in building useful subgroups, that are, however, not interpretable due to the black box nature of such approaches. Here, we developed an unsupervised framework to provide a meaningful and explainable phenomapping of patients. Our framework starts with a deep learning approach, a convolutional autoencoder, that summarizes high-dimensional medical data providing a latent representation (embeddings) that still contains all the relevant features present in the original data. While being a useful summary, the embedding dimensionality is still too large to be treated computationally, therefore, we implemented a further dimensionality-reduction step prior to the final density-based clustering for the subgroup indetification. Finally, we applied the Local Interpretable Model-Agnostic Explanations (LIME) algorithm to provide an interpretation of which medical events characterize patients included in each subgroup. Using data from 161,767 patients accessing one of the local health services in the Piedmont region of Italy (ASL-CN2), which provided healthcare administrative data from 2019-01-01 to 2024-02-29, the implemented framework identified 180 subgroups. Among those, we were able to distinguish subgroups that include pregnant people, individuals with cardiac and neurological problems, and those whose health is reasonably well, but experienced fractures in the past. These subgroups also correlated with different frequencies of inappropriate access to emergency services, highlighting further uses of our approach, such as supporting the planning of resources and health intervention on the territory.

Precision medicine aims at delivering medical interventions, both as a preventive approach and as a treatment, to the right individual at the right time based on the individual's medical history, environment, and lifestyle. To reach this goal, one should understand that individuals find themselves in different places along a continuum, based on diseases, diseases risk factors, and other factors, such as age. It is therefore important to map each individual in the right space at the right time, that is, to identify subgroups of individuals based on data similarity, or pattern (phenomapping). Such subgroups should also be explainable, that is, one should always be able to understand why a given individual was assigned to a given subgroup. Machine learning, and recently, deep learning approaches have proved very effective in building useful subgroups, that are, however, not interpretable due to the black box nature of such approaches. Here, we developed an unsupervised framework to provide a meaningful and explainable phenomapping of patients. Our framework starts with a deep learning approach, a convolutional autoencoder, that summarizes high-dimensional medical data providing a latent representation (embeddings) that still contains all the relevant features present in the original data. While being a useful summary, the embedding dimensionality is still too large to be treated computationally, therefore, we implemented a further dimensionality-reduction step prior to the final density-based clustering for the subgroup indetification. Finally, we applied the Local Interpretable Model-Agnostic Explanations (LIME) algorithm to provide an interpretation of which medical events characterize patients included in each subgroup. Using data from 161,767 patients accessing one of the local health services in the Piedmont region of Italy (ASL-CN2), which provided healthcare administrative data from 2019-01-01 to 2024-02-29, the implemented framework identified 180 subgroups. Among those, we were able to distinguish subgroups that include pregnant people, individuals with cardiac and neurological problems, and those whose health is reasonably well, but experienced fractures in the past. These subgroups also correlated with different frequencies of inappropriate access to emergency services, highlighting further uses of our approach, such as supporting the planning of resources and health intervention on the territory.

Explainable convolutional autoencoder for unsupervised patient phenomapping with Healthcare Administrative Data

RONDINONE, FRANCESCA
2023/2024

Abstract

Precision medicine aims at delivering medical interventions, both as a preventive approach and as a treatment, to the right individual at the right time based on the individual's medical history, environment, and lifestyle. To reach this goal, one should understand that individuals find themselves in different places along a continuum, based on diseases, diseases risk factors, and other factors, such as age. It is therefore important to map each individual in the right space at the right time, that is, to identify subgroups of individuals based on data similarity, or pattern (phenomapping). Such subgroups should also be explainable, that is, one should always be able to understand why a given individual was assigned to a given subgroup. Machine learning, and recently, deep learning approaches have proved very effective in building useful subgroups, that are, however, not interpretable due to the black box nature of such approaches. Here, we developed an unsupervised framework to provide a meaningful and explainable phenomapping of patients. Our framework starts with a deep learning approach, a convolutional autoencoder, that summarizes high-dimensional medical data providing a latent representation (embeddings) that still contains all the relevant features present in the original data. While being a useful summary, the embedding dimensionality is still too large to be treated computationally, therefore, we implemented a further dimensionality-reduction step prior to the final density-based clustering for the subgroup indetification. Finally, we applied the Local Interpretable Model-Agnostic Explanations (LIME) algorithm to provide an interpretation of which medical events characterize patients included in each subgroup. Using data from 161,767 patients accessing one of the local health services in the Piedmont region of Italy (ASL-CN2), which provided healthcare administrative data from 2019-01-01 to 2024-02-29, the implemented framework identified 180 subgroups. Among those, we were able to distinguish subgroups that include pregnant people, individuals with cardiac and neurological problems, and those whose health is reasonably well, but experienced fractures in the past. These subgroups also correlated with different frequencies of inappropriate access to emergency services, highlighting further uses of our approach, such as supporting the planning of resources and health intervention on the territory.
Explainable convolutional autoencoder for unsupervised patient phenomapping with Healthcare Administrative Data
Precision medicine aims at delivering medical interventions, both as a preventive approach and as a treatment, to the right individual at the right time based on the individual's medical history, environment, and lifestyle. To reach this goal, one should understand that individuals find themselves in different places along a continuum, based on diseases, diseases risk factors, and other factors, such as age. It is therefore important to map each individual in the right space at the right time, that is, to identify subgroups of individuals based on data similarity, or pattern (phenomapping). Such subgroups should also be explainable, that is, one should always be able to understand why a given individual was assigned to a given subgroup. Machine learning, and recently, deep learning approaches have proved very effective in building useful subgroups, that are, however, not interpretable due to the black box nature of such approaches. Here, we developed an unsupervised framework to provide a meaningful and explainable phenomapping of patients. Our framework starts with a deep learning approach, a convolutional autoencoder, that summarizes high-dimensional medical data providing a latent representation (embeddings) that still contains all the relevant features present in the original data. While being a useful summary, the embedding dimensionality is still too large to be treated computationally, therefore, we implemented a further dimensionality-reduction step prior to the final density-based clustering for the subgroup indetification. Finally, we applied the Local Interpretable Model-Agnostic Explanations (LIME) algorithm to provide an interpretation of which medical events characterize patients included in each subgroup. Using data from 161,767 patients accessing one of the local health services in the Piedmont region of Italy (ASL-CN2), which provided healthcare administrative data from 2019-01-01 to 2024-02-29, the implemented framework identified 180 subgroups. Among those, we were able to distinguish subgroups that include pregnant people, individuals with cardiac and neurological problems, and those whose health is reasonably well, but experienced fractures in the past. These subgroups also correlated with different frequencies of inappropriate access to emergency services, highlighting further uses of our approach, such as supporting the planning of resources and health intervention on the territory.
Autorizzo consultazione esterna dell'elaborato
File in questo prodotto:
File Dimensione Formato  
Rondinone_thesis.pdf

non disponibili

Dimensione 3.31 MB
Formato Adobe PDF
3.31 MB Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14240/166465