In this Master Thesis, a statistical analysis on a sample of 84 subjects isperformed; each of the individuals is known to be an high risk subject orto have lung cancer, belonging to a binary response class. The analyseddata set is composed by the measurements of the intensities of 933 serumproteins for each subject involved. The purpose is to select a statisticalmodel for the classification of new individuals, based on an algorithm fora feature selection on the huge feature space and on an algorithm for theprediction using the selected features. Results are evaluated based mainlyon the accuracy of the classification, estimated through k-fold stratifiedcross validation. Before the development of the model, a preliminary non-supervised cluster analysis and a preliminary supervised one are performed.In the end, ideas for model optimization (e.g. reducing the number offeatures on which the prediction is based, to speed up the clinical collectionprocess) are proposed. The results got during this study have to be exploitedin the health-care domain, hence the explainability of the model, the time ittakes to collect the measurements on which the prediction would be based,and the expected sensitivity of the prediction are taken into account as well.

Metodi di apprendimento automatico per il rilevamento del cancro ai polmoni tramite le proteine del siero

SALVETTO, CAROLA
2020/2021

Abstract

In this Master Thesis, a statistical analysis on a sample of 84 subjects isperformed; each of the individuals is known to be an high risk subject orto have lung cancer, belonging to a binary response class. The analyseddata set is composed by the measurements of the intensities of 933 serumproteins for each subject involved. The purpose is to select a statisticalmodel for the classification of new individuals, based on an algorithm fora feature selection on the huge feature space and on an algorithm for theprediction using the selected features. Results are evaluated based mainlyon the accuracy of the classification, estimated through k-fold stratifiedcross validation. Before the development of the model, a preliminary non-supervised cluster analysis and a preliminary supervised one are performed.In the end, ideas for model optimization (e.g. reducing the number offeatures on which the prediction is based, to speed up the clinical collectionprocess) are proposed. The results got during this study have to be exploitedin the health-care domain, hence the explainability of the model, the time ittakes to collect the measurements on which the prediction would be based,and the expected sensitivity of the prediction are taken into account as well.
ENG
IMPORT DA TESIONLINE
File in questo prodotto:
File Dimensione Formato  
833704_tesi_salvetto.pdf

non disponibili

Tipologia: Altro materiale allegato
Dimensione 3.94 MB
Formato Adobe PDF
3.94 MB Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14240/70182