The advent of single-cell genetic sequencing technologies has provided biologists with fundamental information for studying tissue heterogeneity, embryonic developmental processes, and precision medicine. These data also enabled the in-depth study of cell types. Recognizing which cell type a cell belongs to by knowing only its gene expression data is important, for example, for the rapid and automatic recognition of cancer cells. There are many software for classifying cell types, all differing in statistical assumptions and architecture. The purpose of this thesis is to approach the classification problem with a type of architecture that, to the best of our knowledge, has not yet been applied to single-cell data, namely one based on the Hopfield model. The Hopfield model, proposed in 1982 to model the memory function of the brain, is a milestone in artificial intelligence. It considers a population of interacting neurons and, at each update, it assigns to each neuron a logical value. By defining an energy function and an update rule, it is possible to classify corrupted or distorted input in terms of memories previously assigned to the model. These memories correspond to energy minima and, at each update, the classifier moves along the energetic profile toward the minimum corresponding to the memory that is most similar to the input. In our work, we apply this model to scRNA-seq data. Our work echoes that of Hope4Genes in 2018, which uses transcriptomics data to classify breast cancer subtypes. Hope4Genes worked with bulk data and the object of this thesis is its application to single-cell data. Working with single-cell data brings some difficulties with respect to bulk data, due to greater variability, more overlaps between clusters, and the presence of dropouts. We considered 4 annotated datasets from Mouse Cell Atlas (liver, lung, stomach, and kidney). Each dataset has been divided into a part used to build memories, and another part used to evaluate the classification. To create memories we selected markers using a tutorial from Scanpy. We then compared the performances from Hope4Genes with scSorter and TreeArches. In all cases, the old version of Hope4Genes resulted to be a worse classifier than the others, but the version we here propose resulted to be competitive. Using simulations, we discovered that Hop4Genes could be preferable to scSorter in extreme cases such as huge unbalancing, and a great number of dropouts.
Un algoritmo tipo Hopfield per la classificazione di tipi cellulari
HILFIKER, MATHIAS
2021/2022
Abstract
The advent of single-cell genetic sequencing technologies has provided biologists with fundamental information for studying tissue heterogeneity, embryonic developmental processes, and precision medicine. These data also enabled the in-depth study of cell types. Recognizing which cell type a cell belongs to by knowing only its gene expression data is important, for example, for the rapid and automatic recognition of cancer cells. There are many software for classifying cell types, all differing in statistical assumptions and architecture. The purpose of this thesis is to approach the classification problem with a type of architecture that, to the best of our knowledge, has not yet been applied to single-cell data, namely one based on the Hopfield model. The Hopfield model, proposed in 1982 to model the memory function of the brain, is a milestone in artificial intelligence. It considers a population of interacting neurons and, at each update, it assigns to each neuron a logical value. By defining an energy function and an update rule, it is possible to classify corrupted or distorted input in terms of memories previously assigned to the model. These memories correspond to energy minima and, at each update, the classifier moves along the energetic profile toward the minimum corresponding to the memory that is most similar to the input. In our work, we apply this model to scRNA-seq data. Our work echoes that of Hope4Genes in 2018, which uses transcriptomics data to classify breast cancer subtypes. Hope4Genes worked with bulk data and the object of this thesis is its application to single-cell data. Working with single-cell data brings some difficulties with respect to bulk data, due to greater variability, more overlaps between clusters, and the presence of dropouts. We considered 4 annotated datasets from Mouse Cell Atlas (liver, lung, stomach, and kidney). Each dataset has been divided into a part used to build memories, and another part used to evaluate the classification. To create memories we selected markers using a tutorial from Scanpy. We then compared the performances from Hope4Genes with scSorter and TreeArches. In all cases, the old version of Hope4Genes resulted to be a worse classifier than the others, but the version we here propose resulted to be competitive. Using simulations, we discovered that Hop4Genes could be preferable to scSorter in extreme cases such as huge unbalancing, and a great number of dropouts.File | Dimensione | Formato | |
---|---|---|---|
858528_tesi_mathias_hilfiker.pdf
non disponibili
Tipologia:
Altro materiale allegato
Dimensione
5.09 MB
Formato
Adobe PDF
|
5.09 MB | Adobe PDF |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14240/86564