Given the importance of single-cell techniques in modern molecular biology, especially to investigate cellular heterogeneity, there is constant need of new methods for efficient data mining. Since single cell technologies provide many high-dimensional sample measurements, they are the ideal datasets for the application of Deep Learning and Machine Learning approaches. In particular, the goal is to discover hidden features and biological meaning of cell clusters thanks to a tool based on Sparsely-Connected Autoencoder (SCA). An autoencoder is composed of an encoder and a decoder sub-model and it is a very powerful tool in data compression and noise removal. Compared to other methods based on autoencoders, SCA has the advantage of providing a controlled association among the input layer and the decoder module, consisting of experimentally-validated relationships between genes and biological features like transcription factors, miRNA, etc.. In this architecture the decoder model is not a "black box" whose content cannot be deciphered, but instead it can be used to grab new information usually hidden in single cell data. In this way, cells can be clustered based on meta-features like transcription factors expression, which is usually difficult to depict due to the low expression level, or miRNA expression, which is not technically measurable in single cell RNAseq data. Additionally, two new metrics, QCC (Quality Control of Cluster) and QCM (Quality Control of Model), allow the evaluation of SCA performance in reconstructing cell clusters obtained with other methods or by other SCA runs, respectively.
Given the importance of single-cell techniques in modern molecular biology, especially to investigate cellular heterogeneity, there is constant need of new methods for efficient data mining. Since single cell technologies provide many high-dimensional sample measurements, they are the ideal datasets for the application of Deep Learning and Machine Learning approaches. In particular, the goal is to discover hidden features and biological meaning of cell clusters thanks to a tool based on Sparsely-Connected Autoencoder (SCA). An autoencoder is composed of an encoder and a decoder sub-model and it is a very powerful tool in data compression and noise removal. Compared to other methods based on autoencoders, SCA has the advantage of providing a controlled association among the input layer and the decoder module, consisting of experimentally-validated relationships between genes and biological features like transcription factors, miRNA, etc.. In this architecture the decoder model is not a "black box" whose content cannot be deciphered, but instead it can be used to grab new information usually hidden in single cell data. In this way, cells can be clustered based on meta-features like transcription factors expression, which is usually difficult to depict due to the low expression level, or miRNA expression, which is not technically measurable in single cell RNAseq data. Additionally, two new metrics, QCC (Quality Control of Cluster) and QCM (Quality Control of Model), allow the evaluation of SCA performance in reconstructing cell clusters obtained with other methods or by other SCA runs, respectively.
Sparsely Connected Autoencoders: A Multi-Purpose Tool for Single Cell omics Analysis
RATTO, MARIA LUISA
2021/2022
Abstract
Given the importance of single-cell techniques in modern molecular biology, especially to investigate cellular heterogeneity, there is constant need of new methods for efficient data mining. Since single cell technologies provide many high-dimensional sample measurements, they are the ideal datasets for the application of Deep Learning and Machine Learning approaches. In particular, the goal is to discover hidden features and biological meaning of cell clusters thanks to a tool based on Sparsely-Connected Autoencoder (SCA). An autoencoder is composed of an encoder and a decoder sub-model and it is a very powerful tool in data compression and noise removal. Compared to other methods based on autoencoders, SCA has the advantage of providing a controlled association among the input layer and the decoder module, consisting of experimentally-validated relationships between genes and biological features like transcription factors, miRNA, etc.. In this architecture the decoder model is not a "black box" whose content cannot be deciphered, but instead it can be used to grab new information usually hidden in single cell data. In this way, cells can be clustered based on meta-features like transcription factors expression, which is usually difficult to depict due to the low expression level, or miRNA expression, which is not technically measurable in single cell RNAseq data. Additionally, two new metrics, QCC (Quality Control of Cluster) and QCM (Quality Control of Model), allow the evaluation of SCA performance in reconstructing cell clusters obtained with other methods or by other SCA runs, respectively.File | Dimensione | Formato | |
---|---|---|---|
RATTO_Thesis.pdf
non disponibili
Dimensione
27.46 MB
Formato
Adobe PDF
|
27.46 MB | Adobe PDF |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14240/6117