Given the importance of single-cell techniques in modern molecular biology, especially to investigate cellular heterogeneity, there is constant need of new methods for efficient data mining. Since single cell technologies provide many high-dimensional sample measurements, they are the ideal datasets for the application of Deep Learning and Machine Learning approaches. In particular, the goal is to discover hidden features and biological meaning of cell clusters thanks to a tool based on Sparsely-Connected Autoencoder (SCA). An autoencoder is composed of an encoder and a decoder sub-model and it is a very powerful tool in data compression and noise removal. Compared to other methods based on autoencoders, SCA has the advantage of providing a controlled association among the input layer and the decoder module, consisting of experimentally-validated relationships between genes and biological features like transcription factors, miRNA, etc.. In this architecture the decoder model is not a "black box" whose content cannot be deciphered, but instead it can be used to grab new information usually hidden in single cell data. In this way, cells can be clustered based on meta-features like transcription factors expression, which is usually difficult to depict due to the low expression level, or miRNA expression, which is not technically measurable in single cell RNAseq data. Additionally, two new metrics, QCC (Quality Control of Cluster) and QCM (Quality Control of Model), allow the evaluation of SCA performance in reconstructing cell clusters obtained with other methods or by other SCA runs, respectively.

Given the importance of single-cell techniques in modern molecular biology, especially to investigate cellular heterogeneity, there is constant need of new methods for efficient data mining. Since single cell technologies provide many high-dimensional sample measurements, they are the ideal datasets for the application of Deep Learning and Machine Learning approaches. In particular, the goal is to discover hidden features and biological meaning of cell clusters thanks to a tool based on Sparsely-Connected Autoencoder (SCA). An autoencoder is composed of an encoder and a decoder sub-model and it is a very powerful tool in data compression and noise removal. Compared to other methods based on autoencoders, SCA has the advantage of providing a controlled association among the input layer and the decoder module, consisting of experimentally-validated relationships between genes and biological features like transcription factors, miRNA, etc.. In this architecture the decoder model is not a "black box" whose content cannot be deciphered, but instead it can be used to grab new information usually hidden in single cell data. In this way, cells can be clustered based on meta-features like transcription factors expression, which is usually difficult to depict due to the low expression level, or miRNA expression, which is not technically measurable in single cell RNAseq data. Additionally, two new metrics, QCC (Quality Control of Cluster) and QCM (Quality Control of Model), allow the evaluation of SCA performance in reconstructing cell clusters obtained with other methods or by other SCA runs, respectively.

Sparsely Connected Autoencoders: A Multi-Purpose Tool for Single Cell omics Analysis

RATTO, MARIA LUISA
2021/2022

Abstract

Given the importance of single-cell techniques in modern molecular biology, especially to investigate cellular heterogeneity, there is constant need of new methods for efficient data mining. Since single cell technologies provide many high-dimensional sample measurements, they are the ideal datasets for the application of Deep Learning and Machine Learning approaches. In particular, the goal is to discover hidden features and biological meaning of cell clusters thanks to a tool based on Sparsely-Connected Autoencoder (SCA). An autoencoder is composed of an encoder and a decoder sub-model and it is a very powerful tool in data compression and noise removal. Compared to other methods based on autoencoders, SCA has the advantage of providing a controlled association among the input layer and the decoder module, consisting of experimentally-validated relationships between genes and biological features like transcription factors, miRNA, etc.. In this architecture the decoder model is not a "black box" whose content cannot be deciphered, but instead it can be used to grab new information usually hidden in single cell data. In this way, cells can be clustered based on meta-features like transcription factors expression, which is usually difficult to depict due to the low expression level, or miRNA expression, which is not technically measurable in single cell RNAseq data. Additionally, two new metrics, QCC (Quality Control of Cluster) and QCM (Quality Control of Model), allow the evaluation of SCA performance in reconstructing cell clusters obtained with other methods or by other SCA runs, respectively.
Sparsely Connected Autoencoders: A Multi-Purpose Tool for Single Cell omics Analysis
Given the importance of single-cell techniques in modern molecular biology, especially to investigate cellular heterogeneity, there is constant need of new methods for efficient data mining. Since single cell technologies provide many high-dimensional sample measurements, they are the ideal datasets for the application of Deep Learning and Machine Learning approaches. In particular, the goal is to discover hidden features and biological meaning of cell clusters thanks to a tool based on Sparsely-Connected Autoencoder (SCA). An autoencoder is composed of an encoder and a decoder sub-model and it is a very powerful tool in data compression and noise removal. Compared to other methods based on autoencoders, SCA has the advantage of providing a controlled association among the input layer and the decoder module, consisting of experimentally-validated relationships between genes and biological features like transcription factors, miRNA, etc.. In this architecture the decoder model is not a "black box" whose content cannot be deciphered, but instead it can be used to grab new information usually hidden in single cell data. In this way, cells can be clustered based on meta-features like transcription factors expression, which is usually difficult to depict due to the low expression level, or miRNA expression, which is not technically measurable in single cell RNAseq data. Additionally, two new metrics, QCC (Quality Control of Cluster) and QCM (Quality Control of Model), allow the evaluation of SCA performance in reconstructing cell clusters obtained with other methods or by other SCA runs, respectively.
POLI, VALERIA
IMPORT TESI SOLO SU ESSE3 DAL 2018
File in questo prodotto:
File Dimensione Formato  
RATTO_Thesis.pdf

non disponibili

Dimensione 27.46 MB
Formato Adobe PDF
27.46 MB Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14240/6117