Sparsely Connected Autoencoders: A Multi-Purpose Tool for Single Cell omics Analysis

Given the importance of single-cell techniques in modern molecular biology, especially to investigate cellular heterogeneity, there is constant need of new methods for efficient data mining. Since single cell technologies provide many high-dimensional sample measurements, they are the ideal datasets for the application of Deep Learning and Machine Learning approaches. In particular, the goal is to discover hidden features and biological meaning of cell clusters thanks to a tool based on Sparsely-Connected Autoencoder (SCA). An autoencoder is composed of an encoder and a decoder sub-model and it is a very powerful tool in data compression and noise removal. Compared to other methods based on autoencoders, SCA has the advantage of providing a controlled association among the input layer and the decoder module, consisting of experimentally-validated relationships between genes and biological features like transcription factors, miRNA, etc.. In this architecture the decoder model is not a "black box" whose content cannot be deciphered, but instead it can be used to grab new information usually hidden in single cell data. In this way, cells can be clustered based on meta-features like transcription factors expression, which is usually difficult to depict due to the low expression level, or miRNA expression, which is not technically measurable in single cell RNAseq data. Additionally, two new metrics, QCC (Quality Control of Cluster) and QCM (Quality Control of Model), allow the evaluation of SCA performance in reconstructing cell clusters obtained with other methods or by other SCA runs, respectively.

Sparsely Connected Autoencoders: A Multi-Purpose Tool for Single Cell omics Analysis

RATTO, MARIA LUISA

2021/2022

Abstract

Given the importance of single-cell techniques in modern molecular biology, especially to investigate cellular heterogeneity, there is constant need of new methods for efficient data mining. Since single cell technologies provide many high-dimensional sample measurements, they are the ideal datasets for the application of Deep Learning and Machine Learning approaches. In particular, the goal is to discover hidden features and biological meaning of cell clusters thanks to a tool based on Sparsely-Connected Autoencoder (SCA). An autoencoder is composed of an encoder and a decoder sub-model and it is a very powerful tool in data compression and noise removal. Compared to other methods based on autoencoders, SCA has the advantage of providing a controlled association among the input layer and the decoder module, consisting of experimentally-validated relationships between genes and biological features like transcription factors, miRNA, etc.. In this architecture the decoder model is not a "black box" whose content cannot be deciphered, but instead it can be used to grab new information usually hidden in single cell data. In this way, cells can be clustered based on meta-features like transcription factors expression, which is usually difficult to depict due to the low expression level, or miRNA expression, which is not technically measurable in single cell RNAseq data. Additionally, two new metrics, QCC (Quality Control of Cluster) and QCM (Quality Control of Model), allow the evaluation of SCA performance in reconstructing cell clusters obtained with other methods or by other SCA runs, respectively.

Scheda breve

	Facoltà/Dipartimento
	
				BIOTECNOLOGIE MOLECOLARI E SCIENZE PER LA SALUTE
			
	Corso di studio
	
				MOLECULAR BIOTECHNOLOGY - BIOTECNOLOGIE MOLECOLARI
			
	Titolo inglese
	
				Sparsely Connected Autoencoders: A Multi-Purpose Tool for Single Cell omics Analysis
			
	Abstract in inglese
	
				Given the importance of single-cell techniques in modern molecular biology, especially to investigate cellular heterogeneity, there is constant need of new methods for efficient data mining. Since single cell technologies provide many high-dimensional sample measurements, they are the ideal datasets for the application of Deep Learning and Machine Learning approaches. In particular, the goal is to discover hidden features and biological meaning of cell clusters thanks to a tool based on Sparsely-Connected Autoencoder (SCA). An autoencoder is composed of an encoder and a decoder sub-model and it is a very powerful tool in data compression and noise removal. Compared to other methods based on autoencoders, SCA has the advantage of providing a controlled association among the input layer and the decoder module, consisting of experimentally-validated relationships between genes and biological features like transcription factors, miRNA, etc.. In this architecture the decoder model is not a "black box" whose content cannot be deciphered, but instead it can be used to grab new information usually hidden in single cell data. In this way, cells can be clustered based on meta-features like transcription factors expression, which is usually difficult to depict due to the low expression level, or miRNA expression, which is not technically measurable in single cell RNAseq data. Additionally, two new metrics, QCC (Quality Control of Cluster) and QCM (Quality Control of Model), allow the evaluation of SCA performance in reconstructing cell clusters obtained with other methods or by other SCA runs, respectively.
			
	Relatrice / Relatore
	
				CALOGERO, RAFFAELE ADOLFO
			
	Altre figure coinvolte (es: Tutor)
	
				POLI, VALERIA
			
	Modalità consultazione tesi
	
				IMPORT TESI SOLO SU ESSE3 DAL 2018
			
	Appare nelle tipologie:
	
				Corso di Laurea Magistrale

File in questo prodotto:

File	Dimensione	Formato
RATTO_Thesis.pdf non disponibili Dimensione 27.46 MB Formato Adobe PDF	27.46 MB	Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14240/6117