Alternative splicing is one of the most complex biological mechanisms affecting more than 95% of human genes and it regulates heterogeneous physiological and pathophysiological processes. However, its importance in development, differentiation and disease has been slowly revealed only in relatively recent times. This mechanism can be viewed as the fine-tune regulation of the “bulkier” gene expression layer, but perturbation analyses revealed how alternative splicing has a great impact on the final cell or tissue phenotype. Thanks to the advent and improvements of next generation RNA sequencing techniques, it is now possible to deeply investigate the transcriptome and perform differential splicing analysis between samples. Although there are now established tools and solid pipelines for the description of alternative splicing, a computational workflow for the functional predictions of the downstream consequences of a differential splicing are currently missing. In this thesis, I will propose a novel computational pipeline of analysis able to provide a network representation of the downstream effects of alternative splicing in terms of differential protein-protein interactions between two different experimental conditions. Starting from raw RNA-Seq gene expression data, differentially expressed genes, isoforms and differentially represented splicing events on protein coding genes are embedded within a protein-protein interaction network. The network was designed based on both protein-protein (BioGRID) and domain-domain (3did) interaction data. The pipeline is able to map each protein domain to their corresponding exon and to annotate every isoform with their encoded domains. To quantitatively describe changing interactions due to alternative splicing between two experimental conditions, the pipeline integrates the network with results from differential splicing and isoform usage analyses computing a score for each pairwise protein interaction. This pipeline was tested on data from MCF-7 breast cancer cell lines treated with siRNA targeting two well-known splicing regulators, ESRP1 and ESRP2, or with control siRNA. Given the pivotal role of these proteins in the regulation of the cell phenotype this experiment served as a solid model to validate the consistency, accuracy and potential of this computational pipeline. The analysis revealed 754 differentially expressed genes, with 422 downregulated and 332 upregulated, 788 alternative spliced genes and a total of 814 significant isoform switching events, involving 567 genes. Consistent with the experimental design, all of these gene sets obtained from differential analysis partially overlapped and were enriched in cell adhesion regulation, extracellular matrix remodelling and tissue morphogenesis. Significantly changing interactions were observed among well-known actors in the EMT pathway such as adhesion molecules, ephrin and growth factors receptors and other proteins previously described in research involved in migration and metastasis. Small subnetworks were isolated from the complete interaction network generated to visualize and highlight notorious and less-known players found to be most affected by alternative splicing. The results showed a consistency and coherence between the network interactions and the current literature. Furthermore, the pipeline highlighted less trivial and novel potential actors involved in the downstream effects of ESRP1/2 silencing.
Una nuova pipeline computazionale per la caratterizzazione funzionale dello splicing alternativo
FRANCHITTI, LORENZO
2020/2021
Abstract
Alternative splicing is one of the most complex biological mechanisms affecting more than 95% of human genes and it regulates heterogeneous physiological and pathophysiological processes. However, its importance in development, differentiation and disease has been slowly revealed only in relatively recent times. This mechanism can be viewed as the fine-tune regulation of the “bulkier” gene expression layer, but perturbation analyses revealed how alternative splicing has a great impact on the final cell or tissue phenotype. Thanks to the advent and improvements of next generation RNA sequencing techniques, it is now possible to deeply investigate the transcriptome and perform differential splicing analysis between samples. Although there are now established tools and solid pipelines for the description of alternative splicing, a computational workflow for the functional predictions of the downstream consequences of a differential splicing are currently missing. In this thesis, I will propose a novel computational pipeline of analysis able to provide a network representation of the downstream effects of alternative splicing in terms of differential protein-protein interactions between two different experimental conditions. Starting from raw RNA-Seq gene expression data, differentially expressed genes, isoforms and differentially represented splicing events on protein coding genes are embedded within a protein-protein interaction network. The network was designed based on both protein-protein (BioGRID) and domain-domain (3did) interaction data. The pipeline is able to map each protein domain to their corresponding exon and to annotate every isoform with their encoded domains. To quantitatively describe changing interactions due to alternative splicing between two experimental conditions, the pipeline integrates the network with results from differential splicing and isoform usage analyses computing a score for each pairwise protein interaction. This pipeline was tested on data from MCF-7 breast cancer cell lines treated with siRNA targeting two well-known splicing regulators, ESRP1 and ESRP2, or with control siRNA. Given the pivotal role of these proteins in the regulation of the cell phenotype this experiment served as a solid model to validate the consistency, accuracy and potential of this computational pipeline. The analysis revealed 754 differentially expressed genes, with 422 downregulated and 332 upregulated, 788 alternative spliced genes and a total of 814 significant isoform switching events, involving 567 genes. Consistent with the experimental design, all of these gene sets obtained from differential analysis partially overlapped and were enriched in cell adhesion regulation, extracellular matrix remodelling and tissue morphogenesis. Significantly changing interactions were observed among well-known actors in the EMT pathway such as adhesion molecules, ephrin and growth factors receptors and other proteins previously described in research involved in migration and metastasis. Small subnetworks were isolated from the complete interaction network generated to visualize and highlight notorious and less-known players found to be most affected by alternative splicing. The results showed a consistency and coherence between the network interactions and the current literature. Furthermore, the pipeline highlighted less trivial and novel potential actors involved in the downstream effects of ESRP1/2 silencing.File | Dimensione | Formato | |
---|---|---|---|
842677_cmb_master_thesis_lorenzo_franchitti.pdf
non disponibili
Tipologia:
Altro materiale allegato
Dimensione
2.07 MB
Formato
Adobe PDF
|
2.07 MB | Adobe PDF |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14240/66366