Copy number variations (CNV) correspond to gains or losses of genetic material, that account for about 13% of the human genome and play an important role in human population diversity and susceptibility to disease. Recent advancements in high throughput technologies have allowed for the analysis of CNVs in the whole genome and in thousands of individuals leading to the identification of CNVs associated with diseases such as autism, schizophrenia, intellectual disability and several cancers. Anyway, the estimation of CNVs from SNP-array data is up to now one of the open question in biostatistics. Several algorithms has been developed in recent years, however each of those give different results and there is no consensus opinion about which method performs better. The most used CNV detection algorithms are based on Hidden Markov Models (HMM) and Circular Binary Segmentation (CBS). In this work we compared the performance of the most updated HMM-based and CBS-based algorithms, combined with two normalization procedures. As a second aim we performed two genome-wide association studies in order to identify CNV associated to malignant pleural mesothelioma (MPM) and myocardial infarction (MI). In order to compare the performance of different normalization/CNV call combination we used the NSR index. It is based on the idea that SNPs that fall in a CNV region in only one of the sample analyzed is more likely to be a false positive than others; so, the lower is the index, the better is the method. Once, identified the most reliable set of CNV we tested the association between CNVs and disease status, by logistic regression adjusting for possible confounders. In conclusion, we provided a useful pipeline in order to compare different CNV call algorithms and demonstrated that, on Illumina arrays, HMM based methods well perform. Moreover, several CNVs associated to MPM were found. Whether validated and replicated, our results could be useful in understanding the etiology of MPM and support the complementary role of genetic background in asbestos-related carcinogenesis of the pleura, indicating that genetic risk factors should be taken into account to better define the MPM risk profile of people with a high exposure to asbestos.
Valutazione e confronto dei metodi attuali per la stima delle Copy Number Variation con dati provenienti da SNP array
DAMIANO, LIBERA
2013/2014
Abstract
Copy number variations (CNV) correspond to gains or losses of genetic material, that account for about 13% of the human genome and play an important role in human population diversity and susceptibility to disease. Recent advancements in high throughput technologies have allowed for the analysis of CNVs in the whole genome and in thousands of individuals leading to the identification of CNVs associated with diseases such as autism, schizophrenia, intellectual disability and several cancers. Anyway, the estimation of CNVs from SNP-array data is up to now one of the open question in biostatistics. Several algorithms has been developed in recent years, however each of those give different results and there is no consensus opinion about which method performs better. The most used CNV detection algorithms are based on Hidden Markov Models (HMM) and Circular Binary Segmentation (CBS). In this work we compared the performance of the most updated HMM-based and CBS-based algorithms, combined with two normalization procedures. As a second aim we performed two genome-wide association studies in order to identify CNV associated to malignant pleural mesothelioma (MPM) and myocardial infarction (MI). In order to compare the performance of different normalization/CNV call combination we used the NSR index. It is based on the idea that SNPs that fall in a CNV region in only one of the sample analyzed is more likely to be a false positive than others; so, the lower is the index, the better is the method. Once, identified the most reliable set of CNV we tested the association between CNVs and disease status, by logistic regression adjusting for possible confounders. In conclusion, we provided a useful pipeline in order to compare different CNV call algorithms and demonstrated that, on Illumina arrays, HMM based methods well perform. Moreover, several CNVs associated to MPM were found. Whether validated and replicated, our results could be useful in understanding the etiology of MPM and support the complementary role of genetic background in asbestos-related carcinogenesis of the pleura, indicating that genetic risk factors should be taken into account to better define the MPM risk profile of people with a high exposure to asbestos.File | Dimensione | Formato | |
---|---|---|---|
771490_tesimagistrale.pdf
non disponibili
Tipologia:
Altro materiale allegato
Dimensione
1.95 MB
Formato
Adobe PDF
|
1.95 MB | Adobe PDF |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14240/157884