Introduction: Identifying the relationship between trait-associated single nucleotide polymorphisms (SNPs) and regulatory genomic regions is crucial for understanding genetic influences on brain function. This projects is based on an integrative approach combining genomic overlap analysis and gapped k-mer support vector machine (gkm-SVM- https://www.beerlab.org/gkmsvm/ ) models to assess whether these SNPs significantly affect chromatin accessibility in various brain cell lines (https://doi.org/10.1126/science.adf7044 - comparative atlas of single-cell chromatin accessibility in human brain). Methods: The first analysis involves categorizing SNPs by chromosome, and creating genomic ranges (GRanges) for each chromosome and cell type. We identified the overlaps between SNPs and open chromatin regions. To evaluate statistical significance, 100 sets of randomized genomic regions are generated, maintaining chromosomal structure, and their overlaps with chromatin regions are analyzed. Finally, histograms are plotted for each cell type showing the frequency distribution of overlaps from randomized regions, with the actual overlap count highlighted. Additionally, P-values are computed by comparing the actual overlap counts with the randomized distributions to determine the significance of observed overlaps. Results, including overlap counts, frequency distributions, and p-values, are saved for further analysis and interpretation. Subsequently, a set of credible SNPs likely to be causal for both macroscopic phenotypes (BMI, MRI) and gene expression, obtained through a colocalization analysis, were used to evaluate the impact of each of them on regulatory regions through the gkm-SVM model. In particular, reference and alternative sequences for the 19-bp region surrounding each SNP are extracted using the reference genomic sequence, and provided as input to the gkm-SVM model, which involves generating kernel matrices, training SVM models with cross-validation, and computing delta scores. Results: In the first analysis, the overlaps were computed considering only the suggestive SNPs associated by GWAS to BMI (p-value <= 1e-5). The cell types whose regulatory regions showed statistically significant (empirical p-value <= 0.01) enrichment are several subtypes of GABA-ergic neurons (cell types FOXP2_2, FOXP2_4, ICGA_2 , PKJ_1 , and SNCG_1) and vascular smooth muscle cells (SMC). The second analysis highlighted in particular SNP rs7187776 as a candidate pleiotropic causal variant for BMI and caudate volume, acting by regulating the expression of TUFM in several brain tissues. This SNP is located within the 5’ UTR of TUFM in a region classified as promoter by ENCODE, and is predicted to affect chromatin accessibility specifically in glial cells (MGC_1). Conclusions: This comprehensive approach provides insights into the regulatory roles of trait-associated SNPs in brain cell lines, highlighting potential mechanisms by which genetic variation influences brain function. Therefore, the methodological framework presented here can be applied to other cell types and traits, facilitating broader understanding of genotype-phenotype relationships.
Introduction: Identifying the relationship between trait-associated single nucleotide polymorphisms (SNPs) and regulatory genomic regions is crucial for understanding genetic influences on brain function. This projects is based on an integrative approach combining genomic overlap analysis and gapped k-mer support vector machine (gkm-SVM- https://www.beerlab.org/gkmsvm/ ) models to assess whether these SNPs significantly affect chromatin accessibility in various brain cell lines (https://doi.org/10.1126/science.adf7044 - comparative atlas of single-cell chromatin accessibility in human brain). Methods: The first analysis involves categorizing SNPs by chromosome, and creating genomic ranges (GRanges) for each chromosome and cell type. We identified the overlaps between SNPs and open chromatin regions. To evaluate statistical significance, 100 sets of randomized genomic regions are generated, maintaining chromosomal structure, and their overlaps with chromatin regions are analyzed. Finally, histograms are plotted for each cell type showing the frequency distribution of overlaps from randomized regions, with the actual overlap count highlighted. Additionally, P-values are computed by comparing the actual overlap counts with the randomized distributions to determine the significance of observed overlaps. Results, including overlap counts, frequency distributions, and p-values, are saved for further analysis and interpretation. Subsequently, a set of credible SNPs likely to be causal for both macroscopic phenotypes (BMI, MRI) and gene expression, obtained through a colocalization analysis, were used to evaluate the impact of each of them on regulatory regions through the gkm-SVM model. In particular, reference and alternative sequences for the 19-bp region surrounding each SNP are extracted using the reference genomic sequence, and provided as input to the gkm-SVM model, which involves generating kernel matrices, training SVM models with cross-validation, and computing delta scores. Results: In the first analysis, the overlaps were computed considering only the suggestive SNPs associated by GWAS to BMI (p-value <= 1e-5). The cell types whose regulatory regions showed statistically significant (empirical p-value <= 0.01) enrichment are several subtypes of GABA-ergic neurons (cell types FOXP2_2, FOXP2_4, ICGA_2 , PKJ_1 , and SNCG_1) and vascular smooth muscle cells (SMC). The second analysis highlighted in particular SNP rs7187776 as a candidate pleiotropic causal variant for BMI and caudate volume, acting by regulating the expression of TUFM in several brain tissues. This SNP is located within the 5’ UTR of TUFM in a region classified as promoter by ENCODE, and is predicted to affect chromatin accessibility specifically in glial cells (MGC_1). Conclusions: This comprehensive approach provides insights into the regulatory roles of trait-associated SNPs in brain cell lines, highlighting potential mechanisms by which genetic variation influences brain function. Therefore, the methodological framework presented here can be applied to other cell types and traits, facilitating broader understanding of genotype-phenotype relationships.
Mechanistic insights on the genetic determinants of complex traits from single-cell chromatin accessibility assays
MARINELLI, CAMILLA
2023/2024
Abstract
Introduction: Identifying the relationship between trait-associated single nucleotide polymorphisms (SNPs) and regulatory genomic regions is crucial for understanding genetic influences on brain function. This projects is based on an integrative approach combining genomic overlap analysis and gapped k-mer support vector machine (gkm-SVM- https://www.beerlab.org/gkmsvm/ ) models to assess whether these SNPs significantly affect chromatin accessibility in various brain cell lines (https://doi.org/10.1126/science.adf7044 - comparative atlas of single-cell chromatin accessibility in human brain). Methods: The first analysis involves categorizing SNPs by chromosome, and creating genomic ranges (GRanges) for each chromosome and cell type. We identified the overlaps between SNPs and open chromatin regions. To evaluate statistical significance, 100 sets of randomized genomic regions are generated, maintaining chromosomal structure, and their overlaps with chromatin regions are analyzed. Finally, histograms are plotted for each cell type showing the frequency distribution of overlaps from randomized regions, with the actual overlap count highlighted. Additionally, P-values are computed by comparing the actual overlap counts with the randomized distributions to determine the significance of observed overlaps. Results, including overlap counts, frequency distributions, and p-values, are saved for further analysis and interpretation. Subsequently, a set of credible SNPs likely to be causal for both macroscopic phenotypes (BMI, MRI) and gene expression, obtained through a colocalization analysis, were used to evaluate the impact of each of them on regulatory regions through the gkm-SVM model. In particular, reference and alternative sequences for the 19-bp region surrounding each SNP are extracted using the reference genomic sequence, and provided as input to the gkm-SVM model, which involves generating kernel matrices, training SVM models with cross-validation, and computing delta scores. Results: In the first analysis, the overlaps were computed considering only the suggestive SNPs associated by GWAS to BMI (p-value <= 1e-5). The cell types whose regulatory regions showed statistically significant (empirical p-value <= 0.01) enrichment are several subtypes of GABA-ergic neurons (cell types FOXP2_2, FOXP2_4, ICGA_2 , PKJ_1 , and SNCG_1) and vascular smooth muscle cells (SMC). The second analysis highlighted in particular SNP rs7187776 as a candidate pleiotropic causal variant for BMI and caudate volume, acting by regulating the expression of TUFM in several brain tissues. This SNP is located within the 5’ UTR of TUFM in a region classified as promoter by ENCODE, and is predicted to affect chromatin accessibility specifically in glial cells (MGC_1). Conclusions: This comprehensive approach provides insights into the regulatory roles of trait-associated SNPs in brain cell lines, highlighting potential mechanisms by which genetic variation influences brain function. Therefore, the methodological framework presented here can be applied to other cell types and traits, facilitating broader understanding of genotype-phenotype relationships.File | Dimensione | Formato | |
---|---|---|---|
thesis Camilla.pdf
non disponibili
Dimensione
936.31 kB
Formato
Adobe PDF
|
936.31 kB | Adobe PDF |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14240/8981