Structural Variants Are a Major Source of Gene Expression Differences in Humans and Often Affect Multiple Nearby Genes
Date of Award
Doctor of Philosophy (PhD)
Structural variants (SVs), including copy number variants (CNVs), balanced rearrangements, and mobile element insertions (MEIs), are an important source of diversity in the human genome but their functional effects are not well understood. SVs are technically difficult to detect and genotype1, and mapping is dependent on deep whole genome sequencing (WGS) which was, until recently, unaffordable for large cohorts. For these reasons SVs are not included in most genome-wide studies of functional variants, despite the fact that SVs are known causal agents in multiple clinical disorders2-16. However, recent advancements in high-throughput sequencing technologies that allow for widespread use of WGS, combined with improvements in scaling SV detection algorithms, mean that comprehensive studies of all forms of genetic variation are now possible for large human cohorts. The Genotype-Tissue Expression (GTEx) project has collected gene expression data across multiple tissue types along with deep WGS data from hundreds of human donors, providing a rich resource for evaluating relationships between genotype and gene expression17. Here, we performed a preliminary analysis of the functional effects of SV in 147 individuals from the GTEx cohort. We comprehensively mapped 23,602 SVs in these individuals and performed expression quantitative trait loci (eQTL) analysis in 13 available tissue types. We found that an SV is a lead marker at 3.5% of eQTLs and observed a notable abundance of rare SVs associated with aberrant expression of nearby genes. These findings emphasize the important role SV plays in human gene expression differences and prompted us to perform a more robust analysis using GTEx samples. Hoping that a larger sample size would improve our power to detect expression-altering SVs, we comprehensively mapped 61,668 SVs in 613 individuals from the GTEx project and measured their effects on gene expression. We estimate that common SVs are causal at 2.66% of eQTLs, which is a 10.5-fold enrichment relative to their abundance in the genome and consistent with our preliminary study. Duplications and deletions were the most impactful variant types, whereas the contribution of mobile element insertions was surprisingly small (0.12% of eQTLs, 1.9-fold enriched relative to their abundance in the genome). Multi-tissue analysis of expression effects revealed that gene-altering SVs show significantly more constitutive effects than other variant types, with 62.09% of coding SV-eQTLs active in all tissues with known eQTL activity compared to 23.08% of coding eQTLs caused by single nucleotide variants (SNVs) and short insertion/deletion variants (indels). Noncoding SVs, SNVs and indels show broadly similar patterns. We also identified 539 rare SVs associated with nearby gene expression outliers. Of these, 62.34% are noncoding SVs that show strong effects on gene expression yet modest enrichment at known regulatory elements, demonstrating that rare noncoding SVs are a major source of gene expression differences but remain difficult to predict from current genomic annotations. Remarkably, both common and rare noncoding SVs often show strong regional effects on the expression of multiple genes: SV-eQTLs affect an average of 1.82 nearby genes compared to 1.09 genes affected by SNV- and indel-eQTLs, and 21.34% of rare expression-altering SVs show strong effects on 2-9 different genes. We also observed significant effects on rare gene expression changes extending 1 Mb from the SV. This provides a mechanism by which individual noncoding SVs may have strong and/or pleiotropic effects on phenotypic variation and disease, and emphasizes the importance of including SV detection in future genomic and disease studies.
Chair and Committee
Ira M. Hall
Nancy L. Saccone
Scott, Alexandra Jane, "Structural Variants Are a Major Source of Gene Expression Differences in Humans and Often Affect Multiple Nearby Genes" (2021). Arts & Sciences Electronic Theses and Dissertations. 2577.