Date of Award

Summer 8-15-2021

Author's School

Graduate School of Arts and Sciences

Author's Department

Biology & Biomedical Sciences (Human & Statistical Genetics)

Degree Name

Doctor of Philosophy (PhD)

Degree Type



Cardiovascular diseases (CVDs) are known to be associated with a variety of quantitative risk factors such as cholesterol, metabolites, and insulin. Understanding the genetic basis of these quantitative traits can shed light on the etiology, prevention, diagnosis, and treatment of disease. However most prior trait-mapping studies have focused on single nucleotide variants (SNVs) and Indels, with the contribution of structural variation (SV) remaining unknown. In this thesis, we present the results of a study examining genetic association between SVs and cardiometabolic traits in the Finnish population. In the first chapter, we used sensitive methods to identify and genotype 129,166 high-confidence SVs from deep whole genome sequencing (WGS) data of 4,848 individuals. We tested the 64,572 common and low frequency SVs for association with 116 quantitative traits, and tested candidate associations using exome sequencing and array genotype data from an additional 15,205 individuals. We discovered 31 genome-wide significant associations at 15 loci, including two novel loci at which SVs have strong phenotypic effects: (1) a deletion of the ALB gene promoter that is greatly enriched in the Finnish population and causes decreased serum albumin level in carriers (p=1.47x10-54), and is also associated with increased levels of total cholesterol (p=1.22x10-28) and 14 additional cholesterol-related traits, and (2) a multiallelic copy number variant (CNV) at PDPR that is strongly associated with pyruvate (p=4.81x10-21) and alanine (p=6.14x10-12) levels and resides within a structurally complex genomic region that has accumulated many rearrangements over evolutionary time. We also confirmed six previously reported associations, including five led by stronger signals in single nucleotide variants (SNVs), and one linking recurrent HP gene deletion and cholesterol levels (p=6.24x10-10), which was also found to be strongly associated with increased glycoprotein level (p=3.53x10-35). The result of this chapter confirms that integrating SVs in trait-mapping studies will expand our knowledge of genetic factors underlying disease risk. Chapter 2 and chapter 3 present two side projects derived from chapter 1: chapter 2 focused on an insulin associated chromosome 1 CNV which turned out to have indirectly measured the mitochondrial DNA copy number, of which the direct measurement showed stronger association with multiple metabolic traits. In chapter 3 we presented a pilot study of applying machine learning to genetics problems unsolvable by traditional methods. We built multi-layer neural network models to impute the highly polymorphic AMY1 CNVs, and showed the boosted performance compared to baseline regression models as well as the best practice employed in previous publication. Both chapters proposed solutions to new questions rising from the main SV project and provided the preliminary data for other ongoing or upcoming projects in our group.


English (en)

Chair and Committee

Ira M. Hall Nathan Stitziel

Committee Members

Timothy Peterson, Nancy Saccone, Adam Locke, John Rice,