ORCID

https://orcid.org/0000-0001-6393-2276

Date of Award

12-9-2024

Author's School

Graduate School of Arts and Sciences

Author's Department

Biology & Biomedical Sciences (Molecular Genetics & Genomics)

Degree Name

Doctor of Philosophy (PhD)

Degree Type

Dissertation

Abstract

Despite the monumental success of genome-wide association studies (GWASs), many important disease risk loci are yet to be discovered. A key methodological challenge is how to leverage haplotype diversity and allelic heterogeneity to improve trait association power, including in noncoding regions where it is difficult to define functional units for variant aggregation. Genealogy-based association methods have the potential to bridge this gap by testing combinations of common and rare haplotypes based purely on their ancestral relationships, without the need to define functional elements or predict variant impacts. In Chapter 2 of this thesis, I will describe the development of LOCATER, a genealogy-based association pipeline. Together with Dr. Ryan Christ, I improved the LOCATER pipeline from its initial form – a purely quadratic form test that has limited statistical power – to the final form with boosted power when allelic heterogeneity is present. To this date, LOCATER is a whole genome screening pipeline that implements local genealogies inferred with an optimized implementation of the Li-Stephens model. Tested with both simulated sequencing data and array data, LOCATER was proved to have greater statistical power compared to SMT in multiple genetic architectures when allelic heterogeneity is present. In Chapter 3, I will present a full genome-wide analysis pipeline centered on LOCATER and the first application of LOCATER to a real-world dataset, a genome sequencing-based study of 6,795 Finnish individuals with 101 cardiometabolic traits and 18.9 million autosomal variants. We identified a total of 351 significant trait associations at 47 genomic loci, and found that LOCATER boosted single marker test association power at 5 of these loci (30 trait associations) by combining independent association signals from distinct alleles. LOCATER successfully recovered known quantitative trait loci that were not found by single marker test (SMT), including LIPG, and suggested a potentially novel association with 'triglycerides in medium VLDL'. LOCATER also recovered known allelic heterogeneity at the APOE/C1/C4/C2 gene cluster by incorporating local genetic relatedness matrices. Notably, we find that confounders have a more pronounced effect on genealogy-based methods than SMT, and propose a new randomization approach and a general method for genomic control to successfully eliminate their effects. This study demonstrates that genealogy-based association methods such as LOCATER are highly effective when multiple causal variants are present. This suggests that applications of these methods to larger and more diverse cohorts will be productive for defining novel genes and risk alleles, especially for traits under negative selection whose genetic architecture is dominated by rare causal variants.

Language

English (en)

Chair and Committee

Ira Hall

Committee Members

Carlos Cruchaga; John Rice; Nancy Saccone; Nathan Stitziel

Available for download on Tuesday, December 18, 2029

Included in

Genetics Commons

Share

COinS