ORCID
https://orcid.org/0000-0001-6393-2276
Date of Award
12-9-2024
Degree Name
Doctor of Philosophy (PhD)
Degree Type
Dissertation
Abstract
Despite the monumental success of genome-wide association studies (GWASs), many important disease risk loci are yet to be discovered. A key methodological challenge is how to leverage haplotype diversity and allelic heterogeneity to improve trait association power, including in noncoding regions where it is difficult to define functional units for variant aggregation. Genealogy-based association methods have the potential to bridge this gap by testing combinations of common and rare haplotypes based purely on their ancestral relationships, without the need to define functional elements or predict variant impacts. In Chapter 2 of this thesis, I will describe the development of LOCATER, a genealogy-based association pipeline. Together with Dr. Ryan Christ, I improved the LOCATER pipeline from its initial form – a purely quadratic form test that has limited statistical power – to the final form with boosted power when allelic heterogeneity is present. To this date, LOCATER is a whole genome screening pipeline that implements local genealogies inferred with an optimized implementation of the Li-Stephens model. Tested with both simulated sequencing data and array data, LOCATER was proved to have greater statistical power compared to SMT in multiple genetic architectures when allelic heterogeneity is present. In Chapter 3, I will present a full genome-wide analysis pipeline centered on LOCATER and the first application of LOCATER to a real-world dataset, a genome sequencing-based study of 6,795 Finnish individuals with 101 cardiometabolic traits and 18.9 million autosomal variants. We identified a total of 351 significant trait associations at 47 genomic loci, and found that LOCATER boosted single marker test association power at 5 of these loci (30 trait associations) by combining independent association signals from distinct alleles. LOCATER successfully recovered known quantitative trait loci that were not found by single marker test (SMT), including LIPG, and suggested a potentially novel association with 'triglycerides in medium VLDL'. LOCATER also recovered known allelic heterogeneity at the APOE/C1/C4/C2 gene cluster by incorporating local genetic relatedness matrices. Notably, we find that confounders have a more pronounced effect on genealogy-based methods than SMT, and propose a new randomization approach and a general method for genomic control to successfully eliminate their effects. This study demonstrates that genealogy-based association methods such as LOCATER are highly effective when multiple causal variants are present. This suggests that applications of these methods to larger and more diverse cohorts will be productive for defining novel genes and risk alleles, especially for traits under negative selection whose genetic architecture is dominated by rare causal variants.
Language
English (en)
Chair and Committee
Ira Hall
Committee Members
Carlos Cruchaga; John Rice; Nancy Saccone; Nathan Stitziel
Recommended Citation
Wang, Xinxin, "Development and Application of a Computational Pipeline for Genealogy-Based Trait Association" (2024). Arts & Sciences Electronic Theses and Dissertations. 3354.
https://openscholarship.wustl.edu/art_sci_etds/3354