Abstract

Genome-wide association studies (GWAS) have been key in expanding our understanding of the genetic contributions to common diseases. However, these genetic associations frequently fail to clarify causal disease mechanisms, as they often fall in non-coding regions and are difficult to interpret. One solution is to perform a GWAS for the levels of a cellular trait, known as quantitative trait locus (QTL) mapping. Through methods such as colocalization, Mendelian Randomization, and transcriptome/proteome-wide association studies, the QTL variants can be compared to disease GWAS, identifying shared variation between cellular and disease traits. We can then prioritize cellular traits as potential causative, targetable factors in disease risk. As proteins actively contribute to cellular processes, understanding their regulation offers an important path toward disease target identification. Large-scale protein QTL (pQTL) analyses have been limited to plasma, which may not accurately capture certain tissue contexts. Cerebrospinal fluid (CSF) interacts with the central nervous system, potentially making it a better proxy for neurological traits. However, no well-powered pQTL analyses of CSF have been performed to date. Here, I present the largest-to-date proteogenomic analysis of CSF. We performed pQTL mapping of 3,506 individuals using the aptamer-based SOMAscan 7k platform, identifying 2,477 pQTLs that were split evenly between gene-proximal pQTLs (cis-pQTLs) and those located distally (trans-pQTLs). We prioritized three highly pleiotropic pQTL hotspots near OSTN, HLA, and APOE that reveal novel disease mechanisms. We also generated a pQTL atlas using an orthogonal antibody-based protein measurement approach and identified platform differences in pQTL detection that necessitate careful interpretation of pQTL associations. Next, we determined the novelty of our CSF pQTL atlas compared to other biological contexts. Over 70% of our cis-pQTLs were not identified at the RNA level, demonstrating unique regulation underlying protein levels. A comparison to a plasma pQTL atlas confirmed robust CSF-specific regulation. Using in-house plasma pQTLs, we found extensive fluid-stratified pQTLs highlighting the necessity of multi-context analyses. We identified fluid-specific pQTL hotspots that reflect biological regulation, including a CSF-specific region linked to lysosomal protein trafficking. Lastly, we integrated our CSF and plasma pQTL resources with disease GWAS to identify potential protein drug targets. We connected almost 200 proteins in each tissue to neurological traits, only 57 of which were consistently associated in both fluids. Disease overlap was observed between AD and dementia with Lewy Bodies (DLB) only in CSF. We identified fluid-specific dysregulated pathways for three traits that highlight the importance of analyzing multiple contexts for drug target identification. Focusing on AD, we uncovered 38 CSF proteins that were enriched in immune and lysosomal processes, suggesting potential drug-targetable mechanisms. We also demonstrated a robust ability for our proteins to predict disease status. Finally, we assessed the proteomic profile of an AD-causing mutation carrier with resilience to symptom development, finding a signature consistent with heat exposure that may have contributed to their delayed onset. This work represents a substantial expansion of our understanding of proteogenomic regulation. We identified thousands of CSF pQTLs that were mainly specific to CSF and to proteins. We further emphasized the importance of studying non-plasma tissues to discover pQTL regulation by connecting our associations with a range of diseases and traits to identify trait-relevant biology. This resource will be useful for future studies that investigate more diseases, as substantial work is still needed to completely understand the genetic contributions to common disease.

Committee Chair

Carlos Cruchaga

Committee Members

Celeste Karch; Gabriel Haller; Nancy Saccone; Young Ah Goo

Degree

Doctor of Philosophy (PhD)

Author's Department

Biology & Biomedical Sciences (Human & Statistical Genetics)

Author's School

Graduate School of Arts and Sciences

Document Type

Dissertation

Date of Award

5-5-2025

Language

English (en)

Author's ORCID

https://orcid.org/0000-0003-4701-7355

Available for download on Saturday, May 02, 2026

Included in

Biology Commons

Share

COinS