Abstract
The elucidation of cell-type–specific signaling networks is central to understanding pancreatic ductal adenocarcinoma (PDAC) and to nominating mechanistically grounded therapeutic targets. We present a Text-to-Target framework that integrates large language models (LLMs) with single-cell omics to couple literature-derived hypotheses to cell-type–resolved expression evidence. Using publicly available datasets, we construct malignant ductal epithelial and lineage-matched acinar meta-cell cohorts from PDAC and perform differential expression analysis to obtain a robust catalogue of disease-associated transcriptional changes. In parallel, an ensemble of LLMs is prompted in a schema-constrained manner to retrieve cell-type–specific targets, pathways, and mechanistic annotations from the biomedical literature. After normalization and quality control, LLM outputs are intersected with PDAC meta-cell DEGs to define an LLM-supported DEG set, which serves as the interface between text priors and omic evidence. We then perform pathway-level integration using over-representation analysis augmented by three LLM-aware scores that quantify pathway recall, expression-weighted activation, and directional concordance, yielding an overall ranking of signaling axes. This integrated analysis recapitulates canonical PDAC modules such as KRAS–MAPK and PI3K–AKT–mTOR, highlights angiogenic and immune checkpoint programs, and elevates replication stress and DNA damage response pathways, including ATR- and PARP-associated circuits, as high-confidence candidates. More broadly, the study demonstrates how this framework can standardize heterogeneous omics and textual knowledge into a unified computational pipeline, enabling reproducible, mechanism-oriented target and pathway discovery in PDAC and, in principle, other complex diseases.
Committee Chair
Fuhai Li
Committee Members
Zachary Abrams, Dan Moran
Degree
Master of Science (MS)
Author's Department
Biomedical Engineering
Document Type
Thesis
Date of Award
Winter 12-17-2025
Language
English (en)
Recommended Citation
Xu, Zixi, "Integrating Large Language Models and Single-Cell Omics Analysis for Target Discovery in Pancreatic Ductal Adenocarcinoma" (2025). McKelvey School of Engineering Theses & Dissertations. 1304.
https://openscholarship.wustl.edu/eng_etds/1304