ORCID
https://orcid.org/0000-0001-7286-0024
Date of Award
8-16-2023
Degree Name
Doctor of Philosophy (PhD)
Degree Type
Dissertation
Abstract
The growth of untargeted metabolomic profiling technology has prompted the upstart of many large cohort studies. Metabolomics serves as an informative complement to other more established ‘omics (e.g., genomics, transcriptomics, proteomics). As a younger discipline, however, experimental and computational workflows are non-standardized and at times inadequate for large-scale studies. Two primary limitations in analysis of untargeted metabolomics data are 1) the poor performance and scalability of the algorithms applied to detect metabolite signals from the raw LC/MS data and 2) inefficient and arduous metabolite identification, which is the process of determining the biochemical structures that correspond to the detected LC/MS peaks. Identification of all metabolites is critical for enabling systems-level analyses and inferences into metabolic mechanisms. These challenges have limited large studies to targeted analyses that have considerably lower computational overhead. However, targeted analyses restrict the biological insights that can be gleaned. Further, due to the difficulty of performing metabolite identification, the traditional bioinformatic workflow of untargeted metabolomics only attempts to identify signals that show statistical significance above a certain cutoff, thereby reducing the number of metabolites that need to be identified. However, systems-biology approaches like pathway mapping and multi-omics integration require structural identities of both statistically significant and non-significant metabolites for meaningful results. Here I describe an alternative workflow that leverages pooled-reference samples and computational tools to automate the processing of large-scale metabolomics data, including the steps of metabolite detection, curation, and identification. By performing thorough analysis of a pooled-reference sample, the metabolite signals relevant to a particular study can be determined, curated, and identified. These metabolite signals can then be extracted from the raw data with a much lower computational cost, and the metabolite abundances can be normalized to remove batch effects and other sources of technical variability. In total, these advances enable an improved metabolomics workflow that scales to arbitrary sample numbers. We apply such a workflow to a study of COVID-19 severity and discovered lipid metabolites that provide prognostic value for predicting disease course. Beyond this application, this dissertation details a computational approach to another advanced metabolomics workflow that involves stable isotopes and spatially-resolved metabolite measurements. These examples demonstrate the importance of computation to the processing and interpretation of metabolomics data and highlights the biological insights into human health and disease that metabolomics offers.
Language
English (en)
Chair and Committee
Gary Patti
Recommended Citation
Stancliffe, Ethan, "Automated Processing of Metabolomics Data for Population-scale Studies" (2023). Arts & Sciences Electronic Theses and Dissertations. 3128.
https://openscholarship.wustl.edu/art_sci_etds/3128