Abstract
Bioproduction in microbial hosts faces two main challenges: (1) unique metabolism of each species is difficult to generalize, and it requires precise quantification before metabolic engineering. (2) Past experiment data are fragmented in different studies, making it difficult to combine these non-standardized data for future experiment design. These challenges lead to high costs and risks in scaling up biomanufacturing processes, where R&D requires substantial budgets and iterative experiments are expensive. This thesis addresses both challenges through two research pillars. Part I uses 13C metabolic flux analysis (MFA) and genome-scale model (GSM) to investigate metabolic limitations in bioproduction hosts. In Escherichia coli producing silk fibroin, a toxic positive feedback loop was identified, in which acetate overflow inhibits protein synthesis and reduces TCA cycle flux. Supplementing key amino acids can help meet precursor demand and alleviate thermodynamic constraints. In oleaginous yeast Yarrowia lipolytica producing polyhydroxybutyrate (PHB) from volatile fatty acids (VFAs), 13C MFA and GSM identified disadvantages of acetate metabolism, including high carbon loss (>50% as CO₂), high enzyme usage, and NADPH limitation. Co-utilization with glucose can reduce these problems by providing reducing power and alleviating thermodynamic constraints. In Lipomyces tetrasporous, 13C MFA and dynamic labeling showed robust TCA cycle activity and NADH production during acetate metabolism. This strain was engineered to produce malate using sustainable alternative feedstocks such as microbe-friendly CO2 fixation electrolyte VFA and corn stover hydrolysate. These studies provide a good mechanistic foundation for rationally engineering microbial hosts. Part II introduces artificial intelligence (AI) and large language model (LLM) tools for knowledge mining and bioprocess optimization. Generative AI GPT-4 was used to extract structured datasets from 176 synthetic biology publications, enabling machine learning (ML) models to predict fermentation titers in Y. lipolytica with high accuracy (R² = 0.86). Transfer learning extended ML to nonconventional yeasts, such as Rhodosporidium toruloides. To expand AI/LLM tools to general bioproduction topics, a NEKO (Network for Knowledge Organization) workflow was developed for knowledge mining by generating knowledge graphs and actionable summaries. NEKO streamlines tasks like literature review, hypothesis generation, and experimental design. Using open-source LLM Qwen augmented with PubMed search, NEKO outperforms proprietary LLMs such as GPT-4 in zero-shot Q and A. By automating data standardization and hypothesis generation, AI/LLM tools reduce the risks associated with fragmented datasets and accelerate R&D cycles for biomanufacturing. Together, this work establishes a framework for researching microbial metabolism. By combining mechanistic and data-driven approaches, this thesis advances next-generation bioproduction.
Committee Chair
Yinjie Tang
Committee Members
Fuzhong Zhang; Joshua Yuan; Shulin Chen; Yixin Chen
Degree
Doctor of Philosophy (PhD)
Author's Department
Energy, Environmental & Chemical Engineering
Document Type
Dissertation
Date of Award
8-18-2025
Language
English (en)
DOI
https://doi.org/10.7936/we86-9r90
Recommended Citation
Xiao, Zhengyang, "Harnessing Metabolic Modeling and Artificial Intelligence for Next-Generation Bioproduction" (2025). McKelvey School of Engineering Theses & Dissertations. 1285.
The definitive version is available at https://doi.org/10.7936/we86-9r90