Abstract

Bioproduction in microbial hosts faces two main challenges: (1) unique metabolism of each species is difficult to generalize, and it requires precise quantification before metabolic engineering. (2) Past experiment data are fragmented in different studies, making it difficult to combine these non-standardized data for future experiment design. These challenges lead to high costs and risks in scaling up biomanufacturing processes, where R&D requires substantial budgets and iterative experiments are expensive. This thesis addresses both challenges through two research pillars. Part I uses 13C metabolic flux analysis (MFA) and genome-scale model (GSM) to investigate metabolic limitations in bioproduction hosts. In Escherichia coli producing silk fibroin, a toxic positive feedback loop was identified, in which acetate overflow inhibits protein synthesis and reduces TCA cycle flux. Supplementing key amino acids can help meet precursor demand and alleviate thermodynamic constraints. In oleaginous yeast Yarrowia lipolytica producing polyhydroxybutyrate (PHB) from volatile fatty acids (VFAs), 13C MFA and GSM identified disadvantages of acetate metabolism, including high carbon loss (>50% as CO₂), high enzyme usage, and NADPH limitation. Co-utilization with glucose can reduce these problems by providing reducing power and alleviating thermodynamic constraints. In Lipomyces tetrasporous, 13C MFA and dynamic labeling showed robust TCA cycle activity and NADH production during acetate metabolism. This strain was engineered to produce malate using sustainable alternative feedstocks such as microbe-friendly CO2 fixation electrolyte VFA and corn stover hydrolysate. These studies provide a good mechanistic foundation for rationally engineering microbial hosts. Part II introduces artificial intelligence (AI) and large language model (LLM) tools for knowledge mining and bioprocess optimization. Generative AI GPT-4 was used to extract structured datasets from 176 synthetic biology publications, enabling machine learning (ML) models to predict fermentation titers in Y. lipolytica with high accuracy (R² = 0.86). Transfer learning extended ML to nonconventional yeasts, such as Rhodosporidium toruloides. To expand AI/LLM tools to general bioproduction topics, a NEKO (Network for Knowledge Organization) workflow was developed for knowledge mining by generating knowledge graphs and actionable summaries. NEKO streamlines tasks like literature review, hypothesis generation, and experimental design. Using open-source LLM Qwen augmented with PubMed search, NEKO outperforms proprietary LLMs such as GPT-4 in zero-shot Q and A. By automating data standardization and hypothesis generation, AI/LLM tools reduce the risks associated with fragmented datasets and accelerate R&D cycles for biomanufacturing. Together, this work establishes a framework for researching microbial metabolism. By combining mechanistic and data-driven approaches, this thesis advances next-generation bioproduction.

Committee Chair

Yinjie Tang

Committee Members

Fuzhong Zhang; Joshua Yuan; Shulin Chen; Yixin Chen

Degree

Doctor of Philosophy (PhD)

Author's Department

Energy, Environmental & Chemical Engineering

Author's School

McKelvey School of Engineering

Document Type

Dissertation

Date of Award

8-18-2025

Language

English (en)

Available for download on Saturday, August 15, 2026

Share

COinS