ORCID
https://orcid.org/0009-0007-1288-3700
Date of Award
Spring 5-2025
Degree Name
Master of Science (MS)
Degree Type
Thesis
Abstract
A fundamental issue in mapping regulatory networks between transcription factors and their target genes is the poor overlap between the set of genes bound by a given transcription factor (TF), and the set of genes that are differentially expressed after knocking out or overexpressing the same TF. We began with the hypothesis that to predict whether a gene will respond to perturbation of a TF, it is important to not only consider whether that TF is bound at the gene’s promoter, but also whether other TFs bind at the same promoter.
In this work, we propose a novel modeling procedure to better predict gene expression changes in Saccharomyces cerevisiae following TF overexpression by considering the binding data of the perturbed TF and additional pairwise TF-TF interactions.
Using binding data from Calling Cards experiments and perturbation response data from the McIsaac ZEV overexpression dataset, we created 101 models for predicting which genes will respond to perturbations of 101 different TFs. The input features for these models, which are linear in their parameters, include the binding profile of the perturbed TF and 119 interaction terms between the perturbed TF and each other TF. A three-step pipeline employing bootstrapping and nested cross-validated LASSO modeling was used to identify high-confidence predictors that affect the perturbation response in a consistent direction.
Our findings suggest that these TF interactions often contribute better explanatory power than individual binding signals alone. Additionally, several recovered interaction terms align with known biological interactions such as GCR2:TYE7 and FKH1:FKH2, supporting the validity of our approach in identifying both known and novel regulatory relationships. These results provide additional support for proposed regulatory mechanisms and offer directions for future exploration. Thus, our work introduces a robust procedure for identifying biologically meaningful TF–TF interactions and improving the predictability of gene expression from TF binding data.
Language
English (en)
Chair
Michael Brent
Committee Members
Tao Ju, Roman Garnett