Date of Award
Doctor of Philosophy (PhD)
A central goal in systems biology is to accurately map the transcription factor (TF) network of a cell. Such a network map is a key component for many downstream applications, from developmental biology to transcriptome engineering, and from disease modeling to drug discovery. Building a reliable network map requires a wide range of data sources including TF binding locations and gene expression data after direct TF perturbations. However, we are facing two roadblocks. First, rich resources are available only for a few well-studied systems and cannot be easily replicated for new organisms or cell types. Second, when TF binding and TF- perturbation response data are available, they rarely converge on a common set of direct and functional targets for a TF. This dissertation explores and validates the best combination of experimental and analytic techniques to map TF networks. First, we introduce an unsupervised inference algorithm that maps TF networks by exploiting only gene expression and genome sequence data. We show that our “data light” method is more accurate at identifying direct targets of TFs than other similar methods. Second, we develop an optimization method to search for a convergent set of target genes that are independently identified by binding locations and perturbation responses of each TF. Combining this method with network inference greatly expanded the high-confidence network maps, especially when applied on datasets obtained by using recently developed experimental methods. Third, we describe a framework for predicting each gene’s responsiveness to a TF perturbation from genomic features. Using this framework, we identified properties of each gene that are independent of the perturbed TF as the major determinants of TF-perturbation responsiveness. This may lead to improvements in network mapping algorithms that exploit TF perturbation responses. Overall, this dissertation provides a scalable framework for mapping high-quality TF networks for a variety of organisms and cell types.
Jeremy Buhler, Roman Garnett, Scott McIsaac, Robi Mitra,