Author's School

Graduate School of Arts & Sciences

Author's Department/Program

Biology and Biomedical Sciences: Computational and Systems Biology


English (en)

Date of Award

Winter 1-1-2012

Degree Type


Degree Name

Doctor of Philosophy (PhD)

Chair and Committee

Barak A Cohen


Regulation of gene expression is a fundamental process in biology. Accurate mathematical models of the relationship between regulatory sequence and observed expression would advance our understanding of biology.

I developed ReLoS, a regulatory logic simulator, to explore mathematical frameworks for describing the relationship between regulatory sequence and observed expression and to explore methods of learning combinatorial regulatory rules from expression data. ReLoS is a flexible simulator allowing a variety of formalisms to be applied. ReLoS was used to explore the question of how complex rules of combinatorial transcriptional regulation must be to explain the complexity of transcriptional regulation observed in biology. A previously published dataset was analyzed for regulatory elements that explained the behavior of regulatory modules for 254 genes in 255 conditions. I found that ReLoS was able to recapitulate a reasonable fraction of the variation: mean gene-wise correlation of 0.7) with only twelve combinatorial rules comprising 13 cis-regulatory elements. This result suggested that learning the combinatorial rules of transcriptional regulation should be possible.

State ensemble statistical thermodynamic models are a class of models used to describe combinatorial transcriptional regulation. One way to parameterize these models is measuring the expression of a reporter gene driven by many similar promoters . Models parameterized in this fashion do better at explaining the sequence to expression relationship, but fail to distinguish between multiple biological mechanisms that give rise to equivalent expression results in the synthetic promoters, thus limiting the generalizability of the models. I developed a ChIP-based strategy for quantitatively measuring the relative occupancy of transcription factors on synthetic promoters. This data complements existing methods for obtaining expression data from the same promoters. Comparison of models parameterized with only expression, only occupancy, or expression and occupancy reveals specific biological details that are missed when considering only expression data. In particular, the occupancy data suggests that differential regulatory effects of Cbf1 in glucose versus amino acid are a function of how it interacts with polymerase rather than changes in concentration or binding affinity. Additionally, the occupancy data suggests that Gcn4 binds in a cooperative manner and that Gcn4 occupancy is adversely affected by the presence of a nearby Nrg1 site. Finally, the occupancy data and expression data taken together suggest that Gcn4 binds in competition with another transcription factor.

Synthesizing disparate sources of information resulted in an improved understanding of the mechanics of transcriptional regulation of the synthetic promoters and was ultimately largely successful in decoupling the DNA binding energies from the TF interactions with polymerase. However, it suggests that more sophisticated models of the relationship between occupancy and expression may be required in at least some cases. Incorporating different sources of data into models of regulation will continue to be important for learning the biological specifics that drive expression changes.


This work is not available online per the author’s request. For access information, please contact or visit

Permanent URL: