Date of Award

Summer 8-15-2015

Author's School

Graduate School of Arts and Sciences

Author's Department

Biology & Biomedical Sciences (Computational & Systems Biology)

Degree Name

Doctor of Philosophy (PhD)

Degree Type



Transcription is regulated through interactions between regulatory proteins, such as transcription factors (TFs), and DNA sequence. It is known that TFs act combinatorially in some cases to regulate transcription, but in which situations and to what degree is unclear.

I first studied the contribution of TF binding sites to expression in mouse embryonic stem (ES) cells by using synthetic cis-regulatory elements (CREs). The synthetic CREs were comprised of combinations of binding sites for the pluripotency TFs Oct4, Sox2, Klf4, and Esrrb. A statistical thermodynamic model explained 72% of the variation in expression driven by these CREs. The high predictive power of this model depended on five TF interaction parameters, including favorable heterotypic interactions between Oct4 and Sox2, Klf4 and Sox2, and Klf4 and Esrrb. The model also included two unfavorable homotypic interaction parameters. These homotypic parameters help to explain the fact that synthetic CREs with mixtures of binding sites for various TFs drive much higher expression than multiple binding sites for the same TF. I then found that the expression of these synthetic CREs largely changes as ES cells differentiate down the neural lineage. However, CREs with no repeat binding sites drove similar levels of expression, suggesting that heterotypic interactions may be similar in the two conditions.

In a separate set of experiments I interrogated the determinants of expression driven by genomic sequences previously segmented into classes based on chromatin features. A set of these sequences was assayed in K562 cells. As expected, we found that Enhancers and Weak Enhancers drove expression over background, while Repressed elements and Enhancers from another cell type did not. Unexpectedly, we found that Weak Enhancers drove higher expression than Enhancers, possibly based on their lower H3K36me3 and H3K27ac, which we found to be weakly associated with lower expression. Using a logistic regression model, we showed that matches to TF binding motifs were best able to predict active sequences, but chromatin features contributed significantly as well.

These results demonstrate that interactions between certain combinations of pluripotency TFs, but not all combinations, are important to transcriptional regulation. Furthermore, chromatin modifications can still contribute to predictions of expression even after accounting for binding site motifs. Better understanding of the process of cis-regulation will allow us to predict which sequences can drive expression and how perturbations affect this expression.


English (en)

Chair and Committee

Barak A Cohen

Committee Members

Gary Stormo, Justin Fay, Joe Dougherty, Andrew Yoo,


Permanent URL:

Included in

Biology Commons