Date of Award

Spring 5-15-2018

Author's School

Graduate School of Arts and Sciences

Author's Department

Biology & Biomedical Sciences (Computational & Systems Biology)

Degree Name

Doctor of Philosophy (PhD)

Degree Type



Gene regulation involves the integration of different sources of information at multiple levels. The action of transcription factors is integrated at cis-regulatory sequences (CRSs). Information from many CRSs is combined to drive spatiotemporally regulated gene expression. Prediction of CRS activity from DNA sequence is challenging because most occurrences of transcription factor binding sites (TFBS) are not functional. I assayed the activity of thousands of genomic sequences with Activator Protein 1 (AP-1) binding sites in K562 cells to identify features in flanking sequences that distinguish functional from non-functional TFBS. I find that sequence features directly adjacent to the AP-1 core motif, within 10 bp, distinguish high from low activity AP-1 sites. Some nearby features are motifs for other TFs that genetically interact with the AP-1 site. Features with most predictive power are extensions of the AP-1 core motif, which likely represent matches to multiple AP-1 family members. Computational models trained on these data with DNA sequence features can distinguish between sequences with high and low activity AP-1 sites, and also predict the impact of mutations in AP-1 core sites and their flanks. I also developed a new method, patchMPRA (parallel targeting of chromosome positions by MPRA) to study how CRS activity is integrated with regional effects at different genomic locations. PatchMPRA measures the activity of hundreds of different elements, all integrated at many specific genomic locations one at a time. We find that while the activity of an integrated reporter is largely determined by genomic position; the same sequences drive very high or very low activity at all chromosomal locations tested. A model of regional effect and intrinsic activity of cis-regulatory elements can explain most of our data, without any interaction terms. Our results suggest a modular organization of the genome, where local cis-regulatory elements and surrounding chromatin interact non-specifically. This work demonstrates the utility of high-throughput reporter assays and computational modeling in decoding gene regulation at multiple levels.


English (en)

Chair and Committee

Barak A. Cohen

Committee Members

James J. Havranek, Robi D. Mitra, Zachary Pincus, Gary D. Stormo,


Permanent URL: