Date of Award
Doctor of Philosophy (PhD)
Understanding how genotype leads to phenotype is key to understand both the development and dysfunction of complex organisms. In the context of regulating the gene expression patterns that contribute to cell identity and function, the goal of my thesis research is to how changes in genome sequence may impact impact gene expression by determining how sequence features contribute to regulatory potential. To accomplish this goal, I first leveraged the key regulatory role of pluripotency transcription factors (TFs) in mouse embryonic stem cells (mESCs) and tested synthetically generated and genomic identified combinations of binding site for four TFs, OCT4, SOX2, KLF4, and ESRRB. I found that although the position of binding sites explained 87% of the variation in expression observed for synthetic elements, the position of binding sites did not explain the expression of tested genomic sequences despite roughly similar binding site composure. Instead, for genomic sequences I found that the quality and spacing of the binding sites contribute more to distinguishing active sequences, suggesting that the arrangements of binding sites are less important for controlling expression in mESCs.
In a separate set of experiments, I tested regions of the human genome assigned a regulatory function based on chromatin features and predicted to have high to low probabilities of being under selection in a commonly used human immune progenitor cell culture model, GM12878. Although only a quarter of the library was assigned as ‘Repressive’ according to chromatin marks, 45% of tested sequences showed repressive activity. Sequences predicted to have high probabilities of being under selection have a small but significant higher average level of activation, but not a higher likelihood of either repression or activation. By making single substitutions found at those loci in human populations for a subset of sequences, I tested the predictive power of two independent programs that aim to integrate both functional annotations and evolutionary signals. I found that neither sets of predictions enriched for variants that impacted regulatory activity. This suggests that although we can survey human genotypes for impacts on regulation, it may be difficult to separate organismal level selection from other processes that contribute to the proper control of gene expression.
These results demonstrate that in mESC, the fixed affinity and fixed spacing found in synthetic combinations of binding sites are unlikely to predict the activity of genomic sequences. Furthermore, testing sequences from the human genome in GM12878 shows that repression may be more prevalent than estimated by chromatin features alone and that predictions of selection do not enrich for human variants that impact regulatory activity. Together, these experiments demonstrate that the relationship between genotype and proper regulatory function is complex and that understanding this relationship is important to understand both subtle and severe impacts to phenotype.
Chair and Committee
Barak A. Cohen
Joseph Dougherty, Kristen M. Naegle, Tim Schedl, Cristina de Guzman Strong,
King, Dana Michele, "Grammar and Variation: Understanding How cis-Regulatory Information is Encoded in Mammalian Genomes" (2018). Arts & Sciences Electronic Theses and Dissertations. 1701.