Date of Award

Spring 5-15-2017

Author's School

Graduate School of Arts and Sciences

Author's Department

Biology & Biomedical Sciences (Computational & Systems Biology)

Degree Name

Doctor of Philosophy (PhD)

Degree Type



Gene expression is driven by specific combinations of transcription factors binding to regulatory sequences to define cell type expression profiles. Changes in DNA sequence alter transcription factor binding affinities and gene expression, and DNA methylation is an additional source of variation that is maintained throughout cellular division. Numerous genomic studies are underway to determine which genes are abnormally regulated by DNA methylation in disease. However, we have a poor understanding of how disease-specific methylation variation affects expression. Global DNA demethylation agents have been clinically approved for use in cancer, which has spurred interest in identifying genes which would be most susceptible for targeted demethylation therapies. In this work, I developed multiple tools to increase our knowledge about the relationship between methylation and gene expression in both tissue specificity and disease. I first developed a computational strategy to identify amplifications and deletions from restriction enzyme-based methylation datasets. In a model of endocrine therapy resistant breast cancer, I identify ESR1 as the most amplified genomic region in response to estrogen deprivation. I develop a qPCR-based assay to probe the amplification in cell lines, formalin-fixed paraffin embedded samples, patient tumors, and xenograft samples. This data is consistent with the hypothesis that in a subset of patients, the ESR1 amplification results in increased levels of ER. These are produced in response to estrogen deprivation to sensitize breast cancer to low available quantities of estrogen for cellular growth. Next, to explain specific variation in methylation that associates with expression change in both disease and tissue-specificity, I developed an integrative analysis tool, Methylation-based Gene Expression Classification (ME-Class). This model captures the complexity of methylation changes around a gene promoter. Using whole-genome bisulfite sequencing and RNA-seq datasets from different tissue samples, ME-Class significantly outperforms published methods using methylation to predict differential gene expression change. To demonstrate its utility, I used ME-Class to analyze different hematopoietic cell types, and identified that expressionassociated methylation changes were predominantly found when comparing cells from distantly related lineages, implying that changes in the cell’s transcriptional program precede associated methylation changes. Training ME-Class on normal-tumor pairs indicated that cancer-specific expression-associated methylation changes differ from tissue-specific changes. I further show that ME-Class can detect functionally relevant cancer-specific, expression-associated methylation changes that are reversed upon the removal of methylation in a model of colon cancer. Lastly, I extended ME-Class to incorporate 5-hydroxymethylcytosine and uncovered gene regulatory logic involving 5hmC and 5mC in mammalian development and disease. As more large-scale, genome-wide, differential DNA methylation studies become available, tools such as ME-class will prove invaluable to understand how specific methylation changes affect transcription. Our results show this toolset can identify genes that are dysregulated by methylation in disease, and could be used to facilitate the identification of patients who may benefit from clinically-approved demethylating therapeutics.


English (en)

Chair and Committee

John R. Edwards

Committee Members

Gary D. Stormo, Tao Ju, Eugene M. Oltz, Christopher A. Maher,


Permanent URL: https://doi.org/10.7936/K7H41PWK