Date of Award

Spring 5-15-2020

Author's School

Graduate School of Arts and Sciences

Author's Department

Biology & Biomedical Sciences (Computational & Systems Biology)

Degree Name

Doctor of Philosophy (PhD)

Degree Type



A single genome can derive phenotypically unique cell types through various epigenetic modifications that instruct specific gene expression patterns. Histone modifications, DNA methylation, and DNA hydroxymetylation are the most common epigenetic modifications. To understand the mechanisms how these epigenetic modifications regulate gene expression, one often needs to map these marks genome-wide through profiling methods. Firstly, for histone modifications, Roadmap Epigenomics Consortium generated The Human Reference Epigenome Map, containing thousands of genome-wide histone modification datasets that describe epigenomes of a variety of different human tissue and cell types. This map has allowed investigators to obtain a much deeper and more comprehensive view of our regulatory genome, e. g. defining regulatory elements including all promoters and enhancers for a given tissue or cell type. An outstanding task is to combine and compare different epigenomes in order to identify regions with epigenomic features specific to certain types of tissues or cells, e. g. lineagespecific regulatory elements. Currently available tools do not directly address this question. This need motivated us to develop a tool that allows investigators to easily identify regions with epigenetic features unique to specific epigenomes that they choose, making detection of common regulatory elements and/or cell type- specific regulatory elements an interactive and dynamic experience. An online tool EpiCompare was developed to assist investigators in exploring the specificity of epigenomic features across selected tissue and cell types. Investigators can design their test by choosing different combinations of epigenomes, and choosing different classification algorithms provided by our tool. EpiCompare will then identify regions with specified epigenomic features, and provide a quality assessment of the predictions. Investigators can interact with EpiCompare by investigating Roadmap Epigenomics data, or uploading their own data for comparison. We demonstrated that by using specific combinations of epigenomes we can detect developmental lineage-specific enhancers. Secondly, for DNA methylation and hydroxymethylation, generating high resolution methylomes and hydroxymethylomes is a significant barrier for individual laboratories, therefore so far only a few cell types have deeply sequenced hydroxymethylomes at single-base resolution. This potential cost-barrier problem engendered a need for cost-effective, but high-resolution 5hmC mapping technology. Current enrichment-based technologies provide cheap, but low-resolution and relative enrichment of 5hmC levels while single base-resolution methods can be prohibitively expensive to scale up to large experiments. To address this problem, we develop a deep learning-based method “DeepH&M”, which integrates enrichment and restriction enzyme sequencing methods to simultaneously estimate absolute hydroxymethylation and methylation levels at single CpG resolution. Using 7-week-old mouse cerebellum data for training DeepH&M model, we demonstrate that the 5hmC and 5mC levels predicted by DeepH&M were in high concordance with whole genome bisulfite- based approaches. The DeepH&M model can be applied to 7-week old frontal cortex and 79-week cerebellum revealing the robust generalizability of this method to other tissues from various biological time points.


English (en)

Chair and Committee

Ting Wang

Committee Members

Jeremy Buhler, Michael Brent, Harrison Gabel, Nan Lin,

Included in

Genetics Commons