Date of Award

Winter 1-15-2021

Author's School

Graduate School of Arts and Sciences

Author's Department

Biology & Biomedical Sciences (Molecular Genetics & Genomics)

Degree Name

Doctor of Philosophy (PhD)

Degree Type



Human cancer is a complex and dynamic disease with mutational, spatial, and temporal heterogeneity. There has been a concerted effort to address the heterogeneity by profiling large-scale multiomic datasets with an emphasis on abnormalities such as DNA mutations at coding regions and structural variants. However, a systematic analysis of alterations at “dark matter” is lacking – including DNA slippage events at microsatellite sequences, gene fusions arising from genomic rearrangements, and aberrant DNA methylations at promoter regions. This dissertation focus on developing data-driven computational analysis pipelines to study the patterns of microsatellite instability, gene fusions, and aberrant DNA methylation, and their functional impacts within and across cancer types, utilizing integrative system-biology approach that combines multi-dimensional genomic, epigenomic, proteomic and clinical data. First we estimated the mutational load and microsatellite instability (MSI) status of 10,980 tumors across 33 cancer types from The Cancer Genome Atlas (TCGA). Beyond the well-characterized canonical MSI-prone tumor types, we identified additional MSI-high tumors in other non-canonical MSI tumor types and further validated in another independent data set from the Clinical Proteomic Tumor Analysis Consortium (CPTAC). A survey of the 993 CPTAC tumors across 7 cancer types revealed tumors with high MSI were associated with high number of predicted microsatellite-derived neoantigen, suggesting that the aberrant expansion and deletion of microsatellite sequence is immunogenic even the non-canonical MSI-high tumor. Next, we focuses on investigating gene fusions in 9,624 TCGA tumor samples across 33 cancer types predicted by multiple RNA-sequencing-based fusion calling tools and validated by orthogonal whole-genome sequencing-based approach. We demonstrated that gene fusions are mutually exclusive with the other driver mutations in most of the cancer types, and function as the sole driver in more than 1% of cancer cases. Lastly, we leveraged the complementary nature of RNA-seq and proteomic data to identify aberrant DNA methylation leading to both transcriptional and translational changes. The integrated multi-omic profiling cataloged epigenomic aberrations of 506 CPTAC tumors across five cancer types, highlighting key changes to driver genes that affect cancer hallmark pathways in a coordinated manner. Overall, our systematic pan-cancer studies uncover determinants and consequences of genetic and epigenetic variation beyond the conventional mutational profiling, revealing potential new disease mechanisms and therapeutic opportunities.


English (en)

Chair and Committee

Li Ding

Committee Members

Christopher Maher, Nancy Saccone, Tim Schedl, Jieya Shao,