This item is under embargo and not available online per the author's request. For access information, please visit


Date of Award

Winter 12-15-2018

Author's School

Graduate School of Arts and Sciences

Author's Department

Biology & Biomedical Sciences (Molecular Genetics & Genomics)

Degree Name

Doctor of Philosophy (PhD)

Degree Type



Accurate interpretation of cancer mutations in individual tumors is a prerequisite for precision medicine. Large-scale sequencing studies, such as The Cancer Genome Atlas (TCGA) project, have worked to address the functional consequences of genomic mutations, with the larger goal of determining the underlying mechanisms of cancer initiation and progression. Many studies have focused on characterizing non-synonymous somatic mutations that alter amino acid sequence, as well as splice disrupting mutations at splice donors and acceptors. Current annotation methods typically classify mutations as disruptors of splicing if they fall on the consensus intronic dinucleotide splice donor, GT, the splice acceptor, AG. Splice site mutations as a group have been presumed to be invariably deleterious because of their disruption of the conserved sequences that are used to identify exon-intron boundaries. While this classification method has been useful, increasing evidence suggests that splice site mutations can lead to transcriptional changes beyond disruption and that many exonic mutations that act primarily through alternative splicing are still being overlooked in cancer genomics. My thesis work focuses on developing tools to systematically classify and functionally validate splice site and splice creating mutations using RNA-Seq data, to more accurately understand the functional consequences of mutations on alternative splicing by integrating DNA and RNA-Sequencing data.

First we developed SpliceInator, a semi-automated tool to systematically detect splicing phenotypes using mutation and gene expression data. We interrogated 1,146 conserved splice site mutations across 19 cancer types revealing a wide range of complex splicing phenotypes and emphasize the importance of analyzing patient specific RNA-Sequencing. We further explored beyond the splice site by interogating all mutations in a splicing context using MiSplice for the first large-scale discovery of splice-creating mutations (SCMs) across 8,656 TCGA tumors. We reported 1,964 originally mis-annotated mutations having clear evidence of creating novel splice junctions. Mutations in a subset of genes including PARP1, BRCA1, and BAP1, were experimentally validated for splice-creating function using a mini-gene splicing assay. Notably, we found neoantigens induced by SCMs are likely several folds more immunogenic compared to missense mutations, exemplified by the recurrent GATA3 SCM. Our work highlights importance of integrating DNA and RNA data for understanding functional and clinical implications of mutations in human diseases. Finally, to further capture the full landscape of SCMs, we explored both somatic and germline mutations for splice-site-creating function using MiSplice. Altogether, we have gathered a set of 2,888 SCMs enabling us to effectively compare the landscape of rare and germline SCMs. This compendium of SCMs has also started to elucidate novel genomic properties of mutations located at the donor and acceptor splice site and SCM containing exons including an overall decrease in the size of the novel exon post mutation, mimicking a natural evolutionary selective pressure but exploited in the cancer genome to maintain proper alternative splicing. To date, this is the first analysis comparing rare germline SCMs and somatic SCMs revealing their comparable dysregulation to the splicing code in cancer. Together my thesis work revealed that splice-site-creating mutants play a much larger role than previously appreciated in contributing to cancer and further expands our understanding of the genetic basis by which mutations can alter the mRNA landscape by dysregulating alternative splicing. More broadly, my work calls for a deeper analysis of seemingly “silent” mutations in any disease as such mutations may alter gene function via alternative splicing and integrating RNA and DNA-Seq can allow for accurate evaluation of mutations in a splicing context.


English (en)

Chair and Committee

Li Ding

Committee Members

Sergej Djuranovic, John Edwards, Christopher Maher, James Skeath,


Permanent URL:

Available for download on Monday, January 04, 2021