Date of Award

Winter 12-15-2022

Author's School

Graduate School of Arts and Sciences

Author's Department

Biology & Biomedical Sciences (Molecular Cell Biology)

Degree Name

Doctor of Philosophy (PhD)

Degree Type



Next-generation sequencing of DNA and RNA continues to be integrated into pre-clinical and clinical research. However, challenges still remain that impede the translation of findings into an improved understanding of human diseases or clinically actionable alterations. The projects described in this dissertation start with compiling efforts from experts who have identified druggable genes within the human genome, followed by in-depth analyses which characterize the pan-cancer landscape of splice-associated mutations and noncoding, regulatory mutations across multiple subtypes of breast cancer. In Chapters 2 and 3, substantial efforts were made to update the content and the user experience for the drug-gene interaction database (DGIdb, dgidb.org). Chapter 2 describes the substantially expanded comprehensive catalog of druggable genes and anti-neoplastic drug-gene interactions included in DGIdb. Along with these content updates, there were major overhauls of the DGIdb codebase, including an updated user interface, preset interaction search filters, consolidation of interaction information into interaction groups, greatly improved search response times, and upgrades to the underlying web application framework. In addition, we expanded the API to add new endpoints, allowing users to extract more detailed information about queried drugs, genes, and drug-gene interactions, including listings of PubMed IDs, interaction type, and other interaction metadata. The updates described in Chapter 3 focus on the integration of DGIdb with crowdsourced efforts, leveraging the Drug Target Commons for community-contributed interaction data, Wikidata to facilitate term normalization, and export to NDEx for drug-gene interaction network representations. Seven new sources were added since the previous major version release, and of the previously aggregated sources, 15 were updated. This update also included improvements to the process of drug normalization and grouping of imported sources. Other notable updates included the introduction of a more sophisticated Query Score for interaction search results, an updated Interaction Score, the inclusion of interaction directionality, and several additional improvements to search features, data releases, licensing documentation, and the application framework. In Chapters 4 and 5, we discuss how comprehensive sequencing approaches were used to discover noncoding, regulatory mutations within 458 breast cancer samples. After extensive filtering, our analysis revealed significant mutation clustering within the noncoding space of RMRP and WDR74, as has been noted in previous studies, as well as ~130 other genes not previously reported. Additionally, noncoding splice-associated mutations were discovered using RegTools. In Chapter 6, we assessed the landscape of splice-associated mutations within patient tumor cohorts from The Cancer Genome Atlas (TCGA) and Washington University clinical cohorts. We developed and employed RegTools to identify significant splice-associated mutations and discovered 235,778 events where a variant significantly increased the splicing of a particular junction across 158,200 unique variants and 131,212 unique junctions. To characterize these somatic variants and their associated splice isoforms, we annotated them with the Variant Effect Predictor (VEP), SpliceAI, and Genotype-Tissue Expression (GTEx) junction counts and compared our results to other tools that integrate genomic and transcriptomic data. We identified novel splice-associated variants and previously unreported patterns of splicing disruption in known cancer drivers, such as TP53, CDKN2A, and B2M, as well as in genes not previously considered cancer-relevant, such as RNF145. This dissertation describes studies that address challenges, including the accessibility of information to researchers, discovery of noncoding regulatory mutations, and identification of splice-associated mutations with an open-source tool, in order to advance the dissemination of knowledge within the bioinformatics and cancer genomics communities, elucidate novel mechanisms of tumor biology, and identify potential therapeutic targets for cancer therapy.


English (en)

Chair and Committee

Malachi Griffith

Committee Members

Obi L Griffith


Update embargo

Included in

Genetics Commons