Date of Award

4-3-2024

Author's School

Graduate School of Arts and Sciences

Author's Department

Biology & Biomedical Sciences (Computational & Systems Biology)

Degree Name

Doctor of Philosophy (PhD)

Degree Type

Dissertation

Abstract

Colorectal cancer (CRC) is the most common gastrointestinal malignancy and a leading cause of cancer deaths in the U.S.A. More than half of CRC patients develop metastatic disease (mCRC) with a 13% average 5-year survival rate. Despite advances in our understanding of primary CRC oncogenesis and biology, the mechanisms of tumor metastasis remain poorly characterized. To date, CRC research has primarily focused on the dysregulation of protein-coding genes to identify oncogenes and tumor suppressors as potential diagnostic and therapeutic targets, thereby under-representing the critical role of many classes of non-coding RNAs (ncRNAs). One emerging class of ncRNAs, circular RNAs (circRNAs), has been reported to be associated in crucial hallmarks of tumorigenesis. CircRNAs are single-stranded, covalently closed RNA molecules produced from pre-messenger RNAs (pre-mRNAs) through backsplicing. Without open ends, their circular structure is resistance to exonucleolytic decay, resulting in high stability, making circRNAs clinically significant as potential non-invasive biomarkers for cancer diagnosis and prognosis. However, their role in mCRC progression remains poorly characterized. Existing studies have numerous limitations due to available circRNA detection methods, access to patient samples, and existing software for robust analyses. Therefore, this thesis research adopted at a multi-omics approach to better understand how circRNAs are dysregulated in mCRC, while optimizing the bioinformatic toolkits for a broader circRNA research community. First, this research aimed to address the knowledge gap about the cell-type specificity of circRNAs to elucidate their functions in the tumor microenvironment (TME). To achieve this, we performed total RNA sequencing (RNA-seq) on 30 matched normal, primary, and metastatic samples from 14 mCRC patients. Additionally, five CRC cell lines were sequenced to construct a circRNA catalog in CRC. We detected 47,869 circRNAs, with 51% previously unannotated in CRC, and 14% novel candidates when compared to existing circRNA databases. We identified 362 circRNAs differentially expressed in primary and/or metastatic tissues, termed Circular RNAs Associated with Metastasis (CRAMS). We performed cell-type deconvolution using published single-cell RNA-seq (scRNA-seq) datasets and applied a non-negative least squares (NNLS) statistical model to estimate cell-type specific circRNA expression. This predicted 667 circRNAs as exclusively expressed in a single cell type. Collectively, the cell-type specificity and differential expression status of all circRNAs are published in TMECircDB (Tumor MicroEnvironment Specific CircRNA DataBase), to aid in functional characterization of circRNAs in mCRC, specifically in the TME. Furthermore, growing reports show that select circRNAs contain circular open reading frames (cORFs) and impact tumorigenesis through encoded small peptides. However, current circRNA detection approaches bias towards using short-read RNA-seq for detecting circRNA backsplice junctions (BSJs) without reliably reconstructing complete circRNA sequences, inhibiting accurate cORFs prediction. To address these challenges, we performed long-read sequencing to enrich for full-length circRNAs that could serve as a guide for subsequent short-read alignment to “rescue” circRNAs that elude existing tools focused on circRNA detection from short reads. We also developed an open-source bioinformatics workflow that characterizes and rescues novel circRNAs missed in existing tools by integrating short- and long-read RNA-seq, referred to as CHRIS (CHaracterizing CircRNAs by Integrative Sequencing). We applied our approach to CRC cell lines and patient samples to reveal 18,032 novel, non-canonical isoforms of known circRNAs of which 69 were altered during cancer metastasis. As proof of concept, we validated five high-confidence circRNAs rescued by CHRIS in CRC cell lines. Next, we performed a proteogenomics integration using 78,913 circRNAs detected by CHRIS and mass spectrometry data from 261 CRC patients from Clinical Proteomic Tumor Analysis Consortium (CPTAC). As published in an online database PepCircDB (Peptide Encoding CircRNA DataBase), we found 8,004 novel peptides encoded by circRNAs, including 1,960 that are only detectable with long-read integration. In summary, this dissertation work pioneered a systematic, integrative analysis of circRNAs associated with metastasis via various potential mechanisms. We hope that the novel biological findings and bioinformatic toolkits in this thesis research will significantly advance our understanding of circRNAs in promoting mCRC as well as provide multiple valuable computational resources for future research of circRNAs in tumor biology.

Language

English (en)

Chair and Committee

Christopher Maher

Available for download on Thursday, April 02, 2026

Share

COinS