Date of Award

8-6-2024

Author's School

Graduate School of Arts and Sciences

Author's Department

Biology & Biomedical Sciences (Human & Statistical Genetics)

Degree Name

Doctor of Philosophy (PhD)

Degree Type

Dissertation

Abstract

Long noncoding ribonucleic acids (lncRNAs) are a heterogeneous class of RNAs greater than 200 nucleotides long, traditionally classified as noncoding, with largely underexplored mechanisms of action. LncRNAs account for an estimated 19,000 genes in the human genome, but only a few thousand have been characterized. LncRNAs have diverse roles in development, gene and protein regulation, and cellular organization as well as diverse mechanisms of action. LncRNAs can function through the RNA itself, through DNA elements within the RNA locus, through the act of transcription of the RNA, and, more recently described, through previously overlooked encoded proteins. Despite the discovery of proteins translated from lncRNAs, the detection and characterization of lncRNAs with coding potential is difficult. The primary high-throughput method of identifying lncRNAs with coding potential is ribosome profiling. This data can be misleading as ribosomes can initiate translation without producing a viable peptide, or the peptide can be nonfunctional or rapidly degraded. Proteomic studies can provide peptide support for lncRNAs, but many do not detect small peptides (< 100 amino acids) or noncanonical start codons, and other methods, such as conservation- or nucleotide composition-based scoring, may not accurately reflect coding potential. Databases of lncRNAs with coding potential are often hand curated without de novo systematic evaluation of coding potential, with limited data sources, or with limited detection methods. This poses an exigent need for a consistently accurate predictor of lncRNA coding potential, as experimental investigations are time-consuming. This thesis sought to address the need for an integrative computational prediction approach and to test promising lncRNA candidates with coding potential. My combined work created comprehensive new resources to guide future lncRNA studies. To overcome the limitations of current methods, we developed an integrative proteogenomic analysis pipeline to detect protein-encoding lncRNAs from mass spectrometry-based proteomic data, transcriptomic data, and other coding prediction tools. Because many developmental and regulatory lncRNAs become dysregulated in cancer, and due to the widespread availability of cancer data repositories, this pipeline was applied to proteomic data from nine cancer types. We systematically detected open reading frames (ORFs) in lncRNAs in multiple cohorts with proteogenomic modalities of support, including conservation scores, protein prediction, and mass spectrometry, resulting in a catalog of potential peptides more refined than previous studies that considered limited amounts of evidence. Seeking to validate promising protein candidates from well-characterized lncRNAs, we investigated HOTAIR, PCAT19, and BCAR4 for coding potential and protein product function. We employed experimental methods to determine if a peptide mechanism, RNA mechanism, or a combination of both was the main contributor to the lncRNA function. From this thesis research, we found HOTAIR produced a small protein that was not functional, PCAT19 did not produce a protein, and BCAR4 produced a functional protein related to human epidermal growth factor receptor 2 (HER2) signaling. In addition, four new subject-specific databases were created to aid future studies of lncRNA coding potential. This research expands the understanding of lncRNA coding potential and regulatory mechanisms in the field of RNA biology, refines the landscape of validated lncRNAs with coding potential, and guides future mechanistic studies of lncRNAs with coding potential.

Language

English (en)

Chair and Committee

Christopher Maher

Committee Members

Jacqueline Payton; Jieya Shao; John Edwards; Malachi Griffith

Available for download on Wednesday, November 19, 2025

Share

COinS