Abstract

PubMed hosts nearly ~30 million abstracts that include both clinical and molecular research in a wide variety of contexts. The increased accessibility to the -omic technologies accelerates scientific discoveries and leads to more publications annually. One confusion in this growth has been the unequal distribution of resources for discoveries. For example, a large fraction of publications and grants on PubMed has only discussed a subset of human genes while leaving out the majority of the genome in the dark. Moreover, focused attention on understudied human genes was also lacking before the work in this dissertation. As a result, we provide a framework to even the playing field for understudied human genes to take a shot at landing on PubMed.We first applied text mining analysis to extract information about human genes from PubMed abstracts. We found that as high as 50% of human genes can be considered understudied and therefore, are orphan genes. In contrast, a very small subset of genes that are already studied (e.g. top-cited) dominate the majority of publication increase in recent years. We also observed that recently de-orphanized genes shared a common set of features that explain their pattern of discoveries, such as an association to top-cited genes and occurrences in Mendelian or GWAS studies. Based on this set of features, we developed Molecular ORphan PHEnOtype MatchEr (MORPHEOME), a computational and molecular framework to assist researchers to map the orphan genes to existing genes and biology.To generate the genome-wide data for MORPHEOME, we carried out multiple different CRISPRi and CRISPRa screens against bisphosphonate, SSRIs, biguanides, and other commonly prescribed blockbuster drugs. We also combined existing CRISPR knockout datasets, protein-protein interaction datasets, and TCGA survival/mouse knockout phenotypes to map orphan genes to their most relevant top-cited genes. We found that already published gene pairs share a high overlap in their protein-protein interaction and genetic fitness relationships. Similarly, we validated a set of novel orphan gene-top-cited protein-protein interaction based on the strength of interaction predictions. Notably, we also followed up on the related work for two recently identified orphan genes, ATRAID and SLC37A3, which were novel targets of bisphosphonates. MORPHEOME predicted a novel link between ATRAID/SLC37A3 genes and mitochondrial localized genes at the intersection of mevalonate pathway. Our follow-up work began to understand the role of these two genes in the mevalonate pathway with mitochondria. These findings show how an unbiased approach from PubMed abstracts and genetic screens revealed previously unknown biology and can serve as a guide for other unknown orphan genes.

Committee Chair

Timothy R. Peterson

Committee Members

Harrison Gabel, Ira Hall, Jason Held, Ting Wang,

Comments

Permanent URL: https://doi.org/10.7936/srrs-e956

Degree

Doctor of Philosophy (PhD)

Author's Department

Biology & Biomedical Sciences (Molecular Cell Biology)

Author's School

Graduate School of Arts and Sciences

Document Type

Dissertation

Date of Award

Summer 8-15-2019

Language

English (en)

DOI

https://doi.org/10.7936/srrs-e956

Recommended Citation

Park, Ji Woong, "A Molecular and Computational Framework to Investigate Understudied Human Genes" (2019). Arts & Sciences Theses and Dissertations. 1936.

The definitive version is available at https://doi.org/10.7936/srrs-e956

Download

Available for download on Saturday, August 29, 2026

Included in

Biology Commons

COinS

DOI

https://doi.org/10.7936/srrs-e956

Arts & Sciences Theses and Dissertations

A Molecular and Computational Framework to Investigate Understudied Human Genes

Abstract

Committee Chair

Committee Members

Comments

Degree

Author's Department

Author's School

Document Type

Date of Award

Language

DOI

Recommended Citation

Included in

DOI

Search

Links

Browse

Author Corner

Arts & Sciences Theses and Dissertations

A Molecular and Computational Framework to Investigate Understudied Human Genes

Author

Abstract

Committee Chair

Committee Members

Comments

Degree

Author's Department

Author's School

Document Type

Date of Award

Language

DOI

Recommended Citation

Included in

Share

DOI

Search

Links

Browse

Author Corner