Date of Award

12-22-2023

Author's School

McKelvey School of Engineering

Author's Department

Computer Science & Engineering

Degree Name

Doctor of Philosophy (PhD)

Degree Type

Dissertation

Abstract

A central theme in modeling in regulatory genomics is to consider what representation is suited for DNA sequences that give the most helpful information. Traditionally, there are k-mers, position-weight-matrices, parametric statistical models like Hidden Markov models, and, more recently, deep neural network. In this work, we introduce sparse representations as a principled framework for problems in regulatory genomics. We show that sparse representation is a framework that allows us to build techniques to answer challenging inferential questions in regulatory genomics. Leveraging sparse representations, we reveal that gapped and long motifs are prevalent in in-vivo datasets like ChIP-Seq, often corresponding to cooperative binding and transposable elements, respectively. Motif discovery through sparse representation outperforms conventional methods by adeptly handling both gapped and ungapped motifs. We demonstrate the superiority of sparse representation in modeling diverse yet conserved regulatory elements compared to k-mer-based approaches, allowing us to identify motifs larger than 30bp. Moreover, a desirable property of sparse representation is its capability to be structured as a hierarchical distributed representation. Specifically, we can speed up its optimization through a deep learning technique known as deep unfolding. This approach yields a neural network that can be extended to address various downstream problems, including DNA classifications and regressions. What sets apart this neural network from conventional constructions is its full interpretability, because unlike conventional models, it does not rely on model-agnostic explanation techniques. The patterns used to represent regulatory elements are transparent within the sparse representation.

Language

English (en)

Chair

Gary Stormo

Committee Members

Jeremy Buhler

Share

COinS