This item is under embargo and not available online per the author's request. For access information, please visit


Date of Award

Spring 5-15-2021

Author's School

Graduate School of Arts and Sciences

Author's Department

Biology & Biomedical Sciences (Computational & Systems Biology)

Degree Name

Doctor of Philosophy (PhD)

Degree Type



Small molecules are key tools in biology and medicine. In biology, small molecules are used to probe biological systems and gain insight into their structure and function. In medicine, this role is further refined to reverse the biological conditions that contribute to human disease. Developing new small molecules into biological probes or drugs can be a daunting scientific task. These projects often begin with many thousands of potential candidates, which are progressively screened and eliminated from consideration by high-throughput experimental assays. Those molecules that emerge from this process as candidate drugs are also subject to extensive in vitro, in vivo, or clinical testing for efficacy and toxicity. Unfortunately, the majority of compounds never become useful drugs or biological probes. These candidates fail due to unforeseen, deleterious bioactivity, such off-target or non-specific effects that render them ineffective as probes, or serious toxic effects that render them unsafe as drugs. In the case of drugs, these frequent, late-stage failures often occur during clinical trials, after millions of dollars have already been spent. These failures contribute to the high cost of drug development and raise drug prices for patients. Pediatric patients are at even higher risk of unforeseen drug toxicities compared to adults due to a lack of drug safety data in children, combined with complex, age-dependent changes in the expression and activity of drug metabolizing enzymes (ontogeny). Importantly, these metabolic changes affect exposure not only to a drug, but also to potentially toxic metabolites of that drug. To address these problems, new in silico tools are needed to extrapolate from experimental knowledge of deleterious small molecule bioactivity to predict the specificity, efficacy, and safety of biological probes and drugs.

Recently, a collection of techniques known as deep learning have been used to build state of the art solutions to problems in several industrial, engineering, and scientific fields. Chemists have been developing these techniques to predict the chemical properties of small molecules, hoping to avoid expending resources on molecules that eventually exhibit deleterious bioactivity. These deep learning techniques have already shown promising results in quantum chemistry, drug metabolism, and toxicology. In this dissertation we contribute to the ongoing scientific work in this field in three ways. First, we demonstrate that existing deep learning techniques fail to represent basic chemical concepts like aromatic and conjugated systems, which depend on relationships between distant atoms in a molecule. To rectify this deficiency we introduce Wave, a breadth-first graph recurrent neural network architecture, which is designed to infer long-distance relationships in graph structured data, including molecules. Wave more efficiently represents these basic chemical concepts. Furthermore, we demonstrate that Wave networks can more accurately compute quantum chemical properties of small molecules, which are correlated with drug metabolism and toxicity. Second, we demonstrate that deep learning can be used to improve existing approaches to identify molecules with deleterious bioactivity in high-throughput assays. Such molecules may appear as promising candidate drugs or biological probes in early experiments, only to be shown to have non-specific or intractable mechanisms for their apparent bioactivity that render them useless. We develop mechanistic models of these deleterious mechanisms of bioactivity and use them to identify molecules that display non-specific activity in a large, public repository of high-throughput assay data. Finally, we develop a new deep learning architecture for modeling time-dependent data and use it to model the age-dependent development of hepatic drug metabolizing enzymes. We combine this model with existing deep-learning models of small molecule reactivity to predict age-dependent exposure to toxic metabolites.


English (en)

Chair and Committee

Joshua Swamidass

Committee Members

Jeremy Buhler, Greg Bowman, Philip Payne, Roman Garnett,

Available for download on Sunday, May 15, 2022