On the challenges and rewards of analyzing molecular dynamics at the terabyte and millisecond scale
Date of Award
Doctor of Philosophy (PhD)
Molecular Dynamics (MD) and Markov state models (MSMs) are powerful tools for estimating and concisely representing the conformational ensemble accessible to biological macromolecules, particularly proteins. Conformational ensembles are of special importance biological function, both in health and disease, because biology derives from molecules’ entire conformational distribution rather than any single structure. Consequently, MD is poised to become a powerful tool for personalized medicine and for the study of molecular sequence-function relationships generally. However, because of their hyperdimensionality and size, just generating MD datasets and Markov state models (MSMs) that represent biologically relevant molecules is a substantive technical challenge. Then, even once these models are generated, it is not immediately obvious how the conformational ensemble represented by an MSM encodes function. In this thesis, I first present enspara, a python library that makes it possible to build and analyze MSMs at an unprecedented scale. Then, I present “exposons,” an unsupervised machine learning method for discovering substructure these colossal datasets by searching for cooperative changes in a protein’s surface. This method is applied to several small systems of biological interest. Finally, I demonstrate the power these technologies to analyze the kinetic diversity of motor protein myosin, the longest-studied protein in all biochemistry, and in so doing address a longstanding mystery in the field of myosin biochemistry. The applicability of these technologies is almost certainly not limited to the handful of systems I study here. Therefore, this work likely has broad implications for the future of biochemistry, personalized medicine, and the study of biology.
Chair and Committee
Gregory R. Bowman
Porter, Justin Roy, "On the challenges and rewards of analyzing molecular dynamics at the terabyte and millisecond scale" (2021). Arts & Sciences Electronic Theses and Dissertations. 2618.