Abstract
Intrinsically disordered proteins and regions (IDRs) lack stable three-dimensional structure under physiological conditions. Instead, IDRs are better described by a conformational equilibrium wherein these proteins rapidly interconvert between many distinct structural states. Although they lack a well-defined reference fold, IDRs are ubiquitous across the Tree of Life and play essential roles in virtually every biological process, including gene regulation, molecular recognition, and signal transduction. The absence of a well-defined fold, however, makes IDRs difficult to interpret and challenging to engineer. Traditional structure-based approaches that rely on tertiary structure or evolutionary conservation are poorly suited to handle the complexity of IDRs. My work advances two complementary paradigms for understanding and designing IDRs. In the first paradigm, sequence → ensemble, I interpret IDR sequences through a molecular biophysical lens: observables derived from the statistics of disordered conformational ensembles are used to generate mechanistic hypotheses about IDR function and used to guide hypothesis generation and design. To enable this at scale, I develop high-throughput sequence-to-ensemble predictors that enable us to navigate disordered conformational landscapes directly from sequence. These models are implemented in robust, user-friendly software, making quantitative ensemble-based analysis accessible across large protein sets, not just individual case studies. In the second paradigm, sequence → function, I develop disorder-specific deep learning models to infer functional sequence constraints directly from the amino acid sequence. Instead of relying on biophysical models, this approach leverages generative modeling and learned representations to design disordered regions. Building on recent advances in natural language processing, I introduce a diffusion-based protein language model tailored to intrinsically disordered regions that learns IDR-specific sequence representations and can condition on adjacent folded domains when present. This allows the model to capture how local sequence context constrains disordered regions, enabling the context-aware design of disordered protein sequences. A defining feature throughout this body of work is its high-throughput, software-first implementation. I design and implement robust, scalable tools that make these models easy to deploy, integrate, and extend within diverse protein bioinformatics and protein design workflows for both computational and experimental researchers. Collectively, these methods and tools are intended to enable a broad community of researchers to systematically probe, predict, and engineer intrinsically disordered proteins and protein regions.
Committee Chair
Alex Holehouse
Committee Members
Andrea Soranno; Eric Galbert; Joshua Rackers; Michael Brent; Roman Garnett
Degree
Doctor of Philosophy (PhD)
Author's Department
Biology & Biomedical Sciences (Computational & Systems Biology)
Document Type
Dissertation
Date of Award
4-28-2026
Language
English (en)
DOI
https://doi.org/10.7936/daz2-1a56
Author's ORCID
https://orcid.org/0000-0002-5022-7006
Recommended Citation
Lotthammer, Jeffrey, "Learning Sequence Constraints Governing Conformational Ensembles and Function in Intrinsically Disordered Proteins" (2026). Arts & Sciences Graduate Student Theses and Dissertations. 3741.
The definitive version is available at https://doi.org/10.7936/daz2-1a56