McKelvey School of Engineering Theses & Dissertations

A Control Theoretic Approach to the Stochastic Multi-armed Bandit Problem With Applications in Hyperparameter Optimization

Jonathan Gornet, Washington University – McKelvey School of EngineeringFollow

Abstract

Decision-making under uncertainty is a fundamental problem encountered frequently in many real-world applications. This challenge has been rigorously formulated as the Stochastic Multi-Armed Bandit (SMAB) problem, which consists of a learner interacting with an environment. For each interaction, the learner selects an action and then receives a reward from the environment based on the chosen action. The learner's objective is to maximize the accumulated reward over a set number of rounds. This thesis addresses the SMAB problem by leveraging the field of Control Theory and dynamical systems. We specifically focus on a SMAB environment where the rewards are the output of a Linear Gaussian Dynamical System (LGDS). The core contribution of this thesis is to demonstrate how Control Theory can enhance our current understanding of decision-making under uncertainty through the SMAB problem. We address two different directions relevant to the SMAB problem using a control theoretic approach. The first direction is the issue of how to efficiently explore the environment's action space. We discover that a LGDS property called observability, which measures the difficulty of estimating the LGDS's state variable, can be utilized to increase the amount of information gained during exploration. The second direction we consider is how to utilize the environmental structure to predict each action's reward more effectively. We show that a representation of the Kalman filter, where the Kalman filter is the optimal one-step predictor of the LGDS's output in the mean-squared error sense, can be extracted for predicting each action's reward. Using the theoretical results developed for the two problems, we propose an online hyperparameter optimizer called Hyperparameter Controller (HyperController) in Reinforcement Learning (RL) to improve the efficiency and performance of training RL neural networks. Our theoretical results demonstrate that HyperController accelerates the training phase while also consistently improving the neural network's performance.

Committee Chair

Bruno Sinopoli

Committee Members

Andrew Clark; Fabio Pasqualetti; Ioannis Kantaros; Neal Patwari

Degree

Doctor of Philosophy (PhD)

Author's Department

Electrical & Systems Engineering

Author's School

McKelvey School of Engineering

Document Type

Dissertation

Date of Award

8-18-2025

Language

English (en)

DOI

https://doi.org/10.7936/vfbq-mc43

Recommended Citation

Gornet, Jonathan, "A Control Theoretic Approach to the Stochastic Multi-armed Bandit Problem With Applications in Hyperparameter Optimization" (2025). McKelvey School of Engineering Theses & Dissertations. 1273.

The definitive version is available at https://doi.org/10.7936/vfbq-mc43

Download

Available for download on Saturday, August 15, 2026

Included in

Systems Science Commons

COinS

DOI

https://doi.org/10.7936/vfbq-mc43

McKelvey School of Engineering Theses & Dissertations

A Control Theoretic Approach to the Stochastic Multi-armed Bandit Problem With Applications in Hyperparameter Optimization

Abstract

Committee Chair

Committee Members

Degree

Author's Department

Author's School

Document Type

Date of Award

Language

DOI

Recommended Citation

Included in

DOI

Search

Links

Browse

Author Corner

McKelvey School of Engineering Theses & Dissertations

A Control Theoretic Approach to the Stochastic Multi-armed Bandit Problem With Applications in Hyperparameter Optimization

Author

Abstract

Committee Chair

Committee Members

Degree

Author's Department

Author's School

Document Type

Date of Award

Language

DOI

Recommended Citation

Included in

Share

DOI

Search

Links

Browse

Author Corner