Abstract
Decision-making under uncertainty is a fundamental problem encountered frequently in many real-world applications. This challenge has been rigorously formulated as the Stochastic Multi-Armed Bandit (SMAB) problem, which consists of a learner interacting with an environment. For each interaction, the learner selects an action and then receives a reward from the environment based on the chosen action. The learner's objective is to maximize the accumulated reward over a set number of rounds. This thesis addresses the SMAB problem by leveraging the field of Control Theory and dynamical systems. We specifically focus on a SMAB environment where the rewards are the output of a Linear Gaussian Dynamical System (LGDS). The core contribution of this thesis is to demonstrate how Control Theory can enhance our current understanding of decision-making under uncertainty through the SMAB problem. We address two different directions relevant to the SMAB problem using a control theoretic approach. The first direction is the issue of how to efficiently explore the environment's action space. We discover that a LGDS property called observability, which measures the difficulty of estimating the LGDS's state variable, can be utilized to increase the amount of information gained during exploration. The second direction we consider is how to utilize the environmental structure to predict each action's reward more effectively. We show that a representation of the Kalman filter, where the Kalman filter is the optimal one-step predictor of the LGDS's output in the mean-squared error sense, can be extracted for predicting each action's reward. Using the theoretical results developed for the two problems, we propose an online hyperparameter optimizer called Hyperparameter Controller (HyperController) in Reinforcement Learning (RL) to improve the efficiency and performance of training RL neural networks. Our theoretical results demonstrate that HyperController accelerates the training phase while also consistently improving the neural network's performance.
Committee Chair
Bruno Sinopoli
Committee Members
Andrew Clark; Fabio Pasqualetti; Ioannis Kantaros; Neal Patwari
Degree
Doctor of Philosophy (PhD)
Author's Department
Electrical & Systems Engineering
Document Type
Dissertation
Date of Award
8-18-2025
Language
English (en)
DOI
https://doi.org/10.7936/vfbq-mc43
Recommended Citation
Gornet, Jonathan, "A Control Theoretic Approach to the Stochastic Multi-armed Bandit Problem With Applications in Hyperparameter Optimization" (2025). McKelvey School of Engineering Theses & Dissertations. 1273.
The definitive version is available at https://doi.org/10.7936/vfbq-mc43