Abstract
Reinforcement learning agents often struggle in tasks with sparse or delayed rewards, since they receive little guidance about which actions to pursue. This thesis investigates how adding intrinsic rewards can help address that issue. We focus on three main methods: Count-Based bonuses, where states are hashed and infrequent states receive higher rewards; Random Network Distillation (RND), where a predictor network learns to match the output of a fixed random target; and the Intrinsic Curiosity Module (ICM), which uses an inverse and a forward model to highlight transitions the agent cannot yet predict.
We implement these approaches under a single Proximal Policy Optimization (PPO) framework and evaluate them on three environments: Cartpole, MiniGrid, and Breakout. The results show that each method excels under certain conditions. Count-Based exploration performs well when the state space can be discretized effectively, while RND and ICM scale better to complex or pixel-based domains. However, none of the methods is universally dominant. The thesis concludes that factors such as dimensionality, reward structure, and the ease of hashing or modeling states should guide the choice of exploration bonus in sparse-reward reinforcement learning.
Committee Chair
Bruno Sinopoli
Committee Members
Yiannis Kantaros Vladimir Kurenok
Degree
Master of Science (MS)
Author's Department
Electrical & Systems Engineering
Document Type
Thesis
Date of Award
Spring 5-7-2025
Language
English (en)
DOI
https://doi.org/10.7936/n8dd-rk98
Recommended Citation
Li, Chengyu, "Comparative Analysis of Intrinsic Reward-Based Reinforcement Learning Algorithms" (2025). McKelvey School of Engineering Theses & Dissertations. 1219.
The definitive version is available at https://doi.org/10.7936/n8dd-rk98