Abstract

Reinforcement learning agents often struggle in tasks with sparse or delayed rewards, since they receive little guidance about which actions to pursue. This thesis investigates how adding intrinsic rewards can help address that issue. We focus on three main methods: Count-Based bonuses, where states are hashed and infrequent states receive higher rewards; Random Network Distillation (RND), where a predictor network learns to match the output of a fixed random target; and the Intrinsic Curiosity Module (ICM), which uses an inverse and a forward model to highlight transitions the agent cannot yet predict.

We implement these approaches under a single Proximal Policy Optimization (PPO) framework and evaluate them on three environments: Cartpole, MiniGrid, and Breakout. The results show that each method excels under certain conditions. Count-Based exploration performs well when the state space can be discretized effectively, while RND and ICM scale better to complex or pixel-based domains. However, none of the methods is universally dominant. The thesis concludes that factors such as dimensionality, reward structure, and the ease of hashing or modeling states should guide the choice of exploration bonus in sparse-reward reinforcement learning.

Committee Chair

Bruno Sinopoli

Committee Members

Yiannis Kantaros Vladimir Kurenok

Degree

Master of Science (MS)

Author's Department

Electrical & Systems Engineering

Author's School

McKelvey School of Engineering

Document Type

Thesis

Date of Award

Spring 5-7-2025

Language

English (en)

Included in

Engineering Commons

Share

COinS