Navigating Aperiodicity: Challenges of Reinforcement Learning
Keywords
Loading...
Authors
Issue Date
2024-08-05
Language
en
Document type
Journal Title
Journal ISSN
Volume Title
Publisher
Title
ISSN
Volume
Issue
Startpage
Endpage
DOI
Abstract
This thesis investigates the application of reinforcement learning (RL) in environments
with uncertain action spaces, specifically focusing on the Penrose P3 tiling environment.
Traditional RL approaches fail in such situations due to the violation of the Markov
property, caused by the variable action space. To address this, I employ proximal pol icy optimisation (PPO) and explore various neural network architectures as the acting
policy to allow for the integration of contextual information. Specifically, I use various
recurrent network architectures and a Transformer encoder to study the influence of in corporating the agent’s past trajectory into the policy optimisation process. Additionally,
I evaluate the impact of attention mechanisms and positional embeddings on convergence
rates and attention scores. Through extensive experiments, I analyse the performance of
these architectures in navigating the aperiodic and non-Markovian Penrose P3 environ ment. The findings reveal that all model architectures use the context to enhance their
decision-making. The attention scores show that the local context matters particularly.
Augmenting the agent’s context with positional embeddings helps, particularly with the
Transformer’s convergence speed, but also revealed interesting artefacts from the P3 en vironment in the other architecture’s attention scores. This research contributes to the
understanding of how context can help RL algorithms to navigate aperiodicity and pro vides insights into designing RL agents for complex real-world applications with dynamic
action spaces.
Description
Citation
Supervisor
Faculty
Faculteit der Sociale Wetenschappen
