Navigating Aperiodicity: Challenges of Reinforcement Learning

Keywords

Loading...
Thumbnail Image

Issue Date

2024-08-05

Language

en

Document type

Journal Title

Journal ISSN

Volume Title

Publisher

Title

ISSN

Volume

Issue

Startpage

Endpage

DOI

Abstract

This thesis investigates the application of reinforcement learning (RL) in environments with uncertain action spaces, specifically focusing on the Penrose P3 tiling environment. Traditional RL approaches fail in such situations due to the violation of the Markov property, caused by the variable action space. To address this, I employ proximal pol icy optimisation (PPO) and explore various neural network architectures as the acting policy to allow for the integration of contextual information. Specifically, I use various recurrent network architectures and a Transformer encoder to study the influence of in corporating the agent’s past trajectory into the policy optimisation process. Additionally, I evaluate the impact of attention mechanisms and positional embeddings on convergence rates and attention scores. Through extensive experiments, I analyse the performance of these architectures in navigating the aperiodic and non-Markovian Penrose P3 environ ment. The findings reveal that all model architectures use the context to enhance their decision-making. The attention scores show that the local context matters particularly. Augmenting the agent’s context with positional embeddings helps, particularly with the Transformer’s convergence speed, but also revealed interesting artefacts from the P3 en vironment in the other architecture’s attention scores. This research contributes to the understanding of how context can help RL algorithms to navigate aperiodicity and pro vides insights into designing RL agents for complex real-world applications with dynamic action spaces.

Description

Citation

Faculty

Faculteit der Sociale Wetenschappen