Predictive Processing in Proximal Policy Optimization
Predictive Processing in Proximal Policy Optimization
Keywords
Authors
Date
2021-04-26
Language
en
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Advances in reinforcement learning have led to drastically more complex agents that require
unrealistic amounts of compute resources. The human brain often achieves comparable results with a
fraction of the energy requirements of these models. Therefore, we turn to insights from neuroscience
on predictive processing and e cient coding to design an e cient agent. We present the Predictive
Processing Proximal Policy Optimization (P4O) agent, an actor-critic reinforcement learning agent
that applies predictive processing to a recurrent variant of the PPO algorithm by integrating a world
model in its hidden state. The prediction error that results from subtracting the encoded observed
state from the world model prediction is used as the primary signal in our model. We demonstrate
that with this approach, predictive processing with a world model can be incorporated while reducing
a model's biologically analogous energy footprint, thus supporting the e cient coding hypothesis.
When we use the encoded state information only to inhibit the recurrent connections, rather than
providing the prediction error separately as input, the number of neurons in the model can be
drastically reduced. Moreover, this approach encourages activations in the model to remain centered
around the zero point, analogous to a lower spiking rate in a biological system and reduced energy
usage. Furthermore, the P4O agent far outperforms the original PPO algorithm on the Seaquest
environment while retaining its e ciency and can be run on a single GPU. It also outperforms other
model-based and model-free state-of-the-art single GPU agents on Seaquest given the same wall-clock
time and exceeds human gamer performance in an initial performance comparison. Future research
could extend the agent with additional uses of its world model, improve its performance through
tuning, inspect its neural coding of competing goals on di erent timescales or investigate the use of
our approach in modeling brain function in various scenarios. Altogether, our work underlines the
synergistic bene ts of the convergence of insights from the elds of neuroscience, arti cial intelligence
and cognitive science.
Description
Citation
Supervisor
Faculty
Faculteit der Sociale Wetenschappen