Predictive Processing in Proximal Policy Optimization

Keywords
Loading...
Thumbnail Image
Issue Date
2021-04-26
Language
en
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Advances in reinforcement learning have led to drastically more complex agents that require unrealistic amounts of compute resources. The human brain often achieves comparable results with a fraction of the energy requirements of these models. Therefore, we turn to insights from neuroscience on predictive processing and e cient coding to design an e cient agent. We present the Predictive Processing Proximal Policy Optimization (P4O) agent, an actor-critic reinforcement learning agent that applies predictive processing to a recurrent variant of the PPO algorithm by integrating a world model in its hidden state. The prediction error that results from subtracting the encoded observed state from the world model prediction is used as the primary signal in our model. We demonstrate that with this approach, predictive processing with a world model can be incorporated while reducing a model's biologically analogous energy footprint, thus supporting the e cient coding hypothesis. When we use the encoded state information only to inhibit the recurrent connections, rather than providing the prediction error separately as input, the number of neurons in the model can be drastically reduced. Moreover, this approach encourages activations in the model to remain centered around the zero point, analogous to a lower spiking rate in a biological system and reduced energy usage. Furthermore, the P4O agent far outperforms the original PPO algorithm on the Seaquest environment while retaining its e ciency and can be run on a single GPU. It also outperforms other model-based and model-free state-of-the-art single GPU agents on Seaquest given the same wall-clock time and exceeds human gamer performance in an initial performance comparison. Future research could extend the agent with additional uses of its world model, improve its performance through tuning, inspect its neural coding of competing goals on di erent timescales or investigate the use of our approach in modeling brain function in various scenarios. Altogether, our work underlines the synergistic bene ts of the convergence of insights from the elds of neuroscience, arti cial intelligence and cognitive science.
Description
Citation
Supervisor
Faculty
Faculteit der Sociale Wetenschappen