Guided belief updates in Deep Bayesian Meta-Reinforcement Learning

Keywords
Loading...
Thumbnail Image
Issue Date
2022-09-01
Language
en
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Balancing exploration and exploitation is a key challenge of reinforcement learning. The Bayes-adaptive policy finds this optimal balance by conditioning on a posterior belief over reward and transition function. The current state-of-the-art approach, VariBad, attempts to meta-train a recurrent neural network to perform approximate Bayesian inference over the posterior belief. Observing the posterior variance reveals behavior dissimilar to exact posterior updates. Therefore it appears that learning the desired behavior entirely a posteriori from data is problematic. Hence, this work provides the belief inference model of a Bayesian RL agent with Bayesian inference mechanics a priori and investigate how this influences performance.
Description
Citation
Faculty
Faculteit der Sociale Wetenschappen