Guided belief updates in Deep Bayesian Meta-Reinforcement Learning
Keywords
Loading...
Authors
Issue Date
2022-09-01
Language
en
Document type
Journal Title
Journal ISSN
Volume Title
Publisher
Title
ISSN
Volume
Issue
Startpage
Endpage
DOI
Abstract
Balancing exploration and exploitation is a key challenge of reinforcement learning. The
Bayes-adaptive policy finds this optimal balance by conditioning on a posterior belief over
reward and transition function. The current state-of-the-art approach, VariBad, attempts
to meta-train a recurrent neural network to perform approximate Bayesian inference over
the posterior belief. Observing the posterior variance reveals behavior dissimilar to exact
posterior updates. Therefore it appears that learning the desired behavior entirely a posteriori
from data is problematic. Hence, this work provides the belief inference model of
a Bayesian RL agent with Bayesian inference mechanics a priori and investigate how this
influences performance.
Description
Citation
Supervisor
Faculty
Faculteit der Sociale Wetenschappen