Guided belief updates in Deep Bayesian Meta-Reinforcement Learning

Keywords

Loading...
Thumbnail Image

Issue Date

2022-09-01

Language

en

Document type

Journal Title

Journal ISSN

Volume Title

Publisher

Title

ISSN

Volume

Issue

Startpage

Endpage

DOI

Abstract

Balancing exploration and exploitation is a key challenge of reinforcement learning. The Bayes-adaptive policy finds this optimal balance by conditioning on a posterior belief over reward and transition function. The current state-of-the-art approach, VariBad, attempts to meta-train a recurrent neural network to perform approximate Bayesian inference over the posterior belief. Observing the posterior variance reveals behavior dissimilar to exact posterior updates. Therefore it appears that learning the desired behavior entirely a posteriori from data is problematic. Hence, this work provides the belief inference model of a Bayesian RL agent with Bayesian inference mechanics a priori and investigate how this influences performance.

Description

Citation

Faculty

Faculteit der Sociale Wetenschappen