Matching ánd Maximizing? A neurally plausible model of stochastic reinforcement learning
Keywords
Loading...
Authors
Issue Date
2009-06-09
Language
en
Document type
Journal Title
Journal ISSN
Volume Title
Publisher
Title
ISSN
Volume
Issue
Startpage
Endpage
DOI
Abstract
An influential model of how reinforcement learning occurs in human brains is the one pioneered by Suri and Schultz [6]. The model was originally designed and tested for learning tasks in deterministic environments. This paper investigates if and how the model can be extended to also apply to learning in stochastic environments. It is known that if rewards are probabilistically coupled to actions that humans tend to display a suboptimal type of behavior, called matching, where the probability of selecting a given
response equals the probability of a reward for that response. Animal experiments suggest that humans are unique in this respect. That is, non-human animals display an optimal type of behavior, called maximizing, where responses with the maximum probability of a reward are consistently selected. We first show that the model in its original form becomes inert when confronted with a stochastic environment.
We then consider two natural adjustments to the model and observe that one of them leads to matching behavior and the other leads to maximizing behavior. The results yield a deeper insight in the workings of the model and may provide a basis for a better understanding of the learning mechanisms implemented by human and animal brains.
Description
Citation
Faculty
Faculteit der Sociale Wetenschappen