Matching ánd Maximizing? A neurally plausible model of stochastic reinforcement learning

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
An influential model of how reinforcement learning occurs in human brains is the one pioneered by Suri and Schultz [6]. The model was originally designed and tested for learning tasks in deterministic environments. This paper investigates if and how the model can be extended to also apply to learning in stochastic environments. It is known that if rewards are probabilistically coupled to actions that humans tend to display a suboptimal type of behavior, called matching, where the probability of selecting a given response equals the probability of a reward for that response. Animal experiments suggest that humans are unique in this respect. That is, non-human animals display an optimal type of behavior, called maximizing, where responses with the maximum probability of a reward are consistently selected. We first show that the model in its original form becomes inert when confronted with a stochastic environment. We then consider two natural adjustments to the model and observe that one of them leads to matching behavior and the other leads to maximizing behavior. The results yield a deeper insight in the workings of the model and may provide a basis for a better understanding of the learning mechanisms implemented by human and animal brains.
Faculteit der Sociale Wetenschappen