Neural Network Models of Reversal Learning in Nonhuman Primates
Keywords
Loading...
Authors
Issue Date
2020-11-01
Language
en
Document type
Journal Title
Journal ISSN
Volume Title
Publisher
Title
ISSN
Volume
Issue
Startpage
Endpage
DOI
Abstract
Primate decision making involves the activity of multiple brain regions, the distinct role of each of which
can be dissected using the appropriate reversal learning tasks in combination with electrophysiological
recordings in the regions of interest. In a reversal-learning task, the subject has to pick from among
two presented objects the one that contains the current target feature, which is then switched at a ran-
dom trial to a new feature. We constructed a recurrent network based on Long-Short Term Memory
(LSTM) neurons, trained by reinforcement learning algorithms to learn to perform several variations of
the reversal-learning task, including both probabilistic as well as deterministic reward schedules, and
evaluated the model's choices and the emerging stimulus/target representation.
The models produce meta-learning curves that are similar to those obtained in animal experiments, with
the time to reach criterion performance increasing with the number of feature dimensions present in the
objects and task di culty. We further found that the training of the network proceeded via a sequence
of rapid increases in performance, each increase re
ecting the learning of a new feature dimension as a
possible target, followed by plateaus with slow changes in performance. The model's learning perfor-
mance within a block was re
ected in the current-target discriminability of neural population activity,
which when processed by dimension reduction procedures followed by tting supporting vector machine
(SVM) classi ers to decode the target, had a similar time course.
The model's meta-learning behavior could be tted with Rescorla-Wagner (RW) type models, that weigh
positive prediction errors ( +) di erently from negative ones ( ). For tasks that had a target rever-
sal the + decreased and increased compared to tasks without reversal. This explains the resulting
model's higher sensitivity to unrewarded trials and slower learning of the current target. The probabilis-
tic reward version of the task was t with even lower values of these weights compared with the task
with deterministic rewards, consistent with expectation.
We evaluated to what extent models generalized by testing them on tasks on which they were not trained.
When the model was trained on tasks with either probabilistic rewards or with a reversal, it performed
well on a task with deterministic reward or without a reversal, respectively, but for the reverse case, when
the training task with exchanged with the test task this was not the case. We also tested the robustness
of the model by either providing or omitting multiple rewards in a row irrespective of the choice. The
model trained on deterministic rewards and no reversal was robust against this as it ceased learning and
would persevere with choosing the same target, the other models were less robust.
2
Taken together, we developed a model that can learn a reversal-learning task with probabilistic reward,
and behaves similar to experiments with human and non-human primates as it can be described by a
RW model. It makes a prediction for the population activity during learning which can be compared to
neural activity recorded in subjects performing this task which is invaluable in guiding future theoretical
and experimental studies.
Description
Citation
Faculty
Faculteit der Sociale Wetenschappen