Neural Network Models of Reversal Learning in Nonhuman Primates

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
Primate decision making involves the activity of multiple brain regions, the distinct role of each of which can be dissected using the appropriate reversal learning tasks in combination with electrophysiological recordings in the regions of interest. In a reversal-learning task, the subject has to pick from among two presented objects the one that contains the current target feature, which is then switched at a ran- dom trial to a new feature. We constructed a recurrent network based on Long-Short Term Memory (LSTM) neurons, trained by reinforcement learning algorithms to learn to perform several variations of the reversal-learning task, including both probabilistic as well as deterministic reward schedules, and evaluated the model's choices and the emerging stimulus/target representation. The models produce meta-learning curves that are similar to those obtained in animal experiments, with the time to reach criterion performance increasing with the number of feature dimensions present in the objects and task di culty. We further found that the training of the network proceeded via a sequence of rapid increases in performance, each increase re ecting the learning of a new feature dimension as a possible target, followed by plateaus with slow changes in performance. The model's learning perfor- mance within a block was re ected in the current-target discriminability of neural population activity, which when processed by dimension reduction procedures followed by tting supporting vector machine (SVM) classi ers to decode the target, had a similar time course. The model's meta-learning behavior could be tted with Rescorla-Wagner (RW) type models, that weigh positive prediction errors ( +) di erently from negative ones ( 􀀀). For tasks that had a target rever- sal the + decreased and 􀀀 increased compared to tasks without reversal. This explains the resulting model's higher sensitivity to unrewarded trials and slower learning of the current target. The probabilis- tic reward version of the task was t with even lower values of these weights compared with the task with deterministic rewards, consistent with expectation. We evaluated to what extent models generalized by testing them on tasks on which they were not trained. When the model was trained on tasks with either probabilistic rewards or with a reversal, it performed well on a task with deterministic reward or without a reversal, respectively, but for the reverse case, when the training task with exchanged with the test task this was not the case. We also tested the robustness of the model by either providing or omitting multiple rewards in a row irrespective of the choice. The model trained on deterministic rewards and no reversal was robust against this as it ceased learning and would persevere with choosing the same target, the other models were less robust. 2 Taken together, we developed a model that can learn a reversal-learning task with probabilistic reward, and behaves similar to experiments with human and non-human primates as it can be described by a RW model. It makes a prediction for the population activity during learning which can be compared to neural activity recorded in subjects performing this task which is invaluable in guiding future theoretical and experimental studies.
Faculteit der Sociale Wetenschappen