Deep Reinforcement Learning of Active Sensing Strategies with POMDPs
is research aims to investigate the development and performance of arti cial neural networks which are capable of extracting task related information from an image using an active sensing strategy. Similar to the way humans deploy an active sensing strategy for the planning of eye movements. Research suggests that humans can learn such a strategy by strengthening eye movements that have led to the extraction of more valuable information. In that sense the goal of active sensing is to maximize the overall value of extracted information. By framing active sensing as a Partially Observable Markov Decision Process (POMDP), it can be transformed into an optimization problem suited for Reinforcement Learning. e goal in this setting is to maximize task performance with only limited information. Because the true state of the environment cannot be observed, any task related action must be based on beliefs about the actual state. Accuracy of these belief states can be increased by extracting more valuable information. Finding an exact solution for this POMDP is intractable and must be approximated. To nd good approximations, several recurrent neural networks with di erent architectures and con gurations are trained using Policy Gradient methods. Training and validating those networks is done using challenging tasks that are designed to cover di erent important aspects of scene understanding. Since active sensing is a combination of planning focus locations and task performance, these two components will be further investigated. Finally the relevance of these results for understanding human or animal eye movements will be discussed.
Faculteit der Sociale Wetenschappen