Interactive Reinformcement Learning; Two successful solutions for handling an abundance of positive feedback

Waa, J.S. van der

Interactive Reinformcement Learning; Two successful solutions for handling an abundance of positive feedback

Files

Waa van der, J._MA_Thesis_2015.pdf (1.03 MB)

Waa van der, J._MA_Thesis_Annex_2015.pdf (1.94 MB)

Authors

Waa, J.S. van der

Issue Date

2015-07-13

Language

en

URI

http://theses.ubn.ru.nl/handle/123456789/230

Abstract

The field of interactive reinforcement learning focuses on creating a learning method where users can teach an agent how to solve a task by providing feedback on the agent's behavior in an intuitive way. The goal of these agents is to find a behavior that maximizes the positive feedback it receives. As users provide positive feedback for almost every step towards the task's goal, the agent learns that some set of actions result in more positive feedback than others. This set becomes a positive circuit: a set of actions and situations for which the agent learned to expect relatively much positive feedback. The problem here is that the agent will exploit a positive circuit until corrected by the user, even though the circuit may not necessarily solve the task. In this study we propose two novel solutions to this positive circuits problem. Both solutions are new in that they focus on forcing the agent to explore more actions and situations instead of simply exploiting a found positive circuit. The first solution generalizes the feedback given for an action in some situation to situations similar to that one situation. If this feedback is positive, it will motivate the agent to perform this action again, even in unknown situations. The second solution uses a method to detect any repetitive behavior and a method to detect high-risk situations likely to elicit such undesired behavior. If one of these methods triggers, the agent is forced to perform the most recent, best assessed action. Both solutions were tested individually by comparing each to a baseline agent with none of the solutions implemented. Interaction between the two solutions was tested by combining them in one agent. Tests were performed in a grid environment with a simple navigation task. The results showed that both solutions caused the agent to solve the task more often and faster than the baseline agent. The first solution also allowed the agent to learn a task solution that was optimal and effective, independent of were it started the navigation task. Finally, results showed that the forced exploration action of the second solution aids the first solutions generalization in finding such optimal task solutions. This study proved that improving exploratory behavior in an interactive reinforcement learning agent is a valid approach to solve the positive circuits problem. Keywords: Interactive reinforcement learning, Positive circuits problem, Human teachers, machine learning, Function approximation, feedback-based exploration

Supervisor

Kaptein, M.C.

Truong, K.P.

Faculty

Faculteit der Sociale Wetenschappen