Interactive Reinformcement Learning; Two successful solutions for handling an abundance of positive feedback
Keywords
Loading...
Authors
Issue Date
2015-07-13
Language
en
Document type
Journal Title
Journal ISSN
Volume Title
Publisher
Title
ISSN
Volume
Issue
Startpage
Endpage
DOI
Abstract
The field of interactive reinforcement learning focuses on creating a learning method where users can teach
an agent how to solve a task by providing feedback on the agent's behavior in an intuitive way. The goal of
these agents is to find a behavior that maximizes the positive feedback it receives. As users provide positive
feedback for almost every step towards the task's goal, the agent learns that some set of actions result in more
positive feedback than others. This set becomes a positive circuit: a set of actions and situations for which the
agent learned to expect relatively much positive feedback. The problem here is that the agent will exploit a
positive circuit until corrected by the user, even though the circuit may not necessarily solve the task. In this
study we propose two novel solutions to this positive circuits problem. Both solutions are new in that they
focus on forcing the agent to explore more actions and situations instead of simply exploiting a found positive
circuit. The first solution generalizes the feedback given for an action in some situation to situations similar
to that one situation. If this feedback is positive, it will motivate the agent to perform this action again, even
in unknown situations. The second solution uses a method to detect any repetitive behavior and a method to
detect high-risk situations likely to elicit such undesired behavior. If one of these methods triggers, the agent is
forced to perform the most recent, best assessed action. Both solutions were tested individually by comparing
each to a baseline agent with none of the solutions implemented. Interaction between the two solutions was
tested by combining them in one agent. Tests were performed in a grid environment with a simple navigation
task. The results showed that both solutions caused the agent to solve the task more often and faster than the
baseline agent. The first solution also allowed the agent to learn a task solution that was optimal and effective,
independent of were it started the navigation task. Finally, results showed that the forced exploration action
of the second solution aids the first solutions generalization in finding such optimal task solutions. This study
proved that improving exploratory behavior in an interactive reinforcement learning agent is a valid approach
to solve the positive circuits problem.
Keywords: Interactive reinforcement learning, Positive circuits problem, Human teachers, machine learning,
Function approximation, feedback-based exploration
Description
Citation
Supervisor
Faculty
Faculteit der Sociale Wetenschappen