Interactive Reinformcement Learning; Two successful solutions for handling an abundance of positive feedback

Waa, J.S. van der

Interactive Reinformcement Learning; Two successful solutions for handling an abundance of positive feedback

dc.contributor.advisor	Kaptein, M.C.
dc.contributor.advisor	Truong, K.P.
dc.contributor.author	Waa, J.S. van der
dc.date.issued	2015-07-13
dc.description.abstract	The field of interactive reinforcement learning focuses on creating a learning method where users can teach an agent how to solve a task by providing feedback on the agent's behavior in an intuitive way. The goal of these agents is to find a behavior that maximizes the positive feedback it receives. As users provide positive feedback for almost every step towards the task's goal, the agent learns that some set of actions result in more positive feedback than others. This set becomes a positive circuit: a set of actions and situations for which the agent learned to expect relatively much positive feedback. The problem here is that the agent will exploit a positive circuit until corrected by the user, even though the circuit may not necessarily solve the task. In this study we propose two novel solutions to this positive circuits problem. Both solutions are new in that they focus on forcing the agent to explore more actions and situations instead of simply exploiting a found positive circuit. The first solution generalizes the feedback given for an action in some situation to situations similar to that one situation. If this feedback is positive, it will motivate the agent to perform this action again, even in unknown situations. The second solution uses a method to detect any repetitive behavior and a method to detect high-risk situations likely to elicit such undesired behavior. If one of these methods triggers, the agent is forced to perform the most recent, best assessed action. Both solutions were tested individually by comparing each to a baseline agent with none of the solutions implemented. Interaction between the two solutions was tested by combining them in one agent. Tests were performed in a grid environment with a simple navigation task. The results showed that both solutions caused the agent to solve the task more often and faster than the baseline agent. The first solution also allowed the agent to learn a task solution that was optimal and effective, independent of were it started the navigation task. Finally, results showed that the forced exploration action of the second solution aids the first solutions generalization in finding such optimal task solutions. This study proved that improving exploratory behavior in an interactive reinforcement learning agent is a valid approach to solve the positive circuits problem. Keywords: Interactive reinforcement learning, Positive circuits problem, Human teachers, machine learning, Function approximation, feedback-based exploration	en_US
dc.identifier.uri	http://theses.ubn.ru.nl/handle/123456789/230
dc.language.iso	en	en_US
dc.thesis.faculty	Faculteit der Sociale Wetenschappen	en_US
dc.thesis.specialisation	Master Artificial Intelligence	en_US
dc.thesis.studyprogramme	Artificial Intelligence	en_US
dc.thesis.type	Master	en_US
dc.title	Interactive Reinformcement Learning; Two successful solutions for handling an abundance of positive feedback	en_US

Files

Original bundle

Now showing 1 - 2 of 2

Name:: Waa van der, J._MA_Thesis_2015.pdf
Size:: 1.03 MB
Format:: Adobe Portable Document Format
Description:: Scriptietekst

Download

Name:: Waa van der, J._MA_Thesis_Annex_2015.pdf
Size:: 1.94 MB
Format:: Adobe Portable Document Format
Description:: Scriptietekst annex

Download

Collections

Faculteit der Sociale Wetenschappen