Optimizing the exploration-exploitation trade-off of Lock-in Feedback
Optimizing the exploration-exploitation trade-off of Lock-in Feedback
Keywords
Authors
Date
2016-02-19
Language
en
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Over the years several strategies to solve bandit prob-
lems have been discovered and examined. Strategies
that deal with continuum-armed bandit problems,
which is a variant of a bandit problem, are less fre-
quently researched. In this study Lock-in Feedback
(LiF), a strategy that deals with continuum-armed
bandit problems is optimized. The main advantage
of LiF over other continuum-armed bandit strategies
is the capability to deal with concept drift. How-
ever, the oscillation needed to detect the concept drift
makes LiF less e cient. The aim of this study is to
adapt LiF in such a way, that LiF is still able to
detect concept drift, with using less oscillations. So
the main research questions that will be answered is:
Could we adapt the policy of LiF such that it needs
fewer oscillations to detect concept drift in order to
reduce its linear regret? Two simulation studies are
done to answer this research question. In the second
study di erent "stabilization policies" were tested.
The aim of the stabilization polices was to detect
concept drift with the use of less oscillations. The
results of this study show that the stabilization poli-
cies are both able to detect concept drift, but more
research should be done to increase the accuracy of
these stabilization policies.
Description
Citation
Supervisor
Faculty
Faculteit der Sociale Wetenschappen