Optimizing the exploration-exploitation trade-off of Lock-in Feedback

Over the years several strategies to solve bandit prob- lems have been discovered and examined. Strategies that deal with continuum-armed bandit problems, which is a variant of a bandit problem, are less fre- quently researched. In this study Lock-in Feedback (LiF), a strategy that deals with continuum-armed bandit problems is optimized. The main advantage of LiF over other continuum-armed bandit strategies is the capability to deal with concept drift. How- ever, the oscillation needed to detect the concept drift makes LiF less e cient. The aim of this study is to adapt LiF in such a way, that LiF is still able to detect concept drift, with using less oscillations. So the main research questions that will be answered is: Could we adapt the policy of LiF such that it needs fewer oscillations to detect concept drift in order to reduce its linear regret? Two simulation studies are done to answer this research question. In the second study di erent "stabilization policies" were tested. The aim of the stabilization polices was to detect concept drift with the use of less oscillations. The results of this study show that the stabilization poli- cies are both able to detect concept drift, but more research should be done to increase the accuracy of these stabilization policies.
