Keeping up with Fraud: An Active Learning Approach for Imbalanced Non-Stationary Data Streams

Keywords
Loading...
Thumbnail Image
Authors
Issue Date
2017-07-13
Language
en
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Fraud detection is a difficult task in which multiple problems co-occur. The data often comes from a non-stationary stream. Moreover, correct labels are available for only a small part of the data and fraudulent cases are much more rare than non-fraudulent cases. A promising technique for solving this combination of problems is active learning, where instances are selected for labeling such that the classifi er can learn the most. Previously, the critical sampling strategy has been proposed, that selects instances close to the decision boundary and oversamples fraudulent cases. The current project suggested an extension to this strategy that also explores full input space. These strategies were compared to state-of-the-art active learning strategies, using a new data stream sampled from the KDD'99 dataset, implemented in Massive Online Analysis (MOA). It was found that the original critical sampling algorithm does not perform better than random sampling, as has been found previously. An explanation could be that critical sampling induces a sampling bias, specifically if minority data comes from multiple dense and sparse areas in input space. In further research, this sampling bias could be overcome by combining critical sampling with a clustering- or diversity-based approach.
Description
Citation
Faculty
Faculteit der Sociale Wetenschappen