Keeping up with Fraud: An Active Learning Approach for Imbalanced Non-Stationary Data Streams
Keywords
Loading...
Authors
Issue Date
2017-07-13
Language
en
Document type
Journal Title
Journal ISSN
Volume Title
Publisher
Title
ISSN
Volume
Issue
Startpage
Endpage
DOI
Abstract
Fraud detection is a difficult task in which multiple problems co-occur.
The data often comes from a non-stationary stream. Moreover, correct
labels are available for only a small part of the data and fraudulent cases
are much more rare than non-fraudulent cases. A promising technique for
solving this combination of problems is active learning, where instances are
selected for labeling such that the classifi er can learn the most. Previously,
the critical sampling strategy has been proposed, that selects instances
close to the decision boundary and oversamples fraudulent cases. The
current project suggested an extension to this strategy that also explores
full input space. These strategies were compared to state-of-the-art active
learning strategies, using a new data stream sampled from the KDD'99
dataset, implemented in Massive Online Analysis (MOA). It was found
that the original critical sampling algorithm does not perform better than
random sampling, as has been found previously. An explanation could
be that critical sampling induces a sampling bias, specifically if minority
data comes from multiple dense and sparse areas in input space. In further
research, this sampling bias could be overcome by combining critical
sampling with a clustering- or diversity-based approach.
Description
Citation
Supervisor
Faculty
Faculteit der Sociale Wetenschappen