Keeping up with Fraud: An Active Learning Approach for Imbalanced Non-Stationary Data Streams

Keywords

Loading...
Thumbnail Image

Authors

Issue Date

2017-07-13

Language

en

Document type

Journal Title

Journal ISSN

Volume Title

Publisher

Title

ISSN

Volume

Issue

Startpage

Endpage

DOI

Abstract

Fraud detection is a difficult task in which multiple problems co-occur. The data often comes from a non-stationary stream. Moreover, correct labels are available for only a small part of the data and fraudulent cases are much more rare than non-fraudulent cases. A promising technique for solving this combination of problems is active learning, where instances are selected for labeling such that the classifi er can learn the most. Previously, the critical sampling strategy has been proposed, that selects instances close to the decision boundary and oversamples fraudulent cases. The current project suggested an extension to this strategy that also explores full input space. These strategies were compared to state-of-the-art active learning strategies, using a new data stream sampled from the KDD'99 dataset, implemented in Massive Online Analysis (MOA). It was found that the original critical sampling algorithm does not perform better than random sampling, as has been found previously. An explanation could be that critical sampling induces a sampling bias, specifically if minority data comes from multiple dense and sparse areas in input space. In further research, this sampling bias could be overcome by combining critical sampling with a clustering- or diversity-based approach.

Description

Citation

Faculty

Faculteit der Sociale Wetenschappen