Keeping up with Fraud: An Active Learning Approach for Imbalanced Non-Stationary Data Streams

dc.contributor.advisorKachergis, G.E.
dc.contributor.advisorKrempl, G.
dc.contributor.advisorWit, M. de
dc.contributor.authorKemper, D.
dc.date.issued2017-07-13
dc.description.abstractFraud detection is a difficult task in which multiple problems co-occur. The data often comes from a non-stationary stream. Moreover, correct labels are available for only a small part of the data and fraudulent cases are much more rare than non-fraudulent cases. A promising technique for solving this combination of problems is active learning, where instances are selected for labeling such that the classifi er can learn the most. Previously, the critical sampling strategy has been proposed, that selects instances close to the decision boundary and oversamples fraudulent cases. The current project suggested an extension to this strategy that also explores full input space. These strategies were compared to state-of-the-art active learning strategies, using a new data stream sampled from the KDD'99 dataset, implemented in Massive Online Analysis (MOA). It was found that the original critical sampling algorithm does not perform better than random sampling, as has been found previously. An explanation could be that critical sampling induces a sampling bias, specifically if minority data comes from multiple dense and sparse areas in input space. In further research, this sampling bias could be overcome by combining critical sampling with a clustering- or diversity-based approach.en_US
dc.identifier.urihttp://theses.ubn.ru.nl/handle/123456789/5237
dc.language.isoenen_US
dc.thesis.facultyFaculteit der Sociale Wetenschappenen_US
dc.thesis.specialisationMaster Artificial Intelligenceen_US
dc.thesis.studyprogrammeArtificial Intelligenceen_US
dc.thesis.typeMasteren_US
dc.titleKeeping up with Fraud: An Active Learning Approach for Imbalanced Non-Stationary Data Streamsen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Kemper, D._MSc_Thesis_2017.pdf
Size:
4.58 MB
Format:
Adobe Portable Document Format
Description:
Thesis text