Named Entity Recognition and the Police Registration Application Summ-IT

Keywords

Loading...
Thumbnail Image

Issue Date

2022-07-01

Language

en

Document type

Journal Title

Journal ISSN

Volume Title

Publisher

Title

ISSN

Volume

Issue

Startpage

Endpage

DOI

Abstract

This research investigated the use of Named Entity Recognition as a tool to assist the Dutch national police in identifying entities in documents in the police registration software Summ- IT. This with the aim of both lowering the workload as well as increasing the data quality. Through several means, a better understanding about the requirements and expectations of the police organisation with respect to this problem was achieved. These insights resulted in the use of the Natural Language Processing toolkit SpaCy as the dedicated platform. As SpaCy offers a broad range of functionalities, different approaches are adopted and compared in this research to not only investigate its viability within the Dutch police department but also further analyse the need for domain-specific annotated data. As this can be a costly resource to acquire, it is valuable to evaluate the potential performance increase. Five entities are considered in this research: ‘PERSON’ (person), ‘GPE’ (location), ‘ORG’ (organisation), ‘COM’ (means of communication) and ‘VEH’ (vehicles). The former three entities also appear regularly in other (more general) domains while the latter two entities are considered domainspecific to the Dutch national police. As there is no labelled data available, the extracted entities had to automatically be located back in the text documents. The process results in F1-scores of around 0.9 or higher with the exception of the ‘ORG’ entity which achieved a significantly lower F1-score of 0.6667. Three different approaches were implemented, of which one did not require any domain-specific annotated data. The final results show that the approach that uses a domain-specific model in combination with a pre-trained language model overall outperforms the other approaches. This indicates that domain-specific data is of added value and further increases the performance. None of the implemented approaches match the needs and expectations that are expressed by employees of the Dutch national police. This, however, does not implicate that this research has failed as it still provides valuable insights into the problem space that is Named Entity Recognition and potentially paves the way for an acceptable future solution.

Description

Citation

Faculty

Faculteit der Sociale Wetenschappen