Named Entity Recognition and the Police Registration Application Summ-IT
Keywords
Loading...
Authors
Issue Date
2022-07-01
Language
en
Document type
Journal Title
Journal ISSN
Volume Title
Publisher
Title
ISSN
Volume
Issue
Startpage
Endpage
DOI
Abstract
This research investigated the use of Named Entity Recognition as a tool to assist the Dutch
national police in identifying entities in documents in the police registration software Summ-
IT. This with the aim of both lowering the workload as well as increasing the data quality.
Through several means, a better understanding about the requirements and expectations of
the police organisation with respect to this problem was achieved. These insights resulted
in the use of the Natural Language Processing toolkit SpaCy as the dedicated platform. As
SpaCy offers a broad range of functionalities, different approaches are adopted and compared
in this research to not only investigate its viability within the Dutch police department but
also further analyse the need for domain-specific annotated data. As this can be a costly
resource to acquire, it is valuable to evaluate the potential performance increase. Five entities
are considered in this research: ‘PERSON’ (person), ‘GPE’ (location), ‘ORG’ (organisation),
‘COM’ (means of communication) and ‘VEH’ (vehicles). The former three entities also appear
regularly in other (more general) domains while the latter two entities are considered domainspecific
to the Dutch national police. As there is no labelled data available, the extracted
entities had to automatically be located back in the text documents. The process results in
F1-scores of around 0.9 or higher with the exception of the ‘ORG’ entity which achieved a
significantly lower F1-score of 0.6667. Three different approaches were implemented, of which
one did not require any domain-specific annotated data. The final results show that the
approach that uses a domain-specific model in combination with a pre-trained language model
overall outperforms the other approaches. This indicates that domain-specific data is of added
value and further increases the performance. None of the implemented approaches match the
needs and expectations that are expressed by employees of the Dutch national police. This,
however, does not implicate that this research has failed as it still provides valuable insights
into the problem space that is Named Entity Recognition and potentially paves the way for
an acceptable future solution.
Description
Citation
Faculty
Faculteit der Sociale Wetenschappen