Off-the-shelf benchmarking of Dutch ASR systems for use in child health clinics

Keywords

Loading...
Thumbnail Image

Issue Date

2025-11-28

Language

en

Document type

Journal Title

Journal ISSN

Volume Title

Publisher

Title

ISSN

Volume

Issue

Startpage

Endpage

DOI

Abstract

To assess the suitability of off-the-shelf automatic speech recognition (ASR) for Dutch child health clinics, we compared nine models (AWS Transcribe, seven Whisper variants, and two Wav2Vec 2.0 variants) across four met rics: robustness, hallucination tendency, medical terminology, and process ing time. Word error rates were analyzed by demographic, accent, and medi cal terms, hallucinations were tracked under silence or background noise, and processing times were measured across conditions. AWS Transcribe and the larger Whisper models achieved the highest accuracy, while smaller Whis per models were prone to hallucinations and repetition, and Wav2Vec 2.0 models produced phonetically plausible but inaccurate outputs. All models struggled more with children and non-native speakers, and a clear trade-off emerged between speed and accuracy: Whisper models slowed as accuracy increased, Wav2Vec 2.0 was consistently fast, and AWS Transcribe, though most accurate, had high and variable processing times. No single model was universally optimal, indicating deployment should balance accuracy, speed, and robustness to meet clinical needs. Keywords: automatic speech recognition, Dutch, bias, hallucination, healthcare.

Description

Citation

Faculty

Faculteit der Sociale Wetenschappen