Off-the-shelf benchmarking of Dutch ASR systems for use in child health clinics
Keywords
Loading...
Authors
Issue Date
2025-11-28
Language
en
Document type
Journal Title
Journal ISSN
Volume Title
Publisher
Title
ISSN
Volume
Issue
Startpage
Endpage
DOI
Abstract
To assess the suitability of off-the-shelf automatic speech recognition (ASR)
for Dutch child health clinics, we compared nine models (AWS Transcribe,
seven Whisper variants, and two Wav2Vec 2.0 variants) across four met rics: robustness, hallucination tendency, medical terminology, and process ing time. Word error rates were analyzed by demographic, accent, and medi cal terms, hallucinations were tracked under silence or background noise, and
processing times were measured across conditions. AWS Transcribe and the
larger Whisper models achieved the highest accuracy, while smaller Whis per models were prone to hallucinations and repetition, and Wav2Vec 2.0
models produced phonetically plausible but inaccurate outputs. All models
struggled more with children and non-native speakers, and a clear trade-off
emerged between speed and accuracy: Whisper models slowed as accuracy
increased, Wav2Vec 2.0 was consistently fast, and AWS Transcribe, though
most accurate, had high and variable processing times. No single model was
universally optimal, indicating deployment should balance accuracy, speed,
and robustness to meet clinical needs.
Keywords: automatic speech recognition, Dutch, bias, hallucination,
healthcare.
Description
Citation
Faculty
Faculteit der Sociale Wetenschappen
