Distributions of Cognates in Europe Based on the Levenshtein Distance
Keywords
Loading...
Authors
Issue Date
2008-12-19
Language
en
Document type
Journal Title
Journal ISSN
Volume Title
Publisher
Title
ISSN
Volume
Issue
Startpage
Endpage
DOI
Abstract
We applied the Levenshtein distance on a professional translation database (extracted
from Euroglot professional 5.0) in order to identify distributions of cognates in 6
European languages. Using the Rosetta schemes of Grootjen (2008) for database
interaction, we classified translation pairs as cognates if a score for orthographic
overlap based on the Levenshtein distance was above a motivated threshold. Semantic
overlap was determined using the conceptual structure of the database. Differences
between cognate distributions across languages were found to be similar to validation
studies on language similarity ordering. In addition, numbers of translations,
proportions of form-identical to form-similar cognates, and proportions of formidentical
false friends to form-identical cognates were compared between languages.
We show that these new techniques from artificial intelligence can facilitate the
selection of stimulus materials for psycholinguistic cognate and false friend research,
and can assess language similarity ordering between the analyzed languages: English,
German, French, Spanish, Italian, and Dutch.
Description
Citation
Supervisor
Faculty
Faculteit der Sociale Wetenschappen