Distributions of Cognates in Europe Based on the Levenshtein Distance
We applied the Levenshtein distance on a professional translation database (extracted from Euroglot professional 5.0) in order to identify distributions of cognates in 6 European languages. Using the Rosetta schemes of Grootjen (2008) for database interaction, we classified translation pairs as cognates if a score for orthographic overlap based on the Levenshtein distance was above a motivated threshold. Semantic overlap was determined using the conceptual structure of the database. Differences between cognate distributions across languages were found to be similar to validation studies on language similarity ordering. In addition, numbers of translations, proportions of form-identical to form-similar cognates, and proportions of formidentical false friends to form-identical cognates were compared between languages. We show that these new techniques from artificial intelligence can facilitate the selection of stimulus materials for psycholinguistic cognate and false friend research, and can assess language similarity ordering between the analyzed languages: English, German, French, Spanish, Italian, and Dutch.
Faculteit der Sociale Wetenschappen