Distributions of Cognates in Europe Based on the Levenshtein Distance

Keywords

Loading...
Thumbnail Image

Issue Date

2008-12-19

Language

en

Document type

Journal Title

Journal ISSN

Volume Title

Publisher

Title

ISSN

Volume

Issue

Startpage

Endpage

DOI

Abstract

We applied the Levenshtein distance on a professional translation database (extracted from Euroglot professional 5.0) in order to identify distributions of cognates in 6 European languages. Using the Rosetta schemes of Grootjen (2008) for database interaction, we classified translation pairs as cognates if a score for orthographic overlap based on the Levenshtein distance was above a motivated threshold. Semantic overlap was determined using the conceptual structure of the database. Differences between cognate distributions across languages were found to be similar to validation studies on language similarity ordering. In addition, numbers of translations, proportions of form-identical to form-similar cognates, and proportions of formidentical false friends to form-identical cognates were compared between languages. We show that these new techniques from artificial intelligence can facilitate the selection of stimulus materials for psycholinguistic cognate and false friend research, and can assess language similarity ordering between the analyzed languages: English, German, French, Spanish, Italian, and Dutch.

Description

Citation

Faculty

Faculteit der Sociale Wetenschappen