Distributions of Cognates in Europe Based on the Levenshtein Distance

dc.contributor.advisorDijkstra, A.F.J.
dc.contributor.advisorGrootjen, F.A.
dc.contributor.authorSchepens, J.J.
dc.description.abstractWe applied the Levenshtein distance on a professional translation database (extracted from Euroglot professional 5.0) in order to identify distributions of cognates in 6 European languages. Using the Rosetta schemes of Grootjen (2008) for database interaction, we classified translation pairs as cognates if a score for orthographic overlap based on the Levenshtein distance was above a motivated threshold. Semantic overlap was determined using the conceptual structure of the database. Differences between cognate distributions across languages were found to be similar to validation studies on language similarity ordering. In addition, numbers of translations, proportions of form-identical to form-similar cognates, and proportions of formidentical false friends to form-identical cognates were compared between languages. We show that these new techniques from artificial intelligence can facilitate the selection of stimulus materials for psycholinguistic cognate and false friend research, and can assess language similarity ordering between the analyzed languages: English, German, French, Spanish, Italian, and Dutch.en_US
dc.thesis.facultyFaculteit der Sociale Wetenschappenen_US
dc.thesis.specialisationBachelor Artificial Intelligenceen_US
dc.thesis.studyprogrammeArtificial Intelligenceen_US
dc.titleDistributions of Cognates in Europe Based on the Levenshtein Distanceen_US
Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
Schepens, J. BaThesis.pdf
535.97 KB
Adobe Portable Document Format