Methods For Automatically Generating a Legal Thesaurus
Automatic thesaurus generation is a desired technique for the reason that a thesaurus is a useful tool in NLP, but manually making a thesaurus is expensive and time consuming. In this thesis, the process of thesaurus generation is divided up in two parts: term extraction and relation extraction. Term extraction being the process of automatically finding candidate terms for a legal thesaurus and relation extraction is the process of finding which terms are hypernyms of each other. For term extraction different termhood measures are used: Log Likelihood, Kullback Leibler Divergence and the measure as assigned by the TExSIS tool. For relation extraction, different classifiers are trained to classify whether two terms have a hypernym-relation. The conclusion of this thesis is that no system could be built that can autonomously build a thesaurus and that in the short term it is better to look for a system to assist humans in making a thesaurus.
Faculteit der Letteren