Detecting Private Information: Using and comparing an Artificial Immune System to a rule-based algorithm
The present master thesis seeks to develop a better way of extracting private information from nonprivate information in official documents. The aim is to automatize this process. For this, several algorithms were created. One is a rule-based algorithm that uses a set amount of words to determine if something is private or not. This algorithm is meant as a baseline for comparison. The second algorithm is an Artificial Immune System, which tries to detect ‘outside’ information, or in this case, private information. The two algorithms were compared in terms of Sensitivity and Specificity during initial tests. A Wilcoxon test was utilized during the final test. The hypothesis is that the Artificial Immune System would perform better due learning the patterns itself, while the rulebased algorithm would face difficulty generalizing. It was proven that the two algorithms function differently in terms of performance (p<0.05), with hints that the Artificial Immune System performs better. However, both the Artificial Immune System and the rule-based algorithm could not reliably detect private information (33% found and 22% found respectively). Other methods will be necessary to solve this problem.
Faculteit der Sociale Wetenschappen