Exploring Discretization Techniques for Breast Cancer Detection with Bayesian Networks
Computer Aided Detection systems (CAD) are assisting radiologists with deciding whether a detected anomaly is malignant. The current CAD systems are detecting most cancers, but false positives are the biggest problem. In a collaboration with the Radboud University Nijmegen Medical Centre (UMCN) and the computer science department of the Radboud University Nijmegen a multi-stage CAD system has been developed. The final stage is a Bayesian Network which elaborates on features like `contrast' or `size' of a suspicious region. In this stage two views of the same region are regarded simultaneously. In this thesis we improve this causal model by discretizing the variables. Both to capture the underlying probability distribution and to aid usability of the CAD system since radiologist would typically annotate a region in a categorical fashion (e.g. `high' or `very low' contrast). Classification performance is determined using ROC curves. A few algorithms perform better than continuous baseline, best was the entropy based method of Fayyad and Irani, but also simpler algorithms can outperform continuous baseline. Two simpler methods with only 3 bins per variable gave results similar to continuous baseline. This indicates that usability can be improved without decline in performance.
Faculteit der Sociale Wetenschappen