Graph Similarity classi cation on HPO graphs of patients with a genetic disorder

Keywords
Loading...
Thumbnail Image
Issue Date
2022-01-25
Language
en
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
1-3% of the general population has intellectual disability. While many of these patients can not be diagnosed with a gene de cit, in 10-20% of these patients, only a variant of unknown signi cance (VUS) is found. A variant of unknown signi cance is a genetic variant that has been identi ed, but whose signi cance to the function or health of the person is unknown. A VUS makes diagnosis impossible. Therefore this study aims to nd a method for diagnosing patients with a VUS. We predict the gene de cit by using graph similarity classi cation on human phenotype ontology (HPO) graphs of patients with a genetic disorder. The human phenotype ontology is a standardized vocabulary of phenotypic abnormalities encountered in human disease [9]. To achieve our aim we ask the following research question: Can we accurately classify which genetic disorder a patient has based on HPO graph similarity classi cation? We have implemented 2 graph similarity classi cation methods to classify patients with a VUS. First of all, most common subgraph classi cation based on one of the following metrics: amount of nodes, amount of edges, sum of weights, and the amount of nodes with penalty in the most common subgraph. Secondly, we applied graph kernels from GraKeL [10] in combination with support vector machines for classi cation. The most important result from the graph similarity classi cation methods is as follows. The best performing graph kernel for kernel-based graph classi cation is the vertex histogram kernel with an accuracy of 80% and a F1-score of 0.75. Furthermore, the optimal most common subgraph (MCS) model is the MCS model based on the nodes with penalty metric with an accuracy of 74% and a F1-score of 0.68. Furthermore, the following 3 conclusions can be drawn from this study. To begin, we can accurately classify which genetic disorder a patient has based on HPO graph similarity classi cation. Moreover, the simple vertex histogram kernel performs on par with more sophisticated graph kernels such as the Weisfeiler-Lehman optimal assignment kernel, this is likely due to the structure of the HPO graphs. In addition, this model could be used as a tool for clinical geneticists to predict the gene de cit given that the suspected gene de cit is in the data. Despite these promising results, there are 3 limitations that can be improved in future work. First of all, the data is limited, therefore we suggest adding more graphs of patients with di erent gene de cits to the data. Secondly, we suggest adding more useful information to the node and/or edge labels to improve classi cation with sophisticated kernels. Finally, one can add informative feature vectors to the nodes to make classi cation with graph neural networks possible.
Description
Citation
Faculty
Faculteit der Sociale Wetenschappen