Graph Similarity classi cation on HPO graphs of patients with a genetic disorder
Keywords
Loading...
Authors
Issue Date
2022-01-25
Language
en
Document type
Journal Title
Journal ISSN
Volume Title
Publisher
Title
ISSN
Volume
Issue
Startpage
Endpage
DOI
Abstract
1-3% of the general population has intellectual disability. While many of
these patients can not be diagnosed with a gene de cit, in 10-20% of these
patients, only a variant of unknown signi cance (VUS) is found. A variant
of unknown signi cance is a genetic variant that has been identi ed, but
whose signi cance to the function or health of the person is unknown. A
VUS makes diagnosis impossible.
Therefore this study aims to nd a method for diagnosing patients with
a VUS. We predict the gene de cit by using graph similarity classi cation
on human phenotype ontology (HPO) graphs of patients with a genetic
disorder. The human phenotype ontology is a standardized vocabulary of
phenotypic abnormalities encountered in human disease [9]. To achieve our
aim we ask the following research question: Can we accurately classify which
genetic disorder a patient has based on HPO graph similarity classi cation?
We have implemented 2 graph similarity classi cation methods to classify
patients with a VUS. First of all, most common subgraph classi cation based
on one of the following metrics: amount of nodes, amount of edges, sum of
weights, and the amount of nodes with penalty in the most common subgraph.
Secondly, we applied graph kernels from GraKeL [10] in combination
with support vector machines for classi cation.
The most important result from the graph similarity classi cation methods
is as follows. The best performing graph kernel for kernel-based graph
classi cation is the vertex histogram kernel with an accuracy of 80% and a
F1-score of 0.75. Furthermore, the optimal most common subgraph (MCS)
model is the MCS model based on the nodes with penalty metric with an
accuracy of 74% and a F1-score of 0.68.
Furthermore, the following 3 conclusions can be drawn from this study.
To begin, we can accurately classify which genetic disorder a patient has
based on HPO graph similarity classi cation. Moreover, the simple vertex
histogram kernel performs on par with more sophisticated graph kernels
such as the Weisfeiler-Lehman optimal assignment kernel, this is likely due
to the structure of the HPO graphs. In addition, this model could be used
as a tool for clinical geneticists to predict the gene de cit given that the
suspected gene de cit is in the data.
Despite these promising results, there are 3 limitations that can be improved
in future work. First of all, the data is limited, therefore we suggest adding more graphs of patients with di erent gene de cits to the data. Secondly,
we suggest adding more useful information to the node and/or edge labels
to improve classi cation with sophisticated kernels. Finally, one can add
informative feature vectors to the nodes to make classi cation with graph
neural networks possible.
Description
Citation
Supervisor
Faculty
Faculteit der Sociale Wetenschappen
