Graph-based diffusion methods for node classification and link prediction
Date
2020-02-13Author
Timilsina, Mohan
Metadata
Show full item recordUsage
This item's downloads: 187 (view details)
Abstract
Graphs are common representation tools to organize information from heterogeneous
sources. They have been applied in various domains such as the scientific, engineering,
political, and business sectors. The large volume of graph data is now becoming a
surging research challenge for building highly accurate predictive models. One of the
main motivations behind the interest in graphs is due to their structure, since it describes
how information spreads through the nodes. In turn, this natural property of graphs
makes them ideal for designing prediction models based on spreading functions, which
can activate changes in the connections and its nodes.
Link Prediction (LP) is one of the fundamental problems in graphs. It is the problem
of predicting associations between a pair of nodes. Many LP algorithms addressed
in the scientific literature evaluate them as a classification task or a ranking problem.
Only a few studies have considered the impacts of diffusion in graphs, and none of
the works on diffusion-based methods has investigated link prediction in graphs with
heterogeneous nodes. Here, we proposed a 2-layered graph framework to combine two
different graphs and analyzed diffusion algorithms such as PageRank, Katz, and heat
diffusion model. We further proposed a novel diffusion method in a 2-layered graph
framework by integrating matrix factorization with heat diffusion. Our results show
the effectiveness of this integration by computing weighted (i.e., ranked) predictions of
initially unknown links between two disjoint nodes.
A prominent problem in the study of diffusion in graphs is that not all kinds of diffusion are
the same. Some network structures support long-range diffusion, whereas others support
short-range diffusion. Long-range diffusion has been largely explored in applications
such as viral marketing, influence maximization, political campaigning, or ordinary
label propagation, especially through graph-based semi-supervised machine learning.
Conversely, short-range diffusion is less explored. If we consider the case of a real-world
network where nodes are shared across different layers i.e., multiplex network, long-range
diffusion might not work and result in node miss-classification. We proposed a novel
boundary-based heat diffusion (BHD) method, which has the flexibility to diffuse the
information step by step due to its time-dependent property and guarantees a closed-form
solution.
We evaluated BHD models on different types of real-world networks for a node classification
task. We took the same inspiration from a semi-supervised machine learning approach by
using a small amount of labeled data and a large number of unlabeled data to classify the
nodes. For this task, we applied BHD in (i) multiplex network; (ii) homogeneous network
with homophilic labels; (iii) homogeneous network with heterophilic labels; and (iv)
homogeneous network with mixed labels. Furthermore, we extended our BHD approach
in complex data for semi-supervised graph regression problems to predict the real values
of the node labels using fewer labeled data. Experiments from business, biomedical,
physical, and social domain data show that the boundary-based heat diffusion method
can effectively outperform the top state of the art methods.