Date of Award
Doctor of Philosophy (PhD)
Structured data modeled as graphs arise in many application domains, such as computer vision, bioinformatics, and sociology. In this dissertation, we focus on three important topics in graph-structured data analysis: graph comparison, graph embeddings, and graph matching, for all of which we propose effective algorithms by making use of kernel functions and the corresponding reproducing kernel Hilbert spaces.For the first topic, we develop effective graph kernels, named as "RetGK," for quantitatively measuring the similarities between graphs. Graph kernels, which are positive definite functions on graphs, are powerful similarity measures, in the sense that they make various kernel-based learning algorithms, for example, clustering, classification, and regression, applicable to structured data. Our graph kernels are obtained by two-step embeddings. In the first step, we represent the graph nodes with numerical vectors in Euclidean spaces. To do this, we revisit the concept of random walks and introduce a new node structural role descriptor, the return probability feature. In the second step, we represent the whole graph with an element in reproducing kernel Hilbert spaces. After that, we can naturally obtain our graph kernels. The advantages of our proposed kernels are that they can effectively exploit various node attributes, while being scalable to large graphs. We conduct extensive graph classification experiments to evaluate our graph kernels. The experimental results show that our graph kernels significantly outperform state-of-the-art approaches in both accuracy and computational efficiency.For the second topic, we develop scalable attributed graph embeddings, named as "SAGE." Graph embeddings are Euclidean vector representations, which encode the attributed and the topological information. With graph embeddings, we can apply all the machine learning algorithms, such as neural networks, regression/classification trees, and generalized linear regression models, to graph-structured data. We also want to highlight that SAGE considers both the edge attributes and node attributes, while RetGK only considers the node attributes. ``SAGE" is a extended work of ``RetGK," in the sense that it is still based on the return probabilities of random walks and is derived from graph kernels. But ``SAGE" uses a totally different strategy, i.e., the ``distance to kernel and embeddings" algorithm, to further represent graphs. To involve the edge attributes, we introduce the adjoint graph, which can help convert edge attributes to node attributes. We conduct classification experiments on graphs with both node and edge attributes. ``SAGE" achieves the better performances than all previous methods.For the third topic, we develop a new algorithm, named as "KerGM," for graph matching. Typically, graph matching problems can be formulated as two kinds of quadratic assignment problems (QAPs): Koopmans-Beckmann's QAP or Lawler's QAP. In our work, we provide a unifying view for these two problems by introducing new rules for array operations in Hilbert spaces. Consequently, Lawler's QAP can be considered as the Koopmans-Beckmann's alignment between two arrays in reproducing kernel Hilbert spaces, making it possible to efficiently solve the problem without computing a huge affinity matrix. Furthermore, we develop the entropy-regularized Frank-Wolfe algorithm for optimizing QAPs, which has the same convergence rate as the original Frank-Wolfe algorithm while dramatically reducing the computational burden for each outer iteration. Furthermore, we conduct extensive experiments to evaluate our approach, and show that our algorithm has superior performance in both matching accuracy and scalability.
Ulugbek Kamilov, Neal Patwari, Lingfei Wu, Xuan Zhang,