This item is under embargo and not available online per the author's request. For access information, please visit http://libanswers.wustl.edu/faq/5640.
Date of Award
Doctor of Philosophy (PhD)
Network science plays a central role in understanding and modeling complex systems in many disciplines, including physics, sociology, biology, computer science, economics, politics, and neuroscience. By studying networks, we can gain a deep understanding of the behavior of the systems they represent. Many networks exhibit community structure, i.e., they have clusters of nodes that are locally densely interconnected. These communities manifest the hierarchical organization of the objects in systems, and detecting communities greatly facilitates the study of the organization and structure of complex systems.
Most existing community-detection methods consider low-order connection patterns, at the level of individual links. But high-order connection patterns, at the level of small sub-networks, are generally not considered. This work starts by developing a novel community-detection method based on cliques, i.e., locally complete subnetworks. The proposed method overcomes the deficiencies of previous similar community-detection methods by considering the mathematical properties of cliques. We apply the proposed method to computer-generated graphs and real-world network datasets. When applied to networks with known community structure (but not known to our method), the proposed method detects the structure with high fidelity and sensitivity. Moreover, our method yields insightful results revealing the organization of real-world complex networks, which we have no a~priori information regarding their community structure. We also show that the proposed method guarantees near-optimal performance in identifying clusters in the bipartition case.
Detecting communities in a whole network presents inherent problems because massive graphs incur a prohibitively large computational load. Moreover, in many cases, we are interested only in a local cluster near a given node. We next present a local graph clustering method that uses heat kernel PageRank to efficiently find a local cluster in massive graphs. The heat kernel PageRank provides a quantitative ranking of nodes, and a local cluster is then found by performing a sweep over the heat kernel PageRank vector. But computing an exact heat kernel PageRank vector may be expensive, so approximate algorithms are often used instead. Most approximate algorithms compute the heat kernel PageRank vector on the whole graph, and thus are dependent on global structures. We develop an algorithm for approximating the heat kernel PageRank on a local subgraph. Moreover, we show that the number of computations required by the proposed algorithm is sublinear in terms of the expected size of the local cluster of interest, and that it provides a good approximation of the heat kernel PageRank, with approximation errors bounded by a probabilistic guarantee. Numerical experiments verify that the local clustering algorithm using our approximate heat kernel PageRank achieves state-of-the-art performance.
Local clustering is particularly important in the study of spreading phenomena in networks, notably in studying the spread of disease, a topic of considerable interest in the network science research community. In the third part of the dissertation, we show that the outbreak of an epidemic can be effectively contained and suppressed in a small subnetwork by a combination of two measures: antidote distribution, providing antidotes to nodes in the subnetwork, and partial quarantine, which is equivalent to removing part of the boundary edges of the subnetwork. Our containment strategy improves over existing antidote distribution schemes based on personalized PageRank in two ways. First, we replace the constraint on the topology of this subnetwork described by Chung et al., that a large fraction of the value of the personalized PageRank vector must be contained in the local cluster, with a partial quarantine scheme. We obtain this result by using the properties of personalized PageRank to devise the partial quarantine scheme, which optimizes the tradeoff between successful containment and interference with the network topology. Second, we derive a new lower bound on the amount of antidote required to achieve successful containment. We prove that, under our antidote distribution scheme, the probability of the infection spreading to the whole network is bounded, and that the infection inside the subnetwork will disappear after a period that is proportional to the logarithm of the number of initially infected nodes. Thus, we find that an epidemic can be effectively contained and suppressed in the small subnetwork in which the outbreak was detected, under weaker conditions than those proposed in earlier research, after our containment strategy is implemented. We demonstrate the effectiveness of our strategy with numerical simulations of epidemics on benchmark networks. We also test our strategy on two examples of epidemics in real-world networks, one in a network of YouTube users, and another in a network of autonomous systems in the Internet. Our strategy is dependent only on the rate of infection, the rate of recovery, and the topology around the initially infected nodes, and is independent of the rest of the network.
Ulugbek Kamilov, Matthew Lew, Johan Wahlstrom, Lan Yang,
Available for download on Saturday, July 09, 2022