Document Type

Technical Report

Department

Computer Science and Engineering

Publication Date

2006-01-01

Filename

wucse-2006-20.pdf

DOI:

10.7936/K7KP80CC

Technical Report Number

WUCSE-2006-20

Abstract

Identifying intrinsic structures in large networks is a fundamental problem in many fields, such as biology, engineering and social sciences. Motivated by biology applications, in this paper we are concerned with identifying community structures, which are densely connected sub-graphs, in large biological networks. We address several critical issues for finding community structures. First, biological networks directly constructed from experimental data often contain spurious edges and may also miss genuine connections. As a result, community structures in biological networks are often weak. We introduce simple operations to capture local neighborhood structures for identifying weak communities. Second, we consider the issue of automatically determining the most appropriate number of communities, a crucial problem for all clustering methods. This requires to properly evaluate the quality of community structures. We extend an existing work of a modularity function for evaluating community structures to weighted graphs. Third, we propose a spectral clustering algorithm to optimize the modularity function, and a greedy partitioning method to approximate the first algorithm with much reduced running time. We evaluate our methods on many networks of known structures, and apply them to three real-world networks that have different types of network communities: a yeast protein-protein interaction network, a co-expression network of yeast cell-cycle genes, and a collaboration network of bioinformaticians. The results show that our methods can find superb community structures and the correct numbers of communities. Our results reveal several interesting network structures that have not been reported previously.

Comments

Permanent URL: http://dx.doi.org/10.7936/K7KP80CC

Share

COinS