Technical Report Number
Identification of groups of functionally related genes from high throughput gene expression data is an important step towards elucidating gene functions at a global scale. Most existing approaches treat gene expression data as points in a metric space, and apply conventional clustering algorithms to identify sets of genes that are close to each other in the metric space. However, they usually ignore the topology of the underlying biological networks. In this paper, we propose a network-based clustering method that is biologically more realistic. Given a gene expression data set, we apply a rank-based transformation to obtain a sparse co-expression network, and use a novel spectral clustering algorithm to identify natural community structures in the network, which correspond to gene functional modules. We have tested the method on two large-scale gene expression data sets in yeast and Arabidopsis, respectively. The results show that the clusters identified by our method on these datasets are functionally richer and more coherent than the clusters from the standard k-means clustering algorithm.
Ruan, Jianhua and Zhang, Weixiong, "Discovering Functional Modules by Clustering Gene Co-expression Networks" Report Number: WUCSE-2006-31 (2006). All Computer Science and Engineering Research.