Document Type

Technical Report

Department

Computer Science and Engineering

Publication Date

2006-01-01

Filename

wucse-2006-31.pdf

DOI:

10.7936/K7FX77P3

Technical Report Number

WUCSE-2006-31

Abstract

Identification of groups of functionally related genes from high throughput gene expression data is an important step towards elucidating gene functions at a global scale. Most existing approaches treat gene expression data as points in a metric space, and apply conventional clustering algorithms to identify sets of genes that are close to each other in the metric space. However, they usually ignore the topology of the underlying biological networks. In this paper, we propose a network-based clustering method that is biologically more realistic. Given a gene expression data set, we apply a rank-based transformation to obtain a sparse co-expression network, and use a novel spectral clustering algorithm to identify natural community structures in the network, which correspond to gene functional modules. We have tested the method on two large-scale gene expression data sets in yeast and Arabidopsis, respectively. The results show that the clusters identified by our method on these datasets are functionally richer and more coherent than the clusters from the standard k-means clustering algorithm.

Comments

Permanent URL: http://dx.doi.org/10.7936/K7FX77P3

Share

COinS