Technical Report Number
Given a matrix of values in which the rows correspond to objects and the columns correspond to features of the objects, rearrangementclustering is the problem of rearranging the rows of the matrix such that the sum of the similarities between adjacent rows is maximized. Referred to by various names and reinvented several times, this clustering technique has been extensively used in many ﬁelds over the last three decades. In this paper, we point out two critical pitfalls that have been previously overlooked. The ﬁrst pitfall is deleterious when rearrangement clustering is applied to objects that form natural clusters. The second concerns a similarity metric that is commonly used. We present an algorithm that overcomes these pitfalls. This algorithm is based on a variation of the Traveling Salesman Problem. It oﬀers an extra beneﬁt as it automatically determines cluster boundaries. Using this algorithm, we optimally solve four benchmark problems and a 2,467-gene expression data clustering problem. As expected, our new algorithm identiﬁes better clusters than those found by previous approaches in all ﬁve cases. Overall, our results demonstrate the beneﬁts of rectifying the pitfalls and exemplify the usefulness of this clustering technique. Our code is available at our websites.
Climer, Sharlee and Zhang, Weixiong, "Rearrangement Clustering: Pitfalls, Remedies, and Applications" Report Number: WUCSE-2005-2 (2005). All Computer Science and Engineering Research.