Document Type

Technical Report


Computer Science and Engineering

Publication Date






Technical Report Number



Background: Microarray data preprocessing, such as differentially expressed (DE) genes selection, is performed prior to higher level statistical analysis in order to account for technical variability. Preprocessing for the Affymetrix GeneChip includes background correction, normalisation and summarisation. Numerous preprocessing methods have been proposed with little consensus as to which is the most suitable. Furthermore, due to poor concordance among results from cross-platform analyses, protocols are being developed to enable cross-platform reproducibility. However, the effect of data analysis on a single platform is still unknown. The objective of our study is two-fold: first to determine whether there is consistency in the results obtained from a single platform; and second to investigate the effect of preprocessing on DE genes selection, analysed on four datasets. Results: Results indicate that microarray analysis is subjective. The lists of DE genes are variable and dependent on the preprocessing method used. Furthermore, the characteristics of the dataset, and the type of DE genes identification method used, greatly affect the outcome. Despite using a single platform, there is a lot of variability in the results. Conclusions: This is the first comprehensive analysis using multiple datasets generated from a single platform and involving many DE genes selection methods to assess the effect of data preprocessing on downstream analysis. Results indicate that preprocessing methods affect downstream analysis. Results are also affected by the kind of data and statistical analysis tools used. Our study reveals that there are inconsistencies in results obtained from a single platform. These issues have been overlooked in past reports.


Permanent URL: