Skip to content

Article image
Microarray Data Analysis

Overview

Microarray data analysis transforms raw fluorescence intensities from hybridized microarrays into meaningful gene expression measurements. Although RNA-seq has largely supplanted microarrays for discovery-based studies, microarrays remain widely used in clinical diagnostics, plant breeding, and large population studies due to their low cost, standardized protocols, and well-established analysis pipelines. A single microarray can measure expression of tens of thousands of transcripts simultaneously by exploiting complementary base pairing between sample cDNA and immobilized probes. The analysis workflow addresses the unique technical characteristics of microarray data, including background correction, normalization, and probe-level summarization.

Methods

Microarray analysis begins with image processing to extract probe-level intensities. Background correction removes nonspecific signal using methods such as robust multi-array average (RMA) or GC-content adjustment. Normalization makes arrays comparable: quantile normalization is the most common approach for one-color arrays, while loess normalization is applied to two-color designs. Probe summarization (for Affymetrix arrays, using RMA or PLIER) combines multiple probes per gene into a single expression value. Quality assessment uses pseudo-images, NUSE plots, and RLE plots to identify problematic arrays. Differential expression is tested with limma, which uses empirical Bayes moderation to stabilize variance estimates across genes. Batch effects are detected with principal component analysis and corrected using ComBat or limma’s removeBatchEffect.

Applications

Despite the rise of sequencing, microarrays continue to deliver value. The FDA-approved MammaPrint and Oncotype DX tests use microarrays for breast cancer prognosis. Clinical DNA microarrays and gene expression panels guide treatment decisions in oncology and rare disease diagnosis. Microarray data analysis also supports validation of qPCR results through qPCR correlation studies and complements DNA sequencing by profiling expression at scale. In agricultural genomics, microarrays enable cost-effective trait mapping and marker-assisted selection across large breeding populations.