Skip to content

Article image
DNA Microarrays and Gene Expression Analysis

DNA microarrays allow simultaneous measurement of the expression levels of thousands of genes, providing a genome-wide view of cellular transcriptional activity. They revolutionized functional genomics by enabling researchers to compare gene expression between different conditions, tissues, or disease states.

Microarray Principles

A DNA microarray consists of thousands of microscopic spots, each containing probes of a specific DNA sequence, attached to a solid surface such as a glass slide or silicon chip. Each probe is designed to hybridize to a specific target mRNA or cDNA sequence. The entire collection of probes represents the set of genes being analyzed.

The basic experiment involves extracting RNA from the samples of interest, converting it to complementary DNA with reverse transcriptase, labeling the cDNA with fluorescent dyes, and hybridizing the labeled cDNA to the microarray. After washing away non-specifically bound material, the fluorescence intensity at each spot is measured, indicating the amount of each mRNA in the original sample.

Two-Color Arrays

In two-color microarray experiments, RNA from two conditions is labeled with different fluorescent dyes, typically Cy3 and Cy5. The labeled cDNAs are mixed and hybridized to a single array. The ratio of Cy5 to Cy3 fluorescence at each spot reflects the relative expression level of each gene between the two conditions. Two-color designs control for spot-to-spot variation but introduce dye bias that requires dye-swap experiments.

Single-Channel Arrays

Single-channel arrays such as Affymetrix GeneChips use a single fluorescent label. Each sample is hybridized to a separate array. Gene expression is measured as absolute intensity, and comparisons are made across arrays after normalization. Single-channel arrays have higher throughput for multiple comparisons but require robust normalization methods to reduce between-array variation. Affymetrix arrays use multiple probes per gene with perfect match and mismatch probes to distinguish specific hybridization from background.

Data Normalization

Microarray data requires extensive preprocessing. Background correction removes signal from non-specific hybridization. Normalization adjusts for technical variation between arrays. Quantile normalization makes the distribution of probe intensities identical across arrays, assuming most genes are not differentially expressed. Robust multi-array average combines background correction, normalization, and summarization of multiple probes per gene.

Differential Expression Analysis

After normalization, statistical tests identify genes with significant expression changes between conditions. The moderated t-test, implemented in the limma package, borrows information across genes to stabilize variance estimates. Multiple testing correction using the Benjamini-Hochberg method controls the false discovery rate. Results are typically reported as fold changes with adjusted p-values. Genes with fold changes above a threshold and adjusted p-values below 0.05 are considered differentially expressed.

Clustering and Classification

Unsupervised clustering groups genes or samples based on expression similarity without prior knowledge. Hierarchical clustering produces dendrograms where genes with similar expression profiles are grouped together. K-means clustering partitions genes into a specified number of clusters. These approaches can reveal co-regulated gene groups and novel sample subtypes.

Supervised classification uses known sample labels to build predictors that can classify unknown samples. Support vector machines, random forests, and nearest-neighbor classifiers are applied. Gene expression signatures can classify cancer subtypes, predict prognosis, and guide treatment selection. The PAM50 signature classifies breast cancer into molecular subtypes with different prognoses.

Applications

Microarrays have been applied to virtually every area of biology. Cancer research uses microarrays to classify tumors, identify prognostic signatures, and discover drug targets. The MammaPrint and Oncotype DX breast cancer assays use gene expression signatures to predict recurrence risk. Developmental biology studies transcriptional programs driving differentiation. Toxicology uses microarrays for toxicogenomic profiling. Microarrays also detect copy number variations when used for comparative genomic hybridization.

Limitations and the Transition to RNA-Seq

Microarrays have limitations including reliance on pre-defined probe sequences, limited dynamic range, and inability to detect novel transcripts or splice variants. RNA sequencing, a key application of next-generation sequencing, has largely replaced microarrays for gene expression analysis. RNA-seq provides digital count data with higher sensitivity and dynamic range, detects novel transcripts and isoforms, and requires no pre-defined probes. However, microarrays remain useful for well-characterized organisms and clinical applications where standardized platforms are advantageous.