Overview
Gene expression databases provide centralized repositories for transcriptomics data generated by microarray and high-throughput sequencing experiments. They ensure that experimental data remain accessible after publication, enabling independent verification, meta-analysis, and novel discoveries through data re-use. The two largest repositories are the Gene Expression Omnibus (GEO) at NCBI and ArrayExpress at EMBL-EBI. Both accept data from expression profiling, chromatin immunoprecipitation, and other functional genomics assays, and enforce community-adopted reporting standards such as MIAME (Minimum Information About a Microarray Experiment).
Key Concepts
GEO organizes data into four record types: Series (a complete experiment), Samples (individual hybridizations or sequencing runs), Platforms (the array or sequencing platform used), and Datasets (curated collections for analysis). GEO2R provides a web-based tool for differential expression analysis without programming. ArrayExpress is the European counterpart, interoperable with GEO through data exchange agreements. Both databases support the MAGE-TAB and MINiML tabular formats for metadata submission. MINSEQE extends MIAME standards to cover sequencing-based expression experiments.
Applications
Public expression databases accelerate discovery across many fields. Researchers use GEO to retrieve datasets for meta-analysis of DNA microarray and gene expression studies, validate their own findings against published experiments, and identify expression biomarkers. qPCR primer design benefits from expression data that confirms transcript abundance patterns. RNA sequencing studies archive raw reads and processed expression matrices in GEO or ArrayExpress as a condition of publication.