Overview
Pathway enrichment analysis determines whether predefined sets of genes — representing biological pathways, functional categories, or cellular processes — are statistically overrepresented in a list of experimentally derived genes. This approach transforms a long list of differentially expressed genes into interpretable biological themes, helping researchers move from “which genes changed?” to “which processes are affected?” The core principle is that if a pathway is relevant to the condition under study, more of its member genes will appear in the user’s gene list than expected by chance.
Methods
The most widely used methods include Fisher’s exact test (or hypergeometric test), which evaluates overlap between the gene list and a pathway gene set against a defined background. The Gene Set Enrichment Analysis (GSEA) approach avoids arbitrary significance thresholds by ranking all genes by a metric of differential expression and testing whether pathway members cluster at the top or bottom of the ranked list. Multiple-testing correction — typically Benjamini-Hochberg false discovery rate — is essential because hundreds of pathways are evaluated simultaneously. Curated pathway databases such as KEGG, Reactome, and Gene Ontology provide the reference gene sets.
Applications
Pathway enrichment is a routine step in transcriptomics, proteomics, and metabolomics studies. It links differential expression results from DNA microarrays and gene expression experiments to functional biology. Cancer researchers use it to identify dysregulated pathways such as glycolysis or the citric acid cycle, and it reveals how gene regulation and epigenetics rewire cellular programs in disease. Enrichment analysis also guides biomarker discovery by highlighting pathways common to metabolic pathways that are perturbed in patient samples.