Overview
Codon usage analysis investigates the frequency with which each synonymous codon — multiple codons encoding the same amino acid — appears in a genome or transcriptome. The genetic code is degenerate: 61 codons specify 20 amino acids, and most amino acids are encoded by two to six synonymous codons. These codons are not used equally; organisms exhibit strong biases toward a subset of codons. The bias reflects a balance between mutational pressure and translational selection, where optimal codons correspond to the most abundant tRNA species, enabling faster and more accurate translation and protein synthesis.
Key Concepts
The codon adaptation index (CAI) measures how closely a gene’s codon usage matches that of a reference set of highly expressed genes, with values near 1 indicating strong bias toward optimal codons. The effective number of codons (ENC) quantifies overall codon bias independently of gene length, where 20 indicates extreme bias (one codon per amino acid) and 61 indicates no bias. The relative synonymous codon usage (RSCU) value for each codon equals the observed frequency divided by the expected frequency under equal usage. Codon usage tables are available for thousands of organisms through databases such as the Codon Usage Database and Kazusa. GC content at the third codon position (GC3) correlates strongly with genome-wide nucleotide composition.
Applications
Codon usage analysis guides heterologous gene expression: genes from one species are often codon-optimized for the production host (e.g., E. coli or yeast) to improve yield. It reveals translational selection in highly expressed genes, which tend to use optimal codons recognized by abundant tRNAs. Studies in amino acids biochemistry benefit from understanding how codon bias affects protein folding kinetics. Gene regulation and epigenetics research explores correlations between codon usage, mRNA stability, and translation efficiency as a layer of post-transcriptional control.