Overview
Variation databases systematically catalog the genetic differences that distinguish individuals and drive phenotypic diversity, including disease susceptibility. As sequencing technologies have matured, the number of known human genetic variants has grown exponentially, reaching hundreds of millions of entries. Variation databases serve two critical functions: they define the baseline spectrum of normal population variation, against which disease-associated variants can be identified, and they aggregate clinical interpretations that guide diagnostic and therapeutic decisions.
Key Concepts
dbSNP (Database of Single Nucleotide Polymorphisms) catalogs single nucleotide variants, small insertions and deletions, and microsatellite repeats. Each variant receives an rs (reference SNP) identifier. dbSNP records include allele frequency data from large population studies such as the 1000 Genomes Project and gnomAD. ClinVar focuses on the relationship between genetic variants and human health, providing clinical significance classifications (pathogenic, benign, uncertain significance) with supporting evidence. COSMIC (Catalogue of Somatic Mutations in Cancer) catalogs somatic mutations found in human cancers, including point mutations, copy number alterations, and gene fusions.
Applications
Variation databases are essential tools in genomic medicine. DNA sequencing projects use dbSNP to filter common polymorphisms from candidate pathogenic variants. Next-generation sequencing pipelines annotate variants against ClinVar to prioritize clinically actionable findings. Cancer biochemistry research relies on COSMIC to identify driver mutations, while DNA repair mechanism studies connect mutation signatures with specific repair pathway deficiencies.