Variant Calling: Detecting Genetic Variation

Overview

Variant calling is the computational process of identifying differences between an individual’s genome and a reference genome. These differences range from single nucleotide polymorphisms (SNPs) and small insertions or deletions (indels) to large structural variants such as copy number changes and chromosomal rearrangements. Accurate variant detection is the cornerstone of human genetics, precision medicine, and evolutionary biology. The process requires careful statistical modeling to distinguish genuine biological variation from sequencing errors and alignment artifacts.

Key Concepts

Most variant callers follow a common workflow: sequencing reads are first aligned to a reference genome using tools such as BWA or Bowtie2, then the aligned data are processed to identify positions where the individual’s sequence differs from the reference. The genome analysis toolkit (GATK) best practices pipeline is widely adopted, using a Bayesian approach to calculate genotype likelihoods. Key considerations include coverage depth (higher depth improves confidence), base quality scores, and mapping quality. Variant filtering using hard thresholds or machine learning (e.g., VQSR) removes false positives. Structural variant detection requires specialized tools such as DELLY or Manta that analyze discordant read pairs and split reads.

Applications

Variant calling drives both clinical and research genomics. It identifies mutations underlying rare genetic disorders and informs cancer genomics by revealing somatic mutations in tumor-normal comparisons. Population-scale projects such as the 1000 Genomes Project have catalogued millions of variants to map human diversity. In infectious disease, variant calling tracks pathogen evolution and drug resistance. The accuracy of variant calls depends fundamentally on the quality of next-generation sequencing data and the DNA sequencing platform used.