Sequence Alignment: Pairwise Comparison of Biological Sequences

Overview

Sequence alignment is the fundamental operation of bioinformatics, placing two or more biological sequences side by side to identify regions of similarity that may reflect functional, structural, or evolutionary relationships. Pairwise alignment compares exactly two sequences and forms the basis for database searching, primer design, and phylogenetic inference. The alignment problem is solved by dynamic programming algorithms — Needleman-Wunsch for global alignment and Smith-Waterman for local alignment — that find the optimal scoring path through a matrix of match, mismatch, and gap penalties.

Key Concepts

Alignments are classified as global or local. Global alignment forces alignment across the entire length of both sequences and is most appropriate for closely related sequences of similar length. Local alignment identifies short, conserved regions and is ideal for detecting shared domains between divergent sequences. Substitution matrices such as BLOSUM62 and PAM250 provide log-odds scores for every possible amino acid replacement, while DNA alignments typically use simple match/mismatch scores. Gap penalties — often a combination of a gap-opening and a gap-extension penalty — discourage excessive insertions or deletions. Heuristic tools like BLAST trade guaranteed optimality for speed by seeding alignments with exact word matches.

Applications

Pairwise alignment is used daily in molecular biology. It underpins polymerase chain reaction primer design by checking primer-template complementarity, validates DNA sequencing results by aligning reads to reference genomes, and identifies conserved residues in protein structure prediction. Comparative genomics relies on alignment to detect horizontal gene transfer in bacterial genetics, and restriction site mapping uses alignment to predict restriction enzyme digestion patterns.