Skip to content

Article image
Sequence Database Search with BLAST

Overview

BLAST (Basic Local Alignment Search Tool) is the most widely used algorithm for comparing a query sequence against a database of known sequences. It rapidly identifies statistically significant local alignments, providing functional annotation for novel genes, detecting homology across distant species, and revealing evolutionary relationships. BLAST sacrifices the guaranteed optimality of full dynamic programming for a heuristic that is fast enough to search databases containing billions of residues. The statistical significance of each hit is reported as an E-value — the expected number of chance alignments with a given score in a database of that size.

Key Concepts

BLAST works by first breaking the query into short words (typically 3 for proteins, 11 for nucleotides), scanning the database for exact matches to these words, and then extending promising matches in both directions to build longer alignments. Variants address specific use cases: BLASTP compares protein queries against protein databases, BLASTN compares nucleotide queries against nucleotide databases, BLASTX translates a nucleotide query in all six reading frames for protein-level comparison, and PSI-BLAST iteratively builds a position-specific score matrix to detect distant homologs. MegabLAST is optimized for highly similar sequences, while discontiguous MegabLAST handles cross-species comparisons.

Applications

BLAST is the first step in annotating unknowns from DNA sequencing projects. It assigns putative function to novel proteins by detecting homology to characterized protein structures. In bacterial genetics, BLAST identifies virulence factors and antibiotic resistance genes. Recombinant DNA technology uses BLAST to verify construct integrity by aligning sequencing reads against expected vector sequences.