Overview
Phylogenetic tree construction is the process of inferring evolutionary relationships among biological entities — species, genes, or populations — from molecular sequence data. The resulting tree-like structure consists of branches (lineages) connected at nodes (common ancestors), with branch lengths often representing the amount of evolutionary change. The fundamental assumption is that sequence similarity reflects shared ancestry. Methods range from simple distance-based approaches, which convert pairwise sequence differences into evolutionary distances, to more complex character-based methods that evaluate each nucleotide or amino acid position independently.
Key Concepts
A critical first step is sequence alignment, where homologous positions are matched across taxa. Distance-based methods such as neighbor-joining (NJ) construct trees from a matrix of pairwise distances and are computationally fast. Character-based methods include maximum parsimony, which minimizes the total number of evolutionary changes, and more statistically rigorous approaches like maximum likelihood and Bayesian inference. Bootstrapping provides confidence estimates by resampling alignment columns and recomputing the tree many times. Common file formats include FASTA (input alignments) and Newick (tree topology).
Applications
Phylogenetic trees are indispensable across biology. They underpin taxonomic classification, trace the origin and spread of pathogens, and guide drug discovery by revealing evolutionary conservation of drug targets. In comparative genomics, tree topology informs the identification of orthologs and paralogs. These analyses build directly on fundamental DNA sequencing data and complement studies in bacterial genetics by mapping strain relationships. Phylogenetic methods also clarify the evolutionary history of viruses, aiding viral structure and classification efforts.