Bayesian Phylogenetics: Probabilistic Tree Inference

Overview

Bayesian phylogenetics provides a probabilistic framework for inferring evolutionary trees by combining prior knowledge with observed sequence data through Bayes’ theorem. Rather than returning a single best tree, it produces a posterior distribution over tree topologies, branch lengths, and substitution model parameters. This distribution quantifies the uncertainty inherent in phylogenetic inference, allowing researchers to assign posterior probabilities to individual clades. The complexity of the parameter space necessitates Markov chain Monte Carlo (MCMC) sampling, which explores thousands or millions of trees and parameter values.

Key Concepts

MCMC algorithms — typically Metropolis-Hastings or Gibbs sampling — generate a chain of correlated samples from the posterior distribution. Convergence is assessed using diagnostics such as effective sample size (ESS) and potential scale reduction factor (PSRF). A burn-in period discards early samples before the chain reaches stationarity. Posterior probabilities on clades represent the proportion of sampled trees containing that clade and are more intuitive than bootstrap support values. The prior distribution can incorporate external information, such as fossil calibrations for molecular dating.

Applications

Bayesian phylogenetics is particularly valued when uncertainty assessment is critical, such as in conservation genetics, epidemiological forecasting, and species delimitation. It excels at integrating multiple data types, including morphological characters and stratigraphic ranges. In practice, Bayesian analyses complement DNA sequencing studies and scale well with data from next-generation sequencing. They are frequently applied to resolve relationships in bacterial genetics, where horizontal gene transfer and recombination create conflicting phylogenetic signals that Bayesian models can accommodate.