Protein Structure Prediction: From Sequence to Structure

Overview

Protein structure prediction aims to determine the three-dimensional conformation of a protein from its amino acid sequence alone. Because experimental structure determination — via X-ray crystallography, NMR spectroscopy, or cryo-EM — remains time-consuming and expensive, computational methods offer a scalable alternative. The field has been revolutionized by deep learning, most notably by AlphaFold2, which achieves near-experimental accuracy for a large fraction of globular proteins. Structure prediction bridges the gap between the ever-growing sequence databases and the comparatively sparse structural coverage of the protein universe.

Methods

Three main approaches exist. Homology (comparative) modeling builds a target structure using one or more experimentally determined template structures with significant sequence similarity. Threading (fold recognition) detects compatible folds even when sequence identity is low. Deep learning methods — particularly those based on transformer architectures and end-to-end differentiable models — predict inter-residue distances and angles from co-evolutionary information and physical constraints. Model quality is assessed using metrics such as pLDDT (predicted local distance difference test) and TM-score.

Applications

Accurate structure prediction accelerates drug discovery by enabling structure-based virtual screening, guides the design of mutagenesis experiments, and facilitates functional annotation of uncharacterized proteins. Predicted structures are routinely used to interpret protein structure relationships, to study the impact of mutations in the context of amino acids, and to model folding pathways relevant to protein folding and chaperones. The approach also complements experimental data from NMR spectroscopy by providing starting models for structure refinement.