RNA Structure Prediction: From Sequence to Shape

Overview

RNA structure prediction determines the two-dimensional and three-dimensional conformation of RNA molecules from their nucleotide sequences. As the functional repertoire of RNA continues to expand — spanning catalysis, gene regulation, and scaffolding — knowing RNA structure is essential for understanding mechanism. RNA folding is hierarchical: the secondary structure (canonical Watson-Crick base pairs and G-U wobbles) forms first, driven by thermodynamic stability, and tertiary contacts subsequently consolidate the three-dimensional fold. Computational methods exploit this hierarchy to predict structures with increasing accuracy.

Methods

Secondary structure prediction uses dynamic programming algorithms based on the nearest-neighbor thermodynamic model, in which the free energy of each base pair stack is summed to find the minimum free energy (MFE) structure. The Zuker algorithm and its implementation in tools such as RNAfold and Mfold are widely used. Partition function approaches (e.g., RNAfold -p) calculate base-pairing probabilities at thermodynamic equilibrium. Comparative sequence analysis improves prediction accuracy by identifying covarying positions across homologous sequences. For tertiary structure, fragment assembly methods (e.g., SimRNA, FARFAR2) and deep learning models (e.g., DRfold) build three-dimensional models guided by secondary structure constraints.

Applications

RNA structure prediction underpins the design of small interfering RNAs, antisense oligonucleotides, and CRISPR guide RNAs. It enables the annotation of non-coding RNAs and the study of riboswitch mechanisms, where ligand-induced conformational changes regulate gene expression. These predictions are interpreted alongside RNA structure and types to classify newly discovered transcripts and in relation to transcription and RNA processing to understand co-transcriptional folding. The approach also contributes to gene regulation and epigenetics by modeling how structural elements in untranslated regions control translation efficiency.