Skip to content

Article image
Metabolite Identification: From Spectra to Structures

Overview

Metabolite identification is the process of determining the chemical structure of a metabolite from its analytical data — primarily mass spectrometry and NMR spectra. It is widely regarded as the most challenging bottleneck in untargeted metabolomics. While modern instruments can detect thousands of metabolite features in a biological sample, only a fraction can be confidently identified. The identification process proceeds through multiple levels of confidence, from an accurate mass formula through putative annotation to definitive structural confirmation. Each level requires different types of evidence and carries a different degree of certainty.

Key Concepts

The Metabolomics Standards Initiative (MSI) defines four confidence levels: Level 1 — confirmed structure (matching authentic standard in two orthogonal dimensions); Level 2 — putatively annotated (matching spectral library without standard); Level 3 — putatively characterized compound class; Level 4 — unknown but detectable. Accurate mass measurement (sub-5 ppm) narrows the candidate list to a limited set of molecular formulas. Isotopic pattern analysis confirms the elemental composition. Fragmentation spectra (MS/MS or MSn) provide structural information by revealing substructures, neutral losses, and connectivity. Retention time prediction models add an orthogonal dimension for filtering candidates. Database searching against HMDB, METLIN, MassBank, and GNPS is the primary route for Level 2 annotation.

Applications

Metabolite identification is essential for every untargeted metabolomics experiment. It enables the discovery of novel metabolites, the characterization of unknown biomarkers, and the assignment of biochemical significance to statistical features. Identification relies on NMR spectroscopy for definitive structure elucidation and mass spectrometry for sensitive detection. Complementary infrared spectroscopy data can provide additional functional group information, and all these techniques together help convert raw spectral features into biologically interpretable metabolites.