Overview
Protein identification by mass spectrometry is the cornerstone of proteomics. The fundamental workflow involves digesting proteins into peptides using a protease such as trypsin, measuring the masses of the intact peptides (MS1), and then fragmenting selected peptides to generate tandem mass spectra (MS/MS) that reveal their amino acid sequence. The resulting spectra are matched against theoretical spectra derived from protein sequence databases to assign identities. This bottom-up or shotgun approach is highly scalable and can identify thousands of proteins from complex mixtures in a single experiment.
Methods
Peptide mass fingerprinting (PMF) identifies proteins by matching the list of experimentally measured peptide masses against the theoretical peptide masses calculated from a database. PMF works best for simple protein mixtures or purified proteins separated by techniques such as SDS-PAGE. Tandem MS (MS/MS) identification fragments individual peptides to produce sequence-informative spectra. Search engines compare the observed fragment ion series — primarily b- and y-ions — against predicted series from candidate peptides. De novo sequencing infers the peptide sequence directly from the spectrum when no database match is found, using the mass differences between consecutive fragment ions to derive the sequence.
Applications
Protein identification is applied throughout molecular biology and clinical research. It confirms the identity of purified proteins obtained through protein extraction and purification, characterizes components of proteomics and mass spectrometry experiments, and identifies protein complexes co-purified with a bait protein. In microbiology, it is used to identify bacterial species through mass spectrometry-based proteotyping, complementing genomic approaches.