Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Jun 23:8:216.
doi: 10.1186/1471-2105-8-216.

ProMEX: a mass spectral reference database for proteins and protein phosphorylation sites

Affiliations

ProMEX: a mass spectral reference database for proteins and protein phosphorylation sites

Jan Hummel et al. BMC Bioinformatics. .

Abstract

Background: In the last decade, techniques were established for the large scale genome-wide analysis of proteins, RNA, and metabolites, and database solutions have been developed to manage the generated data sets. The Golm Metabolome Database for metabolite data (GMD) represents one such effort to make these data broadly available and to interconnect the different molecular levels of a biological system 1. As data interpretation in the light of already existing data becomes increasingly important, these initiatives are an essential part of current and future systems biology.

Results: A mass spectral library consisting of experimentally derived tryptic peptide product ion spectra was generated based on liquid chromatography coupled to ion trap mass spectrometry (LC-IT-MS). Protein samples derived from Arabidopsis thaliana, Chlamydomonas reinhardii, Medicago truncatula, and Sinorhizobium meliloti were analysed. With currently 4,557 manually validated spectra associated with 4,226 unique peptides from 1,367 proteins, the database serves as a continuously growing reference data set and can be used for protein identification and quantification in uncharacterized biological samples. For peptide identification, several algorithms were implemented based on a recently published study for peptide mass fingerprinting 2 and tested for false positive and negative rates. An algorithm which considers intensity distribution for match correlation scores was found to yield best results. For proof of concept, an LC-IT-MS analysis of a tryptic leaf protein digest was converted to mzData format and searched against the mass spectral library. The utility of the mass spectral library was also tested for the identification of phosphorylated tryptic peptides. We included in vivo phosphorylation sites of Arabidopsis thaliana proteins and the identification performance was found to be improved compared to genome-based search algorithms. Protein identification by ProMEX is linked to other levels of biological organization such as metabolite, pathway, and transcript data. The database is further connected to annotation and classification services via BioMoby.

Conclusion: The ProMEX protein/peptide database represents a mass spectral reference library with the capability of matching unknown samples for protein identification. The database allows text searches based on metadata such as experimental information of the samples, mass spectrometric instrument parameters or unique protein identifier like AGI codes. ProMEX integrates proteomics data with other levels of molecular organization including metabolite, pathway, and transcript information and may thus become a useful resource for plant systems biology studies. The ProMEX mass spectral library is available at http://promex.mpimp-golm.mpg.de/.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Peak intensity distribution from product ion spectra. 70% of all fragment peaks have a relative intensity less or equal 0.02. The peak on the right hand side originates from the prior scaling of all spectra to a maximum peak intensity of 1.
Figure 2
Figure 2
Assessment of the effect of noise removal on the identification performance illustrated by the receiver operating characteristic (ROC). The identification rate is robust against different levels of peak noise filters up to 10%. We decided to use a conservative noise removal level of 2%. Recall (true positives) reflects the number of correctly identified peptides based on SEQUEST search, whereas 1 specificity corresponds to all ProMEX peptide identifications which did not match to the SEQUEST identifications. As a comparison, results from using the Euclidean distance are also included which proved to be inferior to the dot product.
Figure 3
Figure 3
Screenshot of the ProMEX user interface. Input can be inserted into the textbox or uploaded as file using the file selection dialog.
Figure 4
Figure 4
Screenshot of the result page showing four identified peptides associated with the candidate hit protein At1g13440 which was tagged in the sample by 8 submitted fragment spectra. For visual inspection of the spectra match, a north-south-plot of the library versus query spectra is shown below.
Figure 5
Figure 5
Comparison of peptide identification rate as a function of match score. True positive identifications are all matches found by SEQUEST and ProMEX. False positives are defined as residual peptide identifications by ProMEX which did not match to the SEQUEST hits. The default threshold value for ProMEX was set to 0.5 according to this diagram.

Similar articles

Cited by

References

    1. Kopka J, Schauer N, Krueger S, Birkemeyer C, Usadel B, Bergmuller E, Dormann P, Weckwerth W, Gibon Y, Stitt M, Willmitzer L, Fernie AR, Steinhauser D. GMD@CSB.DB: the Golm Metabolome Database. Bioinformatics. 2005;21:1635–1638. doi: 10.1093/bioinformatics/bti236. - DOI - PubMed
    1. Wolski W, Lalowski M, Martus P, Herwig R, Giavalisco P, Gobom J, Sickmann A, Lehrach H, Reinert K. Transformation and other factors of the peptide mass spectrometry pairwise peak-list comparison process. BMC Bioinformatics. 2005;6:285. doi: 10.1186/1471-2105-6-285. - DOI - PMC - PubMed
    1. Washburn MP, Wolters D, Yates JR. Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat Biotechnol. 2001;19:242–247. doi: 10.1038/85686. - DOI - PubMed
    1. Wienkoop S, Larrainzar E, Niemann M, Gonzalez E, Lehmann U, Weckwerth W. Stable isotope-free quantitative shotgun proteomics combined with sample pattern recognition for rapid diagnostics - a case study in Medicago truncatula nodules. Journal of Separation Science. 2006;29:2793–2801. doi: 10.1002/jssc.200600290. - DOI - PubMed
    1. Wienkoop S, Glinski M, Tanaka N, Tolstikov V, Fiehn O, Weckwerth W. Linking protein fractionation with multidimensional monolithic RP peptide chromatography/mass spectrometry enhances protein identification from complex mixtures even in the presence of abundant proteins. Rapid Communications of Mass Spectrometry. 2004;18:643–650. doi: 10.1002/rcm.1376. - DOI - PubMed

MeSH terms