Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 May 31;1(1):3-24.
doi: 10.3390/proteomes1010003.

Protein-Protein Interactions: Gene Acronym Redundancies and Current Limitations Precluding Automated Data Integration

Affiliations

Protein-Protein Interactions: Gene Acronym Redundancies and Current Limitations Precluding Automated Data Integration

Juan Casado-Vela et al. Proteomes. .

Abstract

Understanding protein interaction networks and their dynamic changes is a major challenge in modern biology. Currently, several experimental and in silico approaches allow the screening of protein interactors in a large-scale manner. Therefore, the bulk of information on protein interactions deposited in databases and peer-reviewed published literature is constantly growing. Multiple databases interfaced from user-friendly web tools recently emerged to facilitate the task of protein interaction data retrieval and data integration. Nevertheless, as we evidence in this report, despite the current efforts towards data integration, the quality of the information on protein interactions retrieved by in silico approaches is frequently incomplete and may even list false interactions. Here we point to some obstacles precluding confident data integration, with special emphasis on protein interactions, which include gene acronym redundancies and protein synonyms. Three human proteins (choline kinase, PPIase and uromodulin) and three different web-based data search engines focused on protein interaction data retrieval (PSICQUIC, DASMI and BIPS) were used to explain the potential occurrence of undesired errors that should be considered by researchers in the field. We demonstrate that, despite the recent initiatives towards data standardization, manual curation of protein interaction networks based on literature searches are still required to remove potential false positives. A three-step workflow consisting of: (i) data retrieval from multiple databases, (ii) peer-reviewed literature searches, and (iii) data curation and integration, is proposed as the best strategy to gather updated information on protein interactions. Finally, this strategy was applied to compile bona fide information on human DREAM protein interactome, which constitutes liable training datasets that can be used to improve computational predictions.

Keywords: DREAM; HGNC; HUGO; KChIP3; bioinformatics; calsenilin; choline kinase; data integration; gene acronym; gene redundancy; human interactome; protein accession; protein interactions; protein-protein prediction; uromodulin.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Protein interaction network corresponding to human choline kinase (CHKA) using two different versions of STRING [34] versions 9.0 and 9.05. String may be accessed through it web interface [35] or selecting the corresponding option in PSICQUIC View [36]. Searches were triggered using the Swiss-Prot accession number P35790 [32], which uniquely identifies CHKA. The query protein (CHKA, depicted as a red sphere) appears connected with surrounding candidate interacting proteins. Left panel: database searches using STRING v9.0 retrieved false positive nodes A (RCC1, regulator of chromosome condensation) and B (casein kinase proteins -CSNKs- 1G2, 1D, 1A1, 1E, 1AIL and 1G1). Right panel: a recent version of the software (STRING v.9.05) removed false positives and improved the quality of CHKA interactions. STRING also shows information on the source of the interaction mapped is also included as colored lines (databases, textmining and experimental evidence). The default scoring filtering criteria were selected in all cases.
Figure 2
Figure 2
Bar-plot demonstrating the redundancy displayed by human gene acronyms and their synonyms.

Similar articles

References

    1. Schuler G.D., Boguski M.S., Stewart E.A., Stein L.D., Gyapay G., Rice K., White R.E., Rodriguez-Tome P., Aggarwal A., Bajorek E., et al. A gene map of the human genome. Science. 1996;274:540–546. doi: 10.1126/science.274.5287.540. - DOI - PubMed
    1. Lander E.S., Linton L.M., Birren B., Nusbaum C., Zody M.C., Baldwin J., Devon K., Dewar K., Doyle M., FitzHugh W., et al. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. doi: 10.1038/35057062. - DOI - PubMed
    1. Venter J.C., Adams M.D., Myers E.W., Li P.W., Mural R.J., Sutton G.G., Smith H.O., Yandell M., Evans C.A., Holt R.A., et al. The sequence of the human genome. Science. 2001;291:1304–1351. doi: 10.1126/science.1058040. - DOI - PubMed
    1. Gray K.A., Daugherty L.C., Gordon S.M., Seal R.L., Wright M.W., Bruford E.A. Genenames.org: the HGNC resources in 2013. Nucleic Acids Res. 2013;41:D545–D552. - PMC - PubMed
    1. Ramani A.K., Bunescu R.C., Mooney R.J., Marcotte E.M. Consolidating the set of known human protein-protein interactions in preparation for large-scale mapping of the human interactome. Genome Biol. 2005;6:R40. doi: 10.1186/gb-2005-6-5-r40. - DOI - PMC - PubMed

LinkOut - more resources