Phylogenetic tree information aids supervised learning for predicting protein-protein interaction based on distance matrices
- PMID: 17212819
- PMCID: PMC1781468
- DOI: 10.1186/1471-2105-8-6
Phylogenetic tree information aids supervised learning for predicting protein-protein interaction based on distance matrices
Abstract
Background: Protein-protein interactions are critical for cellular functions. Recently developed computational approaches for predicting protein-protein interactions utilize co-evolutionary information of the interacting partners, e.g., correlations between distance matrices, where each matrix stores the pairwise distances between a protein and its orthologs from a group of reference genomes.
Results: We proposed a novel, simple method to account for some of the intra-matrix correlations in improving the prediction accuracy. Specifically, the phylogenetic species tree of the reference genomes is used as a guide tree for hierarchical clustering of the orthologous proteins. The distances between these clusters, derived from the original pairwise distance matrix using the Neighbor Joining algorithm, form intermediate distance matrices, which are then transformed and concatenated into a super phylogenetic vector. A support vector machine is trained and tested on pairs of proteins, represented as super phylogenetic vectors, whose interactions are known. The performance, measured as ROC score in cross validation experiments, shows significant improvement of our method (ROC score 0.8446) over that of using Pearson correlations (0.6587).
Conclusion: We have shown that the phylogenetic tree can be used as a guide to extract intra-matrix correlations in the distance matrices of orthologous proteins, where these correlations are represented as intermediate distance matrices of the ancestral orthologous proteins. Both the unsupervised and supervised learning paradigms benefit from the explicit inclusion of these intermediate distance matrices, and particularly so in the latter case, which offers a better balance between sensitivity and specificity in the prediction of protein-protein interactions.
Figures




Similar articles
-
Improving protein protein interaction prediction based on phylogenetic information using a least-squares support vector machine.Ann N Y Acad Sci. 2007 Dec;1115:154-67. doi: 10.1196/annals.1407.005. Epub 2007 Oct 9. Ann N Y Acad Sci. 2007. PMID: 17925357
-
The inference of protein-protein interactions by co-evolutionary analysis is improved by excluding the information about the phylogenetic relationships.Bioinformatics. 2005 Sep 1;21(17):3482-9. doi: 10.1093/bioinformatics/bti564. Epub 2005 Jun 30. Bioinformatics. 2005. PMID: 15994190
-
Predicting domain-domain interaction based on domain profiles with feature selection and support vector machines.BMC Bioinformatics. 2010 Oct 29;11:537. doi: 10.1186/1471-2105-11-537. BMC Bioinformatics. 2010. PMID: 21034480 Free PMC article.
-
Application of Machine Learning Approaches for Protein-protein Interactions Prediction.Med Chem. 2017;13(6):506-514. doi: 10.2174/1573406413666170522150940. Med Chem. 2017. PMID: 28530547 Review.
-
Testing substitution models within a phylogenetic tree.Mol Biol Evol. 2003 Apr;20(4):572-8. doi: 10.1093/molbev/msg073. Epub 2003 Apr 2. Mol Biol Evol. 2003. PMID: 12679552 Review.
Cited by
-
The origins of the evolutionary signal used to predict protein-protein interactions.BMC Evol Biol. 2012 Dec 6;12:238. doi: 10.1186/1471-2148-12-238. BMC Evol Biol. 2012. PMID: 23217198 Free PMC article.
-
Extracting phylogenetic dimensions of coevolution reveals hidden functional signals.Sci Rep. 2022 Jan 17;12(1):820. doi: 10.1038/s41598-021-04260-1. Sci Rep. 2022. PMID: 35039514 Free PMC article.
-
Reconstructing phylogenetic tree using a protein-protein interaction technique.IET Nanobiotechnol. 2017 Dec;11(8):1005-1016. doi: 10.1049/iet-nbt.2016.0177. IET Nanobiotechnol. 2017. PMID: 29155401 Free PMC article.
-
Comparison of phylogenetic trees through alignment of embedded evolutionary distances.BMC Bioinformatics. 2009 Dec 15;10:423. doi: 10.1186/1471-2105-10-423. BMC Bioinformatics. 2009. PMID: 20003527 Free PMC article.
-
Exploring bacterial organelle interactomes: a model of the protein-protein interaction network in the Pdu microcompartment.PLoS Comput Biol. 2015 Feb 3;11(2):e1004067. doi: 10.1371/journal.pcbi.1004067. eCollection 2015 Feb. PLoS Comput Biol. 2015. PMID: 25646976 Free PMC article.
References
-
- Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, Knight JR, Lockshon D, Narayan V, Srinivasan M, Pochart P, Qureshi-Emili A, Li Y, Godwin B, Conover D, Kalbfleisch T, Vijayadamodar G, Yang M, Johnston M, Fields S, Rothberg JM. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature. 2000;403:623–627. doi: 10.1038/35001009. - DOI - PubMed
-
- Enright AJ, Iliopoulos I, Kyrpides NC, Ouzounis CA. Protein interaction maps for complete genome based on gene fusion events. Nature. 1999;403:86–90. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources