Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Jan;27(1):59-70.
doi: 10.1007/s10930-007-9108-x.

Characterization of protein-protein interfaces

Affiliations

Characterization of protein-protein interfaces

Changhui Yan et al. Protein J. 2008 Jan.

Abstract

We analyze the characteristics of protein-protein interfaces using the largest datasets available from the Protein Data Bank (PDB). We start with a comparison of interfaces with protein cores and non-interface surfaces. The results show that interfaces differ from protein cores and non-interface surfaces in residue composition, sequence entropy, and secondary structure. Since interfaces, protein cores, and non-interface surfaces have different solvent accessibilities, it is important to investigate whether the observed differences are due to the differences in solvent accessibility or differences in functionality. We separate out the effect of solvent accessibility by comparing interfaces with a set of residues having the same solvent accessibility as the interfaces. This strategy reveals residue distribution propensities that are not observable by comparing interfaces with protein cores and non-interface surfaces. Our conclusions are that there are larger numbers of hydrophobic residues, particularly aromatic residues, in interfaces, and the interactions apparently favored in interfaces include the opposite charge pairs and hydrophobic pairs. Surprisingly, Pro-Trp pairs are over represented in interfaces, presumably because of favorable geometries. The analysis is repeated using three datasets having different constraints on sequence similarity and structure quality. Consistent results are obtained across these datasets. We have also investigated separately the characteristics of heteromeric interfaces and homomeric interfaces.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Residue composition and residue propensities for different locations. (A) Residue compositions of protein cores, interfaces, and non-interface surfaces. (B) Residue propensities for protein cores, interfaces, and non-interface surfaces. Residues are ordered by their increasing hydrophobicity based on the Kyte and Doolittle hydropathy index [25]. The results are shown for Dataset100. The figures show that hydrophobic residues are more frequent in protein cores and less common on non-interface surfaces. The opposite trend is observed for hydrophilic residues. Residue propensities for interfaces are intermediate between those for protein cores and non-interface surfaces, with His, Tyr, and Gly being notable exceptions
Fig. 2
Fig. 2
Sequence entropies in protein cores, interfaces, and non-interface surfaces. Sequence entropy values for residues are extracted from the HSSP database (http://www.cmbi.kun.nl/gv/hssp/). The sequence entropy shows the conservation at each residue position in a multiple alignment. The values have been normalized over the range of 0–100, with the lowest sequence entropy values corresponding to the most conserved positions. The results are for Dataset100. The figure shows that among the three groups, protein cores have the highest fraction of residues with high conservation (less entropy values), non-interface surfaces have the smallest, and interfaces are intermediate
Fig. 3
Fig. 3
Secondary structure compositions of protein cores, interfaces, and non-interface surfaces. Secondary structures of proteins are defined using the DSSP program [27]: 310-helix (G), alpha helix (H), pi helix (I), helix-turn (T), extended beta sheet (E), beta bridge (B), bend (S), and other/loop (_). Each protein is divided into interface, protein core, and non-interface surface based on solvent accessibility and whether a residue is in the interface as described in Sect. 2. The results are achieved using Dataset100
Fig. 4
Fig. 4
Residue contact preferences for interfaces. (A) Raw contact frequencies given by (Cijm,nCmn), where Cij is the number of contacts between residue types i and j. (B) Contact preferences given by log2 ((Cijm,nCmn)/(wi × wj)) The results are given for Dataset100. Residues are placed in order by their increasing hydrophobicity based on the Kyte and Doolittle hydropathy index [25]. Figure B shows that Cys–Cys contacts, the contacts between residues with opposite charges, the contacts between different aromatic residues, and those between hydrophobic residues are preferred in interfaces. These contacts are shown in red in Figure B. Comparison between A and B shows that normalizing raw contact frequencies by the frequencies of individual residue types makes the preferences for these contacts stand out more clearly
Fig. 5
Fig. 5
Interface size distribution. Interface size is calculated separately for each side of an interface. The results are obtained for Dastaset100. The distribution has a peak at 600–800 Å2. About 25% of the interfaces have a (one-sided) size in the range of 800 (±200) Å2
Fig. 6
Fig. 6
Comparison of residue compositions of the SetrASA and at interfaces. Five SetrASAs are extracted from Dataset100. Mean values for the SetrASAs are displayed. The standard deviations are below 0.05 (They are shown as bars in the figure but too small to be visible). The residue types are placed in order by their increasing hydrophobicities
Fig. 7
Fig. 7
Normalized interface propensities (NIP) of residues. The propensities are calculated by comparing interfaces with the sets (SetrASA) of residues that have the same relative solvent accessibility as the interfaces. Five SetrASAs were extracted, and mean values are displayed. The standard deviations are below 0.02 (They are shown as bars in the figure, but most of them are too small to be visible). The results show the clear trend that hydrophobic residues are preferred in interfaces and hydrophilic residues are not. Aromatic residues also have high NIP. The results are obtained using Dataset100
Fig. 8
Fig. 8
Comparison of normalized interface propensities (NIP) and raw interface propensities (RIP). NIP are calculated by comparing interfaces with the set of residues (SetrASA) that has the same relative solvent accessibility as the interfaces. Five SetrASAs are extracted, and their mean values are displayed. The standard deviations are below 0.02 (They are shown as bars in the figure, but can barely be seen). RIP are calculated by comparing interfaces with the all residues. While NIP reveals the trend that hydrophobic residues are preferred in interfaces and hydrophilic residues are unfavorable in interfaces, this trend is not seen in the RIP. Many residues have opposite signs in RIP and NIP. The results were obtained for the Dataset100
Fig. 9
Fig. 9
Comparison of the entropies of interfaces with the SetrASA. Sequence entropy values for residues are extracted from the HSSP database (http://www.cmbi.kun.nl/gv/hssp/). The sequence entropy shows the conservation at each residue position in a multiple alignment. The values are normalized over the range of 0–100, with the lowest sequence entropy values corresponding to the most conserved positions. Five SetrASAs are extracted, and the mean values are displayed. The standard deviations are below 0.05 (They are shown as bars in the figure but too small to be visible). The results are shown for Dataset100
Fig. 10
Fig. 10
Secondary structure composition of interfaces and the SetrASA. Five SetrASAs are extracted. Mean values for the SetrASAs are displayed. The standard deviations are less than 0.01 (They are shown as bars in the figure but too small to be visible). The results are achieved using Dataset100
Fig. 11
Fig. 11
The results obtained for three different datasets are consistent. (A–C) Residue composition. (D–F) Sequence entropy distribution. (G–I) Secondary structure composition. (J–L) Interface sizes. (M–O) Raw contact frequencies given by (Cijm,nCmn), where Cij is the number of contacts between residue types i and j. (P–R) Contact preferences given by log2 ((Cijm,nCmn)/(wi × wj)), where wi is the frequency of residue type i in the interfaces. A, D, G, J, M, and P are the results on Dataset100, which consists of 6,545 interfaces. B, E, H, K, N, and Q are the results on Dataset30, which consists of 2,557 interfaces. The mutual similarities among the interfaces are below 30%. C, F, I, L, O, and R are the results for Dataset30_3, which consists of 2,310 interfaces from structures having resolution better than 3.0 Å. The mutual similarities among the interfaces are below 30%
Fig. 11
Fig. 11
The results obtained for three different datasets are consistent. (A–C) Residue composition. (D–F) Sequence entropy distribution. (G–I) Secondary structure composition. (J–L) Interface sizes. (M–O) Raw contact frequencies given by (Cijm,nCmn), where Cij is the number of contacts between residue types i and j. (P–R) Contact preferences given by log2 ((Cijm,nCmn)/(wi × wj)), where wi is the frequency of residue type i in the interfaces. A, D, G, J, M, and P are the results on Dataset100, which consists of 6,545 interfaces. B, E, H, K, N, and Q are the results on Dataset30, which consists of 2,557 interfaces. The mutual similarities among the interfaces are below 30%. C, F, I, L, O, and R are the results for Dataset30_3, which consists of 2,310 interfaces from structures having resolution better than 3.0 Å. The mutual similarities among the interfaces are below 30%
Fig. 12
Fig. 12
Comparisons between homomeric interfaces and heteromeric interfaces. (A) Normalized interface propensities. (B) Sequence entropies. (C) Secondary structures. (D) Interface sizes. (E–F) Raw contact frequencies given by (Cijm,nCmn),where Cij is the number of contacts between residue types i and j. (G–H) Contact preferences given by log2 ((Cijm,nCmn)/(wi × wj)). The results are obtained from Dataset100. Heteromeric interfaces and homomeric interfaces have been extracted from Dataset100 based on the sequence similarities between the interacting protein chains. An interface is a homomeric interface if the two interacting chains have a sequence identity greater than 95%. Otherwise, it is considered a heteromeric interface
Fig. 12
Fig. 12
Comparisons between homomeric interfaces and heteromeric interfaces. (A) Normalized interface propensities. (B) Sequence entropies. (C) Secondary structures. (D) Interface sizes. (E–F) Raw contact frequencies given by (Cijm,nCmn),where Cij is the number of contacts between residue types i and j. (G–H) Contact preferences given by log2 ((Cijm,nCmn)/(wi × wj)). The results are obtained from Dataset100. Heteromeric interfaces and homomeric interfaces have been extracted from Dataset100 based on the sequence similarities between the interacting protein chains. An interface is a homomeric interface if the two interacting chains have a sequence identity greater than 95%. Otherwise, it is considered a heteromeric interface

Similar articles

Cited by

References

    1. Chothia C, Janin J. Nature. 1975;256:705–708. - PubMed
    1. Wodak SJ, Janin J. Adv Protein Chem. 2002;61:9–73. - PubMed
    1. Deremble C, Lavery R. Curr Opin Struct Biol. 2005;15:171–175. - PubMed
    1. Ponstingl H, Kabir T, Gorse D, Thornton JM. Progr Bio-phys Mol Biol. 2005;89:9–35. - PubMed
    1. Reichmann D, Rahat O, Cohen M, Neuvirth H, Schreiber G. Curr Opin Struct Biol. 2007;17:67–76. - PubMed

Publication types

LinkOut - more resources