PropertyValue
is nif:broaderContext of
nif:broaderContext
is schema:hasPart of
schema:isPartOf
nif:isString
  • Dataset. We compiled a dataset of 330 protein domains involving 370 domain–domain interactions from the database provided by Itzhaki et al. [4] This database was obtained by mapping structurally derived domain–domain interactions onto the cellular protein–protein interaction network of five different organisms (Escherichia coli, Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, and Homo sapiens). Our initial dataset contained all single-chain domains with representative structures of domain–domain interactions in the iPfam database [41]. Using multiple sequence alignments provided by iPfam, we mapped all the binding sites of each domain onto its representative structure (Table S1). We selected only binding sites containing at least 80% of their residues within the representative domain structures. All structure images were created using DS ViewerPro 6.0 [42]. Network analysis of domain structures. Residues i and j were considered to be in contact if at least one atom corresponding to residue i was at a distance of less than or equal to 5 Å from an atom from residue j. This value approximates the upper limit for attractive London–van der Waals forces [43]. We modeled the PDB structures of the representative domains as graphs, with residues corresponding to vertices, and their contacts to edges. These networks were subsequently decomposed into modules using the edge-betweenness clustering algorithm proposed by Girvan and Newman [21,28], based on the iterative removal of edges with the highest number of paths running through it (see also Figure S3). We used the parallel implementation PEBC (parallel edge-betweenness clustering) [44] of the algorithm. We used the previously introduced expression for the modularity of each module m [20]: where L is the number of edges in the network, lm is the number of edges between nodes in module m, and dm is the sum of the degrees of nodes in module m. Modules with higher Qm contain many within-module edges, whereas random partitions of the network have an expected value of Qm = 0. Binding site analysis. Binding site clustering. In order to detect whether a domain is interacting with different domains using non-overlapping binding sites, we clustered the list of binding sites corresponding to each domain in the dataset. First, we defined a distance matrix for all pairs of binding sites as: where ni and nj are the number of residues in binding sites i and j that have contacts with the other binding sites j and i, respectively. Ni and Nj are the total number of residues belonging to each binding site. Two binding sites i and j were considered as non-overlapping if C(i, j) < 0.7. Our clustering protocol was based on the hierarchical agglomerative clustering algorithm (see also Figure S4), defined as follows: (1) find the closest pair of binding sites in the distance matrix; (2) merge these two binding sites into a new single binding site if the distance between them is C(i, j) < 0.7; and (3) compute the distance matrix for the new reduced list of binding sites. The clustering process terminates when the distances between all pairs of binding sites are above the threshold, obtaining a set of mutually non-overlapping binding sites in the domain. Relative interface between binding sites. We defined the relative interface between two binding sites as in Equation 2. This parameter represents the averaged proportion of binding site contacting residues, and is a measure of closeness between these binding sites. C(i, j) varies from 0 to 1. Values close to 0 imply a small relative interface, indicating a clear structural separation between both binding sites, whereas values close to 1 appear when almost all residues in both binding sites are on the interface, illustrating their proximity. Similarity of binding site modular compositions. We defined for each binding site j a vector mj representing its modular composition as follows: where mjk is the number of residues of binding site j in module k; and M is the total number of modules in which the domain has been decomposed. The modular composition similarity between two binding sites i and j is defined as the uncentered Pearson correlation coefficient between their respective vectors of modular composition: where , and are the Euclidean norms of vector i and j, respectively. M(i, j) varies from 0 to 1. Values close to 0 show significant differences in the modular compositions of each binding site, whereas values close to 1 correspond to binding sites with almost identical modular compositions. Evaluation of performance. Random generation of binding sites. To test the statistical significance of our studies, we generated a list of random binding sites for each domain, keeping the same number and size of the original binding sites. The random binding sites were generated in the following way: (1) we randomly selected one of the residues in each binding site as the seed residue for the new binding site; and (2) we iteratively added more random neighbors to the new binding site until the number of residues on it equaled the size of the original binding site. In the case of domains with more than one binding site, we checked that all pairs of binding sites in the corresponding list verified C(i, j) < 0.7; otherwise, the random generation of binding sites for this domain was repeated until such condition was reached. We generated 500 random realizations for each binding site of each domain of our dataset. Accuracy and coverage. The accuracy and coverage for the prediction methods were defined as: where TP, FP, and FN are the number of true positives, false positives, and false negatives, respectively. Conservation analysis. Residue conservation scores were determined for each representative domain structure from the ConSurf-HSSP database [45]. A residue was considered as conserved if its score was greater or equal to 9. Patch analysis. Predictions of surface patches for the representative domain structures were determined from the SHARP2 server [46]. We considered the best three predicted overlapping patches.
rdf:type