PMC2134966 | SoMeSci

Property	Value
is nif:broaderContext of	sms:PMC2134966/sentence0 sms:PMC2134966/sentence1 sms:PMC2134966/sentence10 sms:PMC2134966/sentence11 sms:PMC2134966/sentence12 sms:PMC2134966/sentence13 sms:PMC2134966/sentence14 sms:PMC2134966/sentence15 sms:PMC2134966/sentence16 sms:PMC2134966/sentence17 sms:PMC2134966/sentence18 sms:PMC2134966/sentence19 sms:PMC2134966/sentence2 sms:PMC2134966/sentence20 sms:PMC2134966/sentence21 sms:PMC2134966/sentence22 sms:PMC2134966/sentence23 sms:PMC2134966/sentence24 sms:PMC2134966/sentence25 sms:PMC2134966/sentence26 sms:PMC2134966/sentence27 sms:PMC2134966/sentence28 sms:PMC2134966/sentence29 sms:PMC2134966/sentence3 sms:PMC2134966/sentence30 sms:PMC2134966/sentence31 sms:PMC2134966/sentence32 sms:PMC2134966/sentence33 sms:PMC2134966/sentence34 sms:PMC2134966/sentence35 sms:PMC2134966/sentence36 sms:PMC2134966/sentence37 sms:PMC2134966/sentence38 sms:PMC2134966/sentence39 sms:PMC2134966/sentence4 sms:PMC2134966/sentence40 sms:PMC2134966/sentence41 sms:PMC2134966/sentence42 sms:PMC2134966/sentence43 sms:PMC2134966/sentence44 sms:PMC2134966/sentence45 sms:PMC2134966/sentence5 sms:PMC2134966/sentence6 sms:PMC2134966/sentence7 sms:PMC2134966/sentence8 sms:PMC2134966/sentence9
nif:broaderContext	<https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2134966>
is schema:hasPart of	sms:PLoS_Methods
schema:isPartOf	sms:PLoS_Methods
nif:isString	Dataset. We compiled a dataset of 330 protein domains involving 370 domain–domain interactions from the database provided by Itzhaki et al. [4] This database was obtained by mapping structurally derived domain–domain interactions onto the cellular protein–protein interaction network of five different organisms (Escherichia coli, Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, and Homo sapiens). Our initial dataset contained all single-chain domains with representative structures of domain–domain interactions in the iPfam database [41]. Using multiple sequence alignments provided by iPfam, we mapped all the binding sites of each domain onto its representative structure (Table S1). We selected only binding sites containing at least 80% of their residues within the representative domain structures. All structure images were created using DS ViewerPro 6.0 [42]. Network analysis of domain structures. Residues i and j were considered to be in contact if at least one atom corresponding to residue i was at a distance of less than or equal to 5 Å from an atom from residue j. This value approximates the upper limit for attractive London–van der Waals forces [43]. We modeled the PDB structures of the representative domains as graphs, with residues corresponding to vertices, and their contacts to edges. These networks were subsequently decomposed into modules using the edge-betweenness clustering algorithm proposed by Girvan and Newman [21,28], based on the iterative removal of edges with the highest number of paths running through it (see also Figure S3). We used the parallel implementation PEBC (parallel edge-betweenness clustering) [44] of the algorithm. We used the previously introduced expression for the modularity of each module m [20]: where L is the number of edges in the network, lm is the number of edges between nodes in module m, and dm is the sum of the degrees of nodes in module m. Modules with higher Qm contain many within-module edges, whereas random partitions of the network have an expected value of Qm = 0. Binding site analysis. Binding site clustering. In order to detect whether a domain is interacting with different domains using non-overlapping binding sites, we clustered the list of binding sites corresponding to each domain in the dataset. First, we defined a distance matrix for all pairs of binding sites as: where ni and nj are the number of residues in binding sites i and j that have contacts with the other binding sites j and i, respectively. Ni and Nj are the total number of residues belonging to each binding site. Two binding sites i and j were considered as non-overlapping if C(i, j) < 0.7. Our clustering protocol was based on the hierarchical agglomerative clustering algorithm (see also Figure S4), defined as follows: (1) find the closest pair of binding sites in the distance matrix; (2) merge these two binding sites into a new single binding site if the distance between them is C(i, j) < 0.7; and (3) compute the distance matrix for the new reduced list of binding sites. The clustering process terminates when the distances between all pairs of binding sites are above the threshold, obtaining a set of mutually non-overlapping binding sites in the domain. Relative interface between binding sites. We defined the relative interface between two binding sites as in Equation 2. This parameter represents the averaged proportion of binding site contacting residues, and is a measure of closeness between these binding sites. C(i, j) varies from 0 to 1. Values close to 0 imply a small relative interface, indicating a clear structural separation between both binding sites, whereas values close to 1 appear when almost all residues in both binding sites are on the interface, illustrating their proximity. Similarity of binding site modular compositions. We defined for each binding site j a vector mj representing its modular composition as follows: where mjk is the number of residues of binding site j in module k; and M is the total number of modules in which the domain has been decomposed. The modular composition similarity between two binding sites i and j is defined as the uncentered Pearson correlation coefficient between their respective vectors of modular composition: where , and are the Euclidean norms of vector i and j, respectively. M(i, j) varies from 0 to 1. Values close to 0 show significant differences in the modular compositions of each binding site, whereas values close to 1 correspond to binding sites with almost identical modular compositions. Evaluation of performance. Random generation of binding sites. To test the statistical significance of our studies, we generated a list of random binding sites for each domain, keeping the same number and size of the original binding sites. The random binding sites were generated in the following way: (1) we randomly selected one of the residues in each binding site as the seed residue for the new binding site; and (2) we iteratively added more random neighbors to the new binding site until the number of residues on it equaled the size of the original binding site. In the case of domains with more than one binding site, we checked that all pairs of binding sites in the corresponding list verified C(i, j) < 0.7; otherwise, the random generation of binding sites for this domain was repeated until such condition was reached. We generated 500 random realizations for each binding site of each domain of our dataset. Accuracy and coverage. The accuracy and coverage for the prediction methods were defined as: where TP, FP, and FN are the number of true positives, false positives, and false negatives, respectively. Conservation analysis. Residue conservation scores were determined for each representative domain structure from the ConSurf-HSSP database [45]. A residue was considered as conserved if its score was greater or equal to 9. Patch analysis. Predictions of surface patches for the representative domain structures were determined from the SHARP2 server [46]. We considered the best three predicted overlapping patches.
rdf:type	nif:Context