nif:isString
|
-
The Makah Tribal Council has granted permission to the Wootton lab for access to Tatoosh Island.
A food web composed of S species may be represented by an adjacency matrix A, where Aij is 1 if j consumes i, and 0 otherwise. Similarly, interaction networks may be represented by a signed adjacency matrix where Aij is 1 if the growth rate of j positively depends on the presence of i, -1 if its growth rate negatively depends on i, and 0 otherwise. Such a matrix may be thought of as containing the signs of the community matrix (the Jacobian evaluated at equilibrium), as opposed to a matrix of zero-sum energy or nutrient flow throughout the system (sensu [26]). Some interactions, such as competition for carbon or another nutrient, may be considered an indirect interaction which is the product of two consumer-resource interactions (two direct consumer-carbon interactions in this case). In this example, carbon would be incorporated into the differential equations underlying the community matrix. Since we are interested in how species group within an interconnected network, we require that the complete interaction networks are a single weakly connected component (that is, isolated subgraphs were removed).
Interaction data for Tatoosh Island were collected from the intertidal middle zone based on observed interactions and natural history information. This middle zone on Tatoosh is dominated by the mussel Mytilus californianus. This mussel-dominated band is defined from below by the presence of Pisaster ochraceus, which consumes M. californianus [27], and from above by physiological constraints, such as time spent submerged [28]. The signed interaction network contains 110 taxa and 1898 interaction pairs (869 +/-, 5 +/+, 208 +/0, 492 -/0, and 324 -/-). This dataset is available on Dryad (DOI:10.5061/dryad.39jv1) The largest weakly connected component was taken from Doñana Biological Reserve and Norwood Farm (data made available in [24] and [25], respectively). The Doñana network contains 391 species total, with 170 plants, 207 mutualists (576 mutualistic interactions), and 14 herbivores (221 feeding interactions). The Norwood network contains 445 species, with 91 plants, 251 mutualists (569 mutualistic interactions), 62 herbivores (570 herbivory interactions), and 43 parasitoids (367 parasitic interactions). Two species were classified in two categories: one which interacted both as a mutualist and as an herbivore, and another as both a mutualist and a parasitoid. Because taxonomically similar species are generally expected to fill similar roles in a community [29] (but see [30]), taxonomic data provide a potential natural grouping. Tatoosh taxa were classified to kingdom and phylum, and plants in the Doñana and Norwood webs were classified to the order level. Taxonomic levels were chosen to have a number of groups that was similar to the number of groups found by the group model for the complete networks. The high phylogenetic diversity of the Tatoosh system meant that taxonomic groupings beyond the phylum level included too many groups to provide useful information about the system. Taxonomic data for all three networks were gathered from the Integrated Taxonomic Information System (ITIS) database and Encyclopedia of Life (see S1 Text for details).
Group Model for Signed Directed Graphs: Consider an interaction web with S species and L links, K of which are positive and L − K negative. The data can be represented using a signed directed adjacency matrix N. What is the probability of obtaining N by chance? A simple model of random signed network structure is similar to an Erdős-Rényi random graph with S species and a fixed probability c of connecting any two nodes, with an additional probability π that a link is positive. Then the probability of obtaining exactly N using this model is: P(N(S,L,K)|c,π)=cLπK(1-c)Z(1-π)L-K(1) where Z = S2 − L is the number of zeros in the matrix. This likelihood is maximized when c^=LL+Z and π^=KL. Now to see this in the context of the group model, consider N when divided into two groups, X and Y. If N is a mutualistic web, these groups might correspond to plants and pollinators. Now the random network process involves eight probabilities: cxx, the probability of a species in group X connecting to another species in group X, πxx, the probability of a link between two species in X being positive, cxy, the probability of a species in X connecting to a species in Y, and so on for cyx, cyy, πyy, πxy, and πyx, which are defined similarly. Note that cxy and cyx are not necessarily equal (nor are πxy and πyx), since N need not be symmetric. Then the probability of obtaining N given the two groups is: P(N(S,L,K)|cij,πij,i,j∈x,y)=∏i∈(X,Y)∏j∈(X,Y)cijLijπijKij(1-cij)Zij(1-πij)Lij-Kij(2) Analagous to Eq 1, this likelihood is maximized when c^ij=LijLij+Zij and π^ij=KijLij for all combinations of groups. This can be generalized to g groups as follows: P(N(S,L,K)|cij,πij,i,j∈1:g)=∏i=1g∏j=1gcijLijπijKij(1-cij)Zij(1-πij)Lij-Kij(3) When g = 1, this is equivalent to Eq 1. When g = S, each species is in its own group, and the likelihood is 1. Such a grouping is not very informative, so we need to perform model selection. Using a uniform prior (such that the probability of each model is 12), it is possible to analytically calculate a Bayes factor to compare two groupings. For groupings G1 and G2, the Bayes factor is given by: B=P(N|G1)P(N|G2)(4) where P(N∣Gi) is the marginal likelihood ∫01⋯∫01P(cij,πij,i,j∈1:S|Gi)P(N|cij,πij,i,j∈1:g,Gi)dc11…dcggdπ11…dπgg(5) which can be analytically integrated to give: ∏i=1g∏j=1gKij!Zij!(Lij-Kij)!(1+Lij)(1+Lij+Zij)! (6) Because there are many possible groupings to choose from, we compared the marginal likelihoods of the groupings when searching for the best grouping, rather than explicitly calculating B for each pair. We searched for the optimal grouping using Metropolis-Coupled Markov Chain Monte Carlo (MC3) with a Gibbs sampler (see S1 Text). It is not feasible to exhaustively search the space of all possible groupings, so the best groupings found are not guaranteed to be the optimal ones, but for simplicity, we refer to them as “best groupings” throughout.
The entropy of a partition A is an information theoretic measure of the information content or uncertainty of that partition, measured in nats [31]. A partition where all species are in the same group would have low entropy, because we can be quite certain of which group any given species belongs to. In contrast, a partition with many groups would have higher entropy, since it is difficult to make an a priori guess about the group identity of a given species. Entropy is calculated as: H(A)=-∑a∈Ap(a)ln(p(a))(7) This metric is known as Shannon entropy, commonly used in ecology to measure the diversity of a community [32]. The joint entropy of two partitions A and B is similarly defined: H(A,B)=-∑a∈A∑b∈Bp(a,b)ln(p(a,b))(8) This can be thought of as the union between H(A) and H(B), since it sums over all joint probabilities of the two entropies. Note that for all entropies, 0 ln(0) is given to be 0, so that including values with probability zero does not change the entropy [31]. To measure the similarity between two partitions, we then wish to know how much entropy the partitions share. This is known as the mutual information (MI), which quantifies the reduction in entropy of partition B when partition A is known. It is calculated as MIAB=H(A)+H(B)-H(A,B)(9) This can be thought of as the intersection between H(A) and H(B). Converting this measure into probabilities gives us MIAB=-∑a∈Ap(a)ln(p(a))-∑b∈Bp(b)ln(p(b))+∑a∈A∑b∈Bp(a,b)ln(p(a,b))(10) =-∑a∈A∑b∈Bp(a,b)ln(p(a))-∑a∈A∑b∈Bp(a,b)ln(p(b))+∑a∈A∑b∈Bp(a,b)ln(p(a,b))(11) =∑a∈A∑b∈Bp(a,b)ln(p(a,b)p(a)p(b))(12) To see how this is calculated for a partition generated by the group model, see Box 1. Box 1. Calculation of MI for ecological partitionsConsider the following two partitions for a five-species grouping: PartitionA:12121PartitionB:αβγββ where each column is a species, and numbers and Greek letters correspond to group identity within partitions A and B, respectively. Using these groupings, we can create a joint count matrix: 12ni⋅α101β123γ101n⋅j325 where each table entry nij is the number of species which are in group i in partition A and in group j in partition B. The row totals ni⋅ and column totals n⋅j are the marginal counts, i.e., the total number of species in group i in partition A or the total number of species in group j in partition B, respectively. These counts can easily be converted into probabilities by dividing by the total number of species N (in this case, 5). Then p(a)=na⋅N, p(b)=n⋅bN, and p(a,b)=nabN. This gives us MIAB=∑i=1gA∑j=1gBnijNln(nijN1ni·N1n·jN)(13) =∑i=1gA∑j=1gBnijNln(nijNni·n·j)(14) for our example: MIAB=15ln(1·51·3)+⋯+05ln(0·51·2)≈.102(15) Because the MI is the shared entropy between two partitions, it can be represented as a Venn Diagram, with circle areas proportional to H(A) and H(B), and the area of overlap between the circles proportional to the mutual information. The corresponding diagram for our example is given in Fig 2, with H(A) = .673, H(B) = .950, and MIAB = .102. Figure data removed from full text. Figure identifier and caption: 10.1371/journal.pcbi.1004330.g002 Mutual Information Venn Diagram for 5-species partitions A and B.Left circle represents H(A), right circle represents H(B), and the intersection represents MIAB. All areas are proportional to the values they represent.
Significance of MI values was estimated based on a randomization test. To estimate how likely it was to get an equal or higher MI by chance, each of the two partitions were shuffled, such that the randomized partitions conserved the number of species in each group (and therefore the upper bound on the MI, see S1 text for details), but not their identities. The MI was then calculated for the randomized partitions. This process was repeated one million times, and the p-value was estimated as the probability of getting an MI greater than or equal to the observed MI for the two partitions. Since the probability of getting a given MI is based both on the entropies and the groupings, it is possible to get a low p-value for a relatively low MI, or a high p-value for a high MI. Code for calculating partition similarity, obtaining taxonomic data, and running the search algorithm are available on GitHub at https://github.com/esander91/SignedGroupModel.
|