nif:isString
|
-
The neuronal network of the adult nematode C. elegans was first described in the publication by White et al. [26] and was recently revised by Chen et al. [27] and Varshney et al. [28]. It expresses the regime of connections between the animal's 282 somatic neurons and classifies them with respect to their type and direction (http://www.wormatlas.org/neuronalwiring.html, accessed 15th June 2013). In our analysis, we consider a subset of this data where 3 disconnected neurons (VC06, CANL and CANR) are excluded from the set and we take all connections to be undirected. Furthermore, while the connections are distinguished in terms of their type (chemical synapses, gap junctions and neuromuscular junctions), we treat all connections as binary, that is, we assign value 1 if some type of connection exist and 0 otherwise. This yields a binary and symmetric adjacency matrix with 2287 edges that defines the C. elegans network. For an external evaluation of the community estimates, we use categorical and quantitative characteristics of the neurons (node-wise features) and quantitative characteristics of the edges (edge-wise features), as summarised in Table 1.
Table data removed from full text. Table identifier and caption: 10.1371/journal.pone.0097584.t001 Prior biological features of the C. elegans connectome. There is a large body of knowledge on the individual neurons, producing node-wise features. For example, we use the classification of neurons into ten anatomically defined ganglia (“Ganglion classification”), the classification of neurons by their circuitry (“Neuron type”) defined by four groups (sensory, motor, interneurons and polymodal neurons), as well as topological and synaptic division of neurons (“Neuron class”) defined by 103 groups [28], [29]. We also consider ventral nerve cord motor neurons involved in locomotion, egg-laying and possibly avoidance (broadly labelled as “Locomotion circuit” in Table 1) which was described by Haspel et al. [30] using connection data from Chalfie and White [31], Von Stetina et al. [32], Altun and Hall [33], and Chen et al. [27]. Explicitly, this circuit is composed of 84 neurons, of which 74 are motor neurons (excluding VC06) that comprise eight neuron classes. Four of these classes are connected to ventral muscles (VA, VD,VB and VC) while the other four classes are connected to dorsal muscles (AS, DA, DD and DB). The remaining 10 neurons are interneurons (AVA, AVD and AVE; AVB and PVC) promoting backward and forward motion. Although the connection data used in our analysis do not include neuromuscular connections, the circuit presented by Haspel et al. [30] provides some invaluable insights that are beneficial to the evaluation and comparison of the results obtained in our analysis. The remaining set of the node-wise features includes “Anatomical Location” (longitudinal and sectional positions) of the cell body (soma) and the “Birth Time” of each neuron (http://www.biological-networks.org/?page_id=25, accessed 15th June 2013) [34]. Edge-wise features include the “Anatomical Distance” (Euclidean distance between each neuron pair), the “Birth Time Difference” (for each neuron pair, we take an absolute difference in their birth times) and the “Lineage Distance” (for each neuron pair, this is the sum of total divisions to the most recent common ancestor cell) [35].
Our analysis consists of two stages. In the first stage, we derive community structures of the C. elegans neural network using 3 different methods, as described next. In the second stage, we estimate how well each network decomposition explains the system's known prior biological properties. The general techniques used for this part of the analysis are summarised in Section “Evaluation Methods”. We first fix our general notation, but emphasise that the terms “network” and “graph” are used interchangeably. A graph is defined as an object formed by a finite set of vertices (nodes) V of size n and a list of unordered pairs of vertices E (edge list) of size m. For a simple graph (i.e., graph without multiple edges or self connected vertices), the adjacency matrix is symmetric and binary, that is, its elements Aij take value 1 if there is an edge between vertices Vi and Vj and 0 otherwise. The degree of each vertex is , the number of edges connected to a vertex, while the set of all degrees is . Additionally, a graph can be characterised by a clustering coefficient that measures the tendency of its edges to form clusters. The clustering coefficient, defined by Newman [24], [36] is(1)the prevalence of fully connected triplets of nodes among the set of triplets that have at least two connections. The Erdős-Rényi Mixture Model (ERMM): The Erdős-Rényi (ER) model for a graph [37], [38] specifies that edges occur independently with a common probability. Real world graphs are rarely so homogeneous, and the ER model is generally not useful. In contrast, the Erdős-Rényi Mixture Model [23], [24], [39]–[41] poses an ER model on subsets of edges within the graph. In the ERMM, the adjacency matrix is treated as a random variable denoted by and the nodes are assumed to be allocated into Q unknown (latent) groups or blocks, indexed by . We record the group assignment of each node Vi with a dimensional random (classification) vector , whose elements Ziq take value 1 if Vi belongs to the q-th group and 0 otherwise; as each node belongs to exactly one group. The set then consists of independent, identically distributed random variables, each following a single trial multinomial distribution(2)where is a dimensional vector whose elements satisfy the constraint . The elements of describe the size or prevalence of each group, or, alternatively, can be interpreted as the probability that a randomly chosen node is contained in the q-th group. Note that different assumptions about the distribution of are also possible (see, e.g., the recent publication of Latouche et al. [42] who proposed an overlapping stochastic blockmodel). The ERMM specifies that, given the group (block) assignments of the vertices, the elements of are conditionally independent Bernoulli random variables with rates given by their corresponding elements in the connectivity matrix . In other words, if a vertex Vi belongs to group q and a vertex Vj belongs to group l, then(3)As is often the case with mixture models, the likelihood is stated as an incomplete data problem which is optimised for different values of Q, that is, . In the ERMM, however, such optimisation is particularly challenging. Nevertheless, the estimating equations of the model's parameters () can still be obtained with an approximate variational method [43], [44]. With an additional parameter (i.e., the variational parameter for Vi), the estimating equations proposed in [24] are(4)(5)where we employ the usual statistical convention of lower Roman variables, xij, to denote the observed version of the random data, Xij. For each node, the largest variational parameter estimate determines the classification vector estimate (6)The estimates just described depend on Q, the total number of partitions. To compare across different Q, the Integrated Classification Likelihood (ICL) criterion is used. For a model with Q groups, the ICL criterion is(7)where is an estimate of z and is the complete data log likelihood,(8)The details of each likelihood term as well as the derivation of the ICL criterion are presented in the Supplementary Text in File SI. Intuitively, the ICL criterion considers the evidence for the clustered data (i.e., ), and, at the same time, it uses the term () to penalise the model's complexity and, therefore, preserve the simplicity and parsimony of the selected model. Hence, it is generally harder to select a model with a larger number of groups. Using a Poisson approximation for a binomial distribution, the ERMM models the degree distribution as a mixture of Poisson distributions,(9)where is the Poisson rate for the q-th group, . Finally, Daudin, Picard and Robin in [24] proposed that the fitted ERMM can be used to estimate the Newman clustering coefficients (see Eq. (1)) as(10)For further mathematical details on the ERMM, see Supplementary Text in File S1.
The Spectral and Fast Louvain Algorithms: In contrast to the ERMM, the Spectral and Fast Louvain algorithms are deterministic methods that assess the goodness of a graph partition with an objective function known as modularity [45]. Central to the definition of modularity is the difference between the observed edge (Aij) and the expected number of edges () in an equivalent graph with m edges and with randomly connected vertices [46]–[48]. Modularity is defined as(11)where ci and cj represent the groups of vertices Vi and Vj, and if Vi and Vj are located in the same module and 0 otherwise.
The Spectral algorithm [15], [49] optimises modularity (Eq. (11)) by utilising the eigenvalues and eigenvectors associated with the modularity matrix with elements defined as(12)The graph is split into two modules by setting an indicator vector such that if the vertex Vi is located in the module and otherwise. Hence, the modularity can be expressed as(13)where is the eigenvalue of corresponding to the eigenvector . Observe that, for a given and a consisting only of 1's or −1's, the inner product vector is maximised by . This creates two groups, of not necessarily equal size, and each group is in turn split with the additional contribution to modularity being defined as(14)where is (for a group g of size ng) whose elements are: . When no more positive eigenvalues are found, the algorithm stops. More details on the Spectral Algorithm can be found in Supplementary Text in File S1.
The Fast Louvain algorithm [14] optimises modularity (Eq. (11)) in two stages that are repeated iteratively. The algorithm is initialised by assigning each vertex to its own module and, hence, the initial number of groups is equal to the number of vertices. In the first stage, for each vertex Vi, the algorithm considers each of its neighbours and computes the gain of modularity that would have been obtained if the vertex Vi was placed in the same module as its neighbour Vj. The vertex Vi is assigned to the module for which this gain is the largest or, in the case of no positive gain, the vertex stays in its initial module. This process is applied sequentially, cycling through every vertex until no individual move can improve the modularity at which point the first stage stops. In the second stage, the algorithm builds a new network whose vertices are identified as the modules found in the first stage. This gives a simplified community structure that is used as the initialisation for the next pass of the first stage. These two stages are repeated until the maximal modularity is attained.
Community estimation methods are notoriously sensitive to the initial starting conditions (see e.g., [50]). Each method begins with some sort of random initialisation that typically will lead to a local optimum of the objective function (i.e., ICL or modularity). Thus, for all three methods considered, we use multiple random restarts of the algorithm and take the solution that provides the greatest value of the objective function.
To measure the similarity between a partition (i.e., complete segmentation of a graph into a set of groups) and some known biological classifications, we use the Adjusted Rand Index (ARI) [51], [52]. This measure is a modification of the Rand Index (RI) [53], that is expressed as the fraction of vertex pairs that are consistent: a vertex pair is consistent between two partitions if either (a) the vertex pair is within the same group in both partitions, or (b) the vertex pair is split between two groups in both partitions. The interpretation of the RI depends on the number of groups [54], whereas the ARI is adjusted for chance agreement and number of groups [52]. It is defined as(15)where the expectation is computed assuming a hypergeometric distribution of the counts of consistent vertex pairs. ARI scores range from 0 to 1, and indicate the proportion of overlap; for example, if two partitions have an ARI score of 0.6, this means that of the nodes are classified in the same groups. To assess the quality of a partition with respect to quantitative biological features, we use the Intra-class Correlation Coefficient (ICC). The ICC measures the variance that a partition explains in a continuous variable. As per best practice, we estimate the ICC with a mixed effects model [55]. For a node-wise measure, if we denote as the measure on the i-th neuron in the q-th group, the mixed effects model is(16)where aq is the random effect of the q-th group, is the random error term and is the population mean. The random terms aq and are mutually independent and each are independently and identically distributed normal random variables: and . The ICC is defined as the proportion of total variance explained by the between group variance,(17)In other words, the ICC tells us how homogeneous the biological feature is within the partitions of a proposed network decomposition. Note that, here, we defined the ICC for node-wise measures (e.g., anatomical location of neuron), but it can be also defined for edge-wise measures (e.g., Euclidean distance between neurons). While edge-wise measures may violate the independence assumption of the mixed effect model, the ICC will still be a useful metric to compare biological validity of different partitions. Ideally, we would conduct a hypothesis test on the difference in fit between different community estimates. However, because the implied models are not nested, a traditional hypothesis test cannot be employed. Nevertheless, we are able to use model selection metrics, such as the Akaike Information Criterion (AIC) [56]. The AIC can be viewed as a measure of distances between a fitted model (i.e., an estimated partition) and the unknown true model (i.e., the true partition). Denoting to be the model under consideration (i.e., one of the ERMM, Spectral or Louvain methods), the AIC score is defined as(18)where is the log likelihood of the corresponding mixed effect model (Eq. (16)) and p is the number of parameters in the model (here, p = 3). The preferred model is the one with the smallest AIC score (). While the AIC is not an absolute measure, the differences in the AIC scores provide a way to compute approximate probabilities. In particular, the relative likelihood of the model compared to the model that minimises the AIC is given as(19)and represents the relative strength of evidence for this model. Equivalently, this tells us how probable it is that the model minimises the distance from the true model. As a general rule of guidance, it has been suggested [57] that, if the likelihood value (or, equivalently ), there is a substantial evidence that this model is equally useful; if the value is contained in the interval (), then, there is considerably less evidence; and, finally, for values that are strictly smaller than (), there is essentially no evidence that this model is useful.
We fitted the ERMM with the R package “Mixer” [24], [39]–[41]. The “mixer” function specifies default values for the maximum number of iterations, and we found improved performance by increasing these (nbiter = 80 up from 10, fpnbiter = 40 up from 5). We found random restarts was sufficient to visit the optimal solution multiple times, but, to be exhaustive, we also considered up to random restarts. The Fast Louvain and Spectral algorithms were carried out using the Matlab “Brain Connectivity Toolbox” (http://www.brain-connectivity-toolbox.net/, accessed 15th June 2013) [49]. For the Fast Louvain algorithm, we used the function “modularity_louvain_und”, using restarts. For the Spectral algorithm, we used the function “modularity_und”. This function is initiated on a randomly permuted adjacency matrix and although, in theory, all permutations of the adjacency matrix should provide the same result, some numerical discrepancies may occur during the spectral decomposition, subsequently leading to slightly different modularity fits. Specifically, the variability in the fits is driven by numerical errors in the estimation of the elements of , which can erroneously change the sign of its element. For example, if the true value of an element of is and the error is , the estimated value would be . Indeed, this has an immediate impact on the vector which will classify the corresponding node in the wrong group. To be exhaustive, we have therefore used restarts. To calculate the ARI scores, we used the function “adjustedRandIndex” in R software [58], [59] and, for the ICC and AIC, we use the R function “lmer” [60] that employs a Restricted Maximum Likelihood procedure [61] to obtain estimates of , and AIC.
|