nif:isString
|
-
To describe the probabilistic inference algorithm used for estimating the parameters , we first write the likelihood of the entire observed network adjacency matrix in terms of our model (1):(3) For a dyadic model, the likelihood factorizes into terms that involve parameters associated with only two nodes. Commonly used methods to estimate the parameters and hidden variables in such a model are to employ maximum likelihood (ML) techniques in the form of an expectation-maximization type algorithm or Monte Carlo sampling [40]. We prefer a Bayesian approach, based on Maximum A Posteriori (MAP) estimates that does not incur the computational cost of Monte Carlo sampling while being less sensitive to initial conditions and more stable numerically than ML, especially as the parameters which maximize (3) may lie on the the borders of the admissible interval . Furthermore, the MAP approach provides a natural Occam's razor as the posterior distributions of parameter estimates can only reduce in variance with the provision of more data, while the ML approach assumes point estimates or functions for the posterior from the start. This is an important feature of the Bayesian approach as it provides a natural limit for the number of inferred classes and confidence levels in the assignments. Classes cannot be arbitrarily small if the posterior for the inter-class link preference is to be localized. In contrast, under an ML approach the likelihood increases monotonically when more and hence smaller classes are used and model selection criteria, as in [19], are needed. Finally, Bayesian techniques offer a principled way to incorporate prior domain knowledge for obtaining a more accurate approximate marginal posterior distribution , where represents one of the parameters or . A message passing or belief propagation algorithm provides a principled way to calculate approximate posterior marginal distributions [41], [42]. The starting point for this algorithm is a so-called factor- or dependency-graph, a graphical representation of the probabilistic dependencies between the variables (model parameters) we wish to infer from the data, and the individual factors that constitute the likelihood (3). Figure 5A shows this for the case of a bi-partite network, likelihood (3) and model (1).
Figure data removed from full text. Figure identifier and caption: 10.1371/journal.pone.0021282.g005 Factor graphs and an example of an elementary message passing update.Factors of the likelihood function are represented as squares, variables of the generative model as circles. Connections indicate which variables enter the calculation of which factor. a) For a bipartite actor-event networks represented by an adjacency matrix , class label and activity of actor enter in the calculation of all factors in row . Equivalently, class label and popularity of event enter in the calculation of all factors in column . The variables denoting preference of actors in class for events in class enter in every factor. Note that while each factor depends on only variables, the and variables enter in the calculation of , the and variables in and the variables in factors. b) Pictorial representation of the messages involved in calculating sent from factor to variable according to equation (9). c) For directed networks represented by non-symmetric adjacency matrices, the factors correspond to dyads . Additional to the interclass preference matrix, a symmetric matrix of reciprocities is included in the model. Every node carries a single class label , activity and attractiveness parameter . The variables associated with node enter in the calculation of factors in both row and column .
The algorithm proceeds by exchanging messages, conditional probabilities, between factors and variables connected in the dependency graph until convergence. Using the definitions:(4) one can interpret (R-Message) as the likelihood of a single observed matrix entry given only the parameter and all the data matrix except for entry . Equally, (Q-Message) is interpreted as the posterior probability distribution of parameter given the entire data matrix except for entry . For the sake of notational economy, we have adopted to identify functions by their argument. It is to be understood that is a different function than and not the same function evaluated at the points and as should be clear from the definitions (4). Formally, we obtain the R-Message from to , by integrating out all parameters except from a likelihood function(5) Using the independence of given data entries we can readily identify with the of (1). Assuming the joint distribution factorizes with respect to every single , one obtains the following closed set of equations:(6) Although the factorization assumption may seem strong, it merely means that the Q-Messages for any two variables and with are assumed independent. Given that these distributions are conditioned on the entire data matrix except for one entry, the error we make using this assumption is considered negligible for large systems. The form of calculating in (6) follows directly from Bayes' theorem and is the distribution we use to include prior information. These equations can be iterated until convergence after which we finally obtain the desired approximate marginal posterior distribution, for every single parameter, as:(7) To illustrate these ideas, explicit update equations for the inference of the hidden class index of node appear below. Expressions for other parameters are reported in Material S1. With(8) we can write for the R- and Q-Messages between and :(9) The dependency graph greatly facilitates setting-up these update equations. Following the rules that R-Messages are always sent from factors to variables and Q-Messages from variables to factors; and that in R-Messages, we sum or integrate over the incoming Q-messages, while Q-Messages are proportional to the product of incoming R-Messages, we can write the equations based on the dependency graph. Figure 5B shows a detail of 5A focussing on factor to illustrate the messages involved in the calculation of sent to variable as in (9). Figure 5C illustrate the update equations in the case of directed uni-partite networks (cf. Material S1).
|