nif:isString
|
-
To identify suitable data, we conducted online searches on the Scopus database and the Google Scholar search engine. Our searches looked for articles with the words “cooperation”, “game”, and “experiment”, together with one of “punishment”, “reputation”, or “network”, in their title and abstract. When we identified a relevant study, we then also investigated the articles cited by it. In general, we restricted our interest to experiments in groups of at least 10 (although we did not exclude the smaller eight-person groups in [31] for our analyses). We thus identified 21 relevant studies and contacted the corresponding authors. In the end, we obtained data for 18 of them.
We measure wealth with the amount of points the player has amassed at the end of the game. We then use the Gini coefficient to measure inequality in wealth. A Gini coefficient of 0 indicates perfect equality where all group members have equal resources, while 1 indicates perfect inequality where a single individual commands all of the resources. The Gini coefficient is the most common measure of inequality but nevertheless has some limitations. In particular, it is sensitive to changes in inequality among those in the middle of the distribution and not so much to those at the lower end of the distribution [51]. We replicated our analyses using another measure, the Theil index, and the results are qualitatively similar. To measure cooperativeness, we calculate the mean number (or amount) of contributions over all periods. To measure the tendency for clustering, we calculate the network assortativity by cooperativeness and wealth in the final period. For continuous measures, which is the case for cooperativeness and wealth here, network assortativity is essentially the correlation in the measure for all connected pairs [52].
To test differences between experimental conditions for inequality, we use the Mann-Whitney U test. This is a non-parametric test that does not assume a normal distribution for the residuals. It essentially checks against the null hypothesis that a randomly selected value from one condition would be equally likely to be less than or greater than a randomly selected value from the other condition. For the experiments that have a within-group design, whereby groups interact in more than one experimental condition, we considered using the Wilcoxon signed rank sum test but unfortunately, the sample sizes were always too small to have reliable results. Consequently, we only report the Mann-Whitney U in Fig 1. This limitation is not severe because our main results are based on the meta-analysis of the direction of all control-treatment differences, rather than the precise test of any individual one. To test the statistical significance of the observed differences between control and treatment across all relevant experiments, we use a meta-analysis technique known as the sign test [46]. This technique allows us to focus on whether the effect exists, rather than to accurately measure its size. We first count the number of positive and negative effects, regardless of whether they are statistically significant. We then conduct the Binomial test, testing against the null hypothesis that there is no effect in reality and thus negative and positive effects are equally likely to occur by chance. This approach is somewhat limited because it does not take into account the amount of evidence: neither the effect magnitudes nor the sample sizes. Yet, it is best suited for the research question and experimental data we have. First, although meta analyses typically aim to estimate the effect size, effect sizes in controlled social experiments are not very meaningful as they are highly sensitive to the experimental design and, in particular, aspects such as the framing of the decision situation, the monetary incentives, the experience of the participant pool, and the experimenter demand effect [53]. In relying on experimental data, our major objective here is to prove a causal relationship and the sign test is perfectly sufficient for that. Second, pooling effect sizes from studies with different designs and research populations is usually problematic in meta analysis but for us, the variation in our data sources is a boon, not a drawback. Experiments are often criticized for their lack of replicability and validity, so being able to confirm the same effect repeatedly in completely different experimental settings only reaffirms the robustness of the result. To test the significance of the observed network assortativity by cooperativeness and wealth, we need a suitable baseline. The reason is that network assortativity depends both on the connectivity patterns and on the distribution of values of interest. For the baseline, we take the actual final structure in the networks, randomly shuffle the nodes’ cooperativeness and wealth values, and estimate the new assortativity. We repeat this 2000 times, which gives us an expected distribution against which we can estimate the z-score for the empirically observed value. The z-score is estimated using Z = (Xobs − μ(Xran))/σ(Xran), where Xobs is the assortativity in the observed network, Xran is the network assortativity in the randomly shuffled network, μ is the mean, and σ is the standard deviation. If the z-score is above 1.96 (below −1.96), it implies that the chance to obtain the high (low) network assortativity we actually observe under the assumption that the network formation processes are random is less than 5%. Regardless of whether the network assortativity by cooperation and wealth is significantly different than what we would expect by chance, we can also test whether it is significantly different between experimental conditions. As with inequality, we use Mann-Whitney test to do this (S1 and S2 Figs). To estimate the correlations between individual wealth and cooperativeness, wealth and the use of punishment, cooperativeness and the use of punishment, as well as group inequality and wealth (S3–S9 Figs), we use ordinary least-square regressions with standard errors accounting for clustering by experimental groups.
|