PropertyValue
is nif:broaderContext of
nif:broaderContext
is schema:hasPart of
schema:isPartOf
nif:isString
  • Animat brains consist of 8 binary elements: 2 sensors, 4 hidden elements, and 2 motors (left, right) that can loosely be referred to as neurons. The sensors are directed upwards with a space of one unit between them and activated (set to 1) if a falling block is located directly above a sensor (Fig. 1). Otherwise the sensor element is set to 0. All elements are updated from time step t to t+1 according to a transition probability matrix (TPM). In general, the TPM could be probabilistic with transition probabilities between 0 and 1. In the present work, however, the animats' TPMs are purely deterministic, i.e., transition probabilities are either 0 or 1. The brain elements can thus be considered as binary Markov variables, whose value is specified by deterministic logic gates (just as the Markov brains in [13]). Note that the elements are not limited to classic logic gates, such as ANDs, ORs, or XORs, but can potentially specify any deterministic logic function over their inputs. If only one of the motors is updated to state 1, in the next time step the animat will move one unit to the right (motor state 01) or left (motor state 10), respectively. Since no other movement was required of the animat, motor state 11 (both motors on) was chosen to be redundant with motor state 00, for which the animat will not move. To evaluate the number of different TPMs and connectivity matrices for animats with perfect fitness in Task 1 and 2, the TPMs and connectivity matrices were compared in “normal form”, i.e., independent of the labels of their elements and only potentially causal connections were included in the analysis (meaning, hidden elements with only inputs or outputs to the rest of the system were excluded). To that end, for a given matrix all elements were permuted and the resulting permuted matrices were ordered lexicographically. The first permuted matrix was then chosen as the “normal form”. All animat brains are initialized without connections between their elements. Connectivity evolves indirectly during adaption to the environment as outlined below, following a genetic algorithm that selects, mutates, and updates the animat's genome at each new generation. The animats' genes encode hidden Markov gates (HMGs), which in turn determine the connectivity and transition table of each brain element: each HMG has input elements, output elements, and a logic table that specifies the elements' transition table (see [6], [13] for details). In this study, the ancestral genome (generation 0) of all animats does not encode any HGM. Different from previous publications [6], [7], [13], evolution is thus not “jump-started”, which avoids random causal connections in the animats' brains, but requires more generations to reach high levels of fitness. The animats' genomes consist of at least 1,000 and at most 20,000 loci, where each locus in the genome is an integer value ∈ [0,255]. The beginning of a gene is marked by a start codon (the consecutive loci 42 and 213), followed by two loci that respectively encode the number of inputs and outputs of one HMG. The next eight loci are used to determine where the inputs come from and the outputs go to. Because gates are allowed to have at most 4 in- and at most 4 outgoing connections, 8 loci are reserved, and used according to the 2 preceding loci. The subsequent loci encode the transition table of the HMG, determining the input and output elements and their logical relations. This encoding is robust in the sense that mutations that change the input-output structure of an HMG only add or remove the respective parts of the HMG's logic table, while the rest of the table is left intact. Encoding the connectivity and logic functions of the animats' brain elements with HMGs allows for recurrent connections between hidden elements and also self-connections. Feedback from the hidden elements to the sensors, and also from the motors to the hidden units is however prohibited by zeroing out the sensors and motors at each time-step respectively before the new sensor input arrives and after the movement was performed. The animat is located at the bottom row of a 16×36 unit world with periodic boundary conditions (Fig. 2B). We chose the height of 36 units to allow the animats enough time to assess the direction and size of the falling blocks from each initial condition. Each animat is tested in 128 trials: all 16 initial block positions, with blocks moving to the right and left, and four potentially different block sizes. Note that in Task 1 (“catch size 1, avoid size 3”) and Task 2 (“catch size 1, avoid size 2”) the two different block sizes are thus shown 2×32 = 64 times, while in Task 3 (“catch size 1+4, avoid size 2+3”) and Task 4 (“catch size 3+6, avoid size 4+5”) each block size is shown 32 times. In each trial a block of a certain size falls from top to bottom in 36 time steps, moving 1 unit downwards and sideways always in the same direction (left or right). If at time-step 36 at least one of the animat's units overlaps with the block, it is counted as “caught”, otherwise as “avoided”. In Task 1, sensor state S1S2 = 11 unambiguously distinguishes size 3 blocks from size 1 blocks. In all other cases, whether a block should be caught or avoided cannot be decided based on a momentary sensor input state. An animat's fitness F at each generation is simply calculated as the percentage of successfully caught and avoided blocks out of all possible 128 test trials. Starting from a set of 100 ancestral animats without HMGs and thus without connections between elements, the animats adapt according to a genetic algorithm across 60,000 generations. At each generation, fitness is assessed for all animats in a population of 100 candidates. The most successful candidates are selected probabilistically for differential replication according to an exponential fitness measure S = 1.02F*128. For every successfully caught or avoided block the score is thus multiplied by 1.02. The 100 candidate animats are ranked according to S and selected into the next generation with a probability proportional to S and thus to their fitness (roulette wheel selection without elite). After this replication step, the new candidate pool is mutated in three different ways: a) by point mutations, which occur with a probability of p = 0.5% per locus, causing the value to be replaced by a random integer drawn uniformly from [0,…,255]; b) by deletion: with 2% probability, a sequence between 16 and 512 adjacent loci is deleted; c) by duplication: with 5% probability a sequence between 16 and 512 adjacent loci is duplicated and inserted at a random location within the animat's genome, where the size of the sequence to be deleted or duplicated is uniformly distributed in the range given. Since insertions are more likely than deletions, genomes tend to grow in size during evolution. Deletions and duplications are, however, constrained so that the genome remains between 1,000 and 20,000 loci. All genes are expressed. Some of the genes may give rise to redundant HMGs, which, however, will not be robust to mutation. Under fitness selection, the number of genes thus tends to converge to a balanced level (roughly the number of possible elements). Under random selection, only very few rapidly changing random connections between elements appear, and existing network structures decompose within less than 1,000 generations [7]. For each task, 50 evolutionary runs of 60,000 generations are performed. At the end of each evolutionary run, the line of descent (LOD) [19] of a randomly chosen animat from the final generation is traced back to its initial ancestor at generation 0. For each evolutionary run one LOD is obtained, which captures the run's particular evolutionary history. Since reproduction is asexual, without crossover, a unique LOD can be identified for an animat from the final generation. Because, moreover, all animats are part of the same niche, it makes almost no difference which animat is chosen in the final generation, since going backwards across generations their different LODs quickly coalesce to a single line [6]. We performed the full IIT analysis across each line of descent every 512 generations starting from 0. The most recent mathematical formulation of the integrated information theory (“IIT 3.0”) is presented in detail in [15]. In the following we will summarize the main principles and measures relevant to this study, illustrated in simple examples of neuron-like logic gates mechanisms (Fig. 9). Figure data removed from full text. Figure identifier and caption: 10.1371/journal.pcbi.1003966.g009 The information, integration, and exclusion postulate applied at the level of mechanisms (A–C) and systems of mechanisms (D–F). (A–F) Each node is a binary logic-gate mechanism that can be in either state ‘0’ (white) or ‘1’ (yellow). The logic-gates and their connections are represented as neural circuits rather than electronic circuits: directed connections between the nodes indicate the inputs and outputs of the logic gates. The mechanisms labeled A, B, and C correspond to system ABC = 101 shown in (D). (A) Information: Mechanism C in its current state ‘1’ generates information as it constrains its causes (the past states of its inputs AB) and effects (the future states of its outputs AB) compared to their unconstrained distributions (gray distribution). Past and future nodes whose state is unspecified are shown in gray. (B) Integration: The elements X and Y do not form an integrated higher order mechanism, since XY is reducible to its component mechanisms X and Y (φ = 0). However, the elements AB in state ‘10’ do form a higher order mechanism, since AB specifies both, irreducible causes and irreducible effects (the minimum information partition (MIP) on both, the cause and effect side leads to a loss of information). Integrated information φ of AB = 10 is evaluated as the minimum of the cause and effect integrated information: φ =  min(φCause, φEffect), here φ =  φEffect  = 0.25, taking all inputs and outputs of AB into account. The overall MIP of AB over all its inputs and outputs is thus MIPEffect, labeled in red. (C) Exclusion: Of all input-output combinations of mechanism AB, the “concept” of AB = 10 is its maximally irreducible cause repertoire, here over all input elements ABC (φCause = 0.33, same as in (B)), together with its maximally irreducible effect repertoire, here over output element C only (φEffect = 0.5). This means that AB has its maximally irreducible effect repertoire specified on C, not on ABC or any other output combination. The concept's integrated information is φMax  =  min(φCause, φEffect)  =  φCause  = 0.33, its overall MIP is MIPCause, labeled in red. (D) System information: The system ABC = 101 gives rise to a conceptual structure with 4 concepts. (E) System integration: The system WXYZ is reducible into the subsets WX and YZ. WXYZ cannot exist as a system from the intrinsic perspective. By contrast, system ABC is irreducible. Its minimum information partition (MIP) leaves the concepts of A and B intact, but destroys concepts C and AB. Integrated conceptual information Φ(ABC) is evaluated as the difference between the whole conceptual structure C and the partitioned conceptual structure CMIP (see Text S2 in [15]). (F) System exclusion: Of all sets of elements in this larger system, the set ABC has ΦMax and thus forms the main “complex”. ABCD, for example, also specifies integrated conceptual information Φ, but cannot form another complex since it overlaps with ABC and Φ(ABC)> Φ(ABCD) (see Fig. 2). From the intrinsic perspective of a system, a mechanism has a causal role within the system (a “difference that makes a difference”) if its present state constrains the potential past and future states of the system compared to the unconstrained distribution (the distribution of past and future states if all input states to each element are equally likely). This is assessed by perturbing the system into all possible states and observing the effects on the system [25], [46], yielding a transition probability matrix (TPM) that contains the probability of transitioning from each system state to every other system state. As a simple example, Fig. 9A shows mechanism C in state ‘1’. C is a XOR logic gate that receives inputs from elements A and B and outputs again to A and B. The full system ABC is displayed in Fig. 9D. The fact that C is an XOR gate and that at present it is in state C = 1 inherently constrains the past state of the system (“cause repertoire” , i.e., only AB = [10, 01] are possible causes) as well as the future state (“effect repertoire” , i.e., AB = [11] is the only possible effect), as compared to the unconstrained distribution (indicated in gray). The superscripts p, c, f label “past”, “current”, and “future” system subsets. The same approach can be used to evaluate the cause-effect repertoire of higher order mechanism (combinations of elements), such as AB, which is used in what follows to illustrate the notion of integration. In sum, the cause- and effect-repertoire of a mechanism are conditional probability distributions over sets of system elements, albeit not using observed distributions (as done for correlational measures), but considering all system states with equal probability. From the intrinsic perspective, only integrated information matters: the whole has to specify a cause-effect repertoire that is not reducible to that of its parts (Fig. 9B). Irreducibility is assessed by causally partitioning subsets of elements by introducing noise into the connections between them [14], [15]. The partitioned cause-effect repertoire then corresponds to the product distribution of the cause-effect repertoires specified by the parts. If a mechanism can be partitioned without loss of information, as in the case of XY in Fig. 9B (left), the combined mechanism XY cannot have a causal role above and beyond the causal roles of X and Y separately. By contrast, the elements A and B of the example system ABC do form a 2nd order mechanism AB (Fig. 9B, right). This is because AB constrains the past and future of the system ABC more than A and B separately. The amount of integrated (irreducible) information a mechanism M specifies in its current state s0 is quantified by φ, which measures the distance between the whole and partitioned cause-effect repertoire. The partition used to evaluate φ is the minimum information partition (MIP), the partition that makes the least difference. φ is determined on both the cause and the effect-side: and where P and F denote a set of system elements in the past and future, respectively. Differences D between distributions are assessed via the earth-mover's distance (EMD). Generally, EMD quantifies the minimal cost of transforming one probability distribution into another specified over a “ground distance” between system states [15], [47]–[49]. Contrary to the commonly used Kullback-Leibler divergence [31], the EMD is a metric. That is, it is symmetric, bounded, and takes the distance between individual system states into account, here measured by their Hamming distance. The state ‘110’, for example, is more distant from ‘001’ (Hamming distance of 3) than from ‘100’ (Hamming distance of 1). Transporting p = 0.25 from state ‘110’ to ‘001’ would thus corresponds to an EMD of 0.25*3 = 0.75, while transporting it to ‘100’ corresponds to an EMD of 0.25. From the intrinsic perspective of the system, the amount of integrated information specified by a mechanism in a state cannot be more than either the cause or the effect integrated information, so the minimum of the two is taken [15]: Finally, again from the intrinsic perspective, the mechanism's causal role within the system in its current state can only be a single one—corresponding to the cause-effect repertoire that is maximally irreducible (a mechanism cannot perform multiple input-output functions over an overlapping set of elements, [15]). Thus, φ must be calculated for all possible input and output combinations of the mechanism. For the example mechanism AB (Fig. 9C), the cause repertoire of AB = 10 over all past elements ABC is the maximally irreducible cause repertoire, with  = 0.33 (transporting p = 0.33 from state ‘010’ to state ‘000’). On the effect side, the effect repertoire of AB = 10 over future element C is the maximally irreducible one with  = 0.5 (transporting p = 0.5 from state ‘1’ to state ‘0’; compare to Fig. 9B: the effect repertoire of AB = 10 over all elements ABC only has  = 0.25). The maximally irreducible cause and effect repertoire with φMax(AB = 10)  =  min(,) defines the mechanism's “concept”, the core causal role of the mechanism in its current state from the intrinsic perspective of the system itself. Following a principle of causal exclusion, all other inputs and outputs of the mechanism are treated as unconstrained. System of mechanisms and main complex: At the system level, the set of all concepts specified by a system of mechanisms in its current state constitutes a conceptual structure (Fig. 9D). For example, the system ABC = 101 specifies a conceptual structure comprising 4 concepts: 3 elementary, or 1st order concepts of its elementary mechanisms A, B, and C, and the 2nd order concept AB. As for a mechanism and its causal role, a system of mechanisms forms a “causal entity” from its own intrinsic perspective only if the conceptual structure it specifies cannot be reduced to that specified by its parts. Specifically, each part of the system must have both causes and effects in the other part (“strong integration”, Fig. 9E, middle), otherwise some elements could never influence the system or be influenced by it. Irreducibility at the level of systems of mechanisms is quantified by partitioning the system elements unidirectionally. This means that the inputs or outputs of a subset of elements are rendered causally ineffective by noise. Integrated conceptual information Φ (“big phi”) measures the difference between the conceptual structure C of the whole system S in state s0 and the conceptual structure CMIP of the partitioned system: The difference D between two conceptual structures is evaluated using an extended version of the earth-mover's distance (EMD), which quantifies the minimal cost of transforming the conceptual structure C of the whole to the conceptual structure CMIP of the partitioned set of elements. Instead of probabilities, in the extended EMD it is the φ values of the concepts that are redistributed from conceptual structure C to CMIP. Instead of the Hamming distance, the “ground distance” between the concepts of C and CMIP is given by the EMD distance of their cause-effect repertoires. Since ΣφMax of all concepts of C is usually higher than that of CMIP, any residual φMax is transported to the “null” concept (the unconstrained distribution). For more details and an explicit example see Text S2 in [15]. Within a system, many sets of elements can potentially give rise to integrated conceptual structures. However, from the intrinsic perspective of a system, there can only be a single conceptual structure over a set of elements, with no overlap with other conceptual structures, barring a multiplication of causes and effects (causal exclusion). Once again, the relevant conceptual structure is the one that is maximally irreducible (ΦMax), and the corresponding set of elements constitutes a “complex” – a self-defined causal entity within the system. The complex with maximal Φ in the system is called the “main complex” (MC). Note that, in principle, ΦMax should be evaluated over the spatio-temporal scale at which causal interactions are strongest [50]. Since an animat's MC is comprised of maximally 4 hidden Markov elements, we assume these micro elements to be the relevant spatio-temporal scale. In the system shown in Fig. 9F, ABC forms the main complex (see also Fig. 2D). Note that, if the subset of a system is analyzed, such as ABC in Fig. 9F, the remaining elements act as background conditions (fixed external constraints). The number of elements, the number of concepts, and the Φ value of a main complex are measures of integration in a system (Fig. 9F: 3 MC elements, 4 MC concepts, and ΦMax = 0.92). Note that a complex always consists of at least 2 elements, since a single element, even if it has memory in form of a self-loop, cannot be partitioned. Moreover, feed-forward structures cannot give rise to a complex and have Φ = 0. For the same reason the whole system S1S2 ABCD M1M2, as well as every subsystem that includes a sensor or motor element is not integrated and has Φ = 0. Modular mechanisms (feed-forward chains, self-loops, and mechanisms outside the main complex) can of course also contribute to the evolutionary success (fitness) of an organism. The number of concepts in the whole system, here S1S2 ABCD M1M2, provides a measure of all causal relations in the system, modular and integrated, and the sum of their φMax values is a measure of their combined strength (see Fig. 3B: 6 concepts: A, B, C, D, AB, AC, ΣφMax = 1.08). Table 1 shows the average (nonparametric) Spearman rank correlation coefficients across all 50 LODs for all evaluated IIT measures in Task 1-4. In S1 Fig. complementary histograms are shown of the correlation coefficients of all individual LODs. Correlation coefficients were calculated based on ranked variables (i.e., using Spearman's instead of Pearson's correlation coefficients), since the amount by which fitness increases is not expected to depend linearly on any of the causal measures. Initial increases in fitness can be large, simply because initially there is more room for large improvements than at later generations where the animat already has a high percentage of fitness. Error margins throughout this article denote SEM. Since none of the measured variables was found to be normally distributed for all task conditions (Kolmogorov-Smirnoff test for normality) and variances between tasks differed for some of the measures, statistical differences were evaluated using a Kruskal-Wallis test, the non-parametric equivalent of a one-way ANOVA. For all statistical tests across task conditions after adaptation, measures were averaged over the last 3,000 generations (6 data points). Task 1–4 were compared (see Fig. 3), first, taking all 50 independent LODs of each task into account, despite the lower average fitness reached in Task 3 and 4. In this set, statistical differences were found for the number of concepts, ΣφMax, and ΦMax (p = 0.001/0.002/0.016), but not for the number of MC concepts and MC elements. Second, Task 1–4 were compared at the same level of fitness, taking only a subset of LODs with high final fitness into account in Task 3 and 4 (9 and 7 fittest LODs, respectively). The respective subsets of LODs were selected as the set of fittest LODs in Task 3 and 4, whose average fitness across the last 5,000 generations was closest to that achieved on average in Task 1. Compared at the same level of fitness, all IIT measures showed statistical differences (p = 0.000/0.000/0.003/0.003/0.000 for #concepts/ΣφMax/#MC elements/#MC concepts/ΦMax). Moreover, the standard Task 1 was compared to Task 1 with one sensor only, one motor only, and 1% sensor noise (Fig. 6–8). All measures showed significant difference (p = 0.000) when all 50 LODs of each condition were taken into account and also when a subset of LODs with high fitness was compared (again, p = 0.000 for all measures). Differences between pairs of task conditions reported in the results section were assessed by post-hoc Mann-Whitney U tests. Custom-made MATLAB software was used for all calculations. The program to calculate the complex of a small system of logic gates and its constellation of concepts is available under [51]. EMD calculations within the IIT program were performed using the open source fast MATLAB code of Pele and Werman [49]. The IBM SPSS software package was used for statistical analysis.
rdf:type