Wednesday 19 March 2008

Notes on "The Adaptive Advantage of Symbolic Theft over Sensorimotor Toil: Grounding Language in Perceptual Categories"

"The Adaptive Advantage of Symbolic Theft Over Sensorimotor Toil: Grounding Language in Perceptual Categories" is by A. Cangelosi and S. Harnad and it was published in 2002.

The paper is available at Cogprints, CiteSeer and IgentaConnect.

This paper is important because it details how intelligence and language may emerge from evolutionary processes.

There is a journal called, "Evolution of Communication."

The following terms may be useful, semiotics, ontogenesis, epigenesist, mimesis and epigenetic. I only have an approximate idea of what they all mean.

1.0 Language Evolution: A Martian Perspective

(Q) What does it mean to be "grounded in toil?"

(A)

(Q) What is toil?

(A)

1.1 The Symbol Grounding Problem

?

1.2 Categorical Perception

?

2.0 The Mushroom World

The following is a description of the experiment world.

Type A mushrooms have black spots on top and agents should eat these. Type B mushrooms have dark stalks and agents should mark these. Type AB mushrooms have black spots and dark stalks. Agents should eat, mark and return to these. Type C, D and E mushrooms have other features. Agents should not eat, mark or return to these.

Agents can move, eat, mark and signal. Agents have two innate signals. Agents use one signal when eating and another signal when marking. In addition to mushroom features, agents also receive signals from other foragers.

I would change this to extend the program. Agents should learn signal. Agents should also learn to move, eat, mark and listen. Agents should learn to need to eat. These behaviours may be part of initial instinctive behaviours but agents should learn to override behaviours when they are no longer useful.

The world is a 20 by 20 grid.

Mushroom types fall into four categories. The types are 00, A0, 0B and AB. The world has 40 randomly located mushrooms. There are 10 mushrooms of each category.

Marking has two functions. Category 0B and AB mushrooms have a toxin that is painful when inhaled. However, if agents mark the mushrooms by digging into the earth after exposure, this blocks negative effects.

Wherever AB mushrooms appear, more will follow so it is useful to return to marked spots.

Cangelosi and Harnad used the direction of the agent to calculate the angles to mushroom positions. The angles are normalised to the interval [0, 1].

Cangelosi and Harnad encoded visual features as binary values. As mentioned, there are five features.

Cangelosi and Harnad also encoded calls as binary values. As mentioned, there are three types of calls.

3.0 The Neural Network and Genetic Algorithm

3.1 The Network Architecture

Each agent has a feed forward neural network.

(Q) What is a feed forward architecture? What other types of architectures are there?

(A)

The network has eight input units, five hidden units and eight output units.

Five input units are for visual features.

(C) I dislike how it just happens that there are five input units and there are five visual features. There has to be a better way.

Three input units are for calls.

(C) I am assuming these are for calls from other foragers.

One input unit is for the angle to the closest mushroom. This connects straight to the output layer.

(C) I like the diagram of the network architecture because it quickly conveys a lot of information.

Two output units are for the four possible movements.

(C) I dislike how the units tie so closely with the four possible units. It is so reflexive. There does not seem to be any decision-making or decision-making, remembering, or deciding to remember etc.

Three output units are for the three possible actions.

(C) Again, I dislike how three output units link to three possible actions. How could these output units control more actions or fewer actions?

Three output units are for the three possible calls.

(C) My complaint is similar again. I like how another paper describes how agents created their own signals. That is more in line with evolutionary emergent intelligence and communication.

3.2 Network Training

Foragers make 100 actions per epoch for 20 epochs.

(Q) Do all the foragers go at once?

(A)

Each action has two network activations. One is for the action. Another is for the imitation.

(Q) What does that mean?

(A)

After an action, the neural network undergoes a back propagation algorithm.

(Q) Does back propagation occur after the two activations or are the activations, back propagations?

(A)

The agent compares the network's actions and call outputs with what they should have been.

(Q) Is that part of the back propagation algorithm?

(A)

(C) I do not like how the algorithm compares outputs with what they should have been. It just does not seem like an appropriate way to scale up. I mean the agent is not actually going through any reasoning process. The agent is just undergoing conditioning for an automatic response.

This is how the forager learns to categorise mushrooms by performing the correct action and call.

(C) This is just does not sit right with me. How can I explain why clearly? It still feels like certain mechanisms are lacking so this approach cannot scale up. It is like saying when you see X, do Y but the reason for doing so is not apparent to the agent. This is important because without this, further sophisticated language cannot emerge and the agent cannot build knowledge on top of knowledge.

(Q) How does the agent figure out what is correct?

(A)

(C) Agents need some way of relating things to them and with other things. I would prefer if agents relate things in terms of how they are useful to the agent. Again, this is an important part of being able to scale up.

Imitation is in reference to hearing calls from other agents. Somehow, the agents only receive correct calls for a given mushroom as input.

(Q) How do the agents only receive correct calls for a given mushroom?

(A)

3.3 Natural Selection

The population is subject to natural selection.

There are 100 foragers and this number remains constant.

(Q) How does it remain constant if natural selection is happening?

(A)

The fitness formula considers points given to foragers for performing correct actions for a given mushroom.

(C) I dislike when experiments use genetic algorithms like this. It is so limiting to narrow a genetic algorithm like this. I prefer when experiments use genetic algorithms to adapt agents to an ecological niche. I like the idea of ecological niche as put forth by Pfeifer and Bongard, 2007.

At the end of a forager's life cycle, the experiment selects 20 foragers to produce five offspring each.

(C) I would prefer a period of nurturing rather than genetic memory. I do not like how the experiment so drastically changes memory each time.

The experiment randomises weights for actions so that there is no Lamarckian inheritance of learned or Baldwinian evolution of initial weights to set them closer to the final stage of evolution.

(C) So how is learning actually happening? What exactly is going on here? What is the advantageous inheritance?

4.0 Grounding Eat and Mark Directly Through Toil

In one run, all foragers toil. In this run, the foragers do not learn "return."

In the second stage, one group of foragers learn "return" through toiling. Another group of foragers learn "return" through hearing signals.

(Q) How does this happen? What is the mechanism? Is it something that just happens as part of evolution?

(A)

Generation 1 to 200 only goes through the first life stage. This is necessary to get basic behaviour.

(Q) What does the paper mean by basic behaviour?

(A)

Generation 201 to 210, learn "return" behaviour.

(Q) How do Cangelosi and Harnad turn on learning "return" behaviour?

(A)

In the later runs, the toil group learns to return and to vocalise "return."

(Q) How?

(A)

In the theft condition, the foragers depend on the other foragers' calls to learn "return." These foragers do not get the feature input. They only get the vocalisation input.

(Q) What feature input? How do they not get the feature input? What vocalisation input?

(A)

In the experiment, more Thieves learned to return to AB mushrooms than Toilers did.

The situation grounds the categories EAT and MARK. The category RETURN builds on top of EAT and MARK.

(Q) What does the paper mean by grounding? Does the paper mean that grounding occurs because these categories relate to clusters in the network and that these clusters correspond to actions and situations?

(A)

The paper will later examine the statistical result.

The result is that Thieves return to AB mushrooms more often than Toilers do.

4.1 Analysis

Cangelosi and Harnad used a repeated measure and analysis of variance to compare the two conditions.

(Q) What is a repeated measure? What is an analysis of variance? What are the two conditions?

(A)

The dependent variables are the number of AB mushrooms collected at generation 210 and averaged over the 20 fittest individuals in all 10 generations.

(Q) To which generations does the paper refer? Why does the analysis use the 10 generations?

(A)

The independent variable was Theft versus Toil.

(Q) How can Theft and Toil be variables? Does the paper mean the ratio of Theft to Toil?

(A)

The difference between the two conditions was significant. The difference was F (1, 9) = 136.7 p <>

(Q) What does the difference mean?

(A)

5.0 Theft vs. Toil: Simulating Direct Competition

Cangelosi and Harnad ran 10 competitive simulations using genotypes from generation 200 from the previous 10 runs.

(Q) What generations did Cangelosi and Harnad use?

(A)

From generation 201 to 210, for each population, Cangelosi and Harnad randomly divided 100 foragers into 50 Thieves and 50 Toilers.

There is no real time on-line simulation because in each run, the experiment tests only one individual in its world.

(C) This is an opportunity to extend the experiment by testing groups of agents in real time.

(Q) If there is only one individual at a time, how do foragers learn from other agents?

(A)

Direct competition occurs at the end of the life cycle, in the selection of the fittest 20 to reproduce.

Other simulations have studied direct competition for variable mushroom resources in shared environments. In the present ecology, Cangelosi and Harnad assumed that mushrooms are abundant and that the only fitness challenge is for foragers to emerge among the top 20 eaters and markers of the generation.

At generation 201, thieves comprise 50% of the population. In less than 10 generations, the whole population consists of Thieves.

6.0 What Changes During Learning - Analysis of Internal Representations

What changes internally during Toil and Theft? What are the differences between the foragers' hidden unit representations?

Cangelosi and Harnad recorded the activations of the five hidden units during a test cycle. They exposed the forager to all the mushrooms in the test cycle.

(Q) How did they expose the forager to all the mushrooms?

(A)

Cangelosi and Harnad presented an analysis of the fittest individual in seed 8.

(Q) What is seed 8?

(A)

Cangelosi and Harnad used Principle Component Analysis (PCA) to display the network's internal states in two dimensions. PCA reduces five activations to two factor scores.

(Q) What is on each axis?

(A)

A limitation of PCA is that it does not allow direct comparisons between different conditions. This is because of differences in scale.

(Q) What are the conditions?

(A)

Cangelosi and Harnad normalised the PCA scores to a distribution with a mean of eight and standard deviation of one. As such, an analysis can only compare representations within each condition and not between conditions.

During the course of learning the actions and calls, the representations form four separable clusters.

(Q) What are these representations?

(A)

(Q) How do the researchers use these representations to analyse the effects of Toil and Theft learning on similarity space directly?

(A)

7.0 Categorical Perception Effects

This is a long, detailed section. It has a number of things that I do not know.

The main thrust of this section is that within category distances compressed and between category distances expanded.

(Q) What does the paper mean by a category? What does the paper mean by within category distances? What are the within category distances between?

(A)

8.0 Conclusions

?

Binh's Final Thoughts

How do agents learn "eat" and "mark"?

How do agents learn "return" by toiling?

How do agents learn "return" by listening?

How does the genetic algorithm affect things?

I have a large number of questions with this paper. I still do not know clearly how everything works.

The paper does present strong evidence about for the importance of symbol grounding. The paper also demonstrates how agents can build symbols on top of symbols.

No comments:

Post a Comment