I experimented with artificial life and neural networks. Here are the results of my studies and research over the course of the last year. This article is written for a broad audience with an interest in evolution, philosophy and artificial intelligence research.

I store this long list of books I want to read on Amazon and whenever I find that a book is available for little money I buy a copy. This must be what happened as to explain why I obtained that copy of „Artificial Life“ written by Christoph Adami. In the book, the author describes how people developed systems where artificial life developed, wherein some of the universes the emergent species showed intelligent behavior (Polyworld by Larry Yaeger, 1994). The field started with virtual artificial chemistry and advanced to small simulations of nature. I wondered when this was done in the 80s and 90s, why is there not more advanced technology?

Book by Adami

Artificial Intelligence needs an environment

Everyone in the area of artificial intelligence research and machine learning is currently looking at deep learning. Probably your interest in the AI topic was also sparked because since recent advances in Machine Learning the field has become very popular. In Deep Learning we stack many layers of neuronal networks and train the network by looking at a function which measures some performance i.e. how well is a certain problem solved. Then some values in the network are changed to optimize this performance function. I never fully believed in deep learning as a way to achieve strong intelligence. You optimize parameters of a function until you get intelligence? And this is all? Because this was not convincing, I looked around to see what other ways are there to build cognitive systems.

People in the field of neurorobotics believe that intelligence leads to behavior in specific environments. Fogel et al. define intelligent behavior like this: „Intelligent behavior is a composite ability to predict one’s environment coupled with a translation of each prediction into a suitable response in light of some objective”.[1] In Deep Learning the environment is equivalent the dataset. Only the limited dataset is learned and abstracted. When we want to build artificial intelligent systems, we need to talk about cognitive systems; Artificial systems that perceive and act in a closed loop inside a more natural environment.[2] PCA Loop

Information Theory and Life

Adami first describes that it possible to create artificial life. There are several definitions of life. Many of them are outdated because people found organisms which showed that these definitions at hand were too narow. The most convincing theory is based on information theory (termodynamics). Information theory looks at signals and utilizes mathematics to describe the amount of information in a signal. The information which is needed to describe the microstate of a system from the macrostate is measured in what we call entropy. There is also the concept of entropy in thermodynamics. You can argue that they are both the same, so that energy and matter (e=mc²) are equivalent to information. But this is a topic on its own and outside the scope of this article.

Information Theory and Evolution

Next, Adami looks at evolution from the perspective of information theory. Evolution does encode information in a system, as the entropy of the environment is shared in the entropy of the contained system. We can build a system where we have genes storing entropy and supporting evolution in a computer.

Here we note that in reality the entropy of the ecosystem, our environment, is infinity. The entropy in our virtual system is not infinite. Shared Entropy

You could argue that you could just add a source of entropy by adding a virtual entropy source. The source bubbles random values — entropy. But because there is no principle behind the values it is just noise. In terms of propability theory we call "events" (like a 1 or a 0 appearing) with no connection to another event "conditional independent". Conditional independent noise cannot be encoded in the DNA. Adding conditional independent noise only increases the right part of the diagram.

Entropy Source
Entropy Source

We want that the shared entropy, the information, grows in our system over time. The ecosystem should foster the development of intelligent behavior. We further want behavior that is close to human perception. Therefore the ecosystem must be modeled at the level at which we experience reality.

Skipping millions of years

Am I suggesting that the whole human evolution should happen inside the computer? Computers are fast, but they are not that fast, that they can run our history in high detail. We could try to skip some parts of evolution by utilizing models of systems we understand. We can skip the development of specific cells or the time to develop organs, when we can start with models for body parts. Unfortunately, we cannot start with a model of a human, because humans are so complex that they only function with development from child to adult during their lifetime. Phenomena emerge from lower level phenomena. Therefore, we focus on the modeling of the models which are on the same level, we as humans can experience. If we use these high-level description we can skip the time evolution needed for a lot of things.

Evolutionary Space

How to select mating partners

Evolutionary algorithms have one strength compared to other optimizing algorithms. They can optimize without knowing a mathematical description how well a specific goal was performed (objective function). We actually do not want a function which we optimize. The function should be contained implicit in the environment. The virtual animals (agents) who perform well in an environment are selected. The selected parents then bear children. The children get some mutations. Now we have established the theoretical ground for building a system which should create intelligence.

Prototype

Agents After I did my research and had developed the theory how a system should be build, I built a prototype. Here is how I built my first prototype: The agents live in a virtual environment (universe). The world contains foods and dangers. The agents can move and once they touch food they eat it and if they touch danger they die. The agents can sense the surrounding with a sensor like a simplified nose. The nose works by averaging the surrounding environment and dividing by the distance. The nose consists of three channels. One channel for food, one for dangers and one for other agents. They have another input for time and one for their energy level. Inspired by nature I gave them two noses with a small offset to allow stereo sensing. The outputs are two values for their movement velocity in x- and y-direction. Between the input and the ouput is the brain. The brain consists of McCulloch-Pitts-neurons. McCulloch and Pitts derived a mathematical model resembling the biological functions (cumulate the weighted input and use some special function, in math notation: o_i=f(\Sum_j w_jO_j)). There are more sophisticated neuron models out there [3]. As the goal is to find the brain and the used phenomena by evolution, a non-limiting topology was used, namely a fully connected brain: In the beginning each neuron is connected with every other neuron. There are no layers. Another specialty is the time discretizing. For each set of input the neuronal network usually creates an output set. Because the neuron activation only propagates to the next neighbors in each time step there is some small delay, as in a real brain. As I wanted to allow that the system can develop loops for something like short term memory in each update step every neuron only propagates to the neighboring neurons. In every update step, a new output is created. The brain could develop to have some delay, so it really needs time to think if longer neuronal paths should be used.

Screenshot of the system

As the purpose of data is not numbers but insights, as Richard Hamming said. I try to use visualization techniques. This is not a trivial task because in this field rarely visualizations are used. Therefore I have to find my own visualization techniques. The picture shows a visualization I made of the brain after some training progress. The neurons are the blue spheres. The colors indicate positive (green) or negative (red) weights in the connections, in biology they are called axons. The size of the neuron indicates the activation at the current time step. In the back are incoming sensory inputs which are connected to every neuron. Neuronal Network (the brain)

One can see that the numbers of neurons are quite small. The number of neurons can be chosen by evolution and it prefers less neurons. Each new neuron also adds a lot of new connections which add a lot of noise. Usually the agents don’t benefit from this added complexity.

Results

After observing the system for a while you can observe steps in evolution. Over a short time a new behavior, or feature, will be developed and replace the old species. Usually first the species develops moving forward. Later they discover that when you smell danger, that you should turn a little bit or move the other direction. Advanced species walk into the direction the smell of the food. The environment contains an amount of food such that the total amount of energy in the system always stays constant. When more energy is added to the system the reward for finding food is increased and more children can be created. If the energy is very low, the agents have to find a lot of it, therefore making it harder to survive and therefore to evolve. The hurdle for a new evolutionary beneficial step becomes higher. It can happen that the evolution is coming to a halt. We have a value range for the energy, in which the system is evolving. In some sense this behaves like a temperature in chemical or biological processes. We can map the speed of evolution to the parameter and obtain a graph.

A curve show theoretic result of parameter choice

Currently this graph shows only a model of what I observed. There is no easy way to measure the speed of information gain in the DNA. This parameter of energy amount (ea) is not a parameter we are looking to find. We are trying to find the genes from which we get intelligent behavior. The ea parameter says something about the system in which we search for our real parameters. We borrow the name from the field of machine learning and call this ea parameter a hyperparameter. In our system the amount of energy is only one hyperparameter out of many. Another related one is the distribution of the energy ranging from evenly distributed to concentrated on a single point.

Heatmap

What we learn from this is that there are many hyperparameters which must be fine-tuned manually so that virtual agents finds (the best) conditions to develop intelligence. That is no problem, one might say. We just systematically try many hyperparameter values and once we have our values we keep them. In machine learning this is called grid search, because every combination is tested. The picture shows an example of a grid search, where the rows are just different trials with different randomness and the columns different parameters of the energy distribution. One problem is how the success or "fitness" of one run should be measured. I chose the depth of the oldest family tree. This value is depicted if the cell's value.

Another interesting result is that trippy art emerges in the system.

Accidentally art

The system does not scale well

The system stops to continue to evolve at a certain point. Evolutionary steps become too hard; the environment is learned by evolution as much as it can. The complexity is too low, so the system gets stuck. Every DNA has developed relative to the ecosystem. Every genetic encoding can be represented as a point in a high dimensional space (phase space). We cannot modify the environment during a simulation because this is like moving the reference point of the phase space. The system evolves around the hyperparameters we set. They are like constants of our virtual nature. Therefore it does not help if we change them.

Further questions

The more I read about chaotic systems I questioned myself: Is artificial evolution really a chaotic system? To answer this question one has to compute the Lyapunov-exponent, a task which I am still occupied with.

Can we show something as an "edge of chaos" in such a system?

Can we make the evolution open-ended when we find a way to build self-organizing system? Consciousness needs symbolic representation. Can we support it to develop symbols? Will it start to develop philosophy and religion? How can we control the AI (alignment problem) when it becomes smart? Can it escape the virtual environment?

Building the system on GPU

On a more additional note, I want to share some thoughts on the computing part. This is probably only interesting for people who are interested in computations. In my first prototype, I computed the distance of objects based on the position of their center. In my second prototype, the dangers and food are stored in cells. The benefit of using cells is that for the costly sensory computation a kernel can be used. A kernel is a mathematical operation which uses some grid cells. A known size of cells can be cumulated with a mathematical operation to compute the sensory input. This means that we can just use local information to build the whole system. When we use only local information we have a system which is quite close to a cellular automaton. Some scientists argue that our whole universe is built up from a cellular automaton. We could copy the whole memory on the GPU and simulate each cell in a compute shader on the GPU. This means we get a massive performance improvement (maybe like 100 times).

Further reading

Larry Yaeger’s 2011 course „Artificial Life as an approach to Artificial Intelligence“ at Indiana University Bloomington

On the emergence of consciousness:

Douglas Hofstadter: I Am a Strange Loop

algorithm for neuroevolution

Sources/Footnotes:

  1. [1] Fogel et al., 1966, p. 11
  2. [2] David Vernom: Artificial Cognitive Systems: A Primer
  3. [3] e.g. integrate and fire model, Hodgkin-Huxley model

Recently I saw another article in a newspaper where the author held some misconceptions about intelligence quotients. This article made the connection to artificial intelligence which sparked my motivation to explain some things about quantifying intelligence. Intelligence tests are often subject to discussions. Intelligence test measure primary the performance an individual on said test. The result is called intelligence quotient (IQ). This argument is often used to invalidate the relevance of the IQ score. However, studies showed that the result is a good predictor of success in other areas of life. This psychometric test is a useful tool in diagnostic psychology.

Although they are mainly a tool of diagnostic psychology sometimes the term IQ appears in discussions about artificial intelligence. When analyzing intelligence there is the problem that there is no known method to quantify behavior and possibilities of cognitive systems. Because IQ is a famous measure to quantify intelligence people often come back to referring to IQ. If we had a number for intelligence we could optimize algorithms and artificial intelligence would probably a solved problem. As an alternative scientist try to evaluate cognitive systems via the environments in which they act. OpenAI uses Atari Games to measure the performance of their algorithms.

An IQ test has to be carefully designed, which is an expensive process. When lots of people perform this test you get a result where the distribution of the results is gaussian-shaped. The scaling is then chosen so that the expected value is 100. Using this knowledge how do we interpret an IQ score? An IQ score can be compared to the shape to see in which percentile of the population a score is in. By definition, the expected value 100 but a Gaussian distribution has another value: the standard deviation. For many tests, this is 15 e.g. Stanford-Binet and Wechsler adult. Cattell Culture Fair III uses 24. What you need to remember is that the test results are not comparable as the percentile is different depending on the test.

There are two reoccurring misconceptions about IQ tests. First: it is not possible to give someone an IQ without the individual taking the test. The IQ is the result in a test. Although a person might appear very intelligent or less so the results on the IQ test cannot be predicted with high confidence.

Second: Very high scores are not possible. Most tests are norm-oriented, which means that they are designed for the typical human. If you score very high or low the result is not as accurate.

What is the IQ score if every question is answered correctly? It is not possible to get IQ results which are above 160 points on Stanford Binet.

When people talk about IQ numbers combined with artificial intelligence they want to talk about a qualitatively new type of intelligence. A new type, which is on a next level, called artificial general intelligence or superintelligence. Because it does not exist yet it is hard to imagine. We may imagine a smart person or the artificial intelligence we know from the movies, but a super intelligence is far more.

Related and Recommended Reading:

Robert J. Sternberg: International Handbook of Intelligence

Raven’s Progressive Matrices

Some months ago I found a cheat sheet about navigation keystrokes. They are very helpful for improving your terminal performance. Every shortcut is usually documented at the right to the menu items. However this time you don't have that comfort. The handy cheat sheet has some flaws: Some Unix shortcuts do not work on macOS e.g. Alt+B creates the character "∫" instead of moving the beginning of the word.

I also found more shortcuts by trial and error and upgraded the graphic accordingly. You can get it here as png, pdf or omni graffle file.

Cheatsheet for iTerm
Diagram showing AI terms

If you sometimes use the macOS Terminal application I advise you to upgrade to iTerm 2.

If you do a web search for Deep Learning, Machine Learning and all the differences you will often find a graph which includes all those three terms. I think this can be done more detailed. I did not find any graphic showing more than three layers so I did my own. Open image in a new tab for full image display.

The structure is hierarchically so everything e.g. Convolutional Neural networks are used in Deep Learning, which is a part of the Neural Network Landscape etc. Circles which cut other circles mean that a part of this field or method is also part of another field or method or their classification is ambiguous.

Please write me if you have any suggestions or don't agree of my choice of circles.

Update: Thanks for Jonas K. for pointing out that Evolutionary & Genetic Algorithms are not a subset of Machine Learning but algorithms used in Machine Learning. I updated the diagram accordingly.

Diagram showing AI terms
Diagram showing AI terms

Some while ago I tried setting up my developer story at StackOverflow. The forms include a field to enter influential books. So I wondered which books I read and what my opinions on these books were. As I sometimes wish for good literature recommendations with short reviews this lead me to this article. Here is a list of books I read recently with a short review. Some include links to Amazon. I list the languages after the author. I am currently learning Chinese (我学习汉语), so I hope to add some Chinese books in another post in some years.

Rise of the Machines: A Cybernetic History

Book cover

by Thomas Rid, Available Languages: ENG, DE

Shines light on the history of computer technology. The early days are very interesting while the later days are probably known for people my generation or older. The historic importance of cybernetics is shown which is nowadays almost forgotten. Rating: 👍

Gnosis in High Tech und Science-Fiction

by Franz Wegener, Available Languages: DE

Book cover

This is again more about the history of high-tech. This book is in big parts an analysis of the American culture. Many Americans see themselves as Christians but they are in fact not. They are gnostic. Gnosis (realization) means the belief that humans are fallen gods or souls. We are prisoners of our biological bodies. As we realize it we can again rise to the divine. Gnosticism and High-Tech and therefore science wants to free us from our bodies. Gnostic ideas are found in religion, pop culture and high-tech. Wegener analyzes various case studies including the movie “matrix”. He tries a little bit too much to explain everything with gnostic. E.g. he tries explain with gnosticism why some American people have so many problems with abortions but misses some important details of the discussion. Overall the book is a collection of some history and analytics of society and pop culture. Rating: 👍

A Pattern Language: Towns, Buildings, Construction

by Christopher Alexander, Available Languages: ENG, DE

Book cover

A book on architecture, but it is more a book about a philosophy. It introduced “pattern thinking” which later was used by software engineers (gang of four) and UI designers. Only after software engineers started using patterns it got the attention it has earned. This book kickstarted my interest in architecture as a way to design (local) society and daily interactions. Rating: 👍

Genetische Algorithmen und Evolutionsstragien

by Eberhard Schöneburg & Frank Heinzemann & Sven Feddersen, Available Languages: DE

Book cover

The authors explain the differences in evolutionary algorithms, present the notations and give an overview over the methods. There are two theories: Genetic algorithms and evolutionary strategies. Researchers are divided between the two theories. Using information theory the format of encoding should not matter. This is the reasoning which also the advocates of evolutionary algorithms use. Rating: 👍

Wer bin ich — und wenn ja wie viele — Eine philosophische Reise

by R. Precht, Available Languages: DE

Book cover

This book is not really about high-tech but about philosophy. There is no way around philosophy when we talk about science and high-tech. Precht covers every major philosophical question and focusing on modern-day philosophy taking modern research into account. He does not introduce some new ideas so I recommend that you read an equivalent book in your native language. This book introduced me to many philosophical ideas like that there may be no single ego. Rating: 👍

Reality is broken

by Jane McGonical, Available Languages: ENG, DE

Book cover

McGonical argues that games can be used for good. However, the arguments are not convincing. It sounds like McGonical tries to dissolve cognitive dissonance found in the games industry. “Hey, we are investing our careers into games. I don’t have the feeling that we are doing something meaningful with our lives. Let’s look for things where games are improving the world“. The book itself contains some hints that this was the idea for writing this book. I quit the games industry after I ran into this same issue. In fact, the gaming industry is nowadays worse than casinos. This book could have convinced me otherwise but it failed. Rating: 👎

The Singularity is Near

by R. Kurzweil, Available Languages: ENG, DE

Book cover

Kurzweil is not only famous for his inventions but also books about technology and futurism. The technological singularity is the concept that accelerating technologic progress will come to a point where it grows so fast, that it leads to a singularity. Our last invention. We will upload our brains to the computer and merge with AI. Death will become only an option. Many areas of technology are covered including medicine, genetics, nanobots, and computer technology. In parts, this is not an easy read e.g. when he talks about details of neuroscience or the blood-brain barrier. In some parts it becomes repetitive when the main argument is understood, but he keeps repeating the arguments. Rating: 👍

Introduction to Cybernetics

by W. Ross Ashby, Available Languages: ENG

Book cover

It may have been groundbreaking in introducing cybernetics, but today most of the ideas are taught indirectly in a computer science degree. Reading this means understanding many interesting notations but with little new insights. Rating: 👎

Introduction to Artificial Life

by Christoph Adami, Available Languages: ENG

Book cover

Good summary of complexity theory, research on artificial life, and information theory. See evolution and biology from a different perspective using the tools of mathematics and information theory. Rating: 👍

From computer to brain

by William W. Lytton, Available Languages: ENG

Book cover

This book starts by introducing the basics of neuroscience and computer science. It has some weak chapters which are not really relevant to the main idea or where I disagree. Lytton does not really separate information from encoding. For the computation result representation does not matter. What is the difference between 0b011 and 3? They both describe the same number. My critique boils down to the quotation “A message of this chapter is that hardware molds software.” This is only true if we are optimizing performance. And this is certainly not the right book for software optimization strategies. I still have to finish it and hope to find more ideas than perceptron neuronal networks. Rating: 👎