Genetic programming

Genetic Programming can be thought of a Machine Learning paradigm, where our "model" is a computer program. A typical Genetic Programming run has the following steps(in order):

  1. Initialize a population of random programs
  2. Do this times: Select 2 individual from the population, mix some parts from the first and some parts from the second and create 2 new individuals. Add new individuals to the population. (Crossover)
  3. Do this times: Select one individual from the population randomly and apply a small change on it. (Mutation)
  4. Do this times: Select one individual from the population, copy it and again add it to the population. (Reproduction)
  5. Remove the worst individuals from the population till the population size is again . (Selection)
  6. Repeat steps from 2 to 5 till some "End condition" is met.

Basically speaking, we initialize a population of random programs, do crossover, mutation and reproduction operations on them, select the best from them and repeat the process for some time. When we do so, after each "generation", the name we give to steps from 2 to 5 which are repeated, the population tends to become better and better. There is no mathematical proof that this will lead to a solution. But from experiments we do, we know that most of the time, this process, called "Genetic Programming", or "Evolving programs", will give us some good programs suitable for solving the problem at hand.

The above section was vague. In this section, we use Python to write a small experiment to put those techniques into work. Of course, you can write it in any language you are comfortable it. The reason to use Python here is that it's easy to learn and use, and that Python code is almost as readable as pseudocode for people without prior experiment with it.

First, we write a Program class. This class supports these methods:

  • mutate or mutation method. This method applies a very small change to the program.
  • __add__ or crossover method. This method mixes 2 individuals and results in 2 new individuals.
  • clone or reproduction method. This method generates a new program from the same program by coping it.
  • fitness method. This method evaluates how "good" this program is for solving our problem. This measure is called "fitness" and the higher, the better our program is. When a program solves our problem very good, we say the program or the individual is "very fit". Later we will select only most fit programs to stay alive for the next generation.
  • __init__. This method generates a new random program with the given size.
  • evaluate for execution of our program for some input


Now a good question is, how are programs should be? In other words, what representation should our program have? There are several representations, each perform better on some class of problems and worse on some other classes. NFL theorem or No Free Lunch theorem tells us there is no best way to fit every single problem we have.

When John Koza invented Genetic Programming, he used Lisp S-expressions to represent programs. In this representation, the program is a expression tree. The tree has a root, usually a mathematical function or operator. The leafs of the tree are either program constants, or problem inputs. In this representation, the program has exactly one output. As we can see later, there are representations in which the program can have more than one output. The S-expression representation today is known as many names, including Tree based Genetic Programming, Koza style Genetic Programming, Standard Genetic Programming or just Genetic Programming.

An expression tree.
An expression tree.

Shortly after publish of Koza's work, another researcher proposed Stack based Genetic Programming which the problem inputs, constants and operators all live on a stack. His experiments showed that this representation performs drastically better than Tree based one on some problems.

Let's go through an example for executing a program in this representation. Assume a simple example, multiplying 2 by 2 then summing it up with 12. We have this stack:

[2] [2] [*] [12] [+]

As you may have guessed, each operator's operand comes before the operator. So for the multiply operator, those 2s are the operands. And as for the sum, it's result of the multiplication as 12. Here's a Python code to execute a stack with only multiplication and sum operators:

def evaluate(stack: list[str | float]) -> float:
    args = []
    for item in stack:
        print(args)
        if item == "*":
            arg0 = args.pop()
            arg1 = args.pop()
            args.append(arg0 * arg1)
        elif item == "+":
            arg0 = args.pop()
            arg1 = args.pop()
            args.append(arg0 + arg1)
        elif type(item) == float:
            args.append(item)
    return args[-1]

You can see that, if there are numbers left behind in args variable, it won't count and only the last number is returned. So this stack will output the same 16.0:

[2] [3] [2] [*] [10] [+]

After executing the stack, these will be left in args:

[2] [16]

and we only choose the last element, here 16, as the output of our program. We could take any other approach. But this approach is easier and cheaper, as in development time, than most other approaches. It is also simpler for teaching purposes.

Now we want to write a program which accepts and as number inputs and computes some function on them, let's say . The there stands for base 2 logarithm which is frequently used in CS. Here is a stack which represents the above function:

[m] [lg] [n] [2] [*] 5 [+]

Here's another one:

[n] [m] [lg] [swp] [2] [*] [+] [5] [+]

Where swp swaps place of 2 last arguments in args. To better under stand how it works, here's a break down of steps for m = 16 and n = -1:

[-1] [16] [lg] [swp] [2] [*] [+] [5] [+] ; initialization
[-1] [4] [swp] [2] [*] [+] [5] [+] ; lg
[4] [-1] [2] [*] [+] [5] [+] ; swap
[4] [-2] [+] [5] [+] ; multiplication
[2] [5] [+] ; sum
[7] ; result

But why add another operator and expand the search space? You can take whichever approach. But the second approach has shown better results for some problems.