In this article I’ll try to summarise a series of posts I did on my company internal blog about genetic programming.
Some time ago I wrote a random program generator (based on a 1000 hex values genome), then slapped on top a mechanism to breed and mutate those programs and finally added some natural selection to favour programs which can perform a specific function (in this case the addition of two numbers).
After about a thousand generations and 63 thousand or so generated programs (randomly or through breeding or mutating) evaluated for fitness against adding two parameters, I got m63113.py (the program I generate are in Python, at that time I wanted to learn a bit more about Python so that was a good excuse). Python not so hard to understand, so even if you don’t know Python you can look at this program and see how “nature” made it evolve to perform its function.
“Natural” selection led that system to generate a program which performs the function I was pressuring them to perform through “nature”. Well, it is quite interesting from my perspective.
After successfully synthesising a program which adds two parameters, I decided I should try to evolve a little bit more complex program and set on evolving a program which returns a sinusoidal function. I probably have to mention that the function math.sin() is not one of the primitive I allow in the genetic program building blocks. So the genetic program will have to evolve a sinusoidal function by itself.
The picture above is the representation of what I got with that evolution. Genome c20035128 has evolved program c20035128.py which as expected is totally unreadable. I’ve tried to simplify it a bit by removing unnecessary lines and do some simplifications overall but it is still quite cryptic to understand…
# DNA = 0ba0dd89e70cc3d135a12299e41faa243c748[...] def simplified(parameters): variables = [0 for i in range(10)] # Initialise all variables to 0. variables = 416 - parameters if parameters: variables = variables & ~1714 variables /= parameters if variables: variables *= variables % 2147483647 variables = variables & 4066 variables = variables | 651 variables = variables ^ parameters variables += 124 variables -= variables variables = variables & ~parameters variables = variables & ~variables variables %= 64 variables += variables variables += variables variables += 1098 variables = ~variables & parameters variables = variables & ~variables if variables: variables = variables ^ variables return variables return 0
I guess this difficulty to understand a genetically evolved program must be expected… just think about the geneticists who are trying to understand the genetic code of any living organism! Not only it is not readable like python code (in this case), but they probably start with no idea of what could do what where… I raise my hat to them!
One thing we can notice is that flat zero value on the 180-360 degrees… That lead to some lesson learned. As you get to be an “older” developer (in my case I’ve started programming at 12 years old and been doing it professionally for more than 20 years now), you tend to believe you’ve seen it all. Even if you try to keep humble about it, you still deep down think you know it all… So for this Genetic Programming exercise I decided to learn Python at the same time. Python is an interesting language, but it comes with some peculiarities.
The last picture I showed (generation of a sinus) through evolution algorithm looked good from 0-180 degree but was a flat line from 180-360 degree. There were two reasons for that. First because the way modulo works on Python (modulo of a negative number is positive) which was skewing the output toward a positive value (a problem for the sinus on the 180-360 range…). My bad, I was assuming another kind of modulo from other programming languages. Second, the fitness function used the operation string.isdigit() which I wrongly assumed would take negative number as digit… not so fast. That function in Python only recognises natural numbers (positive integer). So the negative numbers were interpreted by the fitness function as not a number and were given a bad score…
So, if you learn a new language (or anything), try to make yourself believe you DON’T know it all… deep down inside you’ll still think you know it all 😉 but, maybe, it might help a little.
Since those were two major problems, I rewrote those parts of the code and slapped at the same time some client-server code so that a primitive MapReduce algorithm could be exploited by running multiple instances (with a leader board as the other instance are run by coworkers around me and it motivates them to know they can have high score 😀 ) and we restarted the machine. So the curve is not as good yet, but it shows promises in the negative realm or 180-360 degree!
So how does it works? First I generate a 1000 character long hexadecimal number, the Genetic Code or Genome of the program to be generated. Then I use that Genome to create a Python program. The hexadecimal values in the genome translate to Python code, in fact to a subset of what can be written in Python. Obviously I had to put some “rules” in place to generate executable code. Rules such as:
- We consider only the first 10 parameters, parameters which must be integers.
- We give to the program a pool of variables to use, 10 of them, again integers.
- The considered operations are: Assignments, Basic Math, Boolean, If, While and Code block.
- Assignments to a variable can be made from a parameter, from another variable or from a value generated from the Genome.
- Basic math are: +=, -=, /=, *= and %= chosen based on the Genome.
- Boolean operations are: &=, |=, ^=, &~=, |~= and ^~= chosen based on the Genome.
- While loops are bounded to a maximum of 100000 iterations to prevent dead lock.
- Integers (variables, parameters or values) are always bounded to a MAX_INT value especially since Python allows us to grow an integer to “infinity” which makes calculation long really quickly e.g.: having a *= within a while loop
So as you see, we need a number of rules to make it workable. And those are just the tip of the iceberg, there’s still more required to make it work.
So one learning is that the more complex the language is, the more “rules” are required in order to create “working” programs most of the time. By “working” programs, I don’t mean that the generated program will do what we want of it… that’s evolved with generations, but that it will at least run without errors and terminate in a finite time most of the time (otherwise, natural selection just takes too long).
Based on that fact I was thinking of using a simpler language for Genetic Programming. It happens that I recently read about the account of some peoples who dabbles with Genetic Programming as well: Kory Becker and Anthony Liu just to name those two who have interesting articles about their tries. They have in common that they started their trials using Brainfuck. If you don’t know Brainfuck don’t worry… it has originally been created to “play” with your brain (not to use the other word). Brainfuck is really simple, it contains only 8 primitives, but still with those 8 primitives it is Turing complete. The operations allow you to increments/decrements a value at the current data pointer, or increments/decrements the current data pointer. Do conditional checks on zero or non-zero values at the current pointer. And finally the last two operations are to output the ascii value at the current pointer, or input from the user and put the character value at the current pointer location. Can’t be simpler…
So for example Hello World! Looks like this:
So the usage of Brainfuck doesn’t look so funny after all. From my side of things, I still think I can produce interesting genetic programs following the constrained Python way, and I want to add more diversity to the generated programs. I want to add the handling of strings for example. So I’ll continue to look at that path when I get some time.