Irreducible Complexity – the world is too complex to have evolved! Let’s see…
This project was born on a plane over the Atlantic last week. With such a long title to this post I thought about breaking it up into multiple posts but got lazy and figured if you’re on this site to begin with, chances are you’re familiar with all these concepts already so here goes.
First a refresher:
The Travelling Salesman Problem (TSP) is an NP-hard problem in combinatorial optimization studied in operations research and theoretical computer science. Given a list of cities and their pairwise distances, the task is to find a shortest possible tour that visits each city exactly once.
The most direct solution would be to try all permutations (ordered combinations) and see which one is cheapest (using brute force search). The running time for this approach lies within a polynomial factor of O(n!), the factorial of the number of cities, so this solution becomes impractical even for only 20 cities.
Instead of Brute Force I will try to converge on the solution using a genetic algorithm.
At this point you may want to flip to the finished project using CHROME 7+ or Firefox 4+ or Opera 10+
Genetic algorithms are those which can be said to exhibit the characteristics of evolution, i.e. a coded sequence, mutation, a fit function and iterative progression. The typical flow is as follows:
- Develop a sequence of instructions (the DNA)
- Develop a large number of these sequences (the population). In the code I called these “strands”.
- Score the population for fitness.
- Sort by fitness score and select the top-performers.
- Generate a new population based off the top-performers.
- Modify or mutate this new population and repeat the process with this new generation.
Over time the population will tend to converge on a solution. This is a genetic algorithm and it’s not guaranteed to give the absolute best solution. As you can guess the rate at which the population converges on the best solution is determined by a number of factors:
- Population Size
- Population diversity
- Mutation Rate
- Fitness Function
- The number of generations
- The number of “parents” selected to breed from.
A closely related process is that of “Hill Climbing”. Hill Climbing I would say is more deterministic (less probabilistic). A typical hill climbing algorithm is as follows:
- Develop a sequence of instructions (the DNA)
- Score the solution for fitness.
- Modify or mutate this solution and score.
- If the child is better than the parent it becomes the new parent.
- Repeat the process (go back to step #2).
So what’s the difference? Well for one, with hill climbing we only ever advance if the new descendant is better than the current solution. With each generation you always converge. With a Genetic Algorithm you advance regardless and hope for convergence over time. Convergence in a GA is not assured.
A more subtle difference is that the Hill Climbing algorithm is fixed on a specific path. It always chooses a better solution, even though there is a chance that this vector will not ultimately result in the absolute solution. However in a GA, the algorithm chooses a set of “good” solutions to breed from, and hence includes the possibility that a current inferior solution might ultimately lead to a better absolute solution overall.
This GA approach mimics real evolution much more closely. It allows for the possibility that a currently superior branch might die off and yield to a currently inferior solution. I think of this as more of a “strategic” solution than a “tactical” hill climbing solution.
Applying GA to the Traveling Salesman Problem I ended up with this:
- I create a set of random cities within a specific area.
- Then I create a route through all cities (dna in the source code).
- Then I score the route by calculating the overall length of the route. A score + dna = a single strand.
- Then I create a population of these strands.
- I sort by fitness score and select the 5 best regardless of score.
- I create a new generation based on these 5, mutating approximately 80% of the time.
- A mutation is basically swapping two cities at random. There is no intelligence to the mutation.
- Score this new generation and repeat the process (go back to step 5).
Enter web workers.
I’m using them as follows:
- Create a large population of possible solutions (my solution uses 1000)
- Create a number of workers, my solution uses 10.
- Split up the population across the workers, giving each 100 solutions to work on.
- Mutate and score each one within the worker (10 threads operating in parallel).
- Join back and Collect the results.
- Sort the entire population and repeat.
Couple of advantages here are that not only do we get parallel execution and finish processing quicker, we don’t hang up the UI thread so the browser remains responsive throughout.
Web Workers are great for processing but little else. They cannot access the DOM, the cannot generate page events, they cannot do much of anything. What they can do is receive information from the parent page, process information and send it back. This information is limited to strings. You can serialize your objects to a json string and then do an eval on the worker side but this ultimately will be done for you by the browser. So far only Firefox and Opera are capable of serializing JSON objects between the page and the worker thread. Hence the example solution only works in these browsers.
A couple of things I noticed in working with this approach.
- A larger population does not always guarantee a better solution. There is a balance for any given problem space and fitness function.
- Mutation rate: You cannot simply change EVERYthing on each generation. You must maintain some of the core DNA in order to improve. Otherwise the solution is too random and does not converge.
- Similarly, too little mutation and the solution converges too slowly.
- Number of Breeders – you would think breeding from the single-best solution at any given point would be ideal. Surprisingly the system did better when a higher number, say 10% of the population is used to breed from. Too many and again you don’t converge. Too few and the system can be boxed into a dead-end solution ala the Hill Climbing approach.
The key thing to remember is at no point have I written an algorithm to find the shortest distance. I’ve simply put pieces together in a closed system, assigned probabilities and set the thing in motion. Even still it’s pretty fascinating.
Over the course of the past week I’ve run this thing many times. It’s the type of thing I can waste hours with. I tried various combinations of mutation rate, population size etc to get the right balance. Initially the system did not converge at all. Then I found it would converge but very quickly plateau and run out of ideas so to speak. Amping up the mutation fixed this but negatively impacted convergence, so I tweaked the number of breeders a little. Back and forth until I got it to where it is today. It’s fun to watch the code try to figure out new pathways, occasionally it’ll drop a segment in favour of a different route, only to bring that same piece back in later. This is where my knowledge of GA falls short. I’m not sure I’ve gotten it right and to take it any further would require some heavy duty research. It’ll suffice for now.
This is a crude system with only 1 test for fitness and no competing goals. Imagine scaling this up to “earth” scale with billions of competing factors, each of varying weights, trillions of instructions per individual, 3.5 billion years of evolution and a system that feeds BACK on itself, altering the fitness functions as it goes. The possibilities are astounding.
As you can tell I had a lot of fun with this one. Genetic algorithms are a powerful concept, particularly when dealing with difficult to solve problems (e.g. NP-hard). The outcome is highly probabilistic and how the algorithm progresses essentially mimics our own progression as life on planet earth. Too cool.
Now flip to the finished project using CHROME 7+ or Firefox 4+ or Opera 10+