Genetic algorithms (GA) are directed search algorithms based on the mechanics of biological evolution. They are powerful tools capable of generating solutions to complex optimization and search problems that would otherwise be extremely hard or time consuming to complete by human trial and error alone.
Components of Genetic Algorithms
The basic principles of GA are the same as biological evolution, only applied in a coding context. The sequence to develop them can be broken into six stages, explained here:
The encoding technique is how you represent a single individual in the population. Each individual has a configuration that represents the biological equivalent of genes on chromosomes. The unique configuration of each individual is a potential solution to the problem you are trying to solve.
Any data structure can be used for encoding (binary, arrays of numbers, permutations of elements, or any other structure), as long as it is configurable and can be edited/rearranged.
Initialization is the term used for building a population of individuals that will be defined by your encoding technique. A population can be generated randomly, pulled from the recent results of a GA run, or taken from another data set.
How you generate your initial population can have an effect on how the population converges on a solution over time. If your initial population is composed of very similar individuals, you risk not having enough genetic variation to improve upon fitness over time to find a solution. Both the size of the initial population, and the variation between individuals, are important factors to consider.
The fitness function is the most important component, and also the most difficult part to design.
In biology, the most fit individuals are those that perform best in their environment and are most likely to pass on their genes to the next generation. Similarly, in a GA, the individuals that perform best are the closest solutions to the problem, and will need to be identified so you can select them in the reproduction stage.
Thus, the fitness function is critically important to the success of the program. It defines and measures the strength of any particular individual in the population at solving the problem. To judge the individuals, your fitness function must address questions such as:
- Which attributes of an individual make them strong (better at solving the problem than other individuals)?
- How do you compare these attributes to others in the population to determine who is most fit?
While developing the fitness function, you also need to determine how close to an actual answer your solution needs to be. Not all problems have an absolute solution. If there won’t be an absolute solution, the fitness function needs to recognize acceptable thresholds to accurately measure and compare individuals.
Selection Of Parents (Reproduction)
In a GA, selection is the process of choosing which members of the population will be parents for the next generation. Not all individuals may get to reproduce, especially if they are very unfit (determined by the fitness function). Although you can set the selection process to be random, it is usually best to select from a percentage of the most fit individuals, while mixing in some less fit members to maintain variation in the next generation. Some parents may also be allowed to stay in the population for the next generation if they are very fit, or you wish to increase randomness in the program.
Recombination And Mutation
The genetic processes of recombination and mutation are applied in a GA to facilitate the creation of children. Once parents are selected, they reproduce to create children who have a data structure that is a combination of their two parents. In biology, offspring have a 50:50 combination of their parents’ genetic information. During recombination in a GA, you can define exactly how much content is provided by each parent. This could be set to 50:50, or any other percentage you wish to test.
Mutations are random adjustments of the encoding of an individual, similar to the process of random mutations in the human genome during reproduction. They increase variation in the next generation, and can sometimes lead to increased fitness of an individual.
The termination criteria is the condition that will end the GA when a sufficient answer is reached. Sometimes the end point is clear, and other times a threshold of quality that is acceptable as a solution must be defined.
Another possibility for termination criteria is to quit the run once maximum fitness has not been improved over a certain number of generations. In this case, the individuals are not getting any better at solving the problem, and it is best to stop the process and use your findings to tweak the program for the next attempt.
How The Components Run Together
Take the example of deciphering an encrypted string of text, where the solution is a complete sentence in English. First, you would define your individuals, build the starting population, and begin evaluating the strength of the individuals at deciphering the text. Some of the fitness function questions you might use to determine how fit any given individual of the population is are:
- Has it solved a word? Has it found single letter words? Solving a really big word is awarded a greater number of fitness points.
- Has it found any trigrams? These are common 3-letter groupings found in language that may be repeated throughout the text. For example, individuals that find “ing” or other trigrams would be awarded additional fitness points.
- Has it found any double letters? There are only a select few letters that occur twice beside each other in the English language, so those can be awarded more points if an individual finds them.
The GA would then continue on repeatedly evaluating individuals, choosing the most fit, selecting parents to reproduce, and using recombination and mutation to generate offspring until a termination criteria is satisfied.
Strategies For Using Genetic Algorithms
Overall, GA can be powerful problem solvers because of their ability to quickly test a vast number of options, continually improving with each generation to find the best solution possible. However, sometimes they can get stuck on a problem and may require intervention to get onto the right track of increasing fitness to find a solution. Some strategies to consider are:
- Increase the size and variation of your initial population
- Toss in randomly generated individuals (can increase variation)
- Parallelize the algorithm to speed up each generation
- Trigger a purge, or a mass mutation, if the population stagnates
It takes a particular type of problem to justify the creation of a genetic algorithm, but it is incredibly satisfying to standby and watch the processes unfold to solve a problem in a fraction of the time it would take a human to accomplish.