Abstract:
In tree-based genetic programming (GP) there is a tendency for the program trees to increase in size from one generation to the next. If this increase in program size is not accompanied by an improvement in fitness
then this unproductive increase is known as bloat. It is standard practice to place some form of control on program size. This can be done
by limiting the number of nodes or the depth of the program trees, or by adding a component to the fitness function that rewards smaller programs (parsimony pressure) or by simplifying individual programs using
algebraic methods. This thesis proposes a novel program simplification
method called numerical simplification that uses only the range of values
the nodes take during fitness evaluation.
The effect of online program simplification, both algebraic and numerical, on program size and resource usage is examined. This thesis also examines the distribution of program fragments within a genetic programming population and how this is changed by using simplification.
It is shown that both simplification approaches result in reductions in
average program size, memory used and computation time and that numerical simplification performs at least as well as algebraic simplification,
and in some cases will outperform algebraic simplification. This reduction
in program size and the resources required to process the GP run come
without any significant reduction in accuracy. It is also shown that although the two online simplification methods destroy some existing program fragments, they generate new fragments during evolution, which
compensates for any negative effects from the disruption of existing fragments. It is also shown that, after the first few generations, the rate new fragments
are created, the rate fragments are lost from the population, and the
number of distinct (different) fragments in the population remain within
a very narrow range of values for the remainder of the run.