
Genetic programming is a new technique for machine learning, program induction and optimization loosely based on an evolutionary paradigm. Genetic programming is easily amenable to parallel computing which help relieve the intrinsic slowness of the approach. We describe a parallel implementation of genetic programming on the T3D computer. We apply the system to a problem of induction of binary decision diagrams used in logical circuit design. It is shown that the results depend in a critical way on the representation of the decision diagrams and that the parallel implementation is able to find the correct solution with less computational effort than the sequential version.
Genetic programming (GP) is a variation of genetic algorithms in which the evolving individuals are themselves computer programs instead of fixed length strings from a limited alphabet of symbols [6]. Programs are represented as trees with ordered branches in which the internal nodes are functions and the leaves are the so-called terminals of the problem. The search space in genetic programming is the space of all computer programs composed of functions and terminals appropriate to the problem domain.
Suitable functions and terminals are determined for the problem at hand and an initial random population of trees is constructed. The population then evolves with fitness being associated to the actual execution of the program and with genetic operators adapted to the tree representation. The crossover operation first selects a random crossover point in each parent tree and then exchanges the sub-trees, giving rise to two offspring trees. Examples related to our application will be given in the next section. There are also provisions for preventing trees from becoming too deep, for simplifying trees and for compressing trees that perform a useful functions into a single reusable module.
Genetic programming is well-suited to parallel implementation. The most popular parallel models are the fine-grained or grid models, and the coarse-grain or island models. In the grid models, large populations of individuals are spatially distributed on a low-dimensional grid and individuals interact locally within a small neighborhood. In the island model the population is subdivided into smaller subpopulations which evolve independently and simultaneously according to a standard EA. Periodic migrations of some selected individuals between islands allow to inject new diversity into converging subpopulations. Micro-processor-based distributed memory machines and workstation clusters are well adapted for the implementation of this model. The advantage of parallel EAs for difficult problems is that they can handle larger populations in reasonable times and favor cooperativity in the search for good solutions ([5], [15]). In the following section we introduce the binary decision diagrams optimization problem and explain in more detail the parallel GP implementation used to solve it.
A binary decision tree is a binary decision diagram that respects a third rule of assemblage: any entry point of a node is connected to only one preceding node.
Since [8] it has been demonstrated that all logical boolean function can be represented by a binary decision diagram. This type of representation finds applications in the test and the implementation of logical functions [3]: the function of a decision node can be implemented by a multiplexor or demultiplexor circuit and a binary decision diagram can be implemented by an interconnection of these circuits. In all these cases, the minimalisation of the number of nodes used is important, for the cost and/or the time of execution of the function. Nevertheless, the complexity of this minimalisation is such that in most cases approximate solutions are accepted [12].
A renewal of interest on the minimalisation of binary decision diagrams is born with the appearance on the market of programmable circuits named FPGA (Field-Programmable Gate Array) [13] [10]. These circuits appear under the form of an array of identical cells (the logic cells), where the user can program the function inside every logic cell (among some possible) and interconnections between cells. Each FPGA manufacturer proposes a different type of logical cells and interconnections. Some, as Actel [1], propose very simple cells, formed of a simple multiplexor circuit. A minimal binary decision diagram can therefore drive to an optimal utilization of the FPGA cells.

Figure 1
In preparing to use genetic programming to solve a problem one has to decide on the set of terminals, the set of primitive functions, the fitness measure, the stopping criterion and the values of some parameters such as population size and crossover rate [6]. The fitness of an individual is defined in our case to be the difference between a perfect solution and the actual number of hits of the given individual on all the input combinations of values. Therefore, a fitness value of 0 means that the individual correctly solves the problem.

Figure 2
While the choice is seldom unique, the nature of the problem suggests suitable terminals and functions. For the representation of binary decision diagrams we tried two representations. In the first one the terminal set was made by the operation codes and the output codes. The only function operating on those terminals was the three-branches IF function. Random-constructed trees are not guaranteed to be valid decision diagrams for our problem since the IF function itself can be the first argument (i.e. the test condition) of another IF function. Furthermore, there is nothing to prevent an output code from also being the first argument to the IF function. Finally, input variables can only appear as test conditions. Likewise, crossing-over valid trees will not necessarily yield admissible offspring (see fig. 1). We therefore penalized invalid trees by giving to them the worst fitness value in order for them to be less likely to be selected for reproduction.
In the second solution we avoided mixing terminals by creating a specialized IF function for each operation code. Example diagrams and a crossover operation are shown for a simple case in fig. 2. With the latter choice of functions and terminals all trees are guaranteed to be valid. Results of the runs of parallel genetic programming using both representations will be discussed in section 3. We now briefly describe our parallel GP implementation choices.
The parallel architectures that best matches the rather coarse grain and variable length of genetic programs are micro-processor-based distributed memory machines, including workstation clusters. On these machines it is easy to implement the island model. Good results were obtained in [9] and [7] using a similar computing architecture.
The T3D multicomputer [4] is a DEC/Alpha-based MIMD machine. The processors are connected by a fast bidirectional 3-D torus interconnect network.The memory of the machine is physically distributed although, depending on the programming model used, it can be globally addressable. Communication latency is low and bandwidth is high due to latency hiding and data transfer optimized hardware and easy routing mechanisms. The T3D array is connected through I/O nodes to a Cray Y-MP host machine on which all program development takes place. Access to peripherals such as disks, tapes and the network is through the host. Three different programming models are available on the T3D: message-passing, work and data sharing using a global address space (CRAFT) and data parallelism.
The message-passing approach perfectly suits the island model for parallel genetic programming. It is based on PVM which is a standard message-passing environment.
We started from the publicily available sgpc GP program [14]. The PVM-based code parallelization was easy except perhaps for the modifications needed to pack and unpack program trees to be sent to other subpopulations (islands). This requires linearizing the trees to be packed in a message buffer and rebuilding them at the destination subpopulation in the new processor's private address space. For efficiency reasons, we pack all migrating individuals in a single message, which minimizes message startup and transmission time.
After code parallelization, suitable values for a number of parameters of the distributed algorithm must be chosen. Besides the usual global population GP values for each subpopulation one has to define the topology and the number of the communicating subpopulations, the size N of the subpopulations, the migration frequency M, the number of migrating individuals K and the individual replacement policy. We found suitable values by trial and error, running many times the parallel algorithm on the well-understood multiplexor problem [6]. The size of the subpopulations for the present problem was thus set at 400, migration took place every 7 generations and the number of individuals exchanged was 8 to 10% of the subpopulation size. These values were close to those used in [9] and in [7].
We experimented with only one processor topology: the ring, and we tested two synchronous exchange policies: simply passing individuals to the next island in the ring, alternating directions at each swap, and passing individuals "modulo" the swap number (fig. 3). The "modulo" swap gave the best results and we retained it for the subsequent tests. We choose to migrate the best K individuals from each island.

Figure 3
The replacement policy used was that the new K individuals displace the worst K individuals of the receiving population. Here also other alternatives are possible. A GP run is terminated either by finding a solution (i.e. a 0 fitness individual) in any subpopulation or by reaching a maximum number of generations. The following pseudo-code gives a schematic description of the algorithm:

Figure 4
The second terminal and function set choice proved to be much more adequate. The parallel GP was able to find the correct solution on most of the 30 runs. A typical successful run is graphically depicted in fig. 5. Comparing it with fig. 4, it is seen that the average fitness is much lower and that the run quickly converges to a 0 fitness solution.
Not only did the parallel algorithm perform much better from the point of view of computing times, which was obviously expected, it also converged more often to the correct solution than the sequential one using a smaller number of fitness evaluations i.e., with a reduced computational effort. For instance, a particular run on a 8-processor system took 33 seconds to complete successfully with a population of 400 individuals per processor whereas for the same total population size (i.e., 400 x 8 = 3200) a sequential execution on one processor took approximately 10 minutes to complete. We observed the same effects in parallel GA systems [9] and the detailed results reported in [7] also agree with this general trend.
It is difficult in the present case to generalize this results. Indeed, we observed that the speedup fluctuates if processors are added to the system. We think that this phenomenon is due to the particular problem treated here, which seems to be not hard enough to require all that parallel processing power. In fact, with more processors one could use larger populations. But larger populations are not needed in our case and subdividing a relatively small population on more processors increases communication overheads and diminishes subpopulation diversity. Preliminary results on a more difficult problem in evolving financial trading models with a parallel GP system showed a more consistent behaviour with nearly linear speedups (work in preparation). Actually, we found that this last problem is only tractable within reasonable time limits by using parallel GP.
In conclusion, in this work we implemented a parallel GP programming model and we did preliminary experimentation with binary decision diagrams optimization problems. Although parallelism was moderately beneficial in this particular case, more complex problems will benefit even more, allowing larger populations and more difficult fitness functions to be treated. The work of others [7] and our own current work in this direction is promising.
[1] Actel. FPGA Data book and design guide. Sunnyvale, Calif., 1995.
[2] S. Arnone, M. Dell'Orto, A. Tettamanzi and M. Tomassini, Highly Parallel Evolutionary Algorithms for Global Optimization, Symbolic Inference and Non-Linear Regression, Proceedings of the 6th International Conference on Physics Computing, European Physical Society, Geneva, 51-54, 1994.
[3] E. Cerny, D. Mange and E. Sanchez, Synthesis of minimal binary decision trees, IEEE Transactions of Computers, 28, 472-482, 1979.
[4] T3D Software Overview Technical Note, SN-2505 1.1, Cray Research Inc., 1993.
[5] V.S. Gordon and D. Whitley, A Machine-Independent Analysis of Parallel genetic Algorithms, Complex Systems, 8, 181-214, 1994.
[6] J.R. Koza, Genetic Programming, MIT Press, Cambridge, MA, 1992.
[7] J.R. Koza and D. Andre, Parallel Genetic Programming on a Network of Transputers, Computer Science Department, Stanford University, Technical Report CS-TR-95-1542, 1995.
[8] C. Y. Lee, Representation of switching circuits by binary-decision programs, Bell Syst. Tech. J., 38, 985-999, 1959.
[9] A. Loraschi, A. Tettamanzi, M. Tomassini and P. Verda, Distributed Genetic Algorithms with an Application to Portfolio Selection Problems, in Proceedings of the Int. Conf. on Artificial Neural Nets and Genetic Algorithms, D.W. Pearson, N.C. Steele and R.F. Albrecht (Editors), Springer-Verlag, 384-387, 1995.
[10] P. Marchal and A. Stauffer, Binary decision diagram oriented FPGAs in Proceedings of FPGA'94, 2nd International ACM/SIGDA Workshop on FPGAs, Berkeley, Calif., Feb. 13-15, 1-10, 1994.
[11] Z. Michalewicz, Genetic Algorithms + Data Structures = Evolution Programs, Springer-Verlag, Second Edition, Berlin, 1994.
[12] B. M. E. Moret, Decision trees and diagrams, Computing Surveys 14, 593-623, 1982.
[13] J. Rose, A. El Gamal and A. Sangiovanni-Vincentelli, Architecture of PPGAs, Proceedings of the IEEE, 81, 1013-1029, 1993.
[14] W.A. Tackett and A. Carmi, Simple Genetic Programming in C, available through the genetic programming archive at ftp.io.com/pub/genetic-programming/code/sgpc1.1.tar.Z.
[15] M. Tomassini, A Survey of Genetic Algorithms, to appear in Annual Reviews of Computational Physics Vol. III, D. Stauffer Editor, World Scientific, 1996.
| ![]() REFER TO CONTENTS |
| ![]() |
|---|