GSoC 2021: Evolutionary Algorithms in Deepbots

Project Description

Introduction

Deepbots was originally developed as an easy interface between Reinforcement Learnign Algorithms as the simulator Webots. However, the current structure of Deepbots allows to easily extend the library to support evolutionary algorithms. The main aim of this project was to integrate the feature to support implementation of the Genetic Algorithm into Deepbots. In an evolutionary algorithm a population of agents are trained and mutated to solve a given task. At every episode the best agents are chosen to mutate in order to reach a good enough solution. We use pygad realise this integration into deepbots. The simulation of such an algorithm can be manifested in two forms:

Single Robot Environtments - A single robot in webots can be used as proxy to play out the calculation of the fitness function of each gene present in the population. Thus, if the population size is 10, the fitness is calculated sequentially for each gene and for the calculation of the fitness function the agent interacts with the webots world which is played out in simulation
Multi Robot Environments - Multiple robots can be used to parellelize the calculation of the fitness for each gene, i.e. if the population size is 10, the environment has an equal number of robots arranged in any particular fashion, each one of which is used as an agent interacting with the Webots world to calculate the fitness of one of the 10 genes, thus speeding up the operation by a factor of 10.

A SupervisorEvolutionary class is added to Deepbots to support evolutionary algorithms. A simple EmitterReceiver Scheme can be used for the communication between the supervisor and the robot in case of single robot environments. For multi-robot environments, we use an extended version of the EmitterReceiver Scheme wherein the supervisor communicates with each robot present in the world using separate channels and separate emitter-receiver pairs.

PyGAD

PyGAD is a Genetic algorithms framework for python, which provides an array of different approaches made available through different levels of abstractions. A Genetic Algorithm can be instantiated as an instance of the pygad.GA class. The pygad.GA has different methods which handle the different stages of the genetic algorithm namely:

Population Initialisation
Calculation of fitness of each solution
Parent Selection
Crossover
Mutation

This class can further be supplied with the particular fitness function that the user wants to use and other parameters to specify crossover schemes, etc. This framework has been integrated into deepbots using a base class SupervisorEvolutionary which acts as a wrapper to the pygad.GA class.

The CartPole problem

The cartpole environment in deepworlds is an extension of the classic classic cartpole problem from OpenAI gym to the Webots simulator. Most of the work done during the project was done by prototyping the code on the CartPole environment and later generalizing it. The example has been implemented in two forms:

CartPole with discrete action space
CartPole with continuous action space

The goal of the cartpole environment is to keep the pole on the cartpole upright. This is done by taking actions which move the cartpole to left or right and providing a reward of 1 for every time-step the pole is upright.

Problems Faced

The pygad.GA.cal_pop_fitness() function calculates the fitness for each solution in the population sequentially. This is not suitable for the multi robot implementation of the evolutionary algorithm since it doesn't allow parallelization of the calculation of the fitness function. To get around this, we plan to create a custom class which inherits from the pygad.GA class and parallelizes the calculation of the fitness of each gene present in the population using an extended version of the EmitterReceiver scheme of communication between the supervisor and the robots present in the Webots world.
The pygad.GA.run() method calls the cal_pop_fitness function twice which in turn calls the fitness_func() once. Thus the fitness_func which is responsible for calculating the fitness of each gene and also for interfacing the genetic algorithm with the simulation running in Webots is called twice per generation and hence, the cartpole in the environment runs twice per gene which is unnecessary. We plan to go around this by re-defining the run function in the custom class mentioned above such that it calculates the fitness of each gene only once per generation.
The EmitterReceiver scheme by default used just one channel of communication between the supervisor and the robot present in the environment. This scheme has been developed keeping in mind a single robot case and breaks in case of multiple robots present in the environment since there's a problem of overlap of messages between last step and the current step as described here. We get around this by creating separate channels (i.e. separate emitter - receiver pairs for each robot).

Completed Work

The following work was done during the course of Google Summer of Code 2021

Adding Changelog to Deepbots

This PR adds a github workflow to deepbots which generates / updates a changelog everytime a new commit is pushed

PR #87 to deepbots - added changelog

Adding a multi-cartpole environment to deepworlds

The muti-cartpole webots environment consists of 9 robots arranged in a 3x3 grid world. Each robot has an individual emitter-receiver pair and communicates with the supervisor using a separate individual channels. This environment can facilitate prototyping of multi-agent reinforcement learning algorithms in webots, parallelization of the genetic algorithm, etc. among many other applications. In the example given below, each robot is being trained using Proximal Policy Optimization

PR #48 to deepworlds - Added multi-cartpole example

Training GIF:
multi-cartpole training

Adding SupervisorEvolutionary class to deepbots

This PR integrates the base class for implementing all the evolutionary algorithms. The model is initialized with an input of the model to be trained (the SupervisorEvolutionary class supports PyTorch models). The class has cuda support for the PyTorch models and also facilitates logging via Weights and Biases

PR #93 to deepbots - Added SupervisorEvolutinary class

Deepworlds environment for evolutionary algorithm using single cartpole

PR #46 to deepworlds - Added Cartpole Evolutionary example

The cartpole evolutionary example implements the training of a single cartpole robot using the genetic algorithm. The algorithm is trained for 75 generations and the performance is as given below.

Best Fitness vs Generation

Training GIF:

Work to do

Extending the SupervisorEvolutionary class to support multiple robots
Add multi-carpole evolutionary example to deepworlds

Relevant Issues

Future Work

Future work can involve building other webots worlds supporting Evolutionary Algorithms

Weekly Updates (Timeline)

19/05/21 - Initial Meeting
- Check the changelog PR
- Go through resources for Genetic Algorithms and PyGAD
- Look into spawning multiple robot grids in Webots
- Open a tracker issue
02/06/21 - Catch up Meeting
- Check the emitter-receiver scheme for multiple robots
- Create a webots world with two cartpoles
09/06/21 - Catch up Meeting
- Scale the multi-cartpole example to 9 robots present in a 3 x 3 grid
- Look into PyGAD
- Run a simplified cartpole example using pygad
18/06/21 - Catch up Meeting
- Look into using pygad to train neural networks - decide the best option out of TorchGA and KeraGA
- Create a single cartpole example for training a Neural Network using pygad (genetic algorithm)
30/06/21 - Catch up Meeting
- Look into the custom data method for communication in Webots
- Implement the class for supporting the training of a single cartpole using GA
- Basic API for the SupervisorEvolutionary class

SupervisorEvolutionary(extend SupervisorEnv)
| --- init()
| --- env
| --- methods and attributes for interfacing with webots
| --- type of model (take argument as string / model)
|      | --- mlp, recurrent, etc.
|      | --- layer sizes
|      | --- activations
| --- loss function (take argument as string / class)
| --- kwargs for pygad.GA
| --- utilities
|      | --- plotting
|      | --- logging and debugging
|      | --- tensorboard / wandb integration

07/07/21 - Catch up Meeting
- Discussion about the communication problem in EmitterReceiver scheme in case of multiple robots
14/07/21 - Catch up Meeting
- Debugging the multi-cartpole example
- Discussion about the SupervisorEvolutionary class
28/07/21 - Catch up Meeting
- Debugging the single cartpole evolutionary example.
- Analysing the performance of the cartpole evolutionary example for different hyperparameters.
- Discussion about the problem of the simulation running twice per gene.
17/08/21 - Catch up Meeting
- Come up with a solution for the problem of the simulation running twice per generation
- Come up with an approach for parallelizing the fitness calculation in the multi-cartpole evolutionary example
- Final Report discussion

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GSoC 2021: Evolutionary Algorithms in Deepbots

Table of Contents

Project Description

Introduction

PyGAD

The CartPole problem

Problems Faced

Completed Work

Adding Changelog to Deepbots

Adding a multi-cartpole environment to deepworlds

Adding SupervisorEvolutionary class to deepbots

Deepworlds environment for evolutionary algorithm using single cartpole

Work to do

Relevant Issues

Future Work

Weekly Updates (Timeline)

Uh oh!

Uh oh!

Clone this wiki locally