-
Notifications
You must be signed in to change notification settings - Fork 56
GSoC 2021: Evolutionary Algorithms in Deepbots
-
Project Description
i. Introduction
ii. PyGAD
iii. CartPole Problem
iv. Problems Faced - Completed Work
- Work to do
- Relevant Issues
- Future Work
- Weekly Updates (Timeline)
Mentors: Manos Kirtas, Kostas Tsampazis
Deepbots was originally developed as an easy interface between Reinforcement Learnign Algorithms as the simulator Webots. However, the current structure of Deepbots allows to easily extend the library to support evolutionary algorithms. The main aim of this project was to integrate the feature to support implementation of the Genetic Algorithm into Deepbots. In an evolutionary algorithm a population of agents are trained and mutated to solve a given task. At every episode the best agents are chosen to mutate in order to reach a good enough solution. We use pygad realise this integration into deepbots. The simulation of such an algorithm can be manifested in two forms:
- Single Robot Environtments - A single robot in webots can be used as proxy to play out the calculation of the fitness function of each gene present in the population. Thus, if the population size is 10, the fitness is calculated sequentially for each gene and for the calculation of the fitness function the agent interacts with the webots world which is played out in simulation
- Multi Robot Environments - Multiple robots can be used to parellelize the calculation of the fitness for each gene, i.e. if the population size is 10, the environment has an equal number of robots arranged in any particular fashion, each one of which is used as an agent interacting with the Webots world to calculate the fitness of one of the 10 genes, thus speeding up the operation by a factor of 10.
A SupervisorEvolutionary
class is added to Deepbots to support evolutionary algorithms. A simple EmitterReceiver Scheme can be used for the communication between the supervisor and the robot in case of single robot environments. For multi-robot environments, we use an extended version of the EmitterReceiver Scheme wherein the supervisor communicates with each robot present in the world using separate channels and separate emitter-receiver pairs.
PyGAD is a Genetic algorithms framework for python, which provides an array of different approaches made available through different levels of abstractions. A Genetic Algorithm can be instantiated as an instance of the pygad.GA class. The pygad.GA
has different methods which handle the different stages of the genetic algorithm namely:
- Population Initialisation
- Calculation of fitness of each solution
- Parent Selection
- Crossover
- Mutation
This class can further be supplied with the particular fitness function that the user wants to use and other parameters to specify crossover schemes, etc. This framework has been integrated into deepbots using a base class SupervisorEvolutionary
which acts as a wrapper to the pygad.GA
class.
The cartpole environment in deepworlds is an extension of the classic classic cartpole problem from OpenAI gym to the Webots simulator. Most of the work done during the project was done by prototyping the code on the CartPole environment and later generalizing it. The example has been implemented in two forms:
- CartPole with discrete action space
- CartPole with continuous action space
The goal of the cartpole environment is to keep the pole on the cartpole upright. This is done by taking actions which move the cartpole to left or right and providing a reward of 1 for every time-step the pole is upright.
-
The
pygad.GA.cal_pop_fitness()
function calculates the fitness for each solution in the population sequentially. This is not suitable for the multi robot implementation of the evolutionary algorithm since it doesn't allow parallelization of the calculation of the fitness function. To get around this, we plan to create a custom class which inherits from thepygad.GA
class and parallelizes the calculation of the fitness of each gene present in the population using an extended version of the EmitterReceiver scheme of communication between the supervisor and the robots present in the Webots world. -
The
pygad.GA.run()
method calls thecal_pop_fitness
function twice which in turn calls thefitness_func()
once. Thus thefitness_func
which is responsible for calculating the fitness of each gene and also for interfacing the genetic algorithm with the simulation running in Webots is called twice per generation and hence, the cartpole in the environment runs twice per gene which is unnecessary. We plan to go around this by re-defining therun
function in the custom class mentioned above such that it calculates the fitness of each gene only once per generation. -
The EmitterReceiver scheme by default used just one channel of communication between the supervisor and the robot present in the environment. This scheme has been developed keeping in mind a single robot case and breaks in case of multiple robots present in the environment since there's a problem of overlap of messages between last step and the current step as described here. We get around this by creating separate channels (i.e. separate emitter - receiver pairs for each robot).
The following work was done during the course of Google Summer of Code 2021
This PR adds a github workflow to deepbots which generates / updates a changelog everytime a new commit is pushed
The muti-cartpole webots environment consists of 9 robots arranged in a 3x3 grid world. Each robot has an individual emitter-receiver pair and communicates with the supervisor using a separate individual channels. This environment can facilitate prototyping of multi-agent reinforcement learning algorithms in webots, parallelization of the genetic algorithm, etc. among many other applications. In the example given below, each robot is being trained using Proximal Policy Optimization
Training GIF:
This PR integrates the base class for implementing all the evolutionary algorithms. The model is initialized with an input of the model to be trained (the SupervisorEvolutionary
class supports PyTorch models). The class has cuda support for the PyTorch models and also facilitates logging via Weights and Biases
The cartpole evolutionary example implements the training of a single cartpole robot using the genetic algorithm. The algorithm is trained for 75 generations and the performance is as given below.
Training GIF:
- Extending the
SupervisorEvolutionary
class to support multiple robots - Add multi-carpole evolutionary example to deepworlds
Future work can involve building other webots worlds supporting Evolutionary Algorithms
-
19/05/21 - Initial Meeting
- Check the changelog PR
- Go through resources for Genetic Algorithms and PyGAD
- Look into spawning multiple robot grids in Webots
- Open a tracker issue
-
02/06/21 - Catch up Meeting
- Check the emitter-receiver scheme for multiple robots
- Create a webots world with two cartpoles
-
09/06/21 - Catch up Meeting
- Scale the multi-cartpole example to 9 robots present in a 3 x 3 grid
- Look into PyGAD
- Run a simplified cartpole example using pygad
-
18/06/21 - Catch up Meeting
- Look into using pygad to train neural networks - decide the best option out of TorchGA and KeraGA
- Create a single cartpole example for training a Neural Network using pygad (genetic algorithm)
-
30/06/21 - Catch up Meeting
- Look into the custom data method for communication in Webots
- Implement the class for supporting the training of a single cartpole using GA
- Basic API for the
SupervisorEvolutionary
class
SupervisorEvolutionary(extend SupervisorEnv)
| --- init()
| --- env
| --- methods and attributes for interfacing with webots
| --- type of model (take argument as string / model)
| | --- mlp, recurrent, etc.
| | --- layer sizes
| | --- activations
| --- loss function (take argument as string / class)
| --- kwargs for pygad.GA
| --- utilities
| | --- plotting
| | --- logging and debugging
| | --- tensorboard / wandb integration
-
07/07/21 - Catch up Meeting
- Discussion about the communication problem in
EmitterReceiver
scheme in case of multiple robots
- Discussion about the communication problem in
-
14/07/21 - Catch up Meeting
- Debugging the multi-cartpole example
- Discussion about the
SupervisorEvolutionary
class
-
28/07/21 - Catch up Meeting
- Debugging the single cartpole evolutionary example.
- Analysing the performance of the cartpole evolutionary example for different hyperparameters.
- Discussion about the problem of the simulation running twice per gene.
-
17/08/21 - Catch up Meeting
- Come up with a solution for the problem of the simulation running twice per generation
- Come up with an approach for parallelizing the fitness calculation in the multi-cartpole evolutionary example
- Final Report discussion