Skip to content

GSoC 2021: Evolutionary Algorithms in Deepbots

Vedant Shah edited this page Aug 23, 2021 · 7 revisions

Table of Contents

  1. Project Description
    i. Introduction
    ii. PyGAD
    iii. CartPole Problem
    iv. Problems Faced
  2. Completed Work
  3. Work to do
  4. Relevant Issues
  5. Future Work
  6. Weekly Updates (Timeline)

Project Description

Mentors: Manos Kirtas, Kostas Tsampazis

Introduction

Deepbots was originally developed as an easy interface between Reinforcement Learnign Algorithms as the simulator Webots. However, the current structure of Deepbots allows to easily extend the library to support evolutionary algorithms. The main aim of this project was to integrate the feature to support implementation of the Genetic Algorithm into Deepbots. In an evolutionary algorithm a population of agents are trained and mutated to solve a given task. At every episode the best agents are chosen to mutate in order to reach a good enough solution. We use pygad realise this integration into deepbots. The simulation of such an algorithm can be manifested in two forms:

  • Single Robot Environtments - A single robot in webots can be used as proxy to play out the calculation of the fitness function of each gene present in the population. Thus, if the population size is 10, the fitness is calculated sequentially for each gene and for the calculation of the fitness function the agent interacts with the webots world which is played out in simulation
  • Multi Robot Environments - Multiple robots can be used to parellelize the calculation of the fitness for each gene, i.e. if the population size is 10, the environment has an equal number of robots arranged in any particular fashion, each one of which is used as an agent interacting with the Webots world to calculate the fitness of one of the 10 genes, thus speeding up the operation by a factor of 10.

A SupervisorEvolutionary class is added to Deepbots to support evolutionary algorithms. A simple EmitterReceiver Scheme can be used for the communication between the supervisor and the robot in case of single robot environments. For multi-robot environments, we use an extended version of the EmitterReceiver Scheme wherein the supervisor communicates with each robot present in the world using separate channels and separate emitter-receiver pairs.

PyGAD

PyGAD is a Genetic algorithms framework for python, which provides an array of different approaches made available through different levels of abstractions. A Genetic Algorithm can be instantiated as an instance of the pygad.GA class. The pygad.GA has different methods which handle the different stages of the genetic algorithm namely:

  • Population Initialisation
  • Calculation of fitness of each solution
  • Parent Selection
  • Crossover
  • Mutation

This class can further be supplied with the particular fitness function that the user wants to use and other parameters to specify crossover schemes, etc. This framework has been integrated into deepbots using a base class SupervisorEvolutionary which acts as a wrapper to the pygad.GA class.

The CartPole problem

The cartpole environment in deepworlds is an extension of the classic classic cartpole problem from OpenAI gym to the Webots simulator. Most of the work done during the project was done by prototyping the code on the CartPole environment and later generalizing it. The example has been implemented in two forms:

  • CartPole with discrete action space
  • CartPole with continuous action space

The goal of the cartpole environment is to keep the pole on the cartpole upright. This is done by taking actions which move the cartpole to left or right and providing a reward of 1 for every time-step the pole is upright.

Problems Faced

  • The pygad.GA.cal_pop_fitness() function calculates the fitness for each solution in the population sequentially. This is not suitable for the multi robot implementation of the evolutionary algorithm since it doesn't allow parallelization of the calculation of the fitness function. To get around this, we plan to create a custom class which inherits from the pygad.GA class and parallelizes the calculation of the fitness of each gene present in the population using an extended version of the EmitterReceiver scheme of communication between the supervisor and the robots present in the Webots world.

  • The pygad.GA.run() method calls the cal_pop_fitness function twice which in turn calls the fitness_func() once. Thus the fitness_func which is responsible for calculating the fitness of each gene and also for interfacing the genetic algorithm with the simulation running in Webots is called twice per generation and hence, the cartpole in the environment runs twice per gene which is unnecessary. We plan to go around this by re-defining the run function in the custom class mentioned above such that it calculates the fitness of each gene only once per generation.

  • The EmitterReceiver scheme by default used just one channel of communication between the supervisor and the robot present in the environment. This scheme has been developed keeping in mind a single robot case and breaks in case of multiple robots present in the environment since there's a problem of overlap of messages between last step and the current step as described here. We get around this by creating separate channels (i.e. separate emitter - receiver pairs for each robot).

Completed Work

The following work was done during the course of Google Summer of Code 2021

Adding Changelog to Deepbots

This PR adds a github workflow to deepbots which generates / updates a changelog everytime a new commit is pushed

Adding a multi-cartpole environment to deepworlds

The muti-cartpole webots environment consists of 9 robots arranged in a 3x3 grid world. Each robot has an individual emitter-receiver pair and communicates with the supervisor using a separate individual channels. This environment can facilitate prototyping of multi-agent reinforcement learning algorithms in webots, parallelization of the genetic algorithm, etc. among many other applications. In the example given below, each robot is being trained using Proximal Policy Optimization

Training GIF:
multi-cartpole training

Adding SupervisorEvolutionary class to deepbots

This PR integrates the base class for implementing all the evolutionary algorithms. The model is initialized with an input of the model to be trained (the SupervisorEvolutionary class supports PyTorch models). The class has cuda support for the PyTorch models and also facilitates logging via Weights and Biases

Deepworlds environment for evolutionary algorithm using single cartpole

The cartpole evolutionary example implements the training of a single cartpole robot using the genetic algorithm. The algorithm is trained for 75 generations and the performance is as given below.

Best Fitness vs Generation

Training GIF:
Training

Work to do

  • Extending the SupervisorEvolutionary class to support multiple robots
  • Add multi-carpole evolutionary example to deepworlds

Relevant Issues

Future Work

Future work can involve building other webots worlds supporting Evolutionary Algorithms

Weekly Updates (Timeline)

  • 19/05/21 - Initial Meeting

    • Check the changelog PR
    • Go through resources for Genetic Algorithms and PyGAD
    • Look into spawning multiple robot grids in Webots
    • Open a tracker issue
  • 02/06/21 - Catch up Meeting

    • Check the emitter-receiver scheme for multiple robots
    • Create a webots world with two cartpoles
  • 09/06/21 - Catch up Meeting

    • Scale the multi-cartpole example to 9 robots present in a 3 x 3 grid
    • Look into PyGAD
    • Run a simplified cartpole example using pygad
  • 18/06/21 - Catch up Meeting

    • Look into using pygad to train neural networks - decide the best option out of TorchGA and KeraGA
    • Create a single cartpole example for training a Neural Network using pygad (genetic algorithm)
  • 30/06/21 - Catch up Meeting

    • Look into the custom data method for communication in Webots
    • Implement the class for supporting the training of a single cartpole using GA
    • Basic API for the SupervisorEvolutionary class
SupervisorEvolutionary(extend SupervisorEnv)
| --- init()
| --- env
| --- methods and attributes for interfacing with webots
| --- type of model (take argument as string / model)
|      | --- mlp, recurrent, etc.
|      | --- layer sizes
|      | --- activations
| --- loss function (take argument as string / class)
| --- kwargs for pygad.GA
| --- utilities
|      | --- plotting
|      | --- logging and debugging
|      | --- tensorboard / wandb integration
  • 07/07/21 - Catch up Meeting

    • Discussion about the communication problem in EmitterReceiver scheme in case of multiple robots
  • 14/07/21 - Catch up Meeting

    • Debugging the multi-cartpole example
    • Discussion about the SupervisorEvolutionary class
  • 28/07/21 - Catch up Meeting

    • Debugging the single cartpole evolutionary example.
    • Analysing the performance of the cartpole evolutionary example for different hyperparameters.
    • Discussion about the problem of the simulation running twice per gene.
  • 17/08/21 - Catch up Meeting

    • Come up with a solution for the problem of the simulation running twice per generation
    • Come up with an approach for parallelizing the fitness calculation in the multi-cartpole evolutionary example
    • Final Report discussion