Skip to content

Alvtron/PopulationBasedTrainingWithDifferentialEvolution

Repository files navigation

Improving Population-Based Training (PBT) for Neural Networks

In recent years, there has been a rise in complex and computationally expensive machine learning systems with many hyperparameters, such as deep convolutional neural networks. Historically, hyperparameters have provided humans a level of control over the learning process and the performance of the model, and it is well known that different problems require different hyperparameter configurations in order to obtain a good model. However, finding good configurations can be quite challenging, and complexity arises when more hyperparameters are introduced. In addition, research has shown that some tasks may benefit from certain hyperparameter schedules, suggesting that there exists an optimal hyperparameter configuration for every training step.

The issue has resulted in a great deal of research on automated hyperparameter optimization, but there is still a lack of purpose-built methods for generating hyperparameter schedules as they are much harder to estimate. Evolutionary approaches such as Population-Based Training (PBT) has shown great success in estimating good hyperparameter schedules, successfully demonstrating that it is capable of training the network and optimizing the hyperparameters at the same time. However, the method relies on simple selection and perturbation techniques when exploring new hyperparameters, and does not consider other, more advanced optimization methods.

In this thesis, we set out to improve upon the PBT method by incorporating heuristics from the powerful, versatile and evolutionary optimizer called Differential Evolution (DE). As a result, three procedures are proposed, named PBT-DE, PBT-SHADE and PBT-LSHADE. Of the proposed procedures, PBT-DE incorporates the initially proposed heuristics, while PBT SHADE and PBT-LSHADE explores more recent and adaptive extensions based of SHADE and LSHADE. In addition, a purpose-built distributed queuing system is used for processing members per generation, and allows for asynchronous parallel training and adaption of members.

In order to assess the predictive performance, PBT and the proposed procedures were compared on the MNIST and Fashion-MNIST datasets using the MLP and LeNet-5 network architectures. The empirical results demonstrate that there is a statistical significant difference between PBT and the proposed procedures on the F1 score. Visual and statistical analysis suggest that each proposed procedure outperform PBT on all tested cases. The result data on the Fashion-MNIST dataset indicate a 0.748% and 0.384% gain on average accuracy with MLP and LeNet-5, respectively, when trained with PBT-LSHADE. On the MNIST dataset, result data shows that PBT-SHADE improved average accuracy with 0.21% using MLP, and PBT-DE improved average accuracy with 0.068% with LeNet-5. Furthermore, additional testing suggest that PBT-SHADE scales better with larger population sizes when compared to PBT.

References

Improving Population-Based Training for Neural Networks - Master Thesis @ Østfold University College Archives Population Based Training of Neural Networks, Jaderberg et al. @ DeepMind

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages