Potential solution to some of our hyperparameter tuning problems

[This](https://github.com/kach/gradient-descent-the-ultimate-optimizer) repository and paper introduces a new way of optimizing hyperparameters using SGD. With this, training becomes far less sensitive to the choice of hyperparameters.
Their package is new, and needs to be tested, so this issue has a very low priority.