Releases · inikishev/torchzero

Additions

New optimizers: Adan, AdaHessian, ESGD, MARSCorrection.
Sharpness-aware minimization: SAM, ASAM, MSAM and MSAMObjective.
a steihaug_toint_cg solver for trust regions, and two modules that use it.
TruncatedNewtonCG - matrix-free trust region newton's method that uses the new solver.
TrustCG trust region that uses the new solver, this can use the hessian approximation generated by any of the full matrix and diagonal quasi-newton modules. TruncatedNewtonCG and TrustCG may be merged in the future once I figure out a better API for trust region.
Added a MINRES solver which can be selected in NewtonCG, it may work better when hessian is not positive definite.
Added a lot of diagonal quasi-newton methods.
Added an Online module which can be used to make any quasi-newton method online by sampling gradient differences from the same batch. OnlineLBFGS has been removed because it is now redundant and can be replaced with Online(LBFGS()).
Other new modules: ShorR, BarzilaiBorwein, AdaptiveHeavyBall, HigherOrderNewton, InverseFreeNewton, CubicRegularization, and few misc and operation modules.
A lot of wrappers for external libraries - directsearch, fcmaes, mads, optuna in addition to the current scipy, nlopt and nevergrad.

Changes:

All norm-related modules like ClipNorm and Normalize now can now use mean absolute value if order="mean_abs"
RFDM with formulas that do not use x_0 now calculates loss at x_0 when necessary (e.g. for descent condition when using line search) , previously it returned loss at perturbed point.
Projection no longer has separate states and settings for projected tensors. Additionally states for different projection targets (parameters, update and gradients) are now separate.
Limited-memory adagrad has been reworked to be a much more efficient based on this https://arxiv.org/abs/1806.02958.
Quasi-newton methods have ptol and gtol to skip updates when parameter or gradient differences are extremely small. ptol_reset resets the state when ptol tolerance is not satisfied, it is off by default but may help when trust region is not used.
LBFGS and LSR1 are now Transform subclasses like the rest of QN methods, and can be updated multiple times per step.

Provide feedback