Skip to content

Releases: inikishev/torchzero

0.3.11

14 Jul 17:51
Compare
Choose a tag to compare

Additions

  • New optimizers: Adan, AdaHessian, ESGD, MARSCorrection.
  • Sharpness-aware minimization: SAM, ASAM, MSAM and MSAMObjective.
  • a steihaug_toint_cg solver for trust regions, and two modules that use it.
  • TruncatedNewtonCG - matrix-free trust region newton's method that uses the new solver.
  • TrustCG trust region that uses the new solver, this can use the hessian approximation generated by any of the full matrix and diagonal quasi-newton modules. TruncatedNewtonCG and TrustCG may be merged in the future once I figure out a better API for trust region.
  • Added a MINRES solver which can be selected in NewtonCG, it may work better when hessian is not positive definite.
  • Added a lot of diagonal quasi-newton methods.
  • Added an Online module which can be used to make any quasi-newton method online by sampling gradient differences from the same batch. OnlineLBFGS has been removed because it is now redundant and can be replaced with Online(LBFGS()).
  • Other new modules: ShorR, BarzilaiBorwein, AdaptiveHeavyBall, HigherOrderNewton, InverseFreeNewton, CubicRegularization, and few misc and operation modules.
  • A lot of wrappers for external libraries - directsearch, fcmaes, mads, optuna in addition to the current scipy, nlopt and nevergrad.

Changes:

  • All norm-related modules like ClipNorm and Normalize now can now use mean absolute value if order="mean_abs"
  • RFDM with formulas that do not use x_0 now calculates loss at x_0 when necessary (e.g. for descent condition when using line search) , previously it returned loss at perturbed point.
  • Projection no longer has separate states and settings for projected tensors. Additionally states for different projection targets (parameters, update and gradients) are now separate.
  • Limited-memory adagrad has been reworked to be a much more efficient based on this https://arxiv.org/abs/1806.02958.
  • Quasi-newton methods have ptol and gtol to skip updates when parameter or gradient differences are extremely small. ptol_reset resets the state when ptol tolerance is not satisfied, it is off by default but may help when trust region is not used.
  • LBFGS and LSR1 are now Transform subclasses like the rest of QN methods, and can be updated multiple times per step.

0.3.10

16 Jun 13:36
Compare
Choose a tag to compare

maybe last release before actually releasing this?

Transform and Preconditioner have been merged into a single class. Transform now has update_tensors and apply_tensors methods that accept a list of state dictionaries and setting dictionaries for each tensor. Those are usually the optimizer per-parameter states, but anything else can be used now.

A lot of quasi-newton methods have been added.

0.3.9

25 May 16:36
Compare
Choose a tag to compare

Initial gradient descent step size in quasi-newton methods now won't be smaller than epsilon to avoid performing no step at all.

0.3.8

25 May 15:23
Compare
Choose a tag to compare

Quasi-newton apply now has no side effects. That means that update can be called multiple times before applying, for example to increase the rank of the update by sampling more directions.
Full Changelog: 0.3.6...0.3.8

0.3.6

23 May 16:32
Compare
Choose a tag to compare
WHAT DO I PUT THERE!!!

0.3.5

23 May 16:25
Compare
Choose a tag to compare
what do i put there.

0.3.4

23 May 15:27
Compare
Choose a tag to compare
what am I supposed to put there???

0.3.3

23 May 15:19
Compare
Choose a tag to compare
please, github

0.3.2

23 May 15:17
Compare
Choose a tag to compare
0.3.2

0.1.8

10 Feb 15:14
Compare
Choose a tag to compare
state_dict support