Releases: inikishev/torchzero
0.3.11
Additions
- New optimizers:
Adan
,AdaHessian
,ESGD
,MARSCorrection
. - Sharpness-aware minimization:
SAM
,ASAM
,MSAM
andMSAMObjective
. - a
steihaug_toint_cg
solver for trust regions, and two modules that use it. TruncatedNewtonCG
- matrix-free trust region newton's method that uses the new solver.TrustCG
trust region that uses the new solver, this can use the hessian approximation generated by any of the full matrix and diagonal quasi-newton modules.TruncatedNewtonCG
andTrustCG
may be merged in the future once I figure out a better API for trust region.- Added a MINRES solver which can be selected in NewtonCG, it may work better when hessian is not positive definite.
- Added a lot of diagonal quasi-newton methods.
- Added an
Online
module which can be used to make any quasi-newton method online by sampling gradient differences from the same batch.OnlineLBFGS
has been removed because it is now redundant and can be replaced withOnline(LBFGS())
. - Other new modules:
ShorR
,BarzilaiBorwein
,AdaptiveHeavyBall
,HigherOrderNewton
,InverseFreeNewton
,CubicRegularization
, and few misc and operation modules. - A lot of wrappers for external libraries - directsearch, fcmaes, mads, optuna in addition to the current scipy, nlopt and nevergrad.
Changes:
- All norm-related modules like
ClipNorm
andNormalize
now can now use mean absolute value iforder="mean_abs"
RFDM
with formulas that do not usex_0
now calculates loss atx_0
when necessary (e.g. for descent condition when using line search) , previously it returned loss at perturbed point.- Projection no longer has separate states and settings for projected tensors. Additionally states for different projection targets (parameters, update and gradients) are now separate.
- Limited-memory adagrad has been reworked to be a much more efficient based on this https://arxiv.org/abs/1806.02958.
- Quasi-newton methods have
ptol
andgtol
to skip updates when parameter or gradient differences are extremely small.ptol_reset
resets the state whenptol
tolerance is not satisfied, it is off by default but may help when trust region is not used. LBFGS
andLSR1
are nowTransform
subclasses like the rest of QN methods, and can be updated multiple times per step.
0.3.10
maybe last release before actually releasing this?
Transform and Preconditioner have been merged into a single class. Transform now has update_tensors
and apply_tensors
methods that accept a list of state dictionaries and setting dictionaries for each tensor. Those are usually the optimizer per-parameter states, but anything else can be used now.
A lot of quasi-newton methods have been added.
0.3.9
Initial gradient descent step size in quasi-newton methods now won't be smaller than epsilon to avoid performing no step at all.
0.3.8
Quasi-newton apply
now has no side effects. That means that update
can be called multiple times before applying, for example to increase the rank of the update by sampling more directions.
Full Changelog: 0.3.6...0.3.8
0.3.6
WHAT DO I PUT THERE!!!
0.3.5
what do i put there.
0.3.4
what am I supposed to put there???
0.3.3
please, github
0.3.2
0.3.2
0.1.8
state_dict support