Releases · HomebrewML/HeavyBall

08 Mar 16:49

ClashLuke

v1.6.3

388190f

Fixed SOAP, HVP PSGD Latest

Latest

Bugfixes:

@francois-rozet fixed a severe convergence regression in SOAP. It's now faster and converges better than before (#42)
ADOPT now correctly matches the paper, significantly improving its convergence
FP64 storage and/or computation now works for more optimizers

Improvements:

NewtonPSGD now supports exact HVP calculation instead of the previous approximate. (Handles BatchNorm better but doesn't support all architectures.)
"smart_one_diag" is a next-to-no-downsides memory_save_mode for PSGD. It reduces memory and compute cost compared to memory_save_mode=None and improves convergence compared to memory_save_mode="one_diag"*

*Instead of preconditioning all dimensions (memory_save_mode=None) or preconditioning all but the largest dimension (memory_save_mode="one_diag") we remove the largest dimension iff it's larger than the second largest. So, a Linear(128, 1024) will now create one 128x128 preconditioner (instead of 128x128 + 1024x1024, 8x as large as the parameters), while a Linear(128, 128) can still benefit from preconditioning both sides.

Contributors

francois-rozet

Assets 2

18 Jan 07:57

ClashLuke

v1.5.2

512ffd0

OrthoGrad & PSGD improvements

General
- precond_schedule matches its docs (@francois-rozet, #31)
- unified warmup_steps API (@francois-rozet, #32 )
- add eps arg to scale_by_adam (#33)
- allow external management of LR (for foreach=True optimizers)
OrthoGrad, a "grokking-first" optimizer that works
PSGD
- no more OOM in torch.linalg.solve
- speed up cache by skipping it when it wouldn't give speedups
- add newton-PSGD ("hvp-PSGD") using finite-difference approximation
- caution momentum, not update (-> improved convergence; closer to paper's intention)
Benchmarks
- grokking benchmark, using modular addition and wide MLPs

Contributors

francois-rozet

Assets 2

01 Jan 15:50

ClashLuke

v1.4.0

0519edb

Fix PSGD, spring cleaning

Previously, only the first parameter of PSGD was trained; This is fixed now
All PSGDs were PurePSGD - now momentum_into_precond_update and exp_avg_input have their expected effect again
preliminary support for external changes of group['lr']

Assets 2

18 Dec 17:54

ClashLuke

v1.3.0

9a20be2

v1.3.0

fixes: in 1.2.x (not 1.1.x), all optimizers were SGD; AdamW now runs AdamW again
heavyball.utils.disable_caution_scaling implements the behavior documented here
SOAP converges well again

Assets 2

15 Dec 19:01

ClashLuke

v1.2.0

afd848f

faster, less memory, minor fixes

LaProp/Adam/... are now compilable
fused_hook and hook_optimizer_into_model, reducing memory usage by fusing backward pass with optimizer step
fewer inplace ops, giving better compilations and cleaner code
scaling ("graft", "scale", "none") for Muon, allowing Adam#Muon at minimal cost
storage_dtype argument is implemented again
LaProp is correctly implemented, ADOPT is more stable
via @ethansmith2000: cleaner, more maintainable defaults, reducing the surface for potential errors

Contributors

ethansmith2000

Assets 2

08 Dec 22:54

ClashLuke

v1.1.0

a7791f0

Stability, Muon and Fixes

utils
- bugfixes impacting SFAdamW and RMSProp
- breaking: zeroth_power_method no longer supports eigh and doesn't allow specification of the number of newtonschulz iterations
- faster newtonschulz5 (via @tysam-code)
- PSGD preconditioner dampening (via @evanatyourservice)
chainable
- implementation of nesterov_momentum, heavyball_momentum and orthogonalize_update
core
- heavyball.Muon (by chaining nesterov_momentum and orthogonalize_update); Muon supports gradient and update clipping out of the box

Contributors

evanatyourservice and tysam-code

Assets 2

07 Dec 19:36

ClashLuke

v1.0.0

7bb74a1

v1.0.0

functional (optax-style) API and backend

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Contributors

Uh oh!

Contributors

Uh oh!

Uh oh!

Uh oh!

Contributors

Uh oh!

Contributors

Uh oh!

Uh oh!

Releases: HomebrewML/HeavyBall

Fixed SOAP, HVP PSGD

Contributors

Uh oh!

OrthoGrad & PSGD improvements

Contributors

Uh oh!

Fix PSGD, spring cleaning

Uh oh!

v1.3.0

Uh oh!

faster, less memory, minor fixes

Contributors

Uh oh!

Stability, Muon and Fixes

Contributors

Uh oh!

v1.0.0

Uh oh!