Open
Description
System Info
NA
Reproduction
Argument optim_bits
and amsgrad
is not used for 8-bit Adam
The document for optim_bits
is also wrong in this case, because we are using 8 bits here.
I'm thinking of 2 ways to solve this:
- Remove
optim_bits
andamsgrad
arguments (actually even 32-bit Adam also doesn't useamsgrad
argument) - If we want to keep the function signature the same, there should be a check that
optim_bits == 8
andamsgrad == False
Expected behavior
NA