Describe the bug
DeepSpeedZeroOptimizer_Stage3 and SuperOffloadOptimizer_Stage3 shares same parameter list, which would cause divergence easily
** Details **
In
|
Stage3ZeroOptimizer = DeepSpeedZeroOptimizer_Stage3 if not self.super_offload( |
,
DeepSpeedZeroOptimizer_Stage3 and
SuperOffloadOptimizer_Stage3 initializer shares same parameter list. This caused extra maintence if any one of these parameter list needs to change. There are two observations:
- There is already mismatch (i.e.
param_names) and this will break SuperOffload.
cpuadam_cores_perc added to DeepSpeedZeroOptimizer_Stage3 as parameter but not used.
** Suggestion **
Seperate calls to DeepSpeedZeroOptimizer_Stage3 and SuperOffloadOptimizer_Stage3.