Issue Description
When starting training with the current configuration, the evaluation results at step 0 show significant fluctuations. This makes the initial evaluation unreliable and affects the reproducibility of experiments.
Proposed Solution
Enable validation before training starts by setting the following in train_XXX.yaml:
trainer:
val_before_train: True
Experimental Results
After applying the fix, evaluation results on gsm8k-eval at step 0 became much more stable (see figure below):
