-
Notifications
You must be signed in to change notification settings - Fork 26
Open
Description
When I am using ArcticTraining, I got confused by the iteration summary
- It is confusing on the total step/iter number, where currently it is
self.config.epochs * len(self.train_dataloader)but when using withself.config.gradient_accumulation_steps, it will scale up by the factor. For example, if usingself.config.gradient_accumulation_steps=4and 100 global step/iters, it will always be xx/400, which is confusing. I wonder any special reason for current design.
https://github.com/snowflakedb/ArcticTraining/blob/main/arctic_training/trainer/trainer.py#L346-L354
- It is better to add an remaining training ETA, which makes it more friendly for long training run.
Metadata
Metadata
Assignees
Labels
No labels