-
Notifications
You must be signed in to change notification settings - Fork 146
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
Error when starting model training from checkpoint in Coqui TTS
When saved as a checkpoint for later training, the last training and eval losses are saved as in dict. When training from scratch, the last training loss is saved as a float. Hence, starting from a checkpoint doesn't run the code properly
To Reproduce
- Train a model in Coqui TTS using trainer
- Once a checkpoint for best model is saved, stop the training
- Set the checkpoint folder as continue path in the trainer class
- Restart from the checkpoint
https://colab.research.google.com/drive/1OwemROn306_JIYASjx39d52eXFHS1O_u
Expected behavior
The training should stop
Logs
Traceback (most recent call last):
File "/mnt/Work/anaconda3/envs/tts-env/lib/python3.10/site-packages/trainer/trainer.py", line 1808, in fit
self._fit()
File "/mnt/Work/anaconda3/envs/tts-env/lib/python3.10/site-packages/trainer/trainer.py", line 1771, in _fit
self.save_best_model()
File "/mnt/Work/anaconda3/envs/tts-env/lib/python3.10/site-packages/trainer/utils/distributed.py", line 35, in wrapped_fn
return fn(*args, **kwargs)
File "/mnt/Work/anaconda3/envs/tts-env/lib/python3.10/site-packages/trainer/trainer.py", line 1893, in save_best_model
self.best_loss = save_best_model(
File "/mnt/Work/anaconda3/envs/tts-env/lib/python3.10/site-packages/trainer/io.py", line 183, in save_best_model
if current_loss < best_loss:
TypeError: '<' not supported between instances of 'float' and 'dict'
Environment
-torch: 2.1.0
-trainer: 0.0.31
-python: 3.10
-OS: Endeavor OS
-cuda: cuda_12.2.r12.2
-GPU: NVIDIA RTX 3060
-pytorch installation: pip
Additional context
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working