Skip to content

Training fails when the last batch has only one sample. #15

@glicerico

Description

@glicerico

If the reminder of the size of the training/validation/test over the batch size is 1.
In my usecase, the validation set has 18753 elements, so using a batch size of 16 leaves only one element in the lsat batch, and the following error occurs:

Epoch 0: 100%|█████████▉| 12208/12209 [1:55:15<00:00,  1.77it/s, loss=1.960, v_num=gsq1]Traceback (most recent call last):                                                   
  File "main.py", line 53, in <module> [04:24<00:00,  4.44it/s]                                                                                                              
    trainer.fit(model)                                                                                                                                                       
  File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 444, in fit                                          
    results = self.accelerator_backend.train()                                                                                                                               
  File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py", line 63, in train                            
    results = self.train_or_test()                                                                                                                                           
  File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 74, in train_or_test                        
    results = self.trainer.train()                                                                                                                                           
  File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 493, in train                                        
    self.train_loop.run_training_epoch()           
  File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 589, in run_training_epoch
    self.trainer.run_evaluation(test_mode=False)                                       
  File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 578, in run_evaluation
    output = self.evaluation_loop.evaluation_step(test_mode, batch, batch_idx, dataloader_idx)         
  File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 171, in evaluation_step
    output = self.trainer.accelerator_backend.validation_step(args)                                                                                                 [52/1614]
  File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py", line 87, in validation_step
    output = self.__validation_step(args)
  File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py", line 95, in __validation_step
    output = self.trainer.model.validation_step(*args)
  File "/root/CASA-Dialogue-Act-Classifier/Trainer.py", line 82, in validation_step
    loss = F.cross_entropy(logits, targets)
  File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/torch/nn/functional.py", line 2468, in cross_entropy
    return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
  File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/torch/nn/functional.py", line 2260, in nll_loss
    if input.size(0) != target.size(0):
IndexError: dimension specified as 0 but tensor has no dimensions

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions