Ran into this error when custom training - > RuntimeError: CUDA error: device-side assert triggered #1839

johnk2hawaii · 2025-05-13T16:19:35Z

I understand this has been an issue in the past but I have not found any cure all solution to it so I thought I would raise it again. I am trying to transfer train on some custom data and I ran into an error. Note that I am using pycharm in a virtual environment and I have cuda version 12.4 , python version 3.9.7 and torch version 2.6.0+cu124 and my I have a NVIDIA GeForce GTX 1660 gpu. The operating system is Windows. I have tested to see if my gpu is available and it seems to be functioning. After running the command below I get the following error after the intial setups go smoothly:

python tools/train.py -f exps/example/custom/yolox_s.py -d 1 -b 8 --fp16 -o -c yolox_s.pth

_2025-05-13 18:08:54 | INFO | yolox.core.trainer:218 - ---> start train epoch1
c:\users\johnk\desktop\pythonproject\yoloxmain\yolox\yolox\core\trainer.py:106: FutureWarning: torch.cuda.amp.autocast(args...) is deprecated. Please use torch.amp.autocast('cuda', args...) instead.
with torch.cuda.amp.autocast(enabled=self.amp_training):
c:\users\johnk\desktop\pythonproject\yoloxmain\yolox\yolox\models\yolo_head.py:474: FutureWarning: torch.cuda.amp.autocast(args...) is deprecated. Please use torch.amp.autocast('cuda', args...) instead.
with torch.cuda.amp.autocast(enabled=False):
C:\actions-runner_work\pytorch\pytorch\pytorch\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:367: block: [0,0,0], thread: [2,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
2025-05-13 18:08:57 | ERROR | yolox.core.trainer:79 - Exception in training:
2025-05-13 18:08:57 | INFO | yolox.core.trainer:200 - Training of experiment is done and the best AP is 0.00
2025-05-13 18:08:57 | ERROR | yolox.core.launch:98 - An error has been caught in function 'launch', process 'MainProcess' (18824), thread 'MainThread' (26388):
Traceback (most recent call last):

File "C:\Users\johnk\Desktop\pythonProject\YOLOXMAIN\YOLOX\tools\train.py", line 138, in
launch(
└ <function launch at 0x0000018D55A0A040>

File "c:\users\johnk\desktop\pythonproject\yoloxmain\yolox\yolox\core\launch.py", line 98, in launch
main_func(*args)
│ └ (╒═══════════════════╤═══════════════════════════════════════════════════════════════════════════════════════════════════════...
└ <function main at 0x0000018D5953A700>

File "C:\Users\johnk\Desktop\pythonProject\YOLOXMAIN\YOLOX\tools\train.py", line 118, in main
trainer.train()
│ └ <function Trainer.train at 0x0000018D59E61310>
└ <yolox.core.trainer.Trainer object at 0x0000018D59E6E760>

File "c:\users\johnk\desktop\pythonproject\yoloxmain\yolox\yolox\core\trainer.py", line 77, in train
self.train_in_epoch()
│ └ <function Trainer.train_in_epoch at 0x0000018D59E61AF0>
└ <yolox.core.trainer.Trainer object at 0x0000018D59E6E760>

File "c:\users\johnk\desktop\pythonproject\yoloxmain\yolox\yolox\core\trainer.py", line 87, in train_in_epoch
self.train_in_iter()
│ └ <function Trainer.train_in_iter at 0x0000018D59E61B80>
└ <yolox.core.trainer.Trainer object at 0x0000018D59E6E760>

File "c:\users\johnk\desktop\pythonproject\yoloxmain\yolox\yolox\core\trainer.py", line 93, in train_in_iter
self.train_one_iter()
│ └ <function Trainer.train_one_iter at 0x0000018D59E61C10>
└ <yolox.core.trainer.Trainer object at 0x0000018D59E6E760>

File "c:\users\johnk\desktop\pythonproject\yoloxmain\yolox\yolox\core\trainer.py", line 107, in train_one_iter
outputs = self.model(inps, targets)
│ │ │ └
│ │ └
│ └ YOLOX(
│ (backbone): YOLOPAFPN(
│ (backbone): CSPDarknet(
│ (stem): Focus(
│ (conv): BaseConv(
│ (conv): ...
└ <yolox.core.trainer.Trainer object at 0x0000018D59E6E760>

File "C:\Users\johnk\Desktop\pythonProject\YOLOXMAIN\venv\lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
│ │ │ └ {}
│ │ └
│ └ <function Module._call_impl at 0x0000018D53063820>
└ YOLOX(
(backbone): YOLOPAFPN(
(backbone): CSPDarknet(
(stem): Focus(
(conv): BaseConv(
(conv): ...

File "C:\Users\johnk\Desktop\pythonProject\YOLOXMAIN\venv\lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
│ │ └ {}
│ └
└ <bound method YOLOX.forward of YOLOX(
(backbone): YOLOPAFPN(
(backbone): CSPDarknet(
(stem): Focus(
(conv...

File "c:\users\johnk\desktop\pythonproject\yoloxmain\yolox\yolox\models\yolox.py", line 34, in forward
loss, iou_loss, conf_loss, cls_loss, l1_loss, num_fg = self.head(
└ YOLOX(
(backbone): YOLOPAFPN(
(backbone): CSPDarknet(
(stem): Focus(
(conv): BaseConv(
(conv): ...

File "C:\Users\johnk\Desktop\pythonProject\YOLOXMAIN\venv\lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
│ │ │ └ {}
│ │ └
│ └ <function Module._call_impl at 0x0000018D53063820>
└ YOLOXHead(
(cls_convs): ModuleList(
(0-2): 3 x Sequential(
(0): BaseConv(
(conv): Conv2d(128, 128, kernel...

File "C:\Users\johnk\Desktop\pythonProject\YOLOXMAIN\venv\lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
│ │ └ {}
│ └
└ <bound method YOLOXHead.forward of YOLOXHead(
(cls_convs): ModuleList(
(0-2): 3 x Sequential(
(0): BaseConv(
...

File "c:\users\johnk\desktop\pythonproject\yoloxmain\yolox\yolox\models\yolo_head.py", line 194, in forward
return self.get_losses(
│ └ <function YOLOXHead.get_losses at 0x0000018D59EBF0D0>
└ YOLOXHead(
(cls_convs): ModuleList(
(0-2): 3 x Sequential(
(0): BaseConv(
(conv): Conv2d(128, 128, kernel...

File "c:\users\johnk\desktop\pythonproject\yoloxmain\yolox\yolox\models\yolo_head.py", line 310, in get_losses
) = self.get_assignments( # noqa
│ └ <function YOLOXHead.get_assignments at 0x0000018D59EBF280>
└ YOLOXHead(
(cls_convs): ModuleList(
(0-2): 3 x Sequential(
(0): BaseConv(
(conv): Conv2d(128, 128, kernel...

File "C:\Users\johnk\Desktop\pythonProject\YOLOXMAIN\venv\lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
│ │ └ {}
│ └
└ <function YOLOXHead.get_assignments at 0x0000018D59EBF1F0>

File "c:\users\johnk\desktop\pythonproject\yoloxmain\yolox\yolox\models\yolo_head.py", line 496, in get_assignments
) = self.simota_matching(cost, pair_wise_ious, gt_classes, num_gt, fg_mask)
│ │ │ │ │ │ └
│ │ │ │ │ └ 11
│ │ │ │ └
│ │ │ └
│ │ └
│ └ <function YOLOXHead.simota_matching at 0x0000018D59EBF3A0>
└ YOLOXHead(
(cls_convs): ModuleList(
(0-2): 3 x Sequential(
(0): BaseConv(
(conv): Conv2d(128, 128, kernel...

File "c:\users\johnk\desktop\pythonproject\yoloxmain\yolox\yolox\models\yolo_head.py", line 551, in simota_matching
_, pos_idx = torch.topk(
│ │ └ <built-in method topk of type object at 0x00007FFCC6D07550>
│ └ <module 'torch' from 'C:\Users\johnk\Desktop\pythonProject\YOLOXMAIN\venv\lib\site-packages\torch\init.py'>
└

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions._

Thanks

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ran into this error when custom training - > RuntimeError: CUDA error: device-side assert triggered #1839

Ran into this error when custom training - > RuntimeError: CUDA error: device-side assert triggered #1839

johnk2hawaii commented May 13, 2025 •

edited

Loading

Ran into this error when custom training - > RuntimeError: CUDA error: device-side assert triggered #1839

Ran into this error when custom training - > RuntimeError: CUDA error: device-side assert triggered #1839

Comments

johnk2hawaii commented May 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

johnk2hawaii commented May 13, 2025 •

edited

Loading