Skip to content

Ran into this error when custom training - > RuntimeError: CUDA error: device-side assert triggered #1839

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
johnk2hawaii opened this issue May 13, 2025 · 0 comments

Comments

@johnk2hawaii
Copy link

johnk2hawaii commented May 13, 2025

I understand this has been an issue in the past but I have not found any cure all solution to it so I thought I would raise it again. I am trying to transfer train on some custom data and I ran into an error. Note that I am using pycharm in a virtual environment and I have cuda version 12.4 , python version 3.9.7 and torch version 2.6.0+cu124 and my I have a NVIDIA GeForce GTX 1660 gpu. The operating system is Windows. I have tested to see if my gpu is available and it seems to be functioning. After running the command below I get the following error after the intial setups go smoothly:

python tools/train.py -f exps/example/custom/yolox_s.py -d 1 -b 8 --fp16 -o -c yolox_s.pth

_2025-05-13 18:08:54 | INFO | yolox.core.trainer:218 - ---> start train epoch1
c:\users\johnk\desktop\pythonproject\yoloxmain\yolox\yolox\core\trainer.py:106: FutureWarning: torch.cuda.amp.autocast(args...) is deprecated. Please use torch.amp.autocast('cuda', args...) instead.
with torch.cuda.amp.autocast(enabled=self.amp_training):
c:\users\johnk\desktop\pythonproject\yoloxmain\yolox\yolox\models\yolo_head.py:474: FutureWarning: torch.cuda.amp.autocast(args...) is deprecated. Please use torch.amp.autocast('cuda', args...) instead.
with torch.cuda.amp.autocast(enabled=False):
C:\actions-runner_work\pytorch\pytorch\pytorch\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:367: block: [0,0,0], thread: [2,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
2025-05-13 18:08:57 | ERROR | yolox.core.trainer:79 - Exception in training:
2025-05-13 18:08:57 | INFO | yolox.core.trainer:200 - Training of experiment is done and the best AP is 0.00
2025-05-13 18:08:57 | ERROR | yolox.core.launch:98 - An error has been caught in function 'launch', process 'MainProcess' (18824), thread 'MainThread' (26388):
Traceback (most recent call last):

File "C:\Users\johnk\Desktop\pythonProject\YOLOXMAIN\YOLOX\tools\train.py", line 138, in
launch(
└ <function launch at 0x0000018D55A0A040>

File "c:\users\johnk\desktop\pythonproject\yoloxmain\yolox\yolox\core\launch.py", line 98, in launch
main_func(*args)
│ └ (╒═══════════════════╤═══════════════════════════════════════════════════════════════════════════════════════════════════════...
└ <function main at 0x0000018D5953A700>

File "C:\Users\johnk\Desktop\pythonProject\YOLOXMAIN\YOLOX\tools\train.py", line 118, in main
trainer.train()
│ └ <function Trainer.train at 0x0000018D59E61310>
└ <yolox.core.trainer.Trainer object at 0x0000018D59E6E760>

File "c:\users\johnk\desktop\pythonproject\yoloxmain\yolox\yolox\core\trainer.py", line 77, in train
self.train_in_epoch()
│ └ <function Trainer.train_in_epoch at 0x0000018D59E61AF0>
└ <yolox.core.trainer.Trainer object at 0x0000018D59E6E760>

File "c:\users\johnk\desktop\pythonproject\yoloxmain\yolox\yolox\core\trainer.py", line 87, in train_in_epoch
self.train_in_iter()
│ └ <function Trainer.train_in_iter at 0x0000018D59E61B80>
└ <yolox.core.trainer.Trainer object at 0x0000018D59E6E760>

File "c:\users\johnk\desktop\pythonproject\yoloxmain\yolox\yolox\core\trainer.py", line 93, in train_in_iter
self.train_one_iter()
│ └ <function Trainer.train_one_iter at 0x0000018D59E61C10>
└ <yolox.core.trainer.Trainer object at 0x0000018D59E6E760>

File "c:\users\johnk\desktop\pythonproject\yoloxmain\yolox\yolox\core\trainer.py", line 107, in train_one_iter
outputs = self.model(inps, targets)
│ │ │ └
│ │ └
│ └ YOLOX(
│ (backbone): YOLOPAFPN(
│ (backbone): CSPDarknet(
│ (stem): Focus(
│ (conv): BaseConv(
│ (conv): ...
└ <yolox.core.trainer.Trainer object at 0x0000018D59E6E760>

File "C:\Users\johnk\Desktop\pythonProject\YOLOXMAIN\venv\lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
│ │ │ └ {}
│ │ └
│ └ <function Module._call_impl at 0x0000018D53063820>
└ YOLOX(
(backbone): YOLOPAFPN(
(backbone): CSPDarknet(
(stem): Focus(
(conv): BaseConv(
(conv): ...

File "C:\Users\johnk\Desktop\pythonProject\YOLOXMAIN\venv\lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
│ │ └ {}
│ └
└ <bound method YOLOX.forward of YOLOX(
(backbone): YOLOPAFPN(
(backbone): CSPDarknet(
(stem): Focus(
(conv...

File "c:\users\johnk\desktop\pythonproject\yoloxmain\yolox\yolox\models\yolox.py", line 34, in forward
loss, iou_loss, conf_loss, cls_loss, l1_loss, num_fg = self.head(
└ YOLOX(
(backbone): YOLOPAFPN(
(backbone): CSPDarknet(
(stem): Focus(
(conv): BaseConv(
(conv): ...

File "C:\Users\johnk\Desktop\pythonProject\YOLOXMAIN\venv\lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
│ │ │ └ {}
│ │ └
│ └ <function Module._call_impl at 0x0000018D53063820>
└ YOLOXHead(
(cls_convs): ModuleList(
(0-2): 3 x Sequential(
(0): BaseConv(
(conv): Conv2d(128, 128, kernel...

File "C:\Users\johnk\Desktop\pythonProject\YOLOXMAIN\venv\lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
│ │ └ {}
│ └
└ <bound method YOLOXHead.forward of YOLOXHead(
(cls_convs): ModuleList(
(0-2): 3 x Sequential(
(0): BaseConv(
...

File "c:\users\johnk\desktop\pythonproject\yoloxmain\yolox\yolox\models\yolo_head.py", line 194, in forward
return self.get_losses(
│ └ <function YOLOXHead.get_losses at 0x0000018D59EBF0D0>
└ YOLOXHead(
(cls_convs): ModuleList(
(0-2): 3 x Sequential(
(0): BaseConv(
(conv): Conv2d(128, 128, kernel...

File "c:\users\johnk\desktop\pythonproject\yoloxmain\yolox\yolox\models\yolo_head.py", line 310, in get_losses
) = self.get_assignments( # noqa
│ └ <function YOLOXHead.get_assignments at 0x0000018D59EBF280>
└ YOLOXHead(
(cls_convs): ModuleList(
(0-2): 3 x Sequential(
(0): BaseConv(
(conv): Conv2d(128, 128, kernel...

File "C:\Users\johnk\Desktop\pythonProject\YOLOXMAIN\venv\lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
│ │ └ {}
│ └
└ <function YOLOXHead.get_assignments at 0x0000018D59EBF1F0>

File "c:\users\johnk\desktop\pythonproject\yoloxmain\yolox\yolox\models\yolo_head.py", line 496, in get_assignments
) = self.simota_matching(cost, pair_wise_ious, gt_classes, num_gt, fg_mask)
│ │ │ │ │ │ └
│ │ │ │ │ └ 11
│ │ │ │ └
│ │ │ └
│ │ └
│ └ <function YOLOXHead.simota_matching at 0x0000018D59EBF3A0>
└ YOLOXHead(
(cls_convs): ModuleList(
(0-2): 3 x Sequential(
(0): BaseConv(
(conv): Conv2d(128, 128, kernel...

File "c:\users\johnk\desktop\pythonproject\yoloxmain\yolox\yolox\models\yolo_head.py", line 551, in simota_matching
_, pos_idx = torch.topk(
│ │ └ <built-in method topk of type object at 0x00007FFCC6D07550>
│ └ <module 'torch' from 'C:\Users\johnk\Desktop\pythonProject\YOLOXMAIN\venv\lib\site-packages\torch\init.py'>

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions._

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant