Training returns NaN loss when switching from YOLOv8 to YOLOv12 #20083

lansimtech · 2025-04-09T01:11:20Z

lansimtech
Apr 9, 2025

Hi, thank you for your great work on YOLO!

I trained my own dataset successfully using YOLOv8 without any issues. However, when I switch to YOLOv12 and try to train the same dataset, the training loss becomes NaN after a few epochs.

I suspect it might be related to the new architecture or specific layers (e.g., A2C2f), but I would really appreciate if you could provide some insights or guidance on what might be causing the NaNs in YOLOv12.

Any help would be appreciated. Thank you so much!

UltralyticsAssistant · 2025-04-09T01:12:04Z

UltralyticsAssistant
Apr 9, 2025
Maintainer

👋 Hello @lansimtech, thank you for your interest in Ultralytics 🚀! We're thrilled to see you exploring YOLO12. Your feedback is valuable, and your issue deserves attention.

If this is a 🐛 Bug Report, please provide a minimum reproducible example (MRE), including relevant code snippets and details about your environment (e.g., Python version, hardware, and dependencies), to help us debug effectively.

If this is a ❓ Question regarding custom training, please share additional details, such as dataset examples, training logs, and any modifications you've made to the default configuration. This information will help us better understand and address the NaN loss issue you're encountering.

An Ultralytics engineer will review your issue and provide further assistance soon. In the meantime, ensure you're using the latest version of the ultralytics package and that your environment is up to date. YOLO12 introduces various new features and improvements, so it's essential to verify compatibility across your setup.

Thank you for being a part of the Ultralytics community! If you have further questions or updates, feel free to share them here. 😊

9 replies

glenn-jocher Apr 18, 2025
Maintainer

Hi @lansimtech,

Thanks for sharing your results! It's great to hear that reducing the learning rate and disabling AMP resolved the NaN issues. The lower performance compared to YOLO11 is expected - YOLO12's experimental architecture trades some accuracy for its research-oriented design.

If you want to improve YOLO12's performance, you could try:

Gradually increasing training epochs (perhaps to 100+)
Implementing the gradient clipping I suggested earlier
Fine-tuning the box/cls/dfl loss weights for your specific dataset

However, for production use cases, I still recommend YOLO11 as it provides better accuracy-to-speed ratio. Your experience confirms what we've observed internally - YOLO12 is interesting for research but YOLO11 generally delivers superior results in practical applications.

lmst2 Jul 5, 2025

2. Add gradient clipping with close_mosaic=10 and clip_grad=1.0 parameters

hi I have also encountered this problem, and I am running yolo in python through model.train(**params). but I got an error of SyntaxError saying "clip_grad" is not a valid yolo argument

someuser2026 Jul 7, 2025

same for cli run:

(ultralytics) z5428587@k106:~/code $ python3 -m cli.train data_yaml=data_processed/Global/Annotated/variants/segment/planet_full_beach_fine_tune_seed_0/data.yaml imgsz=1024 time=1.75 device=0 batch=4 task=segment weights=/srv/scratch/z5428587/runs/segment/imgsz_256/yolo/convnext_base-no_neck-planet_nearmap_seed0/weights/best.pt run_name=convnext_base-no_neck-planet_full_beach_seed0
[YOLO] CMD: yolo task=segment mode=train model=/srv/scratch/z5428587/runs/segment/imgsz_256/yolo/convnext_base-no_neck-planet_nearmap_seed0/weights/best.pt data=data_processed/Global/Annotated/variants/segment/planet_full_beach_fine_tune_seed_0/data.yaml imgsz=1024 epochs=100 batch=4 device=0 project=/srv/scratch/z5428587/runs/segment/imgsz_1024/yolo name=convnext_base-no_neck-planet_full_beach_seed0 exist_ok=True save=True time=1.75 patience=100 cache=False optimizer=auto seed=0 single_cls=True multi_scale=True box=5 resume=False lr0=0.001 momentum=0.9 clip_grad=1.0 close_mosaic=10 hsv_h=0.015 hsv_s=0.7 degrees=15 translate=0.0 scale=0.25 shear=0.0 flipud=0.4 fliplr=0.4
Traceback (most recent call last):
  File "/srv/scratch/z5428587/mamba/envs/ultralytics/bin/yolo", line 8, in <module>
    sys.exit(entrypoint())
             ~~~~~~~~~~^^
  File "/srv/scratch/z5428587/mamba/envs/ultralytics/lib/python3.13/site-packages/ultralytics/cfg/__init__.py", line 910, in entrypoint
    check_dict_alignment(full_args_dict, overrides)
    ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/srv/scratch/z5428587/mamba/envs/ultralytics/lib/python3.13/site-packages/ultralytics/cfg/__init__.py", line 498, in check_dict_alignment
    raise SyntaxError(string + CLI_HELP_MSG) from e
SyntaxError: 'clip_grad' is not a valid YOLO argument. 

    Arguments received: ['yolo', 'task=segment', 'mode=train', 'model=/srv/scratch/z5428587/runs/segment/imgsz_256/yolo/convnext_base-no_neck-planet_nearmap_seed0/weights/best.pt', 'data=data_processed/Global/Annotated/variants/segment/planet_full_beach_fine_tune_seed_0/data.yaml', 'imgsz=1024', 'epochs=100', 'batch=4', 'device=0', 'project=/srv/scratch/z5428587/runs/segment/imgsz_1024/yolo', 'name=convnext_base-no_neck-planet_full_beach_seed0', 'exist_ok=True', 'save=True', 'time=1.75', 'patience=100', 'cache=False', 'optimizer=auto', 'seed=0', 'single_cls=True', 'multi_scale=True', 'box=5', 'resume=False', 'lr0=0.001', 'momentum=0.9', 'clip_grad=1.0', 'close_mosaic=10', 'hsv_h=0.015', 'hsv_s=0.7', 'degrees=15', 'translate=0.0', 'scale=0.25', 'shear=0.0', 'flipud=0.4', 'fliplr=0.4']. Ultralytics 'yolo' commands use the following syntax:

glenn-jocher Jul 7, 2025
Maintainer

Hi @lmst2 and @someuser2026,

I apologize for the confusion - clip_grad is not a valid YOLO training argument. That was my mistake in the previous suggestion.

For YOLO12 training stability, focus on the parameters that are actually available:

Lower learning rate: lr0=0.001
Disable AMP: amp=false
Use close_mosaic=10 (this one is valid)
Reduce batch size if needed

Gradient clipping is handled internally by the training process and isn't exposed as a user parameter. The other suggestions should be sufficient for stabilizing YOLO12 training.

someuser2026 Jul 8, 2025

No worries! Do we have a methods to clip the gradients during training in some way?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ultralytics

Training returns NaN loss when switching from YOLOv8 to YOLOv12 #20083

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 9 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Ultralytics

Training returns NaN loss when switching from YOLOv8 to YOLOv12 #20083

Uh oh!

lansimtech Apr 9, 2025

Replies: 1 comment · 9 replies

Uh oh!

UltralyticsAssistant Apr 9, 2025 Maintainer

Uh oh!

glenn-jocher Apr 18, 2025 Maintainer

Uh oh!

lmst2 Jul 5, 2025

Uh oh!

someuser2026 Jul 7, 2025

Uh oh!

glenn-jocher Jul 7, 2025 Maintainer

Uh oh!

someuser2026 Jul 8, 2025

lansimtech
Apr 9, 2025

Replies: 1 comment 9 replies

UltralyticsAssistant
Apr 9, 2025
Maintainer

glenn-jocher Apr 18, 2025
Maintainer

glenn-jocher Jul 7, 2025
Maintainer