Skip to content

/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:94: operator(): block: [24973,0,0], thread: [57,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. #11

@Asma-Bk

Description

@Asma-Bk

Hello,

I'm currently trying to use your notebook from unslothai/notebooks#61 to fine-tune Qwen2.5 VL 3B for structured information extraction from visual documents using GRPO. I'm working on a remote server, and after resolving some installation issues, I was able to launch the training. However, GRPO training fails after just one step, throwing a CUDA error (see full trace below).

Beyond the runtime error, I'm also quite confused by the integration structure between the unsloth repo and the GRPO functionality for VLMs:

In the notebook, you're using GRPOTrainer from trl and only importing FastVisionModel from unsloth. However, you mention that you've modified Unsloth’s trainer to support VLM training with GRPO — yet I don’t see how those changes are actually being invoked.

In contrast, the vlm-grpo repository suggests that VLM fine-tuning with GRPO requires importing VLMGRPOTrainer from your custom vlmgrpo module, not the trl trainer.

This discrepancy is quite puzzling. I’m unsure whether the notebook is outdated or whether there’s an implicit integration I’m missing.

I’d really appreciate any clarification regarding:

Whether the notebook correctly uses your VLM-GRPO modifications.

If VLMGRPOTrainer should always be used when training vision-language models with GRPO.

Whether the CUDA crash is linked to incorrect usage or is a bug in the implementation.

Thank you in advance for your time and help!

Error trace:
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:94: operator(): block: [24973,0,0], thread: [57,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
...
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:94: operator(): block: [24973,0,0], thread: [63,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.

Traceback (most recent call last):
File "train.py", line 98, in
trainer.train()
File ".../site-packages/transformers/trainer.py", line 2240, in train
return inner_training_loop(
File "...", line 315, in _fast_inner_training_loop
File "...", line 31, in _unsloth_training_step
File "UnslothGRPOTrainer.py", line 2030, in compute_loss
loss, completion_length, mean_kl = grpo_accumulated_loss(
File "UnslothGRPOTrainer.py", line 315, in grpo_accumulated_loss
ref_hidden_states = trainer.model(
File ".../torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File ".../torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File ".../peft/peft_model.py", line 1845, in forward
return self.base_model(
File ".../torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File ".../torch/nn/modules/module.py", line 1845, in _call_impl
return inner()
File ".../torch/nn/modules/module.py", line 1793, in inner
result = forward_call(*args, **kwargs)
File ".../peft/tuners/tuners_utils.py", line 216, in forward
return self.model.forward(*args, **kwargs)
File "unsloth_compiled_module_qwen2_5_vl.py", line 1136, in forward
return Qwen2_5_VLForConditionalGeneration_forward(...)
File ".../transformers/utils/generic.py", line 969, in wrapper
output = func(self, *args, **kwargs)
File "unsloth_compiled_module_qwen2_5_vl.py", line 964, in Qwen2_5_VLForConditionalGeneration_forward
outputs = self.model(...)
File ".../torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File ".../torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File ".../modeling_qwen2_5_vl.py", line 1661, in forward
image_embeds = self.get_image_features(pixel_values, image_grid_thw)
File ".../modeling_qwen2_5_vl.py", line 1614, in get_image_features
image_embeds = self.visual(pixel_values, grid_thw=image_grid_thw)
File ".../torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File ".../torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File ".../modeling_qwen2_5_vl.py", line 502, in forward
hidden_states = hidden_states[window_index, :, :]

RuntimeError: CUDA error: device-side assert triggered
Compile with TORCH_USE_CUDA_DSA=1 to enable device-side assertions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions