/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:94: operator(): block: [24973,0,0], thread: [57,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.

Hello,

I'm currently trying to use your notebook from [unslothai/notebooks#61](https://github.com/unslothai/notebooks/pull/61) to fine-tune Qwen2.5 VL 3B for structured information extraction from visual documents using GRPO. I'm working on a remote server, and after resolving some installation issues, I was able to launch the training. However, GRPO training fails after just one step, throwing a CUDA error (see full trace below).

Beyond the runtime error, I'm also quite confused by the integration structure between the unsloth repo and the GRPO functionality for VLMs:

In the notebook, you're using GRPOTrainer from trl and only importing FastVisionModel from unsloth. However, you mention that you've modified Unsloth’s trainer to support VLM training with GRPO — yet I don’t see how those changes are actually being invoked.

In contrast, the vlm-grpo repository suggests that VLM fine-tuning with GRPO requires importing VLMGRPOTrainer from your custom vlmgrpo module, not the trl trainer.

This discrepancy is quite puzzling. I’m unsure whether the notebook is outdated or whether there’s an implicit integration I’m missing.

I’d really appreciate any clarification regarding:

Whether the notebook correctly uses your VLM-GRPO modifications.

If VLMGRPOTrainer should always be used when training vision-language models with GRPO.

Whether the CUDA crash is linked to incorrect usage or is a bug in the implementation.

Thank you in advance for your time and help!

Error trace: 
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:94: operator(): block: [24973,0,0], thread: [57,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
...
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:94: operator(): block: [24973,0,0], thread: [63,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.

Traceback (most recent call last):
  File "train.py", line 98, in <module>
    trainer.train()
  File ".../site-packages/transformers/trainer.py", line 2240, in train
    return inner_training_loop(
  File "...", line 315, in _fast_inner_training_loop
  File "...", line 31, in _unsloth_training_step
  File "UnslothGRPOTrainer.py", line 2030, in compute_loss
    loss, completion_length, mean_kl = grpo_accumulated_loss(
  File "UnslothGRPOTrainer.py", line 315, in grpo_accumulated_loss
    ref_hidden_states = trainer.model(
  File ".../torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File ".../torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
  File ".../peft/peft_model.py", line 1845, in forward
    return self.base_model(
  File ".../torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File ".../torch/nn/modules/module.py", line 1845, in _call_impl
    return inner()
  File ".../torch/nn/modules/module.py", line 1793, in inner
    result = forward_call(*args, **kwargs)
  File ".../peft/tuners/tuners_utils.py", line 216, in forward
    return self.model.forward(*args, **kwargs)
  File "unsloth_compiled_module_qwen2_5_vl.py", line 1136, in forward
    return Qwen2_5_VLForConditionalGeneration_forward(...)
  File ".../transformers/utils/generic.py", line 969, in wrapper
    output = func(self, *args, **kwargs)
  File "unsloth_compiled_module_qwen2_5_vl.py", line 964, in Qwen2_5_VLForConditionalGeneration_forward
    outputs = self.model(...)
  File ".../torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File ".../torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
  File ".../modeling_qwen2_5_vl.py", line 1661, in forward
    image_embeds = self.get_image_features(pixel_values, image_grid_thw)
  File ".../modeling_qwen2_5_vl.py", line 1614, in get_image_features
    image_embeds = self.visual(pixel_values, grid_thw=image_grid_thw)
  File ".../torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File ".../torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
  File ".../modeling_qwen2_5_vl.py", line 502, in forward
    hidden_states = hidden_states[window_index, :, :]

RuntimeError: CUDA error: device-side assert triggered  
Compile with `TORCH_USE_CUDA_DSA=1` to enable device-side assertions.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:94: operator(): block: [24973,0,0], thread: [57,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed. #11

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:94: operator(): block: [24973,0,0], thread: [57,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. #11

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:94: operator(): block: [24973,0,0], thread: [57,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed. #11