Skip to content

Error on running SFT on the Nemotron 49B v1.5 model with NeMo-RL with DTensor backend #1135

@gvenkatakris

Description

@gvenkatakris

Running into an error while running SFT on Nemotron Super v1.5 with NeMo-RL DTensor backend:
Using the NeMo-RL container which uses transformers library v4.53.3.

ray.exceptions.RayTaskError(TypeError): ray::DTensorPolicyWorker.train() (pid=1136606, ip=10.65.31.216, actor_id=4e2e4f22e11bc433d51e86d401000000, repr=DTensorPolicyWorker[rank=0])

           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/nemo-rl/nemo_rl/utils/nsys.py", line 88, in wrapper

    ret = func(*args, **kwargs)

          ^^^^^^^^^^^^^^^^^^^^^

  File "/nemo-rl/nemo_rl/models/policy/dtensor_policy_worker.py", line 722, in train

    outputs = self.model(**model_args)

              ^^^^^^^^^^^^^^^^^^^^^^^^

  File "/opt/ray_venvs/nemo_rl.models.policy.dtensor_policy_worker.DTensorPolicyWorker/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl

    return self._call_impl(*args, **kwargs)

           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/opt/ray_venvs/nemo_rl.models.policy.dtensor_policy_worker.DTensorPolicyWorker/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1857, in _call_impl

    return inner()

           ^^^^^^^

  File "/opt/ray_venvs/nemo_rl.models.policy.dtensor_policy_worker.DTensorPolicyWorker/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1805, in inner

    result = forward_call(*args, **kwargs)

             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

TypeError: DeciLMForCausalLM.forward() got an unexpected keyword argument 'flash_attn_kwargs'


Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions