-
Notifications
You must be signed in to change notification settings - Fork 138
Open
Labels
Description
Running into an error while running SFT on Nemotron Super v1.5 with NeMo-RL DTensor backend:
Using the NeMo-RL container which uses transformers library v4.53.3.
ray.exceptions.RayTaskError(TypeError): ray::DTensorPolicyWorker.train() (pid=1136606, ip=10.65.31.216, actor_id=4e2e4f22e11bc433d51e86d401000000, repr=DTensorPolicyWorker[rank=0])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nemo-rl/nemo_rl/utils/nsys.py", line 88, in wrapper
ret = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/nemo-rl/nemo_rl/models/policy/dtensor_policy_worker.py", line 722, in train
outputs = self.model(**model_args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/ray_venvs/nemo_rl.models.policy.dtensor_policy_worker.DTensorPolicyWorker/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/ray_venvs/nemo_rl.models.policy.dtensor_policy_worker.DTensorPolicyWorker/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1857, in _call_impl
return inner()
^^^^^^^
File "/opt/ray_venvs/nemo_rl.models.policy.dtensor_policy_worker.DTensorPolicyWorker/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1805, in inner
result = forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: DeciLMForCausalLM.forward() got an unexpected keyword argument 'flash_attn_kwargs'