-
Notifications
You must be signed in to change notification settings - Fork 15
Open
Description
Thanks for releasing the training codes and pipeline. While I'm trying to reproduce the libero-long results, I encountered CUDA errors and had to wrap with sdpa_kernel(SDPBackend.EFFICIENT_ATTENTION):
on
Line 215 in 8895cba
attn_output = F.scaled_dot_product_attention( |
However, this will greatly slow down the training, for 80000 steps, it currently needs 21 days to finish.
I tried different dockers (including the one in this repo), cu124+torch2.6 and cu126+torch2.7. All these trials result in CUDA errors. Could anyone that successfully starts training share there libs and versions?
Metadata
Metadata
Assignees
Labels
No labels