It seems like this issue has been going on for quite some time but is difficult to reproduce. This dataset leads to a successful training but the loss is null and gradient is nan for the entire training: https://huggingface.co/datasets/LegrandFrederic/grab_food On the contrary, with the exact same code, this dataset shows a correct loss and gradient: https://huggingface.co/datasets/LegrandFrederic/clean_desk This problem has been going on since gr00t-n1 and is still relevant with gr00t-n1.5 I will keep exploring this issue