Hello, I re-trained your model, 1.5M steps Their performance is a bit behind SOTA, but may be useful as a research work to compare and compare. <img width="924" alt="image" src="https://github.com/user-attachments/assets/00dc574e-b2e1-48ef-82cd-76d8d61e95b4">