Skip to content

Loss curve correctness for Llama 3.1 405B #181

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
bhavya01 opened this issue Mar 28, 2025 · 0 comments
Open

Loss curve correctness for Llama 3.1 405B #181

bhavya01 opened this issue Mar 28, 2025 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@bhavya01
Copy link
Collaborator

🚀 Test loss curves against a reference implementation like huggingface for a large number of steps like 1000 steps.

The reference loss should come from a separate implementation (It can come from running huggingface models on GPUs or TPUs). Probably generate a loss file by running a cron job weekly and compare our implementation against the loss curve nightly for any regressions.

@bhavya01 bhavya01 added the enhancement New feature or request label Mar 28, 2025
@tengyifei tengyifei changed the title Convergence and loss correctness for Llama 3.1 405B Loss curve correctness for Llama 3.1 405B Mar 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants