🤖 LLM Leaderboard

A comprehensive, community-driven tracking system for Large Language Models with detailed specifications, benchmarks, and analysis.

Track. Compare. Analyze. This repository provides an organized, searchable database of 679 LLM models from 174+ organizations, helping researchers and developers understand the rapidly evolving landscape of AI language models.

✨ Features

📊 679 Models Tracked - From GPT-5 to Gemini, Claude to Llama
🏢 174+ Labs - Coverage across major AI research organizations
📈 Multiple Benchmarks - MMLU, GPQA, HLE performance metrics
🔍 Detailed Model Cards - Architecture, training data, performance analysis
📂 Organized Structure - Browse by lab, architecture, capability, or release date
🚀 Open Source - MIT licensed, community contributions welcome

Total Models: 679

📊 Quick Navigation

🏆 Top Models by ALScore

ALScore is calculated as: √(Parameters × Tokens) ÷ 300

Rank	Model	Lab	Params	Tokens	ALScore	MMLU	GPQA	Status
1	Claude Opus 4	Anthropic	6000.0B	100000.0B	81.6	-	83.3	🟢
2	GPT-4.5	OpenAI	3000.0B	114000.0B	61.6	89.6	71.4	🟢
3	Claude Opus 4.1	Anthropic	2000.0B	100000.0B	47.1	-	80.9	🟢
4	DeepSeek-R2	DeepSeek-AI	1200.0B	130000.0B	41.6	-	-	🟢
5	Claude 3 Opus	Anthropic	2500.0B	40000.0B	33.3	86.8	59.5	🟢
6	codex-1	OpenAI	600.0B	100000.0B	25.8	-	-	🟢
7	o3	OpenAI	600.0B	100000.0B	25.8	91.2	83.3	🟢
8	Llama 4 Behemoth	Meta AI	2000.0B	30000.0B	25.8	-	73.7	🔴
9	Gemini Ultra	Google DeepMind	2000.0B	30000.0B	25.8	-	-	-
10	o3-preview	OpenAI	600.0B	100000.0B	25.8	-	87.7	🟢
11	Grok 4	xAI	600.0B	80000.0B	23.1	-	88.9	🟢
12	LearnLM	Google DeepMind	1500.0B	30000.0B	22.4	-	72.0	🟡
13	Gemini Ultra 1.0	Google DeepMind	1500.0B	30000.0B	22.4	83.7	35.7	🟢
14	Yi-XLarge	01-ai	2000.0B	20000.0B	21.1	85.1	48.2	🟢
15	Qwen3-Max-Preview	Alibaba	1000.0B	36000.0B	20.0	-	64.6	🟢
16	Qwen3-Max	Alibaba	1000.0B	36000.0B	20.0	-	85.4	🟢
17	GPT-5	OpenAI	300.0B	114000.0B	19.5	91.0	89.4	🟢
18	Grok-3	xAI	928.0B	36200.0B	19.3	-	84.6	🟢
19	Gemini 2.5 Pro Preview	Google DeepMind	400.0B	80000.0B	18.9	-	84.0	🟢
20	Claude Sonnet 4.5	Anthropic	400.0B	80000.0B	18.9	-	83.4	🟢
21	Gemini 2.5 Pro 06-05	Google DeepMind	400.0B	80000.0B	18.9	-	86.4	🟢
22	Samba-1	SambaNova	1400.0B	20000.0B	17.6	-	-	🟡
23	Inflection-2	Inflection AI	1200.0B	20000.0B	16.3	-	-	🟢
24	Inflection-3 Productivity (3.0)	Inflection AI	1200.0B	20000.0B	16.3	-	-	🟢
25	Inflection-3 Pi (3.0)	Inflection AI	1200.0B	20000.0B	16.3	-	-	🟢
26	Inflection-2.5	Inflection AI	1200.0B	20000.0B	16.3	85.5	38.4	🟢
27	SpreadsheetLLM	Microsoft	1760.0B	13000.0B	15.9	-	-	🔴
28	GPT-4 Classic (gpt-4-0314 & gpt-4-0613, non-Turbo)	OpenAI	1760.0B	13000.0B	15.9	86.4	35.7	🟢
29	GPT-4 MathMix	OpenAI	1760.0B	13000.0B	15.9	-	-	🔴
30	PanGu 5.0 Super	Huawei	1000.0B	20000.0B	14.9	-	-	🟡

📈 Dataset Statistics

Total Models: 679
Public Models: 551
Private Models: 102
Labs Tracked: 174

🏢 Top Labs by Model Count

Google DeepMind: 47 models
Microsoft: 38 models
Meta AI: 36 models
OpenAI: 32 models
Google: 28 models
Alibaba: 26 models
NVIDIA: 23 models
Mistral: 21 models
DeepSeek-AI: 16 models
Baidu: 12 models

🏗️ Architecture Distribution

Dense: 534 models
MoE: 131 models
MatFormer: 2 models
Hybrid: 2 models
Compound: 1 models
CoE: 1 models

📝 About This Project

This leaderboard tracks LLM models with detailed specifications, benchmarks, and metadata. All data is organized in markdown for easy browsing and version control.

Benchmarks Explained

ALScore: Quick power rating based on parameters and training tokens
MMLU: Massive Multitask Language Understanding (general knowledge)
MMLU-Pro: Advanced version of MMLU
GPQA: Graduate-level Q&A benchmark
HLE: High-Level Evaluation score

🚀 Getting Started

Browse the Leaderboard

Explore Top Models: Start with the top 30 models ranked by ALScore
Find by Lab: Check out specific organizations in the Labs section
Filter by Capability: Browse Reasoning Models or Architecture Types
View Details: Click any model name to see detailed specifications and benchmarks

Search Options

By Performance: MMLU Rankings | By Parameters
By Time: Release Date - See the latest releases
By Organization: All Labs - Browse all 174+ organizations
By Architecture: MoE Models | Dense Models

Model Card Structure

Each model page includes:

📊 Overview and key capabilities
🔧 Technical specifications (parameters, tokens, architecture)
📈 Performance benchmarks with intelligent analysis
🏷️ Tags and categories
📝 Detailed notes and key information
🔗 Links to official resources
🔍 Related models and lab profiles

🤝 Contributing

We welcome contributions! Here's how you can help:

Report Issues: Found incorrect data? Open an issue
Suggest Models: Know a model we're missing? Let us know!
Improve Documentation: Help keep information accurate and up-to-date
Share Feedback: Ideas for better organization or new features?

See CONTRIBUTING.md for detailed guidelines.

📊 Data & Methodology

Data Collection

Data is aggregated from multiple sources:

Official model releases and announcements
Research papers and technical documentation
Public benchmarks and evaluation leaderboards
Community verification and contributions

Compute Optimal Training

Models are evaluated against the Chinchilla scaling law (optimal ratio ≥20:1 tokens-to-parameters):

✅ Compute-optimal: Adequately trained with sufficient data (≥20:1)
⚠️ Under-trained: May benefit from additional training (<20:1)

Benchmark Scoring

ALScore Formula: √(Parameters × Tokens) ÷ 300
- ≥20: Extremely powerful
- 10-20: Very powerful
- 5-10: Powerful
- 1-5: Mid-tier
- <1: Lightweight

Note: Benchmark scores are from publicly available sources. Some values may be estimates or unavailable (-).

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

All AI research organizations for their contributions to open research
The AI community for sharing benchmarks and insights
Contributors who help keep this resource accurate and up-to-date

Last Updated: 2025-10-02 | Maintained by: Community Contributors

⭐ Star this repo to stay updated with the latest LLM developments!

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
architectures		architectures
insights		insights
labs		labs
models		models
rankings		rankings
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🤖 LLM Leaderboard

✨ Features

📊 Quick Navigation

🏆 Top Models by ALScore

📈 Dataset Statistics

🏢 Top Labs by Model Count

🏗️ Architecture Distribution

📝 About This Project

Benchmarks Explained

Tags

🚀 Getting Started

Browse the Leaderboard

Search Options

Model Card Structure

🤝 Contributing

📊 Data & Methodology

Data Collection

Compute Optimal Training

Benchmark Scoring

📜 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

License

AIComputing101/llm-leaderboard

Folders and files

Latest commit

History

Repository files navigation

🤖 LLM Leaderboard

✨ Features

📊 Quick Navigation

🏆 Top Models by ALScore

📈 Dataset Statistics

🏢 Top Labs by Model Count

🏗️ Architecture Distribution

📝 About This Project

Benchmarks Explained

Tags

🚀 Getting Started

Browse the Leaderboard

Search Options

Model Card Structure

🤝 Contributing

📊 Data & Methodology

Data Collection

Compute Optimal Training

Benchmark Scoring

📜 License

🙏 Acknowledgments

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages