A comprehensive, community-driven tracking system for Large Language Models with detailed specifications, benchmarks, and analysis.
Track. Compare. Analyze. This repository provides an organized, searchable database of 679 LLM models from 174+ organizations, helping researchers and developers understand the rapidly evolving landscape of AI language models.
- π 679 Models Tracked - From GPT-5 to Gemini, Claude to Llama
- π’ 174+ Labs - Coverage across major AI research organizations
- π Multiple Benchmarks - MMLU, GPQA, HLE performance metrics
- π Detailed Model Cards - Architecture, training data, performance analysis
- π Organized Structure - Browse by lab, architecture, capability, or release date
- π Open Source - MIT licensed, community contributions welcome
Total Models: 679
ALScore is calculated as: β(Parameters Γ Tokens) Γ· 300
| Rank | Model | Lab | Params | Tokens | ALScore | MMLU | GPQA | Status |
|---|---|---|---|---|---|---|---|---|
| 1 | Claude Opus 4 | Anthropic | 6000.0B | 100000.0B | 81.6 | - | 83.3 | π’ |
| 2 | GPT-4.5 | OpenAI | 3000.0B | 114000.0B | 61.6 | 89.6 | 71.4 | π’ |
| 3 | Claude Opus 4.1 | Anthropic | 2000.0B | 100000.0B | 47.1 | - | 80.9 | π’ |
| 4 | DeepSeek-R2 | DeepSeek-AI | 1200.0B | 130000.0B | 41.6 | - | - | π’ |
| 5 | Claude 3 Opus | Anthropic | 2500.0B | 40000.0B | 33.3 | 86.8 | 59.5 | π’ |
| 6 | codex-1 | OpenAI | 600.0B | 100000.0B | 25.8 | - | - | π’ |
| 7 | o3 | OpenAI | 600.0B | 100000.0B | 25.8 | 91.2 | 83.3 | π’ |
| 8 | Llama 4 Behemoth | Meta AI | 2000.0B | 30000.0B | 25.8 | - | 73.7 | π΄ |
| 9 | Gemini Ultra | Google DeepMind | 2000.0B | 30000.0B | 25.8 | - | - | - |
| 10 | o3-preview | OpenAI | 600.0B | 100000.0B | 25.8 | - | 87.7 | π’ |
| 11 | Grok 4 | xAI | 600.0B | 80000.0B | 23.1 | - | 88.9 | π’ |
| 12 | LearnLM | Google DeepMind | 1500.0B | 30000.0B | 22.4 | - | 72.0 | π‘ |
| 13 | Gemini Ultra 1.0 | Google DeepMind | 1500.0B | 30000.0B | 22.4 | 83.7 | 35.7 | π’ |
| 14 | Yi-XLarge | 01-ai | 2000.0B | 20000.0B | 21.1 | 85.1 | 48.2 | π’ |
| 15 | Qwen3-Max-Preview | Alibaba | 1000.0B | 36000.0B | 20.0 | - | 64.6 | π’ |
| 16 | Qwen3-Max | Alibaba | 1000.0B | 36000.0B | 20.0 | - | 85.4 | π’ |
| 17 | GPT-5 | OpenAI | 300.0B | 114000.0B | 19.5 | 91.0 | 89.4 | π’ |
| 18 | Grok-3 | xAI | 928.0B | 36200.0B | 19.3 | - | 84.6 | π’ |
| 19 | Gemini 2.5 Pro Preview | Google DeepMind | 400.0B | 80000.0B | 18.9 | - | 84.0 | π’ |
| 20 | Claude Sonnet 4.5 | Anthropic | 400.0B | 80000.0B | 18.9 | - | 83.4 | π’ |
| 21 | Gemini 2.5 Pro 06-05 | Google DeepMind | 400.0B | 80000.0B | 18.9 | - | 86.4 | π’ |
| 22 | Samba-1 | SambaNova | 1400.0B | 20000.0B | 17.6 | - | - | π‘ |
| 23 | Inflection-2 | Inflection AI | 1200.0B | 20000.0B | 16.3 | - | - | π’ |
| 24 | Inflection-3 Productivity (3.0) | Inflection AI | 1200.0B | 20000.0B | 16.3 | - | - | π’ |
| 25 | Inflection-3 Pi (3.0) | Inflection AI | 1200.0B | 20000.0B | 16.3 | - | - | π’ |
| 26 | Inflection-2.5 | Inflection AI | 1200.0B | 20000.0B | 16.3 | 85.5 | 38.4 | π’ |
| 27 | SpreadsheetLLM | Microsoft | 1760.0B | 13000.0B | 15.9 | - | - | π΄ |
| 28 | GPT-4 Classic (gpt-4-0314 & gpt-4-0613, non-Turbo) | OpenAI | 1760.0B | 13000.0B | 15.9 | 86.4 | 35.7 | π’ |
| 29 | GPT-4 MathMix | OpenAI | 1760.0B | 13000.0B | 15.9 | - | - | π΄ |
| 30 | PanGu 5.0 Super | Huawei | 1000.0B | 20000.0B | 14.9 | - | - | π‘ |
- Total Models: 679
- Public Models: 551
- Private Models: 102
- Labs Tracked: 174
- Google DeepMind: 47 models
- Microsoft: 38 models
- Meta AI: 36 models
- OpenAI: 32 models
- Google: 28 models
- Alibaba: 26 models
- NVIDIA: 23 models
- Mistral: 21 models
- DeepSeek-AI: 16 models
- Baidu: 12 models
- Dense: 534 models
- MoE: 131 models
- MatFormer: 2 models
- Hybrid: 2 models
- Compound: 1 models
- CoE: 1 models
This leaderboard tracks LLM models with detailed specifications, benchmarks, and metadata. All data is organized in markdown for easy browsing and version control.
- ALScore: Quick power rating based on parameters and training tokens
- MMLU: Massive Multitask Language Understanding (general knowledge)
- MMLU-Pro: Advanced version of MMLU
- GPQA: Graduate-level Q&A benchmark
- HLE: High-Level Evaluation score
- SOTA: State-of-the-art performance
- Reasoning: Advanced reasoning capabilities
- Dense: Traditional dense architecture
- MoE: Mixture of Experts architecture
- Explore Top Models: Start with the top 30 models ranked by ALScore
- Find by Lab: Check out specific organizations in the Labs section
- Filter by Capability: Browse Reasoning Models or Architecture Types
- View Details: Click any model name to see detailed specifications and benchmarks
- By Performance: MMLU Rankings | By Parameters
- By Time: Release Date - See the latest releases
- By Organization: All Labs - Browse all 174+ organizations
- By Architecture: MoE Models | Dense Models
Each model page includes:
- π Overview and key capabilities
- π§ Technical specifications (parameters, tokens, architecture)
- π Performance benchmarks with intelligent analysis
- π·οΈ Tags and categories
- π Detailed notes and key information
- π Links to official resources
- π Related models and lab profiles
We welcome contributions! Here's how you can help:
- Report Issues: Found incorrect data? Open an issue
- Suggest Models: Know a model we're missing? Let us know!
- Improve Documentation: Help keep information accurate and up-to-date
- Share Feedback: Ideas for better organization or new features?
See CONTRIBUTING.md for detailed guidelines.
Data is aggregated from multiple sources:
- Official model releases and announcements
- Research papers and technical documentation
- Public benchmarks and evaluation leaderboards
- Community verification and contributions
Models are evaluated against the Chinchilla scaling law (optimal ratio β₯20:1 tokens-to-parameters):
- β Compute-optimal: Adequately trained with sufficient data (β₯20:1)
β οΈ Under-trained: May benefit from additional training (<20:1)
- ALScore Formula:
β(Parameters Γ Tokens) Γ· 300- β₯20: Extremely powerful
- 10-20: Very powerful
- 5-10: Powerful
- 1-5: Mid-tier
- <1: Lightweight
Note: Benchmark scores are from publicly available sources. Some values may be estimates or unavailable (-).
This project is licensed under the MIT License - see the LICENSE file for details.
- All AI research organizations for their contributions to open research
- The AI community for sharing benchmarks and insights
- Contributors who help keep this resource accurate and up-to-date
Last Updated: 2025-10-02 | Maintained by: Community Contributors
β Star this repo to stay updated with the latest LLM developments!