Skip to content

A comprehensive, community-driven tracking system for Large Language Models with detailed specifications, benchmarks, and analysis.

License

Notifications You must be signed in to change notification settings

AIComputing101/llm-leaderboard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ€– LLM Leaderboard

A comprehensive, community-driven tracking system for Large Language Models with detailed specifications, benchmarks, and analysis.

Models Last Updated License

Track. Compare. Analyze. This repository provides an organized, searchable database of 679 LLM models from 174+ organizations, helping researchers and developers understand the rapidly evolving landscape of AI language models.

✨ Features

  • πŸ“Š 679 Models Tracked - From GPT-5 to Gemini, Claude to Llama
  • 🏒 174+ Labs - Coverage across major AI research organizations
  • πŸ“ˆ Multiple Benchmarks - MMLU, GPQA, HLE performance metrics
  • πŸ” Detailed Model Cards - Architecture, training data, performance analysis
  • πŸ“‚ Organized Structure - Browse by lab, architecture, capability, or release date
  • πŸš€ Open Source - MIT licensed, community contributions welcome

Total Models: 679

πŸ“Š Quick Navigation

πŸ† Top Models by ALScore

ALScore is calculated as: √(Parameters Γ— Tokens) Γ· 300

Rank Model Lab Params Tokens ALScore MMLU GPQA Status
1 Claude Opus 4 Anthropic 6000.0B 100000.0B 81.6 - 83.3 🟒
2 GPT-4.5 OpenAI 3000.0B 114000.0B 61.6 89.6 71.4 🟒
3 Claude Opus 4.1 Anthropic 2000.0B 100000.0B 47.1 - 80.9 🟒
4 DeepSeek-R2 DeepSeek-AI 1200.0B 130000.0B 41.6 - - 🟒
5 Claude 3 Opus Anthropic 2500.0B 40000.0B 33.3 86.8 59.5 🟒
6 codex-1 OpenAI 600.0B 100000.0B 25.8 - - 🟒
7 o3 OpenAI 600.0B 100000.0B 25.8 91.2 83.3 🟒
8 Llama 4 Behemoth Meta AI 2000.0B 30000.0B 25.8 - 73.7 πŸ”΄
9 Gemini Ultra Google DeepMind 2000.0B 30000.0B 25.8 - - -
10 o3-preview OpenAI 600.0B 100000.0B 25.8 - 87.7 🟒
11 Grok 4 xAI 600.0B 80000.0B 23.1 - 88.9 🟒
12 LearnLM Google DeepMind 1500.0B 30000.0B 22.4 - 72.0 🟑
13 Gemini Ultra 1.0 Google DeepMind 1500.0B 30000.0B 22.4 83.7 35.7 🟒
14 Yi-XLarge 01-ai 2000.0B 20000.0B 21.1 85.1 48.2 🟒
15 Qwen3-Max-Preview Alibaba 1000.0B 36000.0B 20.0 - 64.6 🟒
16 Qwen3-Max Alibaba 1000.0B 36000.0B 20.0 - 85.4 🟒
17 GPT-5 OpenAI 300.0B 114000.0B 19.5 91.0 89.4 🟒
18 Grok-3 xAI 928.0B 36200.0B 19.3 - 84.6 🟒
19 Gemini 2.5 Pro Preview Google DeepMind 400.0B 80000.0B 18.9 - 84.0 🟒
20 Claude Sonnet 4.5 Anthropic 400.0B 80000.0B 18.9 - 83.4 🟒
21 Gemini 2.5 Pro 06-05 Google DeepMind 400.0B 80000.0B 18.9 - 86.4 🟒
22 Samba-1 SambaNova 1400.0B 20000.0B 17.6 - - 🟑
23 Inflection-2 Inflection AI 1200.0B 20000.0B 16.3 - - 🟒
24 Inflection-3 Productivity (3.0) Inflection AI 1200.0B 20000.0B 16.3 - - 🟒
25 Inflection-3 Pi (3.0) Inflection AI 1200.0B 20000.0B 16.3 - - 🟒
26 Inflection-2.5 Inflection AI 1200.0B 20000.0B 16.3 85.5 38.4 🟒
27 SpreadsheetLLM Microsoft 1760.0B 13000.0B 15.9 - - πŸ”΄
28 GPT-4 Classic (gpt-4-0314 & gpt-4-0613, non-Turbo) OpenAI 1760.0B 13000.0B 15.9 86.4 35.7 🟒
29 GPT-4 MathMix OpenAI 1760.0B 13000.0B 15.9 - - πŸ”΄
30 PanGu 5.0 Super Huawei 1000.0B 20000.0B 14.9 - - 🟑

πŸ“ˆ Dataset Statistics

  • Total Models: 679
  • Public Models: 551
  • Private Models: 102
  • Labs Tracked: 174

🏒 Top Labs by Model Count

πŸ—οΈ Architecture Distribution

  • Dense: 534 models
  • MoE: 131 models
  • MatFormer: 2 models
  • Hybrid: 2 models
  • Compound: 1 models
  • CoE: 1 models

πŸ“ About This Project

This leaderboard tracks LLM models with detailed specifications, benchmarks, and metadata. All data is organized in markdown for easy browsing and version control.

Benchmarks Explained

  • ALScore: Quick power rating based on parameters and training tokens
  • MMLU: Massive Multitask Language Understanding (general knowledge)
  • MMLU-Pro: Advanced version of MMLU
  • GPQA: Graduate-level Q&A benchmark
  • HLE: High-Level Evaluation score

Tags

  • SOTA: State-of-the-art performance
  • Reasoning: Advanced reasoning capabilities
  • Dense: Traditional dense architecture
  • MoE: Mixture of Experts architecture

πŸš€ Getting Started

Browse the Leaderboard

  1. Explore Top Models: Start with the top 30 models ranked by ALScore
  2. Find by Lab: Check out specific organizations in the Labs section
  3. Filter by Capability: Browse Reasoning Models or Architecture Types
  4. View Details: Click any model name to see detailed specifications and benchmarks

Search Options

Model Card Structure

Each model page includes:

  • πŸ“Š Overview and key capabilities
  • πŸ”§ Technical specifications (parameters, tokens, architecture)
  • πŸ“ˆ Performance benchmarks with intelligent analysis
  • 🏷️ Tags and categories
  • πŸ“ Detailed notes and key information
  • πŸ”— Links to official resources
  • πŸ” Related models and lab profiles

🀝 Contributing

We welcome contributions! Here's how you can help:

  1. Report Issues: Found incorrect data? Open an issue
  2. Suggest Models: Know a model we're missing? Let us know!
  3. Improve Documentation: Help keep information accurate and up-to-date
  4. Share Feedback: Ideas for better organization or new features?

See CONTRIBUTING.md for detailed guidelines.

πŸ“Š Data & Methodology

Data Collection

Data is aggregated from multiple sources:

  • Official model releases and announcements
  • Research papers and technical documentation
  • Public benchmarks and evaluation leaderboards
  • Community verification and contributions

Compute Optimal Training

Models are evaluated against the Chinchilla scaling law (optimal ratio β‰₯20:1 tokens-to-parameters):

  • βœ… Compute-optimal: Adequately trained with sufficient data (β‰₯20:1)
  • ⚠️ Under-trained: May benefit from additional training (<20:1)

Benchmark Scoring

  • ALScore Formula: √(Parameters Γ— Tokens) Γ· 300
    • β‰₯20: Extremely powerful
    • 10-20: Very powerful
    • 5-10: Powerful
    • 1-5: Mid-tier
    • <1: Lightweight

Note: Benchmark scores are from publicly available sources. Some values may be estimates or unavailable (-).

πŸ“œ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • All AI research organizations for their contributions to open research
  • The AI community for sharing benchmarks and insights
  • Contributors who help keep this resource accurate and up-to-date

Last Updated: 2025-10-02 | Maintained by: Community Contributors

⭐ Star this repo to stay updated with the latest LLM developments!

About

A comprehensive, community-driven tracking system for Large Language Models with detailed specifications, benchmarks, and analysis.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published