GitHub - DeepWism/DeepWism-R2: DeepWism R2 is a next-generation AGI system built on the T3CEDS framework (Thin-Thick-Thin Crowd Entropy Dynamics System), which redefines intelligence as a process of entropy reduction rather than attention modeling.

1. Introduction

We introduce DeepWism® R2(Research&Report), a revolutionary next-generation AI system based on the agents framework - a Thin-Thick-Thin Crowd Entropy Dynamics System (T3CEDS) that establishes entropy reduction as the fundamental mechanism underlying crowd intelligence. This represents a paradigm shift in AI research, moving from traditional attention-based mechanisms to entropy management as the core design principle. DeepWism® R2 innovative architecture consists of three distinct yet interconnected layers:

Thin Perception Layer: Efficiently captures high-dimensional inputs while preserving essential information and reducing input entropy

Thick Processing Layer: Leverages crowd intelligence mechanisms to actively reduce entropy through structured reasoning, collaborative processing, and deep ranking

Thin Decision Layer: Distills complex representations into coherent outputs, further reducing entropy to facilitate clear, actionable decisions

DeepWism® R2 achieves 27.5% accuracy on Humanity's Last Exam (HLE), surpassing OpenAI DeepResearch's 26.6% and establishing new state-of-the-art performance. In addition, it leads on xbench-ScienceQA with 70.0% and xbench-DeepSearch with 64.0%, outperforming all other models.

These results demonstrate superior capabilities in:

Complex Problem-Solving: Enhanced reasoning through entropy reduction mechanisms

Uncertainty Management: Effective handling of high-entropy scenarios

Multi-Domain Generalization: Robust performance across diverse problem domains including science, deep retrieval, and logic

AI Explainability: Transparent decision-making processes through entropy dynamics

2. Summary

Revolutionary Architecture: Thin-Thick-Thin Crowd Entropy Dynamics System

Entropy-Centric Design: Unlike traditional attention-based models, DeepWism® R2 is fundamentally designed around entropy reduction principles. This paradigm shift enables more effective handling of complex, uncertain problem spaces that challenge conventional AI systems.
Crowd Intelligence Integration: The thick processing layer implements sophisticated crowd intelligence mechanisms that leverage collective reasoning patterns to systematically reduce entropy through collaborative processing and structured analysis.
T3CEDS Framework: The three-layer architecture optimizes information flow from high-entropy inputs to low-entropy, actionable outputs, ensuring maximum coherence and decision clarity at each stage.

Performance Excellence: State-of-the-Art Results

Humanity's Last Exam Leadership: Achieving 27.5% accuracy on HLE, DeepWism® R2 sets new benchmarks in complex reasoning tasks, demonstrating superior performance over leading models including OpenAI DeepResearch, Gemini2.5, and Claude 3.7 Sonnet.
Multi-Benchmark Dominance: Consistent excellence across diverse evaluation metrics including AIME2025 (93.3%), GPQA-Diamond (88.0%), MMLU (86.0%), and SWE-bench Verified (72.0%), showcasing robust generalization capabilities.
Open Research Commitment: DeepWism® R2 framework will be fully open-sourced with global API access for authenticated users, fostering collaborative advancement in entropy-based AI research.

3. Evaluation Results

Comprehensive Benchmark Performance DeepWism® R2 demonstrates exceptional performance across multiple challenging benchmarks, establishing new state-of-the-art results in complex reasoning and problem-solving tasks.

Model	xbench-ScienceQA	xbench-DeepSearch	HLE	AIME2025	GPQA-Diamond	MMLU	SWE-bench Verified
DeepWism® R2	70.0	64.0	27.5	93.3	88.0	86.0	72.0
o3-high	60.8	65.0	20.3	–	–	–	–
o3-pro	59.6	–	–	–	–	–	–
Doubao-Seed-1.6	56.6	50.0	–	–	–	–	–
OpenAI DeepResearch	–	–	26.6	–	–	–	–
Gemini 2.5 Pro	59.4	50.0	21.6	88.0	86.4	84.5	67.2
o4 mini	50.4	60.0	18.1	92.7	81.4	82.0	68.1
Claude 4 Opus	–	–	10.7	90.0	83.3	80.7	79.4
DeepSeek-R1-0528	54.6	–	14.0	87.5	81.0	84.0	57.6

🔑 Key Performance Highlights

🏆 Humanity's Last Exam (HLE): 27.5% accuracy, surpassing all existing models including OpenAI DeepResearch
🧪 xbench-ScienceQA: 70.0% accuracy, leading all models on complex scientific question answering
🔍 xbench-DeepSearch: 64.0% accuracy, demonstrating top-tier deep retrieval and reasoning capabilities
📊 AIME2025: 93.3% accuracy, showcasing exceptional mathematical reasoning capabilities
🔬 GPQA-Diamond: 88.0% accuracy, indicating superior performance in graduate-level science questions
📚 MMLU: 86.0% accuracy, reflecting strong multi-domain knowledge understanding
💻 SWE-bench Verified: 72.0% accuracy, proving highly effective software engineering problem-solving skills

4. Chat Website

Experience DeepWism® R2's revolutionary capabilities through our interactive platforms:

🌐 Chat Interface: i.deepwism.com

Real-time interaction with DeepWism® R2
Entropy visualization in reasoning processes
Multi-domain problem-solving capabilities

5. Contact

For questions, collaborations, or support ，please contact us at: r2@deepwism.com

Website: www.deepwism.com

GitHub Issues: Report bugs or request features Twitter: @DeepWism

Advancing Next Generation AI through Entropy Reduction and Crowd Intelligence

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

1. Introduction

2. Summary

3. Evaluation Results

🔑 Key Performance Highlights

4. Chat Website

5. Contact

About

Uh oh!

Releases

Packages

DeepWism/DeepWism-R2

Folders and files

Latest commit

History

Repository files navigation

1. Introduction

2. Summary

3. Evaluation Results

🔑 Key Performance Highlights

4. Chat Website

5. Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages