We introduce DeepWism® R2(Research&Report), a revolutionary next-generation AI system based on the agents framework - a Thin-Thick-Thin Crowd Entropy Dynamics System (T3CEDS) that establishes entropy reduction as the fundamental mechanism underlying crowd intelligence. This represents a paradigm shift in AI research, moving from traditional attention-based mechanisms to entropy management as the core design principle. DeepWism® R2 innovative architecture consists of three distinct yet interconnected layers:
Thin Perception Layer: Efficiently captures high-dimensional inputs while preserving essential information and reducing input entropy
Thick Processing Layer: Leverages crowd intelligence mechanisms to actively reduce entropy through structured reasoning, collaborative processing, and deep ranking
Thin Decision Layer: Distills complex representations into coherent outputs, further reducing entropy to facilitate clear, actionable decisions
DeepWism® R2 achieves 27.5% accuracy on Humanity's Last Exam (HLE), surpassing OpenAI DeepResearch's 26.6% and establishing new state-of-the-art performance. In addition, it leads on xbench-ScienceQA with 70.0% and xbench-DeepSearch with 64.0%, outperforming all other models.
These results demonstrate superior capabilities in:
Complex Problem-Solving: Enhanced reasoning through entropy reduction mechanisms
Uncertainty Management: Effective handling of high-entropy scenarios
Multi-Domain Generalization: Robust performance across diverse problem domains including science, deep retrieval, and logic
AI Explainability: Transparent decision-making processes through entropy dynamics
Revolutionary Architecture: Thin-Thick-Thin Crowd Entropy Dynamics System
-
Entropy-Centric Design: Unlike traditional attention-based models, DeepWism® R2 is fundamentally designed around entropy reduction principles. This paradigm shift enables more effective handling of complex, uncertain problem spaces that challenge conventional AI systems.
-
Crowd Intelligence Integration: The thick processing layer implements sophisticated crowd intelligence mechanisms that leverage collective reasoning patterns to systematically reduce entropy through collaborative processing and structured analysis.
-
T3CEDS Framework: The three-layer architecture optimizes information flow from high-entropy inputs to low-entropy, actionable outputs, ensuring maximum coherence and decision clarity at each stage.
Performance Excellence: State-of-the-Art Results
-
Humanity's Last Exam Leadership: Achieving 27.5% accuracy on HLE, DeepWism® R2 sets new benchmarks in complex reasoning tasks, demonstrating superior performance over leading models including OpenAI DeepResearch, Gemini2.5, and Claude 3.7 Sonnet.
-
Multi-Benchmark Dominance: Consistent excellence across diverse evaluation metrics including AIME2025 (93.3%), GPQA-Diamond (88.0%), MMLU (86.0%), and SWE-bench Verified (72.0%), showcasing robust generalization capabilities.
-
Open Research Commitment: DeepWism® R2 framework will be fully open-sourced with global API access for authenticated users, fostering collaborative advancement in entropy-based AI research.
Comprehensive Benchmark Performance DeepWism® R2 demonstrates exceptional performance across multiple challenging benchmarks, establishing new state-of-the-art results in complex reasoning and problem-solving tasks.
Model | xbench-ScienceQA | xbench-DeepSearch | HLE | AIME2025 | GPQA-Diamond | MMLU | SWE-bench Verified |
---|---|---|---|---|---|---|---|
DeepWism® R2 | 70.0 | 64.0 | 27.5 | 93.3 | 88.0 | 86.0 | 72.0 |
o3-high | 60.8 | 65.0 | 20.3 | – | – | – | – |
o3-pro | 59.6 | – | – | – | – | – | – |
Doubao-Seed-1.6 | 56.6 | 50.0 | – | – | – | – | – |
OpenAI DeepResearch | – | – | 26.6 | – | – | – | – |
Gemini 2.5 Pro | 59.4 | 50.0 | 21.6 | 88.0 | 86.4 | 84.5 | 67.2 |
o4 mini | 50.4 | 60.0 | 18.1 | 92.7 | 81.4 | 82.0 | 68.1 |
Claude 4 Opus | – | – | 10.7 | 90.0 | 83.3 | 80.7 | 79.4 |
DeepSeek-R1-0528 | 54.6 | – | 14.0 | 87.5 | 81.0 | 84.0 | 57.6 |
- 🏆 Humanity's Last Exam (HLE): 27.5% accuracy, surpassing all existing models including OpenAI DeepResearch
- 🧪 xbench-ScienceQA: 70.0% accuracy, leading all models on complex scientific question answering
- 🔍 xbench-DeepSearch: 64.0% accuracy, demonstrating top-tier deep retrieval and reasoning capabilities
- 📊 AIME2025: 93.3% accuracy, showcasing exceptional mathematical reasoning capabilities
- 🔬 GPQA-Diamond: 88.0% accuracy, indicating superior performance in graduate-level science questions
- 📚 MMLU: 86.0% accuracy, reflecting strong multi-domain knowledge understanding
- 💻 SWE-bench Verified: 72.0% accuracy, proving highly effective software engineering problem-solving skills
Experience DeepWism® R2's revolutionary capabilities through our interactive platforms:
🌐 Chat Interface: i.deepwism.com
- Real-time interaction with DeepWism® R2
- Entropy visualization in reasoning processes
- Multi-domain problem-solving capabilities
For questions, collaborations, or support ,please contact us at: r2@deepwism.com
Website: www.deepwism.com
GitHub Issues: Report bugs or request features Twitter: @DeepWism
Advancing Next Generation AI through Entropy Reduction and Crowd Intelligence
DeepWism® AI © 2025