Reproduction Inquiry: DeepSeek-V3.2-Exp AIME2025 Result

Hi DeepSeek Team,

First of all, thank you for open-sourcing **DeepSeek-V3.2-Exp** and sharing the impressive benchmark results.  
Our team has been working to reproduce the results reported in your official release, but we’ve encountered some discrepancies that we’d like to understand better.

---

### Reproduction Setup

- **Inference:** `SGLang v0.5.4.post1` on H200 * 8
- **Model weights:** from the official [Hugging Face repository](https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp) (`commit 9d2f599`, `main` branch)  
- **Sampling parameters:**  
  ```
  temperature = 0.6
  top_p = 0.95
  max_tokens = 32768
  k = 64  # for pass@1 computation, following DeepSeek-R1 paper
  ```
- **Datasets:**
  - [opencompass/AIME2025](https://huggingface.co/datasets/opencompass/AIME2025)
- **Prompt format:**
  ```text
  Solve the following math problem step by step. The last line of your response should be of the form Answer: $ANSWER (without quotes) where $ANSWER is the answer to the problem.

  {question}

  Remember to put your answer on its own line after "Answer:", and you do not need to use a \boxed command.
  ```

---

### Our Reproduction Result

| Mode | AIME2025 Pass@1 (report → ours) |
|------|--------------------------|
| Thinking | 89.3 → 64.9 |

We have double-checked dataset integrity, prompts, and sampling settings, but still observed a noticeable performance gap.

---

### Questions

Could you please share a bit more about your evaluation configuration? For example:
- The exact inference environment or framework version used  
- Whether any custom decoding, filtering, or re-ranking steps were applied  
- Any additional prompt preprocessing or formatting details  

Any clarification would be greatly appreciated!  
Thanks again for releasing such a high-quality model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reproduction Inquiry: DeepSeek-V3.2-Exp AIME2025 Result #44

Reproduction Setup

Our Reproduction Result

Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reproduction Inquiry: DeepSeek-V3.2-Exp AIME2025 Result #44

Description

Reproduction Setup

Our Reproduction Result

Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions