Skip to content

Hi, I ran the test and found that many models can only answer 320 questions correctly. Why is that? Why is it 320, and if I enter 330, except for the initial "conetext windows", all subsequent ones fail, #1

@red-co

Description

@red-co

I tested gemini2.0-flash, gemini2.5-flash, gemma3-27b, qwen3_235b-a22b,
among which,2.0-flash, 2.5-flash, qwen235b-a22b, are all exactly 320.

This is my prompt

Here are n five-digit additions in the form of
Qn. xn+yn,

You need to answer in the form of
An. {anwser}
, do not group,

example:
`
A1. 79281
A2. 138779
A3. 139180
...
`

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions