Overview

Super simplified (and custom) evaluation protocol. The results that you obtain with this repo are not comparable to those that you would obtain with the original repo (I took only inspiration). I mainly use this repo for personal evaluation.

Limited use: not for commercial applications.

Single-answer

MATH-500
AIME 2025
LIMO

Multi-choice

GPQA

Install

pip install -e .

and for a simple test (after having set all the os env variables needed for the provider of your choice).

python example.py

Inference Settings

For all the experiments that require intensive reasoning, I have used:

max_completion_tokens=38912

Qwen3 & Qwen2.5 (Based on HF suggestions for Qwen3)

temperature=0.6
top_p=0.95
top_k=20

Gemma3 (Based on unsloth suggestions)

temperature=1.0
top_p=0.95
top_k=64

Mistral Small 3.2

temperature=0.15

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
src/simpler_simple_evals		src/simpler_simple_evals
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
example.py		example.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Overview

Single-answer

Multi-choice

Install

Inference Settings

Qwen3 & Qwen2.5 (Based on HF suggestions for Qwen3)

Gemma3 (Based on unsloth suggestions)

Mistral Small 3.2

About

Uh oh!

Releases 1

Packages

Languages

License

fedric95/simpler-simple-evals

Folders and files

Latest commit

History

Repository files navigation

Overview

Single-answer

Multi-choice

Install

Inference Settings

Qwen3 & Qwen2.5 (Based on HF suggestions for Qwen3)

Gemma3 (Based on unsloth suggestions)

Mistral Small 3.2

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages