Great job, questions about the results #4

yanghu819 · 2025-02-24T05:45:26Z

I run

python train.py --digit --fix_src --dataset gsm8k --steps 120000 --weights_path /huyang/r1/diffusion-of-thoughts/plaid1b_weights/

python evaluation_batch.py --weights_path outputs/gsm8k-bs16-fix_src-digit-steps120000 --fix_src --digit --dataset gsm8k --score_temp 0.5

the result is
[2025-02-24 13:14:58,570] total: 1319, corr: 68, acc: 0.05155420773313116
[2025-02-24 13:14:58,570] time: 315.3894371986389s
[2025-02-24 13:14:58,571] Mean: 0.05155420773313116, Std: 0.0

Am I doing right? Thank you so much for checking the issue

yanghu819 · 2025-02-27T11:16:26Z

I find acc: 0.05 is due to my imcomplete training data, after using the right gsm8k, the result is a lot better, but still have some issues.

the train and eval code are as:
python train.py --digit --fix_src --dataset gsm8k --steps 120000 --weights_path /huyang/r1/diffusion-of-thoughts/plaid1b_weights/

python evaluation_batch.py --weights_path outputs/gsm8k-bs16-fix_src-digit-steps120000 --fix_src --digit --dataset gsm8k --score_temp 0.5

the final result is acc: 0.19863532979529946.
It can't achieve the paper result 32.6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Great job, questions about the results #4

Great job, questions about the results #4

yanghu819 commented Feb 24, 2025

yanghu819 commented Feb 27, 2025

Uh oh!

Great job, questions about the results #4

Great job, questions about the results #4

Comments

yanghu819 commented Feb 24, 2025

yanghu819 commented Feb 27, 2025

Uh oh!