Question about run_fast_rloo_7b.sh & run_rloo_7b.sh

Nice job! I have 2 questions.

I want to ask if I can directly run your **run_fast_rloo_7b.sh** to achieve the effects in your paper.  It seems that I run an epoch need 30hours.(device: 4 *  NVIDIA H100 80GB)

Does Qwen/Qwen2.5-Math-7B need to be modified to **use_chat_template=False** as stated in the README? Because when I run the **run_fast_rloo_7b.sh** script here, some weird predictions occure as follows.

![Image](https://github.com/user-attachments/assets/94867673-a7fd-4579-b242-b1773e942607)

any advice?

**My env:**
Package                                  Version                   Editable project location
---------------------------------------- ------------------------- ------------------------------------------------------------------
absl-py                                  2.3.0
accelerate                               1.7.0
docstring_parser                         0.16
einops                                   0.8.1
email_validator                          2.2.0
exceptiongroup                           1.3.0
executing                                2.2.0
huggingface-hub                          0.32.4
hupper                                   1.12.1
hydra-core                               1.3.2
kiwisolver                               1.4.8
lark                                     1.2.2
liger_kernel                             0.5.10
lighteval                                0.8.0                     
pbkdf2                                   1.3
peft                                     0.15.2
pexpect                                  4.9.0
pillow                                   11.1.0
pytest                                   8.3.5
python-dateutil                          2.9.0.post0
python-dotenv                            1.1.0
python-json-logger                       3.3.0
python-multipart                         0.0.20
python3-openid                           3.2.0
pytz                                     2025.2
pyzmq                                    26.4.0
qwen-vl-utils                            0.0.11
tokenizers                               0.21.1
torch                                    2.6.0
torch_memory_saver                       0.0.6
torchao                                  0.11.0
torchaudio                               2.6.0
torchdata                                0.11.0
torchvision                              0.21.0
tqdm                                     4.67.1
traitlets                                5.14.3
transaction                              5.0
transformers                             4.52.4
translationstring                        1.4
triton                                   3.2.0
venusian                                 3.1.1
verl                                     0.3.1.dev0                
virtualenv                               20.31.2
vllm                                     0.8.5
wandb                                    0.20.1




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question about run_fast_rloo_7b.sh & run_rloo_7b.sh #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about run_fast_rloo_7b.sh & run_rloo_7b.sh #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions