-
Notifications
You must be signed in to change notification settings - Fork 50
Open
Description
我使用如下的配置进行生成:
python3 -m verl.trainer.main_generation
trainer.nnodes=1
trainer.n_gpus_per_node=4
data.path=./processed_data/${DATA_TYPE}.parquet
data.output_path=${OUTPUT_DIR}/${DATA_TYPE}.json
data.n_samples=12
data.batch_size=1024
model.path=${MODEL_PATH}
rollout.temperature=0.6
rollout.response_length=16384
rollout.top_k=-1
rollout.top_p=0.95
rollout.gpu_memory_utilization=0.85
rollout.tensor_model_parallel_size=1
+data.skip_format_reward=True
数据集是math train和aime的混合,发现pass@1只有0.04,查看生成的数据发现responses乱序了(包含了很多其他问题的response),请问作者这是哪里出了问题呢?如能提供一些debug的insight,感激不尽!
Metadata
Metadata
Assignees
Labels
No labels