Regarding the validation and test set split of datasets like MultiArith/SVAMP

Hello, I see in your paper that for datasets like MultiArith/SVAMP, you randomly sampled 500 data points to serve as a validation set, with the rest as the test set. Have you made this split validation and test set public? Or the corresponding index files? I only found the val_index.npy for gsm8k, and it only sampled 200 data points from the training set, which is not quite consistent with what you mentioned in the paper about "sampling 500 data points from the test set to serve as the validation set"?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Regarding the validation and test set split of datasets like MultiArith/SVAMP #3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Regarding the validation and test set split of datasets like MultiArith/SVAMP #3

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions