-
Notifications
You must be signed in to change notification settings - Fork 25
How to evaluate BLEU score on LM1B? #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi, We computed the BLEU score with all test data as references and reported the average BLEU score of each generated sentence. We sampled 1K sentences respectively for evaluating BLEU and S-BLEU. |
@Hzfinfdu Thanks for the great work! |
@yujianll Hi,
|
@Hzfinfdu Thanks for the reply! |
@yujianll Hi, We trained DiffusionBERT with 512 steps and used DDIM sampling to uniformly sample 128 steps on test set, both for NLL calculation and generation. Hope this helps! |
Thanks, this helps a lot! |
Dear authors,
I understand that you plan to release your code on January. But could you share more details regarding how you evaluate the BLEU score and PPL on the LM1B dataset? I am also working on Diffusion Model for text and may potentially cite your paper. Thanks!
The text was updated successfully, but these errors were encountered: