Dear authors,
In the issue #11 you has explained that "Once the sampling-feedback-training cycle is completed, we use the current model as the reference model for the next round of training, conducting a total of N rounds". However, int the code, the ref model is fixed during the all training process, rather than update for each training epoch, this is where I am confused,looking forward to your reply!
sincerely.