Finetune algorithms log only train regret

All of the algorithms with offline-to-online finetuning log training regret (regret obtained by online interactions which are used for training) under both `train/regret` and `eval/regret`. So we report only train regret which is different from Cal-QL work where authors report eval regret. Reporting eval regret is strange because the thing we really want to minimize on practice is a train regret so this bug is not critical but should be kept in mind. I will fix it but without reruning all of the algorithms due to compute limitations (maybe later we will rerun it).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Finetune algorithms log only train regret #76

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Finetune algorithms log only train regret #76

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions