-
Notifications
You must be signed in to change notification settings - Fork 384
[Feature] Migrate CQL from D4RLExperienceReplay to MinariExperienceReplay + fix W&B logging and SLURM usage #3035
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3035
Note: Links to docs will display an error until the docs builds have been completed. This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@@ -181,13 +182,13 @@ def make_replay_buffer( | |||
|
|||
|
|||
def make_offline_replay_buffer(rb_cfg): | |||
data = D4RLExperienceReplay( | |||
data = MinariExperienceReplay( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced D4RLExperienceReplay with MinariExperienceReplay following official deprecation of D4RL datasets.
See: https://github.com/Farama-Foundation/d4rl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy with these changes, do you have a run log to share?
Yes, you can find the run here: It includes a run of the updated offline CQL sota-implementation using MinariExperienceReplay on mujoco/hopper/expert-v0 dataset. |
I cannot see it unfortunately |
MinariExperienceReplay Report.pdf You can see the logs from W&B in this pdf |
This is a comparison between the training logs using Minari and the D4RL Replay Buffer. Although they generally display similar behavior, D4RL shows higher noise in the Q-function loss during training. D4RL v Minari Report.pdf While the overall training behavior remains consistent, the increased Q-function noise with D4RL suggests Minari may provide slightly more stable learning curves |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sota-implementations runs seem to be failing because of this (I also fixed a minor issue with redq so I rebased your branch on main to make it look cleaner)
Can you have a look?
4fa5618
to
9eba388
Compare
@vmoens I can see sota-checks were succesful: https://github.com/pytorch/rl/actions/runs/16122638416/workflow However, can you clarify what you mean by sota-runs implementations failed? Is this referring to a different workflow or step beyond the sota-check? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok let's merge then!
Description
This PR updates the offline CQL sota-implementation by migrating from the deprecated D4RLExperienceReplay to the new MinariExperienceReplay, following the recent deprecation notice from the D4RL maintainers. It also includes fixes and improvements for tooling and script robustness.
Motivation and Context
D4RL’s official repositories have announced that:
To stay aligned with the evolving ecosystem and maintain long-term compatibility, this PR updates the offline dataset interface to use MinariExperienceReplay.
Additionally, it fixes minor bugs affecting W&B logging and SLURM job script behavior, while also ensuring full compliance with the repo’s linting and formatting standards.
No functional changes have been introduced beyond the mentioned components.
Fixes #3034
Types of changes
What types of changes does your code introduce? Remove all that do not apply:
Checklist
Go over all the following points, and put an
x
in all the boxes that apply.If you are unsure about any of these, don't hesitate to ask. We are here to help!