Skip to content

[Feature] Migrate CQL from D4RLExperienceReplay to MinariExperienceReplay + fix W&B logging and SLURM usage #3035

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jul 8, 2025

Conversation

Ibinarriaga8
Copy link
Contributor

@Ibinarriaga8 Ibinarriaga8 commented Jul 4, 2025

Description

This PR updates the offline CQL sota-implementation by migrating from the deprecated D4RLExperienceReplay to the new MinariExperienceReplay, following the recent deprecation notice from the D4RL maintainers. It also includes fixes and improvements for tooling and script robustness.

Motivation and Context

D4RL’s official repositories have announced that:

All offline datasets in D4RL have been moved to Minari, and all online environments have been transferred to Gymnasium, MiniGrid, and Gymnasium-Robotics.

To stay aligned with the evolving ecosystem and maintain long-term compatibility, this PR updates the offline dataset interface to use MinariExperienceReplay.

Additionally, it fixes minor bugs affecting W&B logging and SLURM job script behavior, while also ensuring full compliance with the repo’s linting and formatting standards.

No functional changes have been introduced beyond the mentioned components.

Fixes #3034

  • I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

What types of changes does your code introduce? Remove all that do not apply:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds core functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (update in the documentation)
  • Example (update in the folder of examples)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

  • I have read the CONTRIBUTION guide (required)
  • My change requires a change to the documentation.
  • I have updated the tests accordingly (required for a bug fix or a new feature).
  • I have updated the documentation accordingly.

Copy link

pytorch-bot bot commented Jul 4, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3035

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 4, 2025
@Ibinarriaga8 Ibinarriaga8 changed the title [Offline RL] Migrate CQL from D4RLExperienceReplay to MinariExperienceReplay + fix W&B logging and SLURM usage [Feature] Migrate CQL from D4RLExperienceReplay to MinariExperienceReplay + fix W&B logging and SLURM usage Jul 4, 2025
@@ -181,13 +182,13 @@ def make_replay_buffer(


def make_offline_replay_buffer(rb_cfg):
data = D4RLExperienceReplay(
data = MinariExperienceReplay(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced D4RLExperienceReplay with MinariExperienceReplay following official deprecation of D4RL datasets.
See: https://github.com/Farama-Foundation/d4rl

Copy link
Collaborator

@vmoens vmoens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy with these changes, do you have a run log to share?

@Ibinarriaga8
Copy link
Contributor Author

Happy with these changes, do you have a run log to share?

Yes, you can find the run here:
https://wandb.ai/202206789-universidad-pontificia-comillas/torchrl_example_cql/workspace?nw=nwuser202206789

It includes a run of the updated offline CQL sota-implementation using MinariExperienceReplay on mujoco/hopper/expert-v0 dataset.

@vmoens
Copy link
Collaborator

vmoens commented Jul 4, 2025

I cannot see it unfortunately

@Ibinarriaga8
Copy link
Contributor Author

MinariExperienceReplay Report.pdf

You can see the logs from W&B in this pdf

@Ibinarriaga8
Copy link
Contributor Author

Ibinarriaga8 commented Jul 7, 2025

This is a comparison between the training logs using Minari and the D4RL Replay Buffer. Although they generally display similar behavior, D4RL shows higher noise in the Q-function loss during training.

D4RL v Minari Report.pdf
W&B Report

While the overall training behavior remains consistent, the increased Q-function noise with D4RL suggests Minari may provide slightly more stable learning curves

Copy link
Collaborator

@vmoens vmoens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sota-implementations runs seem to be failing because of this (I also fixed a minor issue with redq so I rebased your branch on main to make it look cleaner)
Can you have a look?

@vmoens vmoens force-pushed the feature/minari-replay branch from 4fa5618 to 9eba388 Compare July 7, 2025 16:34
@Ibinarriaga8
Copy link
Contributor Author

Ibinarriaga8 commented Jul 8, 2025

@vmoens I can see sota-checks were succesful: https://github.com/pytorch/rl/actions/runs/16122638416/workflow

However, can you clarify what you mean by sota-runs implementations failed? Is this referring to a different workflow or step beyond the sota-check?

@vmoens vmoens added the enhancement New feature or request label Jul 8, 2025
Copy link
Collaborator

@vmoens vmoens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok let's merge then!

@vmoens vmoens merged commit 3cf1df0 into pytorch:main Jul 8, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Migrate D4RLExperienceReplay to MinariExperienceReplay following official deprecation
3 participants