Skip to content

Upgrade gymnasium to 1.0.0 #502

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 10 commits into
base: master
Choose a base branch
from
Draft

Upgrade gymnasium to 1.0.0 #502

wants to merge 10 commits into from

Conversation

sdpkjc
Copy link
Collaborator

@sdpkjc sdpkjc commented Mar 2, 2025

Description

Upgrade gymnasium to 1.0.0

  • gymnasium classic control

    • c51.py
    • c51_jax.py
    • dqn.py
    • dqn_jax.py
    • ppo.py
    • pqn.py
  • gymnasium mujoco

    • ddpg_continuous_action.py
    • ddpg_continuous_action_jax.py
    • td3_continuous_action.py
    • td3_continuous_action_jax.py
    • sac_continuous_action.py
    • ppo_continuous_action.py
    • rpo_continuous_action.py
  • gymnasium atari (EpisodicLifeEnv conflicts with gymnasium v1.0.0's RecordEpisodeStatistics and will be fixed later.)

    • c51_atari.py
    • c51_atari_jax.py
    • dqn_atari.py
    • dqn_atari_jax.py
    • qdagger_dqn_atari_impalacnn.py
    • qdagger_dqn_atari_jax_impalacnn.py
    • sac_atari.py
    • ppo_atari.py
    • ppo_atari_lstm.py
    • ppo_atari_multigpu.py
  • envpool

    • ppo_rnd_envpool.py
    • pqn_atari_envpool_lstm.py
    • pqn_atari_envpool.py
    • ppo_atari_envpool.py
    • ppo_atari_envpool_xla_jax.py
    • ppo_atari_envpool_xla_jax_scan.py
  • other

    • ppg_procgen.py
    • ppo_pettingzoo_ma_atari.py
    • ppo_procgen.py
    • ppo_trxl.py
    • ppo_continuous_action_isaacgym.py

Types of changes

  • Bug fix
  • New feature
  • New algorithm
  • Documentation

Checklist:

  • I've read the CONTRIBUTION guide (required).
  • I have ensured pre-commit run --all-files passes (required).
  • I have updated the tests accordingly (if applicable).
  • I have updated the documentation and previewed the changes via mkdocs serve.
    • I have explained note-worthy implementation details.
    • I have explained the logged metrics.
    • I have added links to the original paper and related papers.

If you need to run benchmark experiments for a performance-impacting changes:

  • I have contacted @vwxyzjn to obtain access to the openrlbenchmark W&B team.
  • I have used the benchmark utility to submit the tracked experiments to the openrlbenchmark/cleanrl W&B project, optionally with --capture_video.
  • I have performed RLops with python -m openrlbenchmark.rlops.
    • For new feature or bug fix:
      • I have used the RLops utility to understand the performance impact of the changes and confirmed there is no regression.
    • For new algorithm:
      • I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
    • I have added the learning curves generated by the python -m openrlbenchmark.rlops utility to the documentation.
    • I have added links to the tracked experiments in W&B, generated by python -m openrlbenchmark.rlops ....your_args... --report, to the documentation.

Copy link

vercel bot commented Mar 2, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
cleanrl ✅ Ready (Inspect) Visit Preview 💬 Add feedback Mar 4, 2025 9:24am

Copy link
Collaborator

@pseudo-rnd-thoughts pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for starting this @sdpkjc

Ale-py should be updated to v0.10.1

And the auto reset mode of the vector environment should be updated

pyproject.toml Outdated
stable-baselines3 = "2.0.0"
gymnasium = ">=0.28.1"
stable-baselines3 = ">=2.4.0"
gymnasium = ">=1.0.0"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be specified as v1.1.0, if sb3 is the limitation then I think see if I can update it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sb3 depends on <=1.0.0.

@pseudo-rnd-thoughts
Copy link
Collaborator

pseudo-rnd-thoughts commented Mar 3, 2025

If I remember correctly, SB3 is used for the replay buffer and the atari wrappers. IMO, those features can probably be shifted in cleanrl_utils directly however this should happen in a separate later PR

@sdpkjc sdpkjc changed the title Upgrade gymnasium to 1.1.0 Upgrade gymnasium to 1.0.0 Mar 4, 2025


# Only for gymnasium v1.0.0
class SameModelSyncVectorEnv(gym.vector.SyncVectorEnv):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be called SameStepModeSyncVectorEnv or we just shift to gymnasium v1.1.0

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -12,13 +12,13 @@ license="MIT"
readme = "README.md"

[tool.poetry.dependencies]
python = ">=3.8,<3.11"
python = ">=3.9,<3.11"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is limiting us increasing this?

tensorboard = "^2.10.0"
wandb = "^0.13.11"
gym = "0.23.1"
torch = ">=1.12.1"
stable-baselines3 = "2.0.0"
gymnasium = ">=0.28.1"
stable-baselines3 = "^2.4.0"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect we will have a new release of sb3 with support for gymnasium v1.1.0 as no changes seem to be required on their end (DLR-RM/stable-baselines3#2095)

@MarcusBinderDTU
Copy link

Hey, with the updates in gymnasium 1.1 would it not be easier to simply use the 'Same-Step Mode' or am I missing something?

Does it have to do with the support for the other wrappers that are only supported in the 'Next step' mode?

@pseudo-rnd-thoughts
Copy link
Collaborator

@MarcusBinderDTU it is more about minimising implementation changes.
The old implementation used same step, therefore, as this is a module update, we are trying to minimise code changes.
Moving to next step (which for the PPO implementations could be beneficial) would be a separate or

@MarcusBinderDTU
Copy link

@MarcusBinderDTU it is more about minimising implementation changes. The old implementation used same step, therefore, as this is a module update, we are trying to minimise code changes. Moving to next step (which for the PPO implementations could be beneficial) would be a separate or

Thanks for the fast reply!

I agree, but I dont understand why not going directly to gymnasium 1.1 and then using autoreset_mode=gym.vector.AutoresetMode.SAME_STEP for the environment creation?

Would that not be the easiest way of doing it?

@sdpkjc
Copy link
Collaborator Author

sdpkjc commented Mar 20, 2025

Thanks for your suggestion. However, since the currently released version of sb3 depends on gymnasium < 1.1, we can’t upgrade to 1.1 directly. Once #505 is merged, we'll remove the sb3 dependency and then update to gymnasium 1.1, which will allow us to use autoreset_mode=gym.vector.AutoresetMode.SAME_STEP for environment creation. So this PR is on hold until #505 goes in.

@MarcusBinderDTU
Copy link

Ahh, now I see! Thanks for clarifying it, that makes sense :)

Copy link

@jugheadjones10 jugheadjones10 May 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sdpkjc
For this line:
real_next_obs[idx] = infos["final_observation"][idx]
I think "final_observation" should be "final_obs".
Tried running and "final_observation" gives a key error but not "final_obs".

@jugheadjones10
Copy link

@sdpkjc I noticed that the scripts in cleanrl_utils/evals also still use the old way of getting episodic returns:

if "final_info" in infos:
    for info in infos["final_info"]:
        if "episode" not in info:

So we might need to update all the files in this directory too!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants