Skip to content

Official code for NeurIPS2024 (poster): "A2PO: Towards Effective Offline Reinforcement Learning from an Advantage-aware Perspective" in gym tasks.

Notifications You must be signed in to change notification settings

Plankson/A2PO_gym

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A2PO: Towards Effective Offline Reinforcement Learning from an Advantage-aware Perspective

Official codebase for the paper A2PO: Towards Effective Offline Reinforcement Learning from an Advantage-aware Perspective.

Overview

**TLDR: ** This paper presents Advantage-Aware Policy Optimization (A2PO), a novel method designed to improve offline reinforcement learning (RL) performance on datasets with varied quality. A2PO addresses the constraint conflicts common in mixed-quality offline datasets by implementing an advantage-aware policy that uses conditional variational autoencoders (CVAE) to disentangle diverse behavior policies. By conditioning on advantage values, A2PO selectively optimizes policies toward high-value actions, surpassing conventional advantage-weighted methods in benchmark evaluations. Experimental results demonstrate A2PO's superior effectiveness in achieving optimal policies across single- and mixed-quality datasets in comparison with existing offline RL baselines.

Installation

Please execute the following command:

conda env create -f environment.yml
conda activate offrl

Datasets

Our experiments are performed on D4RL datasets. Meanwhile, we have runed our experiments on more diverse datasets constructed on the original D4RL trajectories. These newly constructed datasets can be found at this link.

Running experiments

The paper results can be reproduced by :

python main.py --env=<env_name> --seed=<seed_id>

If want to see influence of different components, the command can be extent as bellow:

python main.py --env=<env_name> --use_discrete=<Bool> --epsilon=<epsilon> --vae_step=<vae step> --seed=<seed_id>

Citation

If you find this work useful for your research, please cite our paper:

@article{qing2024advantage,
  title={Advantage-Aware Policy Optimization for Offline Reinforcement Learning},
  author={Qing, Yunpeng and Cong, Jingyuan and Chen, Kaixuan and Zhou, Yihe and Song, Mingli and others},
  journal={arXiv preprint arXiv:2403.07262},
  year={2024}
}

Contact

Please feel free to contact me via email (qingyunpeng@zju.edu.cn, liushunyu@zju.edu.cn) if you are interested in our research :)

About

Official code for NeurIPS2024 (poster): "A2PO: Towards Effective Offline Reinforcement Learning from an Advantage-aware Perspective" in gym tasks.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages