Skip to content

This project introduces TADM (Transformer-based Any-step Dynamics Model), which combines the flexible long-range prediction of Any-step Dynamics Models with the power of Transformers. By using an anchor state and a sequence of actions, TADM avoids error buildup from step-by-step prediction and removes the need for recurrent networks.

Notifications You must be signed in to change notification settings

Zeisig0115/Transformer-based-Any-step-Dynamics-Model-for-MBRL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Transformer-based-Any-step-Dynamics-Model-for-MBRL

For anybody interested in this project please feel free to read the project report. The code mainly heritages from https://github.com/HxLyn3/ADMPO

v3-1

v3-1

v3-1

Reference:

[1] F.-M. Luo, T. Xu, H. Lai, X.-H. Chen, W. Zhang, and Y. Yu. A survey on model-based reinforcement learning. Science China Information Sciences, 67(2):121101, 2024.

[2] D. Ha and J. Schmidhuber. World models. arXiv preprint arXiv:1803.10122, 2018.

[3] M. Bhardwaj, T. Xie, B. Boots, N. Jiang, and C.-A. Cheng. Adversarial model for offline reinforcement learning. Advances in Neural Information Processing Systems, 36:1245–1269, 2023.

[4] X.-H. Chen, Y. Yu, Z. Zhu, Z. Yu, C. Zhenjun, C. Wang, Y. Wu, R.-J. Qin, H. Wu, R. Ding, et al. Adversarial counterfactual environment model learning. Advances in Neural Information Processing Systems, 36:70654–70706, 2023.

[5] Z.-M. Zhu, X.-H. Chen, H.-L. Tian, K. Zhang, and Y. Yu. Offline reinforcement learning with causal structured world models. arXiv preprint arXiv:2206.01474, 2022.

[6] T. Yu, G. Thomas, L. Yu, S. Ermon, J. Y. Zou, S. Levine, C. Finn, and T. Ma. MOPO: Model-based offline policy optimization. Advances in Neural Information Processing Systems, 33:14129–14142, 2020.

[7] M. Janner, J. Fu, M. Zhang, and S. Levine. When to trust your model: Model-based policy optimization. Advances in Neural Information Processing Systems, 32, 2019.

[8] R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. A Bradford Book, 2018.

[9] H. Lin, Y.-Y. Xu, Y. Sun, Z. Zhang, Y.-C. Li, C. Jia, J. Ye, J. Zhang, and Y. Yu. Any-step dynamics model improves future predictions for online and offline reinforcement learning. arXiv preprint arXiv:2405.17031, 2024.

[10] C. Chen, J. Yoon, Y.-F. Wu, and S. Ahn. TransDreamer: Reinforcement learning with transformer world models, 2022. URL: https://openreview.net/forum?id=s3K0arSRl4d.

[11] D. Hafner, T. Lillicrap, J. Ba, and M. Norouzi. Dream to control: Learning behaviors by latent imagination. arXiv preprint arXiv:1912.01603, 2019.

[12] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is all you need. Advances in Neural Information Processing Systems, 30, 2017.

About

This project introduces TADM (Transformer-based Any-step Dynamics Model), which combines the flexible long-range prediction of Any-step Dynamics Models with the power of Transformers. By using an anchor state and a sequence of actions, TADM avoids error buildup from step-by-step prediction and removes the need for recurrent networks.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages