The official implementation of "Mind the Gap: Offline Policy Optimization for Imperfect Rewards" (ICLR2023)
-
Updated
Mar 3, 2023 - Python
The official implementation of "Mind the Gap: Offline Policy Optimization for Imperfect Rewards" (ICLR2023)
# Mind-the-GapMind the Gap aims to enhance Chain of Thought (CoT) tuning for better AI performance. Join us in exploring innovative solutions and contributing to the project! 🐙🌟
Add a description, image, and links to the imperfect-reward-function topic page so that developers can more easily learn about it.
To associate your repository with the imperfect-reward-function topic, visit your repo's landing page and select "manage topics."