What are the recommended strategies for configuring Unizero in long-horizon planning scenarios? #377
-
I'm interested in configuring Unizero for longer-horizon planning tasks. In particular, I’m interested in whether there are any known rules of thumb, practical guidelines, or prior experience when adjusting hyperparameters like For example, in the default CartPole setup:
When increasing From my own experiments:
Open Questions:
Understandably, CartPole isn’t an ideal environment to test the real benefits of long-horizon credit assignment — the episodes are short and reward is immediate — but it serves as a clean and simple testbed to experiment with config changes before moving on to more complex environments. I Would love to hear from anyone who's experimented with:
This was written by a human, however was modified using an LLM to improve English readability. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Thank you very much for your insightful question. Variable Definitions
Responses to Your Questions (For Discussion Only)
I hope these clarifications and responses help address your questions. |
Beta Was this translation helpful? Give feedback.
Thank you very much for your insightful question.
Variable Definitions
max_blocks
num_unroll_steps
context_length