Skip to content
Discussion options

You must be logged in to vote

Thank you very much for your insightful question.

Variable Definitions

  1. max_blocks

    • Refers to the range of timesteps considered during all phases of training and testing.
    • Note: Each timestep comprises two tokens (one for the state and one for the action). Therefore, max_tokens = max_blocks × 2.
  2. num_unroll_steps

    • Represents the length of the sequence unrolled during training of the value-equivalent model.
    • It should be less than or equal to max_blocks.
    • If num_unroll_steps is less than max_blocks, part of the horizon will not be trained—which is equivalent to setting num_unroll_steps equal to max_blocks. In most cases, we simply set num_unroll_steps = max_blocks.
  3. context_length

    • Indic…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by ryanon4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants