You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to understand the inference process, and my understanding is that we can refine previous input tokens (since rk, the current prediction, attends to all ri for i =1,..,k-1). How can I modify the inference process to refine previous tokens? Do I need to disable caching and pass the entire context to the transformer at each iteration?
Thank you for your help!
The text was updated successfully, but these errors were encountered:
@matt-bendel
Thank you for your interest in our work and for this thoughtful question!
You're absolutely right about the attention mechanism - at step k, the model can attend to all previous tokens r₁ through rₖ₋₁. However, the model doesn't actually modify or "refine" previous tokens during inference. Instead, it uses the existing sequence (all previous scale tokens) purely as context to predict the next scale token in the sequence.
A quick follow-up question. Imagine that I had some GT sequence of rks, e.g., r_seq = [r1, r2, r3], and I wanted to use this sequence as an initial context for VAR. Is the appropriate approach to use logic like what is in the VAR forward function to construct the input to predict r4? Do I need to pass the full context to the model each iteration with this approach, or could I still use caching?
Hello, very impressive work here!
I am trying to understand the inference process, and my understanding is that we can refine previous input tokens (since rk, the current prediction, attends to all ri for i =1,..,k-1). How can I modify the inference process to refine previous tokens? Do I need to disable caching and pass the entire context to the transformer at each iteration?
Thank you for your help!
The text was updated successfully, but these errors were encountered: