You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wonder why do we need to initialize the cache for every block here? Can we reuse the past_key_values from the previous model inference except the first block loop? Thanks!