Skip to content

Question: please tell me more how --batch-size affects prompt ingestion #2463

Answered by slaren
vmajor asked this question in Q&A
Discussion options

You must be logged in to vote

When a token is evaluated, the result is stored in the KV cache. All previously evaluated tokens are considered when generating a new token. It doesn't matter if you evaluate the prompt one token at a time or in batches of any size, the result in the KV cache is the same.

Replies: 1 comment 5 replies

Comment options

You must be logged in to vote
5 replies
@BarfingLemurs
Comment options

@slaren
Comment options

@vmajor
Comment options

@slaren
Comment options

Answer selected by vmajor
@vmajor
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants