Question: please tell me more how --batch-size affects prompt ingestion #2463

vmajor · 2023-07-31T06:48:11Z

vmajor
Jul 31, 2023

My understanding is that unless --batch-size matches the prompt length, the model will not evaluate the weights of all tokens and may in fact not consider all the information in the prompt when generating a response.

In one particular use case, I am feeding the model a list of news summaries and I want it to give me an overall conclusion based on all the news summaries. If I use --batch-size that is shorter than the prompt length, the model will not actually look at all the summaries, but will move the --batch-size window across the prompt and summarise only the tokens that fit inside --batch-size.

If this is correct, then the use of --batch-size is highly problematic and in effect no different to using embeddings and a search over a vector store, but in a simple fashion rather than actually creating embeddings and indexing them.

Answered by slaren

Jul 31, 2023

When a token is evaluated, the result is stored in the KV cache. All previously evaluated tokens are considered when generating a new token. It doesn't matter if you evaluate the prompt one token at a time or in batches of any size, the result in the KV cache is the same.

View full answer

slaren · 2023-07-31T09:22:20Z

slaren
Jul 31, 2023
Maintainer

Using a larger --batch-size generally increases performance at the cost of memory usage. The results should be the same regardless of what batch size you use, all the tokens in the prompt will be evaluated in groups of at most batch-size tokens.

5 replies

BarfingLemurs Jul 31, 2023

Using a larger --batch-size generally increases performance at the cost of memory usage.

Does this mean the max prompt processing tokens that fits for a given gpu, can be doubled?

If I could only process 2048 tokens on a gpu before OOM, would setting --batch-size to 256 increase this to 4096?

slaren Jul 31, 2023
Maintainer

It just means that the memory usage will be lower in most cases, whether the reduction is enough to enable other things is not guaranteed.

vmajor Jul 31, 2023
Author

"... all the tokens in the prompt will be evaluated in groups of at most batch-size tokens."

This is what makes it unclear. To me it directly contradicts the previous sentence.

If tokens are evaluated in groups of at most batch-size tokens, then not all tokens in the prompt are evaluated against each other. Or does it mean that token weights are all stored somewhere, and then final weights are assigned across all tokens?

slaren Jul 31, 2023
Maintainer

When a token is evaluated, the result is stored in the KV cache. All previously evaluated tokens are considered when generating a new token. It doesn't matter if you evaluate the prompt one token at a time or in batches of any size, the result in the KV cache is the same.

Answer selected by vmajor

vmajor Jul 31, 2023
Author

OK thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question: please tell me more how --batch-size affects prompt ingestion #2463

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 5 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Question: please tell me more how --batch-size affects prompt ingestion #2463

Uh oh!

vmajor Jul 31, 2023

Replies: 1 comment · 5 replies

Uh oh!

slaren Jul 31, 2023 Maintainer

Uh oh!

BarfingLemurs Jul 31, 2023

Uh oh!

slaren Jul 31, 2023 Maintainer

Uh oh!

vmajor Jul 31, 2023 Author

Uh oh!

slaren Jul 31, 2023 Maintainer

Uh oh!

vmajor Jul 31, 2023 Author

vmajor
Jul 31, 2023

Replies: 1 comment 5 replies

slaren
Jul 31, 2023
Maintainer

slaren Jul 31, 2023
Maintainer

vmajor Jul 31, 2023
Author

slaren Jul 31, 2023
Maintainer

vmajor Jul 31, 2023
Author