Skip to content

[CB] Reduce wastage in prefill compute and pad blocks in homogeneous continuous batching #255

@yannicks1

Description

@yannicks1

Reduce wastage in prefill compute and pad blocks in homogeneous continuous batching

implement optimization idea by @JRosenkranz: do prefill only on next multiple of block size and then during decode pad with (valid) block id. Reduces computes for prefill and does not waist any valid blocks ids if whole blocks are padded to make tkv homogeneous.

related PR #262

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions