Skip to content

Commit aaa4ac1

Browse files
authored
Disable prefix cache by default for benchmark (vllm-project#18639)
Signed-off-by: cascade812 <cascade812@outlook.com>
1 parent 06a0338 commit aaa4ac1

File tree

2 files changed

+6
-0
lines changed

2 files changed

+6
-0
lines changed

benchmarks/benchmark_latency.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -189,5 +189,8 @@ def run_to_completion(profile_dir: Optional[str] = None):
189189
)
190190

191191
parser = EngineArgs.add_cli_args(parser)
192+
# V1 enables prefix caching by default which skews the latency
193+
# numbers. We need to disable prefix caching by default.
194+
parser.set_defaults(enable_prefix_caching=False)
192195
args = parser.parse_args()
193196
main(args)

vllm/benchmarks/latency.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,9 @@ def add_cli_args(parser: argparse.ArgumentParser):
8080
)
8181

8282
parser = EngineArgs.add_cli_args(parser)
83+
# V1 enables prefix caching by default which skews the latency
84+
# numbers. We need to disable prefix caching by default.
85+
parser.set_defaults(enable_prefix_caching=True)
8386

8487

8588
def main(args: argparse.Namespace):

0 commit comments

Comments
 (0)