Skip to content

Commit 257b00e

Browse files
zou3519yangw-dev
authored andcommitted
Disable remote caching when calling compile_fx (vllm-project#16611)
Signed-off-by: rzou <zou3519@gmail.com> Signed-off-by: Yang Wang <elainewy@meta.com>
1 parent 6d0994d commit 257b00e

File tree

1 file changed

+13
-0
lines changed

1 file changed

+13
-0
lines changed

vllm/compilation/compiler_interface.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -290,6 +290,19 @@ def _get_shape_env() -> AlwaysHitShapeEnv:
290290
# Dynamo metrics context, see method for more details.
291291
stack.enter_context(self.metrics_context())
292292

293+
# Disable remote caching. When these are on, on remote cache-hit,
294+
# the monkey-patched functions never actually get called.
295+
# vLLM today assumes and requires the monkey-patched functions to
296+
# get hit.
297+
# TODO(zou3519): we're going to replace this all with
298+
# standalone_compile sometime.
299+
if is_torch_equal_or_newer("2.6"):
300+
stack.enter_context(
301+
torch._inductor.config.patch(fx_graph_remote_cache=False))
302+
stack.enter_context(
303+
torch._functorch.config.patch(
304+
enable_remote_autograd_cache=False))
305+
293306
compiled_graph = compile_fx(
294307
graph,
295308
example_inputs,

0 commit comments

Comments
 (0)