[RFC]: vLLM-compile low-hanging fruit cold start improvements

### Motivation.

This issue tracks potential low-hanging fruit for improving vLLM-compile cold start time. @anijain2305, @BoyuanFeng, and I sat down to look at some traces and noticed some things we can improve.

There are more longer-term projects for improving torch.compile cold start time, but those will probably take a bit to hit.

### Proposed Change.

- [ ] vLLM's [custom bytecode hook](https://github.com/vllm-project/vllm/blob/536fd330036b0406786c847f68e4f67cba06f421/vllm/compilation/wrapper.py#L77-L121) seems to take a long time (~7 seconds on llama-3.1-70b model). I'm not sure how much of this is actually needed for runtime execution. We should guard the decompilation step behind an envvar. If VLLM_COMPILE_DEPYF=0 (default), we write out a `transformed_code.py` that has a comment that says "Please set VLLM_COMPILE_DEPYF=1 to populate this file".
- [ ] In llama-3.1-70b, with piecewise cudagraphs, we split a module into 80 different subgraphs. A lot of these subgraphs are literally the same. However, subgraphs 2-79 (approx) are cache-hitting in fx_graph_cache, but they are cache missing in AOTAutogradCache. This needs some more investigation as to why they are cache missing there.

### Feedback Period.

7/2-7/11, but really, anytime until these things are fixed.

### CC List.

cc @ProExpertProg @youkaichao @WoosukKwon @jamesjwu @zhxchen17

### Any Other Things.

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[RFC]: vLLM-compile low-hanging fruit cold start improvements #20451

Motivation.

Proposed Change.

Feedback Period.

CC List.

Any Other Things.

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[RFC]: vLLM-compile low-hanging fruit cold start improvements #20451

Description

Motivation.

Proposed Change.

Feedback Period.

CC List.

Any Other Things.

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions