Skip to content

Commit a96fec4

Browse files
authored
[Doc] Guide for Incremental Compilation Workflow (vllm-project#19109)
1 parent aa538ff commit a96fec4

File tree

4 files changed

+313
-0
lines changed

4 files changed

+313
-0
lines changed

docs/contributing/README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,8 @@ See <gh-file:LICENSE>.
2929
Depending on the kind of development you'd like to do (e.g. Python, CUDA), you can choose to build vLLM with or without compilation.
3030
Check out the [building from source][build-from-source] documentation for details.
3131

32+
For an optimized workflow when iterating on C++/CUDA kernels, see the [Incremental Compilation Workflow](./incremental_build.md) for recommendations.
33+
3234
### Building the docs with MkDocs
3335

3436
#### Introduction to MkDocs
@@ -188,6 +190,7 @@ The PR needs to meet the following code quality standards:
188190

189191
### Adding or Changing Kernels
190192

193+
When actively developing or modifying kernels, using the [Incremental Compilation Workflow](./incremental_build.md) is highly recommended for faster build times.
191194
Each custom kernel needs a schema and one or more implementations to be registered with PyTorch.
192195

193196
- Make sure custom ops are registered following PyTorch guidelines:
Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
# Incremental Compilation Workflow for vLLM Development
2+
3+
When working on vLLM's C++/CUDA kernels located in the `csrc/` directory, recompiling the entire project with `uv pip install -e .` for every change can be time-consuming. An incremental compilation workflow using CMake allows for faster iteration by only recompiling the necessary components after an initial setup. This guide details how to set up and use such a workflow, which complements your editable Python installation.
4+
5+
## Prerequisites
6+
7+
Before setting up the incremental build:
8+
9+
1. **vLLM Editable Install:** Ensure you have vLLM installed from source in an editable mode. Using pre-compiled wheels for the initial editable setup can be faster, as the CMake workflow will handle subsequent kernel recompilations.
10+
11+
```console
12+
uv venv --python 3.12 --seed
13+
source .venv/bin/activate
14+
VLLM_USE_PRECOMPILED=1 uv pip install -U -e . --torch-backend=auto
15+
```
16+
17+
2. **CUDA Toolkit:** Verify that the NVIDIA CUDA Toolkit is correctly installed and `nvcc` is accessible in your `PATH`. CMake relies on `nvcc` to compile CUDA code. You can typically find `nvcc` in `$CUDA_HOME/bin/nvcc` or by running `which nvcc`. If you encounter issues, refer to the [official CUDA Toolkit installation guides](https://developer.nvidia.com/cuda-toolkit-archive) and vLLM's main [GPU installation documentation](../getting_started/installation/gpu/cuda.inc.md#troubleshooting) for troubleshooting. The `CMAKE_CUDA_COMPILER` variable in your `CMakeUserPresets.json` should also point to your `nvcc` binary.
18+
19+
3. **Build Tools:** It is highly recommended to install `ccache` for fast rebuilds by caching compilation results (e.g., `sudo apt install ccache` or `conda install ccache`). Also, ensure the core build dependencies like `cmake` and `ninja` are installed. These are installable through `requirements/build.txt` or your system's package manager.
20+
21+
```console
22+
uv pip install -r requirements/build.txt --torch-backend=auto
23+
```
24+
25+
## Setting up the CMake Build Environment
26+
27+
The incremental build process is managed through CMake. You can configure your build settings using a `CMakeUserPresets.json` file at the root of the vLLM repository.
28+
29+
### Generate `CMakeUserPresets.json` using the helper script
30+
31+
To simplify the setup, vLLM provides a helper script that attempts to auto-detect your system's configuration (like CUDA path, Python environment, and CPU cores) and generates the `CMakeUserPresets.json` file for you.
32+
33+
**Run the script:**
34+
35+
Navigate to the root of your vLLM clone and execute the following command:
36+
37+
```console
38+
python tools/generate_cmake_presets.py
39+
```
40+
41+
The script will prompt you if it cannot automatically determine certain paths (e.g., `nvcc` or a specific Python executable for your vLLM development environment). Follow the on-screen prompts. If an existing `CMakeUserPresets.json` is found, the script will ask for confirmation before overwriting it.
42+
43+
After running the script, a `CMakeUserPresets.json` file will be created in the root of your vLLM repository.
44+
45+
### Example `CMakeUserPresets.json`
46+
47+
Below is an example of what the generated `CMakeUserPresets.json` might look like. The script will tailor these values based on your system and any input you provide.
48+
49+
```json
50+
{
51+
"version": 6,
52+
"cmakeMinimumRequired": {
53+
"major": 3,
54+
"minor": 26,
55+
"patch": 1
56+
},
57+
"configurePresets": [
58+
{
59+
"name": "release",
60+
"generator": "Ninja",
61+
"binaryDir": "${sourceDir}/cmake-build-release",
62+
"cacheVariables": {
63+
"CMAKE_CUDA_COMPILER": "/usr/local/cuda/bin/nvcc",
64+
"CMAKE_C_COMPILER_LAUNCHER": "ccache",
65+
"CMAKE_CXX_COMPILER_LAUNCHER": "ccache",
66+
"CMAKE_CUDA_COMPILER_LAUNCHER": "ccache",
67+
"CMAKE_BUILD_TYPE": "Release",
68+
"VLLM_PYTHON_EXECUTABLE": "/home/user/venvs/vllm/bin/python",
69+
"CMAKE_INSTALL_PREFIX": "${sourceDir}",
70+
"CMAKE_CUDA_FLAGS": "",
71+
"NVCC_THREADS": "4",
72+
"CMAKE_JOB_POOLS": "compile=32"
73+
}
74+
}
75+
],
76+
"buildPresets": [
77+
{
78+
"name": "release",
79+
"configurePreset": "release",
80+
"jobs": 32
81+
}
82+
]
83+
}
84+
```
85+
86+
**What do the various configurations mean?**
87+
- `CMAKE_CUDA_COMPILER`: Path to your `nvcc` binary. The script attempts to find this automatically.
88+
- `CMAKE_C_COMPILER_LAUNCHER`, `CMAKE_CXX_COMPILER_LAUNCHER`, `CMAKE_CUDA_COMPILER_LAUNCHER`: Setting these to `ccache` (or `sccache`) significantly speeds up rebuilds by caching compilation results. Ensure `ccache` is installed (e.g., `sudo apt install ccache` or `conda install ccache`). The script sets these by default.
89+
- `VLLM_PYTHON_EXECUTABLE`: Path to the Python executable in your vLLM development environment. The script will prompt for this, defaulting to the current Python environment if suitable.
90+
- `CMAKE_INSTALL_PREFIX: "${sourceDir}"`: Specifies that the compiled components should be installed back into your vLLM source directory. This is crucial for the editable install, as it makes the newly built kernels immediately available to your Python environment.
91+
- `CMAKE_JOB_POOLS` and `jobs` in build presets: Control the parallelism of the build. The script sets these based on the number of CPU cores detected on your system.
92+
- `binaryDir`: Specifies where the build artifacts will be stored (e.g., `cmake-build-release`).
93+
94+
## Building and Installing with CMake
95+
96+
Once your `CMakeUserPresets.json` is configured:
97+
98+
1. **Initialize the CMake build environment:**
99+
This step configures the build system according to your chosen preset (e.g., `release`) and creates the build directory at `binaryDir`
100+
101+
```console
102+
cmake --preset release
103+
```
104+
105+
2. **Build and install the vLLM components:**
106+
This command compiles the code and installs the resulting binaries into your vLLM source directory, making them available to your editable Python installation.
107+
108+
```console
109+
cmake --build --preset release --target install
110+
```
111+
112+
3. **Make changes and repeat!**
113+
Now you start using your editable install of vLLM, testing and making changes as needed. If you need to build again to update based on changes, simply run the CMake command again to build only the affected files.
114+
115+
```console
116+
cmake --build --preset release --target install
117+
```
118+
119+
## Verifying the Build
120+
121+
After a successful build, you will find a populated build directory (e.g., `cmake-build-release/` if you used the `release` preset and the example configuration).
122+
123+
```console
124+
> ls cmake-build-release/
125+
bin cmake_install.cmake _deps machete_generation.log
126+
build.ninja CPackConfig.cmake detect_cuda_compute_capabilities.cu marlin_generation.log
127+
_C.abi3.so CPackSourceConfig.cmake detect_cuda_version.cc _moe_C.abi3.so
128+
CMakeCache.txt ctest _flashmla_C.abi3.so moe_marlin_generation.log
129+
CMakeFiles cumem_allocator.abi3.so install_local_manifest.txt vllm-flash-attn
130+
```
131+
132+
The `cmake --build ... --target install` command copies the compiled shared libraries (like `_C.abi3.so`, `_moe_C.abi3.so`, etc.) into the appropriate `vllm` package directory within your source tree. This updates your editable installation with the newly compiled kernels.
133+
134+
## Additional Tips
135+
136+
- **Adjust Parallelism:** Fine-tune the `CMAKE_JOB_POOLS` in `configurePresets` and `jobs` in `buildPresets` in your `CMakeUserPresets.json`. Too many jobs can overload systems with limited RAM or CPU cores, leading to slower builds or system instability. Too few won't fully utilize available resources.
137+
- **Clean Builds When Necessary:** If you encounter persistent or strange build errors, especially after significant changes or switching branches, consider removing the CMake build directory (e.g., `rm -rf cmake-build-release`) and re-running the `cmake --preset` and `cmake --build` commands.
138+
- **Specific Target Builds:** For even faster iterations when working on a specific module, you can sometimes build a specific target instead of the full `install` target, though `install` ensures all necessary components are updated in your Python environment. Refer to CMake documentation for more advanced target management.

docs/getting_started/installation/gpu/cuda.inc.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -151,6 +151,9 @@ pip install -e .
151151
[sccache](https://github.com/mozilla/sccache) works similarly to `ccache`, but has the capability to utilize caching in remote storage environments.
152152
The following environment variables can be set to configure the vLLM `sccache` remote: `SCCACHE_BUCKET=vllm-build-sccache SCCACHE_REGION=us-west-2 SCCACHE_S3_NO_CREDENTIALS=1`. We also recommend setting `SCCACHE_IDLE_TIMEOUT=0`.
153153

154+
!!! note "Faster Kernel Development"
155+
For frequent C++/CUDA kernel changes, after the initial `pip install -e .` setup, consider using the [Incremental Compilation Workflow](../../contributing/incremental_build.md) for significantly faster rebuilds of only the modified kernel code.
156+
154157
##### Use an existing PyTorch installation
155158

156159
There are scenarios where the PyTorch dependency cannot be easily installed via pip, e.g.:

tools/generate_cmake_presets.py

Lines changed: 169 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,169 @@
1+
# SPDX-License-Identifier: Apache-2.0
2+
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
3+
import json
4+
import multiprocessing
5+
import os
6+
import sys
7+
from shutil import which
8+
9+
try:
10+
# Try to get CUDA_HOME from PyTorch installation, which is the
11+
# most reliable source of truth for vLLM's build.
12+
from torch.utils.cpp_extension import CUDA_HOME
13+
except ImportError:
14+
print("Warning: PyTorch not found. "
15+
"Falling back to CUDA_HOME environment variable.")
16+
CUDA_HOME = os.environ.get("CUDA_HOME")
17+
18+
19+
def get_python_executable():
20+
"""Get the current Python executable, which is used to run this script."""
21+
return sys.executable
22+
23+
24+
def get_cpu_cores():
25+
"""Get the number of CPU cores."""
26+
return multiprocessing.cpu_count()
27+
28+
29+
def generate_presets(output_path="CMakeUserPresets.json"):
30+
"""Generates the CMakeUserPresets.json file."""
31+
32+
print("Attempting to detect your system configuration...")
33+
34+
# Detect NVCC
35+
nvcc_path = None
36+
if CUDA_HOME:
37+
prospective_path = os.path.join(CUDA_HOME, "bin", "nvcc")
38+
if os.path.exists(prospective_path):
39+
nvcc_path = prospective_path
40+
print("Found nvcc via torch.utils.cpp_extension.CUDA_HOME: "
41+
f"{nvcc_path}")
42+
43+
if not nvcc_path:
44+
nvcc_path = which("nvcc")
45+
if nvcc_path:
46+
print(f"Found nvcc in PATH: {nvcc_path}")
47+
48+
if not nvcc_path:
49+
nvcc_path_input = input(
50+
"Could not automatically find 'nvcc'. Please provide the full "
51+
"path to nvcc (e.g., /usr/local/cuda/bin/nvcc): ")
52+
nvcc_path = nvcc_path_input.strip()
53+
print(f"Using NVCC path: {nvcc_path}")
54+
55+
# Detect Python executable
56+
python_executable = get_python_executable()
57+
if python_executable:
58+
print(f"Found Python via sys.executable: {python_executable}")
59+
else:
60+
python_executable_prompt = (
61+
"Could not automatically find Python executable. Please provide "
62+
"the full path to your Python executable for vLLM development "
63+
"(typically from your virtual environment, e.g., "
64+
"/home/user/venvs/vllm/bin/python): ")
65+
python_executable = input(python_executable_prompt).strip()
66+
if not python_executable:
67+
raise ValueError(
68+
"Could not determine Python executable. Please provide it "
69+
"manually.")
70+
71+
print(f"Using Python executable: {python_executable}")
72+
73+
# Get CPU cores
74+
cpu_cores = get_cpu_cores()
75+
nvcc_threads = min(4, cpu_cores)
76+
cmake_jobs = max(1, cpu_cores // nvcc_threads)
77+
print(f"Detected {cpu_cores} CPU cores. "
78+
f"Setting NVCC_THREADS={nvcc_threads} and CMake jobs={cmake_jobs}.")
79+
80+
# Get vLLM project root (assuming this script is in vllm/tools/)
81+
project_root = os.path.abspath(
82+
os.path.join(os.path.dirname(__file__), ".."))
83+
print(f"VLLM project root detected as: {project_root}")
84+
85+
# Ensure python_executable path is absolute or resolvable
86+
if not os.path.isabs(python_executable) and which(python_executable):
87+
python_executable = os.path.abspath(which(python_executable))
88+
elif not os.path.isabs(python_executable):
89+
print(f"Warning: Python executable '{python_executable}' is not an "
90+
"absolute path and not found in PATH. CMake might not find it.")
91+
92+
cache_variables = {
93+
"CMAKE_CUDA_COMPILER": nvcc_path,
94+
"CMAKE_BUILD_TYPE": "Release",
95+
"VLLM_PYTHON_EXECUTABLE": python_executable,
96+
"CMAKE_INSTALL_PREFIX": "${sourceDir}",
97+
"CMAKE_CUDA_FLAGS": "",
98+
"NVCC_THREADS": str(nvcc_threads),
99+
}
100+
101+
# Detect compiler cache
102+
if which("sccache"):
103+
print("Using sccache for compiler caching.")
104+
for launcher in ("C", "CXX", "CUDA", "HIP"):
105+
cache_variables[f"CMAKE_{launcher}_COMPILER_LAUNCHER"] = "sccache"
106+
elif which("ccache"):
107+
print("Using ccache for compiler caching.")
108+
for launcher in ("C", "CXX", "CUDA", "HIP"):
109+
cache_variables[f"CMAKE_{launcher}_COMPILER_LAUNCHER"] = "ccache"
110+
else:
111+
print("No compiler cache ('ccache' or 'sccache') found.")
112+
113+
configure_preset = {
114+
"name": "release",
115+
"binaryDir": "${sourceDir}/cmake-build-release",
116+
"cacheVariables": cache_variables,
117+
}
118+
if which("ninja"):
119+
print("Using Ninja generator.")
120+
configure_preset["generator"] = "Ninja"
121+
cache_variables["CMAKE_JOB_POOLS"] = f"compile={cmake_jobs}"
122+
else:
123+
print("Ninja not found, using default generator. "
124+
"Build may be slower.")
125+
126+
presets = {
127+
"version":
128+
6,
129+
# Keep in sync with CMakeLists.txt and requirements/build.txt
130+
"cmakeMinimumRequired": {
131+
"major": 3,
132+
"minor": 26,
133+
"patch": 1
134+
},
135+
"configurePresets": [configure_preset],
136+
"buildPresets": [{
137+
"name": "release",
138+
"configurePreset": "release",
139+
"jobs": cmake_jobs,
140+
}],
141+
}
142+
143+
output_file_path = os.path.join(project_root, output_path)
144+
145+
if os.path.exists(output_file_path):
146+
overwrite = input(
147+
f"'{output_file_path}' already exists. Overwrite? (y/N): ").strip(
148+
).lower()
149+
if overwrite != 'y':
150+
print("Generation cancelled.")
151+
return
152+
153+
try:
154+
with open(output_file_path, "w") as f:
155+
json.dump(presets, f, indent=4)
156+
print(f"Successfully generated '{output_file_path}'")
157+
print("\nTo use this preset:")
158+
print(
159+
f"1. Ensure you are in the vLLM root directory: cd {project_root}")
160+
print("2. Initialize CMake: cmake --preset release")
161+
print("3. Build+install: cmake --build --preset release "
162+
"--target install")
163+
164+
except OSError as e:
165+
print(f"Error writing file: {e}")
166+
167+
168+
if __name__ == "__main__":
169+
generate_presets()

0 commit comments

Comments
 (0)