Skip to content

Commit 64a9af5

Browse files
authored
Simplify ep kernels installation (vllm-project#19412)
Signed-off-by: youkaichao <youkaichao@gmail.com>
1 parent e424884 commit 64a9af5

File tree

5 files changed

+26
-69
lines changed

5 files changed

+26
-69
lines changed

tools/ep_kernels/README.md

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,10 @@
11
Large-scale cluster-level expert parallel, as described in the [DeepSeek-V3 Technical Report](http://arxiv.org/abs/2412.19437), is an efficient way to deploy sparse MoE models with many experts. However, such deployment requires many components beyond a normal Python package, including system package support and system driver support. It is impossible to bundle all these components into a Python package.
22

3-
Here we break down the requirements in 3 steps:
3+
Here we break down the requirements in 2 steps:
44
1. Build and install the Python libraries (both [pplx-kernels](https://github.com/ppl-ai/pplx-kernels) and [DeepEP](https://github.com/deepseek-ai/DeepEP)), including necessary dependencies like NVSHMEM. This step does not require any privileged access. Any user can do this.
5-
2. Build and install the system libraries (GDR Copy). This step requires root access. You can do it inside a Docker container so that they can be shipped as a single image.
6-
3. Build and install the system drivers (GDR Copy, and necessary modifications to NVIDIA driver to enable IBGDA). This step requires root access, and must be done on the host machine.
5+
2. Configure NVIDIA driver to enable IBGDA. This step requires root access, and must be done on the host machine.
76

8-
2 and 3 are necessary for multi-node deployment.
7+
2 is necessary for multi-node deployment.
98

109
All scripts accept a positional argument as workspace path for staging the build, defaulting to `$(pwd)/ep_kernels_workspace`.
1110

@@ -21,7 +20,6 @@ bash install_python_libraries.sh
2120

2221
```bash
2322
bash install_python_libraries.sh
24-
sudo bash install_system_libraries.sh
25-
sudo bash install_system_drivers.sh
23+
sudo bash configure_system_drivers.sh
2624
sudo reboot # Reboot is required to load the new driver
2725
```
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
set -ex
2+
3+
# turn on IBGDA
4+
echo 'options nvidia NVreg_EnableStreamMemOPs=1 NVreg_RegistryDwords="PeerMappingOverride=1;"' | tee -a /etc/modprobe.d/nvidia.conf
5+
update-initramfs -u
6+
7+
echo "Please reboot the system to apply the changes"

tools/ep_kernels/install_python_libraries.sh

Lines changed: 15 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -13,16 +13,6 @@ fi
1313
# install dependencies if not installed
1414
pip3 install cmake torch ninja
1515

16-
# build gdrcopy, required by nvshmem
17-
pushd $WORKSPACE
18-
wget https://github.com/NVIDIA/gdrcopy/archive/refs/tags/v2.4.4.tar.gz
19-
mkdir -p gdrcopy_src
20-
tar -xvf v2.4.4.tar.gz -C gdrcopy_src --strip-components=1
21-
pushd gdrcopy_src
22-
make -j$(nproc)
23-
make prefix=$WORKSPACE/gdrcopy_install install
24-
popd
25-
2616
# build nvshmem
2717
pushd $WORKSPACE
2818
mkdir -p nvshmem_src
@@ -34,26 +24,30 @@ git init
3424
git apply -vvv nvshmem.patch
3525

3626
# assume CUDA_HOME is set correctly
37-
export GDRCOPY_HOME=$WORKSPACE/gdrcopy_install
27+
if [ -z "$CUDA_HOME" ]; then
28+
echo "CUDA_HOME is not set, please set it to your CUDA installation directory."
29+
exit 1
30+
fi
31+
32+
# disable all features except IBGDA
33+
export NVSHMEM_IBGDA_SUPPORT=1
34+
3835
export NVSHMEM_SHMEM_SUPPORT=0
3936
export NVSHMEM_UCX_SUPPORT=0
4037
export NVSHMEM_USE_NCCL=0
41-
export NVSHMEM_IBGDA_SUPPORT=1
4238
export NVSHMEM_PMIX_SUPPORT=0
4339
export NVSHMEM_TIMEOUT_DEVICE_POLLING=0
44-
export NVSHMEM_USE_GDRCOPY=1
45-
export NVSHMEM_IBRC_SUPPORT=1
46-
47-
# remove MPI dependency
40+
export NVSHMEM_USE_GDRCOPY=0
41+
export NVSHMEM_IBRC_SUPPORT=0
4842
export NVSHMEM_BUILD_TESTS=0
4943
export NVSHMEM_BUILD_EXAMPLES=0
5044
export NVSHMEM_MPI_SUPPORT=0
45+
export NVSHMEM_BUILD_HYDRA_LAUNCHER=0
46+
export NVSHMEM_BUILD_TXZ_PACKAGE=0
47+
export NVSHMEM_TIMEOUT_DEVICE_POLLING=0
5148

52-
cmake -S . -B $WORKSPACE/nvshmem_build/ -DCMAKE_INSTALL_PREFIX=$WORKSPACE/nvshmem_install
53-
54-
cd $WORKSPACE/nvshmem_build/
55-
make -j$(nproc)
56-
make install
49+
cmake -G Ninja -S . -B $WORKSPACE/nvshmem_build/ -DCMAKE_INSTALL_PREFIX=$WORKSPACE/nvshmem_install
50+
cmake --build $WORKSPACE/nvshmem_build/ --target install
5751

5852
popd
5953

tools/ep_kernels/install_system_drivers.sh

Lines changed: 0 additions & 24 deletions
This file was deleted.

tools/ep_kernels/install_system_libraries.sh

Lines changed: 0 additions & 18 deletions
This file was deleted.

0 commit comments

Comments
 (0)