[wip] add nccl allocator and symm memory and enable TP all reduce for nccl symm #21383

Amir-19 · 2025-07-22T14:59:03Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

Test Plan

Test Result

(Optional) Documentation Update

github-actions · 2025-07-22T14:59:11Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

mergify · 2025-07-22T14:59:37Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Amir-19.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

gemini-code-assist

Code Review

This pull request introduces a custom NCCL allocator for symmetric memory to optimize tensor-parallel all-reduce operations. The changes are well-structured, including new tests for the functionality. However, I've found a couple of critical issues in the new allocator implementation regarding error handling and thread safety that must be addressed. I've also pointed out a couple of high-severity issues related to debug logging that should be cleaned up.

vllm/distributed/device_communicators/pynccl_allocator.py

vllm/distributed/device_communicators/cuda_communicator.py

Amir-19 added 4 commits July 16, 2025 11:49

add nccl symm memory

4daed93

wip

fc6960a

wip

92ba937

wip

8c0372f

mergify bot added the needs-rebase label Jul 22, 2025

gemini-code-assist bot reviewed Jul 22, 2025

View reviewed changes

Amir-19 added 4 commits July 22, 2025 10:34

wip

d625672

wip

f4c2640

wip

04a152a

wip

e0d95ee

mergify bot added the deepseek Related to DeepSeek models label Jul 22, 2025

Amir-19 added 6 commits July 22, 2025 13:19

wip

c8a3d50

wip

f4a24ed

wip

1817c59

wip

ee9cbc1

wip

4858fd2

wip

f119bbc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[wip] add nccl allocator and symm memory and enable TP all reduce for nccl symm #21383

[wip] add nccl allocator and symm memory and enable TP all reduce for nccl symm #21383

Amir-19 commented Jul 22, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Jul 22, 2025

Uh oh!

mergify bot commented Jul 22, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[wip] add nccl allocator and symm memory and enable TP all reduce for nccl symm #21383

Are you sure you want to change the base?

[wip] add nccl allocator and symm memory and enable TP all reduce for nccl symm #21383

Conversation

Amir-19 commented Jul 22, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

github-actions bot commented Jul 22, 2025

Uh oh!

mergify bot commented Jul 22, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Amir-19 commented Jul 22, 2025 •

edited by github-actions bot

Loading