forked from sgl-project/sglang
-
Notifications
You must be signed in to change notification settings - Fork 0
Automatically detects RDMA devices, eliminating complex manual setup for mooncake #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
whybeyoung
wants to merge
3
commits into
main
Choose a base branch
from
auto-detect-net
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,152 @@ | ||
#!/usr/bin/env python | ||
# coding:utf-8 | ||
""" | ||
@author: nivic ybyang7 | ||
@license: Apache Licence | ||
@file: ib_devices | ||
@time: 2025/04/03 | ||
@contact: ybyang7@iflytek.com | ||
@site: | ||
@software: PyCharm | ||
|
||
# Code is far away from bugs with the god animal protecting | ||
I love animals. They taste delicious. | ||
┏┓ ┏┓ | ||
┏┛┻━━━┛┻┓ | ||
┃ ☃ ┃ | ||
┃ ┳┛ ┗┳ ┃ | ||
┃ ┻ ┃ | ||
┗━┓ ┏━┛ | ||
┃ ┗━━━┓ | ||
┃ God Bless ┣┓ | ||
┃ No BUG! ┏┛ | ||
┗┓┓┏━┳┓┏┛ | ||
┃┫┫ ┃┫┫ | ||
┗┻┛ ┗┻┛ | ||
""" | ||
import os | ||
|
||
# Copyright (c) 2022. Lorem ipsum dolor sit amet, consectetur adipiscing elit. | ||
# Morbi non lorem porttitor neque feugiat blandit. Ut vitae ipsum eget quam lacinia accumsan. | ||
# Etiam sed turpis ac ipsum condimentum fringilla. Maecenas magna. | ||
# Proin dapibus sapien vel ante. Aliquam erat volutpat. Pellentesque sagittis ligula eget metus. | ||
# Vestibulum commodo. Ut rhoncus gravida arcu. | ||
import pyverbs.device as d | ||
import pynvml | ||
|
||
|
||
def get_device_list(prefix, gpu_no=0, roce_version=2, port_num=1): | ||
""" | ||
Get a list of RDMA devices matching the specified prefix. | ||
|
||
Args: | ||
prefix (str): Device name prefix to filter (e.g., 'mlx') | ||
gpu_no (int): GPU device number (default: 0) | ||
roce_version (int): RoCE version to use (default: 2) | ||
port_num (int): Port number to query (default: 1) | ||
|
||
Returns: | ||
dict: Dictionary mapping RDMA device names to their PCI addresses | ||
""" | ||
lst = d.get_device_list() | ||
if len(lst) == 0: | ||
print("No IB devices") | ||
return [] | ||
device_list = {} | ||
for dev in lst: | ||
if dev.name.decode().startswith(prefix): | ||
with d.Context(name=dev.name.decode()) as ctx: | ||
gid_tbl_len = ctx.query_port(port_num).gid_tbl_len | ||
if gid_tbl_len > 0: | ||
ctx.query_gid(port_num=port_num, index=roce_version) | ||
# Get PCI address from sysfs | ||
dev_path = f"/sys/class/infiniband/{dev.name.decode()}/device" | ||
if os.path.exists(dev_path): | ||
pci_addr = os.readlink(dev_path).split("/")[-1] # Format like "0000:19:00.0" | ||
device_list[dev.name.decode()] = pci_addr | ||
|
||
return device_list | ||
|
||
|
||
def get_gpu_pci_address(gpu_no): | ||
""" | ||
Get the PCI address of a specified GPU device. | ||
|
||
Args: | ||
gpu_no (int): GPU device number | ||
|
||
Returns: | ||
str: PCI address of the GPU device | ||
""" | ||
pynvml.nvmlInit() | ||
handle = pynvml.nvmlDeviceGetHandleByIndex(gpu_no) | ||
pci_info = pynvml.nvmlDeviceGetPciInfo(handle) | ||
pynvml.nvmlShutdown() | ||
return pci_info.busId | ||
|
||
|
||
def get_net_device_from_rdma(rdma_dev): | ||
""" | ||
Get the network interface name corresponding to a RoCE device. | ||
|
||
Args: | ||
rdma_dev (str): RDMA device name | ||
|
||
Returns: | ||
str: Network interface name or None if not found | ||
""" | ||
net_path = f"/sys/class/infiniband/{rdma_dev}/device/net" | ||
if os.path.exists(net_path): | ||
return os.listdir(net_path)[0] # Read network interface name | ||
return None | ||
|
||
|
||
def normalize_pci_addr(pci_addr): | ||
""" | ||
Standardize PCI address format. | ||
|
||
Args: | ||
pci_addr (str): PCI address to normalize | ||
|
||
Returns: | ||
str: Normalized PCI address in format "0000:08:00.0" | ||
""" | ||
parts = pci_addr.split(":") | ||
if len(parts) == 3: # Format like "00000000:08:00.0" | ||
return f"{int(parts[0], 16):04x}:{parts[1]}:{parts[2]}" # Convert to "0000:08:00.0" | ||
return pci_addr # Return original format | ||
|
||
|
||
def find_best_rdma_device_for_gpu(gpu_no, prefix="mlx"): | ||
""" | ||
Find the most affinity RoCE network card for a given GPU. | ||
|
||
Args: | ||
gpu_no (int): GPU device number | ||
prefix (str): RDMA device name prefix (default: "mlx") | ||
|
||
Returns: | ||
tuple: (best_rdma_dev, net_dev) containing the best RDMA device and its network interface | ||
""" | ||
gpu_pci = normalize_pci_addr(get_gpu_pci_address(gpu_no)) | ||
roce_devices = {k: normalize_pci_addr(v) for k, v in get_device_list(prefix).items()} | ||
|
||
best_rdma_dev = None | ||
min_distance = float("inf") | ||
|
||
for rdma_dev, rdma_pci in roce_devices.items(): | ||
if rdma_pci[:5] == gpu_pci[:5]: # Ensure same NUMA node | ||
distance = abs(int(rdma_pci.split(":")[1], 16) - int(gpu_pci.split(":")[1], 16)) | ||
if distance < min_distance: | ||
min_distance = distance | ||
best_rdma_dev = rdma_dev | ||
|
||
if best_rdma_dev: | ||
net_dev = get_net_device_from_rdma(best_rdma_dev) | ||
return best_rdma_dev, net_dev | ||
|
||
|
||
if __name__ == '__main__': | ||
gpu_no = 0 # GPU device number to query | ||
rdma_dev, net_dev = find_best_roce_for_gpu(gpu_no) | ||
print(f"GPU {gpu_no} most affinity RDMA device: {rdma_dev}, corresponding network interface: {net_dev}") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We think it's better to put toplogy detection inside the Mooncake Transfer Engine. You can checkout our PR here and tryout. To enable this, just leave the device_name items blank.