Skip to content

volcano vgpu插件怎么使用队列 #54

@zj619

Description

@zj619

开启volcano vgpu插件后,我们想用队列来管控vgpu资源,集群有4张gpu卡,每张卡显存15g。

1、创建如下队列

apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
name: gpu-work
spec:
capability:
volcano.sh/vgpu-number: 3

2、首先提交了3个独占的任务

apiVersion: v1
kind: Pod
metadata:
name: gpu-pod-5
namespace: gpu-test
annotations:
nvidia.com/use-gputype: "T4"
scheduling.volcano.sh/queue-name: gpu-work
spec:
schedulerName: volcano
containers:
- name: cuda-container
image: ezone.ksyun.com/ezone/dh_docker/snapshot/ubuntu:22.04
command: ["bash", "-c", "sleep 86400"]
args: ["100000"]
resources:
limits:
volcano.sh/vgpu-number: 1 #单位个

3、3个pod,都running,然后提交2个共享的任务(同时设置vgpu-number和vgpu-memory),
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod-5
namespace: gpu-test
annotations:
nvidia.com/use-gputype: "T4"
#scheduling.volcano.sh/queue-name: gpu-work
spec:
schedulerName: volcano
containers:
- name: cuda-container
image: ezone.ksyun.com/ezone/dh_docker/snapshot/ubuntu:22.04
command: ["bash", "-c", "sleep 86400"]
args: ["100000"]
resources:
limits:
volcano.sh/vgpu-number: 1 #单位个
volcano.sh/vgpu-memory: 5000 #单位MI

4、结果,第5个pod pending了,看起来像是队列不支持共享vgpu模式,想问下volcano里面vgpu插件和队列如何配合使用,没有看到相关文档

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions