Skip to content

[Bug/Help] portal:多节点集群中只有一个节点有GPU资源,web界面识别错误 #1495

@397325475

Description

@397325475

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

  • 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

发生了什么 | What happened

7节点集群中只有一个节点有4张GPU资源,web界面识别的是28张GPU,当把这个有GPU的节点单独一个分区后,识别的GPU数正常,但是新建交互式任务的时候只能勾选GPU,不能设置CPU资源

期望结果 | What did you expect to happen

系统在一个分区内,可以正常识别GPU的数量

之前运行正常吗? | Did this work before?

之前没有这样试过,第一次遇到这个问题

复现方法 | Steps To Reproduce

1、slurm集群其中一个节点有GPU
2、web登录首页显示GPU数量不对

运行环境 | Environment

- OS:centos7.9
- Scheduler:slurm 22.05.8
- Docker:26.1.4
- Docker-compose: V2.7.0
- SCOW cli: 1.6.4
- SCOW: v1.6.4
- Adapter:slurm-adapter v1.6

备注 | Anything else?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions