Skip to content

GPUDeviceCoreAllocated is not 100 when only requesting nvidia.com/gpu (exclusive memory) #1353

@xrwang8

Description

@xrwang8

What would you like to be added:

Add a conditional default for NVIDIA GPU core requests: when a container requests nvidia.com/gpu and the GPU memory is exclusive (i.e., nvidia.com/gpumem-percentage equals 100, or the memory percentage is not set and defaults to 100%), and the user did not explicitly set nvidia.com/gpucores, default nvidia.com/gpucores to 100. Also document this behavior.

What type of PR is this?

/kind feature

What this PR does / why we need it:

Users commonly interpret nvidia.com/gpu as “exclusive GPU”. HAMi already defaults the memory to 100% in this case, effectively preventing other pods from sharing the device. However, metrics like GPUDeviceCoreAllocated remain at 0 unless nvidia.com/gpucores is explicitly set, which is misleading. This change aligns semantics and metrics by defaulting cores to 100 only when memory is exclusive and the core request is unset. Non‑exclusive memory cases (e.g., nvidia.com/gpumem-percentage: 50) remain shareable and are not affected.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

  • Explicit nvidia.com/gpucores always takes precedence and is not overridden.
  • MIG mode: behavior remains consistent; “exclusive memory” refers to the selected MIG instance’s memory. Core defaulting to 100 applies only when cores are unset.
  • Memory oversubscription (nvidia.deviceMemoryScaling > 1): “exclusive” refers to the registered (scaled) memory. The conditional default still applies.
  • No change to nvidia.defaultCores (remains 0 by default), preserving backward compatibility.

Does this PR introduce a user-facing change?:

Yes.

  • Default nvidia.com/gpucores to 100 when memory is exclusive and cores are not explicitly set. Non‑exclusive memory requests remain unchanged and shareable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions