Skip to content

nvme_driver: nvme device interrupts not spread across CPUs #1664

Open
@mattkur

Description

@mattkur

NVMe devices have a fixed number of interrupt vectors (IVs). The nvme_driver creates one IoIssuer and IO Queue Pair per interrupt vector. When the number of IVs is less than the number of vCPUs, some vCPUs need to share the same IOQP and IV. The nvme_driver creates them in a greedy fashion, based on the CPU on which the IO was issued by the guest. In certain guest workloads. this means that each NVMe device in OpenHCL can overlap on a relatively small subset of CPUs. This problem becomes amplified when NVMe devices are used to support a striped disk: a single IO (say a write on CPU0), can cause multiple NVMe devices to create an IO issuer on CPU 0.

We should change the algorithm to not overload a subset of CPUs. Some options:

  1. Option 1: the vtl2 settings worker knows now many nvme devices an OpenHCL VM will have. When those settings are supplied, create a global cap. Respect that cap in the nvme_driver.
  2. Option 2: the nvme_driver can keep a tally of the number of IVs assigned to any given CPU. When the max vs. min count becomes too great, then find a "close" CPU.
  3. Option 3: The supported number of IVs and vCPUs is known. "spread out" the IVs across CPUs by generating a stride. Be careful not to always start this stride on the same CPU for all nvme devices.

In general, fallback to close CPUs should try to preserve NUMA locality.

Reported-By: @fliang-ms

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions