nvme_driver: nvme device interrupts not spread across CPUs

NVMe devices have a fixed number of interrupt vectors (IVs). The `nvme_driver` creates one IoIssuer and IO Queue Pair per interrupt vector. When the number of IVs is less than the number of vCPUs, some vCPUs need to share the same IOQP and IV. The `nvme_driver` creates them in a greedy fashion, based on the CPU on which the IO was issued by the guest. In certain guest workloads. this means that each NVMe device in OpenHCL can overlap on a relatively small subset of CPUs. This problem becomes amplified when NVMe devices are used to support a striped disk: a single IO (say a `write` on CPU0), can cause multiple NVMe devices to create an IO issuer on CPU 0.

We should change the algorithm to not overload a subset of CPUs. Some options:

1. Option 1: the vtl2 settings worker knows now many nvme devices an OpenHCL VM will have. When those settings are supplied, create a global cap. Respect that cap in the `nvme_driver`.
2. Option 2: the `nvme_driver` can keep a tally of the number of IVs assigned to any given CPU. When the max vs. min count becomes too great, then find a "close" CPU.
3. Option 3: The supported number of IVs and vCPUs is known. "spread out" the IVs across CPUs by generating a stride. Be careful not to always start this stride on the same CPU for all nvme devices.

In general, fallback to close CPUs should try to preserve NUMA locality.

Reported-By: @fliang-ms 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

nvme_driver: nvme device interrupts not spread across CPUs #1664

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

nvme_driver: nvme device interrupts not spread across CPUs #1664

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions