CP-52880: Avoid O(N^2) behaviour in Xapi_vdi.update_allowed_operations #6183

edwintorok · 2024-12-12T22:52:37Z

Activate old xapi_vdi.update_allowed_operations optimization: get_internal_records_where does a linear scan currently, so operating on N VDIs is O(N^2).

Look at the VBD records directly, like before this 2013 commit which regressed it: 5097475

(We are going to optimize get_record separately so it doesn't go through serialization)

This still requires validation and measurements (e.g. using @contificate 's benchmark)

psafont · 2024-12-13T11:33:34Z

So the change here is avoiding the None value? Why can't the function's signature be change to remove the optional codepath to explicitly remove the n^2 behaviour?

robhoes · 2024-12-13T11:50:40Z

I agree with Pau: the get_internal_records_where branch is still there, but effectively made dead code?

Also, the trade-off here, which triggered the original change 11 years ago, is that if you are not no the coordinator, the get_internal_records_where call is a single round-trip over the network, where now you have a round-trip for each VBD. The network overhead may outweigh the DB handling speedup in that case. It depends whether this function may run on any host or just on the coordinator.

Vincent-lau

So the O(N^2) behaviour comes from the fact that we retrieve a vbd ref but never uses it?

edwintorok · 2025-02-04T18:45:00Z

I agree with Pau: the get_internal_records_where branch is still there, but effectively made dead code?

It is still used by assert_operation_valid, which operates on a single VDI (so the old code is the best we can do for now).

I should probably try to optimize the db filter code to actually use the "index" when available, and avoid all those linear scans (that might improve other parts of the codebase too).

s that if you are not no the coordinator, the get_internal_records_where call is a single round-trip over the network, where now you have a round-trip for each VBD.

if a VM has many disks in a pool, and this gets called on a non-member host, then yes. It is not very clear where this function is called from: I see callers from message_forwarding, so that should be OK, but there are calls from elsewhere too.

Perhaps the code could check in which situation it is: if we are pool master then fetch all the VBDs individually as here, and if not fall back to the other call (and later we can optimize the other call).

edwintorok · 2025-02-04T18:50:09Z

Perhaps the code could check in which situation it is: if we are pool master then fetch all the VBDs individually as here, and if not fall back to the other call (and later we can optimize the other call).

I've implemented this change

ocaml/xapi/xapi_vdi.ml

Create a XAPI database with a number of VMs/VDIs/VBDs and measure how long update_allowed_operations takes. Can't really use 2400 VMs here yet, because even with 240 VMs takes ~15s to initialize the test. ``` ╭─────────────────────────────────┬───────────────────────────┬───────────────────────────┬───────────────────────────╮ │name │ major-allocated │ minor-allocated │ monotonic-clock │ ├─────────────────────────────────┼───────────────────────────┼───────────────────────────┼───────────────────────────┤ │ update_allowed_operations/VDI │ 10145.1862 mjw/run│ 7412588.8431 mnw/run│ 53244625.3769 ns/run│ ╰─────────────────────────────────┴───────────────────────────┴───────────────────────────┴───────────────────────────╯ update_allowed_operations/VDI (ns): { monotonic-clock per run = 53244625.376923 (confidence: 53612028.619469 to 53103082.519729); r² = Some 0.992851 } ``` Signed-off-by: Edwin Török <edwin.torok@cloud.com>

``` ╭─────────────────────────────────┬───────────────────────────┬───────────────────────────┬───────────────────────────╮ │name │ major-allocated │ minor-allocated │ monotonic-clock │ ├─────────────────────────────────┼───────────────────────────┼───────────────────────────┼───────────────────────────┤ │ update_allowed_operations/VDI │ 10096.0354 mjw/run│ 7412629.8723 mnw/run│ 53075833.0400 ns/run│ ╰─────────────────────────────────┴───────────────────────────┴───────────────────────────┴───────────────────────────╯ update_allowed_operations/VDI (ns): { monotonic-clock per run = 53075833.040000 (confidence: 53469156.908088 to 52924201.003476); r² = Some 0.991453 } ``` Signed-off-by: Edwin Török <edwin.torok@cloud.com>

O(log n) instead of O(n) complexity. Also filtering can be done more efficiently. ``` ╭─────────────────────────────────┬───────────────────────────┬───────────────────────────┬───────────────────────────╮ │name │ major-allocated │ minor-allocated │ monotonic-clock │ ├─────────────────────────────────┼───────────────────────────┼───────────────────────────┼───────────────────────────┤ │ update_allowed_operations/VDI │ 9874.6062 mjw/run│ 7412584.1277 mnw/run│ 51364914.0200 ns/run│ ╰─────────────────────────────────┴───────────────────────────┴───────────────────────────┴───────────────────────────╯ update_allowed_operations/VDI (ns): { monotonic-clock per run = 51364914.020000 (confidence: 51756944.767313 to 51194994.874696); r² = Some 0.990799 } ``` On this test ~2-3% improvement (potentially more on larger lists). Signed-off-by: Edwin Török <edwin.torok@cloud.com>

Perform the cheaper check first, so that it will short-circuit the evaluation when false. ``` ╭─────────────────────────────────┬───────────────────────────┬───────────────────────────┬───────────────────────────╮ │name │ major-allocated │ minor-allocated │ monotonic-clock │ ├─────────────────────────────────┼───────────────────────────┼───────────────────────────┼───────────────────────────┤ │ update_allowed_operations/VDI │ 9884.5615 mjw/run│ 7412534.5215 mnw/run│ 53189355.8923 ns/run│ ╰─────────────────────────────────┴───────────────────────────┴───────────────────────────┴───────────────────────────╯ update_allowed_operations/VDI (ns): { monotonic-clock per run = 53189355.892308 (confidence: 53722938.915014 to 52908047.166446); r² = Some 0.992578 } ``` Signed-off-by: Edwin Török <edwin.torok@cloud.com>

Activate old xapi_vdi.update_allowed_operations optimization: get_internal_records_where does a linear scan currently, so operating on N VDIs is O(N^2). Look at the VBD records directly, like before this 2013 commit which regressed it: 5097475 (We are going to optimize get_record separately so it doesn't go through serialization) For now only do this when run on the coordinator to avoid potentially large number of VBD round-trip database fetches. We'll need to optimize the 'get_internal_record_where' later to also speed up the pool case. ``` ╭─────────────────────────────────┬───────────────────────────┬───────────────────────────┬───────────────────────────╮ │name │ major-allocated │ minor-allocated │ monotonic-clock │ ├─────────────────────────────────┼───────────────────────────┼───────────────────────────┼───────────────────────────┤ │ update_allowed_operations/VDI │ 9205.8042 mjw/run│ 964577.0228 mnw/run│ 2868770.0725 ns/run│ ╰─────────────────────────────────┴───────────────────────────┴───────────────────────────┴───────────────────────────╯ update_allowed_operations/VDI (ns): { monotonic-clock per run = 2868770.072546 (confidence: 2947963.590731 to 2834338.835371); r² = Some 0.404284 } ``` Compared to the previous commit this is 18x faster. Signed-off-by: Edwin Török <edwin.torok@cloud.com>

edwintorok · 2025-03-04T18:08:25Z

The python failure is unrelated (coverage), I rebased on latest master to get the CI fixes.

Vincent-lau reviewed Jan 23, 2025

View reviewed changes

edwintorok force-pushed the pr/CP-52880 branch 2 times, most recently from 0532c5e to a59fc48 Compare February 4, 2025 18:49

edwintorok marked this pull request as ready for review February 4, 2025 18:50

edwintorok changed the base branch from feature/perf to master February 4, 2025 18:50

psafont reviewed Feb 5, 2025

View reviewed changes

ocaml/xapi/xapi_vdi.ml Show resolved Hide resolved

psafont approved these changes Feb 5, 2025

View reviewed changes

edwintorok force-pushed the pr/CP-52880 branch from a59fc48 to 3c89345 Compare February 6, 2025 18:34

edwintorok enabled auto-merge February 6, 2025 18:36

edwintorok force-pushed the pr/CP-52880 branch 2 times, most recently from e74ee31 to 1106d14 Compare February 13, 2025 13:57

edwintorok added 5 commits March 4, 2025 18:07

edwintorok force-pushed the pr/CP-52880 branch from 1106d14 to 50a39d7 Compare March 4, 2025 18:08

robhoes approved these changes Apr 28, 2025

View reviewed changes

edwintorok added this pull request to the merge queue Apr 28, 2025

Merged via the queue into xapi-project:master with commit 625449c Apr 28, 2025
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CP-52880: Avoid O(N^2) behaviour in Xapi_vdi.update_allowed_operations #6183

CP-52880: Avoid O(N^2) behaviour in Xapi_vdi.update_allowed_operations #6183

Uh oh!

edwintorok commented Dec 12, 2024

Uh oh!

psafont commented Dec 13, 2024

Uh oh!

robhoes commented Dec 13, 2024

Uh oh!

Vincent-lau left a comment

Uh oh!

edwintorok commented Feb 4, 2025

Uh oh!

edwintorok commented Feb 4, 2025

Uh oh!

Uh oh!

edwintorok commented Mar 4, 2025

Uh oh!

Uh oh!

Uh oh!

CP-52880: Avoid O(N^2) behaviour in Xapi_vdi.update_allowed_operations #6183

CP-52880: Avoid O(N^2) behaviour in Xapi_vdi.update_allowed_operations #6183

Uh oh!

Conversation

edwintorok commented Dec 12, 2024

Uh oh!

psafont commented Dec 13, 2024

Uh oh!

robhoes commented Dec 13, 2024

Uh oh!

Vincent-lau left a comment

Choose a reason for hiding this comment

Uh oh!

edwintorok commented Feb 4, 2025

Uh oh!

edwintorok commented Feb 4, 2025

Uh oh!

Uh oh!

edwintorok commented Mar 4, 2025

Uh oh!

Uh oh!

Uh oh!