-
Notifications
You must be signed in to change notification settings - Fork 292
CP-52880: Avoid O(N^2) behaviour in Xapi_vdi.update_allowed_operations #6183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
So the change here is avoiding the None value? Why can't the function's signature be change to remove the optional codepath to explicitly remove the n^2 behaviour? |
I agree with Pau: the Also, the trade-off here, which triggered the original change 11 years ago, is that if you are not no the coordinator, the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the O(N^2) behaviour comes from the fact that we retrieve a vbd ref but never uses it?
It is still used by I should probably try to optimize the db filter code to actually use the "index" when available, and avoid all those linear scans (that might improve other parts of the codebase too).
if a VM has many disks in a pool, and this gets called on a non-member host, then yes. It is not very clear where this function is called from: I see callers from message_forwarding, so that should be OK, but there are calls from elsewhere too. Perhaps the code could check in which situation it is: if we are pool master then fetch all the VBDs individually as here, and if not fall back to the other call (and later we can optimize the other call). |
0532c5e
to
a59fc48
Compare
I've implemented this change |
a59fc48
to
3c89345
Compare
e74ee31
to
1106d14
Compare
Create a XAPI database with a number of VMs/VDIs/VBDs and measure how long update_allowed_operations takes. Can't really use 2400 VMs here yet, because even with 240 VMs takes ~15s to initialize the test. ``` ╭─────────────────────────────────┬───────────────────────────┬───────────────────────────┬───────────────────────────╮ │name │ major-allocated │ minor-allocated │ monotonic-clock │ ├─────────────────────────────────┼───────────────────────────┼───────────────────────────┼───────────────────────────┤ │ update_allowed_operations/VDI │ 10145.1862 mjw/run│ 7412588.8431 mnw/run│ 53244625.3769 ns/run│ ╰─────────────────────────────────┴───────────────────────────┴───────────────────────────┴───────────────────────────╯ update_allowed_operations/VDI (ns): { monotonic-clock per run = 53244625.376923 (confidence: 53612028.619469 to 53103082.519729); r² = Some 0.992851 } ``` Signed-off-by: Edwin Török <edwin.torok@cloud.com>
``` ╭─────────────────────────────────┬───────────────────────────┬───────────────────────────┬───────────────────────────╮ │name │ major-allocated │ minor-allocated │ monotonic-clock │ ├─────────────────────────────────┼───────────────────────────┼───────────────────────────┼───────────────────────────┤ │ update_allowed_operations/VDI │ 10096.0354 mjw/run│ 7412629.8723 mnw/run│ 53075833.0400 ns/run│ ╰─────────────────────────────────┴───────────────────────────┴───────────────────────────┴───────────────────────────╯ update_allowed_operations/VDI (ns): { monotonic-clock per run = 53075833.040000 (confidence: 53469156.908088 to 52924201.003476); r² = Some 0.991453 } ``` Signed-off-by: Edwin Török <edwin.torok@cloud.com>
O(log n) instead of O(n) complexity. Also filtering can be done more efficiently. ``` ╭─────────────────────────────────┬───────────────────────────┬───────────────────────────┬───────────────────────────╮ │name │ major-allocated │ minor-allocated │ monotonic-clock │ ├─────────────────────────────────┼───────────────────────────┼───────────────────────────┼───────────────────────────┤ │ update_allowed_operations/VDI │ 9874.6062 mjw/run│ 7412584.1277 mnw/run│ 51364914.0200 ns/run│ ╰─────────────────────────────────┴───────────────────────────┴───────────────────────────┴───────────────────────────╯ update_allowed_operations/VDI (ns): { monotonic-clock per run = 51364914.020000 (confidence: 51756944.767313 to 51194994.874696); r² = Some 0.990799 } ``` On this test ~2-3% improvement (potentially more on larger lists). Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Perform the cheaper check first, so that it will short-circuit the evaluation when false. ``` ╭─────────────────────────────────┬───────────────────────────┬───────────────────────────┬───────────────────────────╮ │name │ major-allocated │ minor-allocated │ monotonic-clock │ ├─────────────────────────────────┼───────────────────────────┼───────────────────────────┼───────────────────────────┤ │ update_allowed_operations/VDI │ 9884.5615 mjw/run│ 7412534.5215 mnw/run│ 53189355.8923 ns/run│ ╰─────────────────────────────────┴───────────────────────────┴───────────────────────────┴───────────────────────────╯ update_allowed_operations/VDI (ns): { monotonic-clock per run = 53189355.892308 (confidence: 53722938.915014 to 52908047.166446); r² = Some 0.992578 } ``` Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Activate old xapi_vdi.update_allowed_operations optimization: get_internal_records_where does a linear scan currently, so operating on N VDIs is O(N^2). Look at the VBD records directly, like before this 2013 commit which regressed it: 5097475 (We are going to optimize get_record separately so it doesn't go through serialization) For now only do this when run on the coordinator to avoid potentially large number of VBD round-trip database fetches. We'll need to optimize the 'get_internal_record_where' later to also speed up the pool case. ``` ╭─────────────────────────────────┬───────────────────────────┬───────────────────────────┬───────────────────────────╮ │name │ major-allocated │ minor-allocated │ monotonic-clock │ ├─────────────────────────────────┼───────────────────────────┼───────────────────────────┼───────────────────────────┤ │ update_allowed_operations/VDI │ 9205.8042 mjw/run│ 964577.0228 mnw/run│ 2868770.0725 ns/run│ ╰─────────────────────────────────┴───────────────────────────┴───────────────────────────┴───────────────────────────╯ update_allowed_operations/VDI (ns): { monotonic-clock per run = 2868770.072546 (confidence: 2947963.590731 to 2834338.835371); r² = Some 0.404284 } ``` Compared to the previous commit this is 18x faster. Signed-off-by: Edwin Török <edwin.torok@cloud.com>
The python failure is unrelated (coverage), I rebased on latest master to get the CI fixes. |
Activate old xapi_vdi.update_allowed_operations optimization: get_internal_records_where does a linear scan currently, so operating on N VDIs is O(N^2).
Look at the VBD records directly, like before this 2013 commit which regressed it: 5097475
(We are going to optimize get_record separately so it doesn't go through serialization)
This still requires validation and measurements (e.g. using @contificate 's benchmark)