CA-399757: Add CAS style check for SR scan #6113

Vincent-lau · 2024-11-15T12:00:33Z

This is to reduce the chance of race between SR.scan and VDI.db_introduce. More details in the comment, but this won't fix the problem, but just makes it less likely to happen...

I have seen a few instances of this happening now in SXM tests, so need to prioritise fixing this

Trying to run some tests on this, but it's hard to know effective this is as storage migration tends to require lots of resources, and does not get run very often. Plus this issue is not very reproducible.

psafont

Can there be a situation where the two lists don't converge?

Vincent-lau · 2024-11-15T12:14:47Z

Can there be a situation where the two lists don't converge?

Well then someone has to be constantly changing it under our feet. I think the only source that could do that would be SMAPIv1 calling back to xapi, which shouldn't happen all the time...

ocaml/xapi/xapi_sr.ml

contificate

See commentary above.

I think the lack of else begin ... end makes the following a bug:

if db_vdis_after <> db_vdis_before then
  scan_rec () ;
update_vdis ~__context ~sr db_vdis_after vs ;

If you do recursively call scan_rec and it converges, you will then update_vdis, return to the caller and update_vdis with its version of db_vdis_after. This is probably not the desired semantics.

I think this should be a loop guard with a max retry bound, not recursion.

contificate · 2024-11-15T13:11:58Z

Could we mitigate the TOCTOU issue somewhat by holding the DB lock when computing after, such that we continue to holding it after we break the loop (fixpoint convergence), then do all the updates whilst holding the lock for the entirety (ensuring not to hold it during the scan), and then release it.

I don't know how expensive all the storage operations are in general or if this is the TOCTOU issue you mention, so this suggestion may be deeply destructive if you hold the lock for too long. This assumes the TOCTOU issue you're talking about is the potential to interleave changes between the breaking of the loop and the subsequent updates.

Vincent-lau · 2024-11-15T14:07:37Z

Could we mitigate the TOCTOU issue somewhat by holding the DB lock when computing after, such that we continue to holding it after we break the loop (fixpoint convergence), then do all the updates whilst holding the lock for the entirety (ensuring not to hold it during the scan), and then release it.

I don't know how expensive all the storage operations are in general or if this is the TOCTOU issue you mention, so this suggestion may be deeply destructive if you hold the lock for too long. This assumes the TOCTOU issue you're talking about is the potential to interleave changes between the breaking of the loop and the subsequent updates.

The TOCTOU problem happens when the db gets changed after we do the check db_vdis_after <> db_vdis_before, at which point we shouldn't really proceed.

So you are suggesting holding the lock before if db_vdis_after <> db_vdis_before then and release it until after we finish updating the db? How do you hold a db lock for part of the operations db_lock.lock db_lock.db_lock?

contificate · 2024-11-15T14:27:28Z

So you are suggesting holding the lock before if db_vdis_after <> db_vdis_before then and release it until after we finish updating the db? How do you hold a db lock for part of the operations db_lock.lock db_lock.db_lock?

Something like:

while not fixpoint do
  before = ..
  action
  acquire db_lock
  after = ...
  fixpoint = (before == after)
  if not fixpoint:
    release db_lock
done
# holding db lock still
bunch of updates
release db_lock

I don't think the database lock exposes such an interface, I think we restricted it to be a with_lock function with a callback to avoid misuse. You also won't be able to do this if you spawn new threads within any of the "after" or "bunch of updates" computation (the holder of the lock is determined by thread ID - so you would need to be careful that you can still make progress when holding the lock).

Vincent-lau · 2024-11-19T14:43:50Z

I don't think the database lock exposes such an interface, I think we restricted it to be a with_lock function with a callback to avoid misuse. You also won't be able to do this if you spawn new threads within any of the "after" or "bunch of updates" computation (the holder of the lock is determined by thread ID - so you would need to be careful that you can still make progress when holding the lock).

Yes I am worried about deadlocking if we hold this lock. It could be that SR.scan is run on a pool member, and it will hold the lock for a period, while invoking updates such as Db.SR.... which will be run on the coordinator, and because the member is holding the lock, this will cause coordinator never able to acquire the db lock at all. And this is just one case, there might be other cases of dead locking as well.

In general there is just no way to solve this cleanly...

ocaml/xapi/xapi_sr.ml

robhoes · 2024-11-28T15:52:31Z

Yes I am worried about deadlocking if we hold this lock. It could be that SR.scan is run on a pool member, and it will hold the lock for a period, while invoking updates such as Db.SR.... which will be run on the coordinator, and because the member is holding the lock, this will cause coordinator never able to acquire the db lock at all. And this is just one case, there might be other cases of dead locking as well.

This function can run on any host (e.g. a pool member for a local SR), but the coordinator runs the database and only the coordinator has access to DB locks.

robhoes · 2024-11-28T17:51:51Z

I think this is fine after adding the iteration limit, as discussed above.

contificate

The recent changes look good.

My only nitpick is that the order of operands in db_vdis_after <> db_vdis_before && limit > 0 could be swapped under the general idea that less costly computations that can short-circuit a condition should be tested first. Of course, it is the single limit retry case that this benefits the most and only in marginal ways.

SR.scan is currently not an atomic operation, and this has caused problems as during the scan itself, there might be other calls changing the state of the database, such as VDI.db_introduce called by SM, if using SMAPIv1. This will confuse SR.scan as it sees an outdated snapshot. The proposed workaround would be add a CAS style check for SR.scan, which will refuse to update the db if it detects changes. This is still subject to the TOCTOU problem, but should reduce the racing window. Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>

Vincent-lau force-pushed the private/shul2/scan-cas branch from a7928b1 to 95322fe Compare November 15, 2024 12:08

psafont approved these changes Nov 15, 2024

View reviewed changes

contificate reviewed Nov 15, 2024

View reviewed changes