-
Notifications
You must be signed in to change notification settings - Fork 96
FileStorage: fix rare data corruption when using restore after multiple undos #395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -672,28 +672,33 @@ def _data_find(self, tpos, oid, data): | |
tid, tl, status, ul, dl, el = unpack(TRANS_HDR, h) | ||
status = as_text(status) | ||
self._file.read(ul + dl + el) | ||
tend = tpos + tl + 8 | ||
tend = tpos + tl | ||
pos = self._file.tell() | ||
backpointer = None | ||
while pos < tend: | ||
h = self._read_data_header(pos) | ||
if h.oid == oid: | ||
# Make sure this looks like the right data record | ||
if h.plen == 0: | ||
# This is also a backpointer. Gotta trust it. | ||
# This is also a backpointer, remember it and keep | ||
# looking at other records from this transaction, | ||
# if there is multiple undo records, we use the | ||
# oldest. | ||
backpointer = pos | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why are we sure that the "backpointer" found at the highest position is the right one? I.e. why have older undo records a higher position? With your change, the code no longer matches the introductory source comment. The source comment should get updated. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hello everyone. I had the same concern and actually this issue is not only about undo, but also how FileStorage handles and/or should handle a situation when one transaction contains two data records with the same oid. What loadBefore should return for such case - data from the first data record or from the second? Current behaviour is to return the data from the second data record - the one that was last to be create database with multiple data entries in the same transaction for oid=1---- 8< ---- (
$ python -m zodbtools.zodb commit 1.fs 0000000000000000 <1.txn
03f63ee48fa55c00 $ fs1 dump 1.fs
Trans #00000 tid=03f63ee48fa55c00 time=2024-02-01 10:44:33.667016 offset=35
status=' ' user='author' description='T1'
data #00000 oid=0000000000000001 size=3 class=?.?
data #00001 oid=0000000000000001 size=3 class=?.? try to read that oid=1---- 8< ---- ( from __future__ import print_function
from ZODB.FileStorage import FileStorage
from ZODB.utils import p64, u64
stor = FileStorage('1.fs', read_only=True)
head = stor.lastTransaction()
head_ = p64(u64(head)+1)
oid = p64(1)
x = stor.loadBefore(oid, head_)
print(x) $ python catobj.py
('BBB', '\x03\xf6>\xe4\x8f\xa5\\\x00', None) I'm not sure 100% but probably this is kind of expected behaviour. So taking this into account for consistency it should be also the latest undo record that is taking the effect. And for generality we should also consider cases like:
because here it should be data3 to be returned by loadBefore, not data from T1. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (undo record can be, and probably should be, perceived as regular store with instruction to copy data from particular previous data record) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. With the above point of view I suggest to amend the patch with the following: --- a/src/ZODB/FileStorage/FileStorage.py
+++ b/src/ZODB/FileStorage/FileStorage.py
@@ -663,6 +663,10 @@ def _data_find(self, tpos, oid, data):
# Else oid's data record contains the data, and the file offset of
# oid's data record is returned. This data record should contain
# a pickle identical to the 'data' argument.
+ # When looking for oid's data record we scan all data records in
+ # the transaction till the end looking for the latest record with oid,
+ # even if there are multiple such records. This matches load behaviour
+ # which uses the data record created by last store.
# Unclear: If the length of the stored data doesn't match len(data),
# an exception is raised. If the lengths match but the data isn't
@@ -674,31 +678,36 @@ def _data_find(self, tpos, oid, data):
self._file.read(ul + dl + el)
tend = tpos + tl
pos = self._file.tell()
- backpointer = None
+ data_hdr = None
+ data_pos = 0
+ # scan all data records in this transaction looking for the latest
+ # record with our oid
while pos < tend:
h = self._read_data_header(pos)
if h.oid == oid:
- # Make sure this looks like the right data record
- if h.plen == 0:
- # This is also a backpointer, remember it and keep
- # looking at other records from this transaction,
- # if there is multiple undo records, we use the
- # oldest.
- backpointer = pos
- else:
- if h.plen != len(data):
- # The expected data doesn't match what's in the
- # backpointer. Something is wrong.
- logger.error("Mismatch between data and"
- " backpointer at %d", pos)
- return 0
- _data = self._file.read(h.plen)
- if data != _data:
- return 0
- return pos
+ data_hdr = h
+ data_pos = pos
pos += h.recordlen()
self._file.seek(pos)
- return backpointer or 0
+ if data_hdr is None:
+ return 0
+
+ # return position of found data record, but make sure this looks like
+ # the right data record to return.
+ if data_hdr.plen == 0:
+ # This is also a backpointer, Gotta trust it.
+ return data_pos
+ else:
+ if data_hdr.plen != len(data):
+ # The expected data doesn't match what's in the
+ # backpointer. Something is wrong.
+ logger.error("Mismatch between data and"
+ " backpointer at %d", pos)
+ return 0
+ _data = self._file.read(data_hdr.plen)
+ if data != _data:
+ return 0
+ return data_pos
def restore(self, oid, serial, data, version, prev_txn, transaction):
# A lot like store() but without all the consistency checks. This
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you, I like this patch better, it's more logical. I have integrated this in the pull request. When I wrote that patch, the reason to take the latest data record was to match the behavior of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks, Jérome. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And I see about your original observations. |
||
else: | ||
if h.plen != len(data): | ||
# The expected data doesn't match what's in the | ||
# backpointer. Something is wrong. | ||
logger.error("Mismatch between data and" | ||
" backpointer at %d", pos) | ||
return 0 | ||
_data = self._file.read(h.plen) | ||
if data != _data: | ||
return 0 | ||
return pos | ||
if h.plen != len(data): | ||
# The expected data doesn't match what's in the | ||
# backpointer. Something is wrong. | ||
logger.error("Mismatch between data and" | ||
" backpointer at %d", pos) | ||
return 0 | ||
_data = self._file.read(h.plen) | ||
if data != _data: | ||
return 0 | ||
return pos | ||
pos += h.recordlen() | ||
self._file.seek(pos) | ||
return 0 | ||
return backpointer or 0 | ||
|
||
def restore(self, oid, serial, data, version, prev_txn, transaction): | ||
# A lot like store() but without all the consistency checks. This | ||
|
Uh oh!
There was an error while loading. Please reload this page.