Skip to content

Commit 55261f3

Browse files
committed
loadAt
loadAt is new optional storage interface that is intended to replace loadBefore with more clean and uniform semantic. Compared to loadBefore, loadAt: 1) returns data=None and serial of the removal, when loaded object was found to be deleted. loadBefore is returning only data=None in such case. This loadAt property allows to fix DemoStorage data corruption when whiteouts in overlay part were not previously correctly taken into account. #318 2) for regular data records, does not require storages to return next_serial, in addition to (data, serial). loadBefore requirement to return both serial and next_serial is constraining storages unnecessarily, and, while for FileStorage it is free to implement, for other storages it is not - for example for NEO and RelStorage, finding out next_serial, after looking up oid@at data record, costs one more SQL query: https://lab.nexedi.com/nexedi/neoppod/blob/fb746e6b/neo/storage/database/mysqldb.py#L484-508 https://lab.nexedi.com/nexedi/neoppod/blob/fb746e6b/neo/storage/database/mysqldb.py#L477-482 https://github.com/zodb/relstorage/blob/3.1.1-1-ge7628f9/src/relstorage/storage/load.py#L259-L264 https://github.com/zodb/relstorage/blob/3.1.1-1-ge7628f9/src/relstorage/adapters/mover.py#L177-L199 next_serial is not only about execution overhead - it is semantically redundant to be there and can be removed from load return. The reason I say that next_serial can be removed is that in ZODB/py the only place, that I could find, where next_serial is used on client side is in client cache (e.g. in NEO client cache), and that cache can be remade to work without using that next_serial at all. In simple words whenever after loadAt(oid, at) -> (data, serial) query, the cache can remember data for oid in [serial, at] range. Next, when invalidation message from server is received, cache entries, that had at == client_head, are extended (at -> new_head) for oids that are not present in invalidation message, while for oids that are present in invalidation message no such extension is done. This allows to maintain cache in correct state, invalidate it when there is a need to invalidate, and not to throw away cache entries that should remain live. This of course requires ZODB server to include both modified and just-created objects into invalidation messages ( zopefoundation/ZEO#160 , #319 ). Switching to loadAt should thus allow storages like NEO and, maybe, RelStorage, to do 2x less SQL queries on every object access. #318 (comment) In other words loadAt unifies return signature to always be (data, serial) instead of POSKeyError object does not exist at all None object was removed (data, serial, next_serial) regular data record used by loadBefore. This patch: - introduces new interface. - introduces ZODB.utils.loadAt helper, that uses either storage.loadAt, or, if the storage does not implement loadAt interface, tries to mimic loadAt semantic via storage.loadBefore to possible extent + emits corresponding warning. - converts MVCCAdapter to use loadAt instead of loadBefore. - changes DemoStorage to use loadAt, and this way fixes above-mentioned data corruption issue; adds corresponding test; converts DemoStorage.loadBefore to be a wrapper around DemoStorage.loadAt. - adds loadAt implementation to FileStorage and MappingStorage. - adapts other tests/code correspondingly. /cc @jimfulton, @jamadden, @vpelletier, @jmuchemb, @arnaud-fontaine, @gidzit, @klawlf82, @hannosch
1 parent 22d1405 commit 55261f3

17 files changed

+353
-79
lines changed

src/ZODB/BaseStorage.py

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ class BaseStorage(UndoLogCompatible):
5959
6060
If it stores multiple revisions, it should implement
6161
loadSerial()
62-
loadBefore()
62+
loadAt()
6363
6464
Each storage will have two locks that are accessed via lock
6565
acquire and release methods bound to the instance. (Yuck.)
@@ -267,9 +267,8 @@ def loadSerial(self, oid, serial):
267267
raise POSException.Unsupported(
268268
"Retrieval of historical revisions is not supported")
269269

270-
def loadBefore(self, oid, tid):
271-
"""Return most recent revision of oid before tid committed."""
272-
return None
270+
# do not provide loadAt/loadBefore here in BaseStorage - if child forgets
271+
# to override it - storage will always return "no data" instead of failing.
273272

274273
def copyTransactionsFrom(self, other, verbose=0):
275274
"""Copy transactions from another storage.

src/ZODB/DB.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -726,8 +726,7 @@ def open(self, transaction_manager=None, at=None, before=None):
726726
- `before`: like `at`, but opens the readonly state before the
727727
tid or datetime.
728728
"""
729-
# `at` is normalized to `before`, since we use storage.loadBefore
730-
# as the underlying implementation of both.
729+
# `at` is normalized to `before`.
731730
before = getTID(at, before)
732731
if (before is not None and
733732
before > self.lastTransaction() and

src/ZODB/DemoStorage.py

Lines changed: 48 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
import random
2525
import weakref
2626
import tempfile
27+
import warnings
2728
import ZODB.BaseStorage
2829
import ZODB.blob
2930
import ZODB.interfaces
@@ -217,45 +218,55 @@ def __len__(self):
217218
# still want load for old clients (e.g. zeo servers)
218219
load = load_current
219220

220-
def loadBefore(self, oid, tid):
221-
try:
222-
result = self.changes.loadBefore(oid, tid)
223-
except ZODB.POSException.POSKeyError:
224-
# The oid isn't in the changes, so defer to base
225-
return self.base.loadBefore(oid, tid)
221+
def loadAt(self, oid, at):
222+
data, serial = ZODB.utils.loadAt(self.changes, oid, at)
223+
if (data is not None) or (serial != ZODB.utils.z64):
224+
# object is present in changes either as data or deletion record.
225+
return data, serial
226226

227-
if result is None:
228-
# The oid *was* in the changes, but there aren't any
229-
# earlier records. Maybe there are in the base.
230-
try:
231-
result = self.base.loadBefore(oid, tid)
232-
except ZODB.POSException.POSKeyError:
233-
# The oid isn't in the base, so None will be the right result
234-
pass
227+
# object is not present in changes at all - use base
228+
return ZODB.utils.loadAt(self.base, oid, at)
229+
230+
def loadBefore(self, oid, before):
231+
warnings.warn("loadBefore is deprecated - use loadAt instead",
232+
DeprecationWarning, stacklevel=2)
233+
p64 = ZODB.utils.p64
234+
u64 = ZODB.utils.u64
235+
236+
if before in (maxtid, ZODB.utils.z64):
237+
at = before
238+
else:
239+
at = p64(u64(before)-1)
240+
241+
data, serial = self.loadAt(oid, at)
242+
243+
# find out next_serial.
244+
# it is ok to use dumb/slow implementation since loadBefore should not
245+
# be used and is provided only for backward compatibility.
246+
next_serial = maxtid
247+
while 1:
248+
_, s = self.loadAt(oid, p64(u64(next_serial)-1))
249+
assert s >= serial
250+
if s == serial:
251+
# found - next_serial is serial of the next data record
252+
break
253+
next_serial = s
254+
255+
if next_serial == maxtid:
256+
next_serial = None
257+
258+
# next_serial found -> return/raise what loadBefore users expect
259+
if data is None:
260+
if next_serial is None:
261+
# object was never created
262+
raise ZODB.POSException.POSKeyError(oid)
235263
else:
236-
if result and not result[-1]:
237-
# The oid is current in the base. We need to find
238-
# the end tid in the base by fining the first tid
239-
# in the changes. Unfortunately, there isn't an
240-
# api for this, so we have to walk back using
241-
# loadBefore.
242-
243-
if tid == maxtid:
244-
# Special case: we were looking for the
245-
# current value. We won't find anything in
246-
# changes, so we're done.
247-
return result
248-
249-
end_tid = maxtid
250-
t = self.changes.loadBefore(oid, end_tid)
251-
while t:
252-
end_tid = t[1]
253-
t = self.changes.loadBefore(oid, end_tid)
254-
result = result[:2] + (
255-
end_tid if end_tid != maxtid else None,
256-
)
257-
258-
return result
264+
# object was deleted
265+
return None
266+
267+
# regular data record
268+
return data, serial, next_serial
269+
259270

260271
def loadBlob(self, oid, serial):
261272
try:

src/ZODB/FileStorage/FileStorage.py

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
import logging
2222
import os
2323
import time
24+
import warnings
2425
from struct import pack
2526
from struct import unpack
2627

@@ -57,6 +58,7 @@
5758
from ZODB.interfaces import IStorageIteration
5859
from ZODB.interfaces import IStorageRestoreable
5960
from ZODB.interfaces import IStorageUndoable
61+
from ZODB.interfaces import IStorageLoadAt
6062
from ZODB.POSException import ConflictError
6163
from ZODB.POSException import MultipleUndoErrors
6264
from ZODB.POSException import POSKeyError
@@ -133,6 +135,7 @@ def __init__(self, afile):
133135
IStorageCurrentRecordIteration,
134136
IExternalGC,
135137
IStorage,
138+
IStorageLoadAt,
136139
)
137140
class FileStorage(
138141
FileStorageFormatter,
@@ -559,7 +562,40 @@ def loadSerial(self, oid, serial):
559562
else:
560563
return self._loadBack_impl(oid, h.back)[0]
561564

565+
def loadAt(self, oid, at):
566+
"""loadAt implements IStorageLoadAt."""
567+
with self._files.get() as _file:
568+
try:
569+
pos = self._lookup_pos(oid)
570+
except POSKeyError:
571+
# object does not exist
572+
return None, z64
573+
574+
while 1:
575+
h = self._read_data_header(pos, oid, _file)
576+
if h.tid <= at:
577+
break
578+
pos = h.prev
579+
if not pos:
580+
# object not yet created as of at
581+
return None, z64
582+
583+
# h is the most recent DataHeader with .tid <= at
584+
if h.plen:
585+
# regular data record
586+
return _file.read(h.plen), h.tid
587+
elif h.back:
588+
# backpointer
589+
data, _, _, _ = self._loadBack_impl(oid, h.back,
590+
fail=False, _file=_file)
591+
return data, h.tid
592+
else:
593+
# deletion
594+
return None, h.tid
595+
562596
def loadBefore(self, oid, tid):
597+
warnings.warn("loadBefore is deprecated - use loadAt instead",
598+
DeprecationWarning, stacklevel=2)
563599
with self._files.get() as _file:
564600
pos = self._lookup_pos(oid)
565601
end_tid = None

src/ZODB/MappingStorage.py

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919

2020
import BTrees
2121
import time
22+
import warnings
2223
import ZODB.BaseStorage
2324
import ZODB.interfaces
2425
import ZODB.POSException
@@ -30,6 +31,7 @@
3031
@zope.interface.implementer(
3132
ZODB.interfaces.IStorage,
3233
ZODB.interfaces.IStorageIteration,
34+
ZODB.interfaces.IStorageLoadAt,
3335
)
3436
class MappingStorage(object):
3537
"""In-memory storage implementation
@@ -148,9 +150,26 @@ def __len__(self):
148150

149151
load = ZODB.utils.load_current
150152

153+
# ZODB.interfaces.IStorageLoadAt
154+
@ZODB.utils.locked(opened)
155+
def loadAt(self, oid, at):
156+
z64 = ZODB.utils.z64
157+
tid_data = self._data.get(oid)
158+
if not tid_data:
159+
return None, z64
160+
if at == z64:
161+
return None, z64
162+
tids_at = tid_data.keys(None, at)
163+
if not tids_at:
164+
return None, z64
165+
serial = tids_at[-1]
166+
return tid_data[serial], serial
167+
151168
# ZODB.interfaces.IStorage
152169
@ZODB.utils.locked(opened)
153170
def loadBefore(self, oid, tid):
171+
warnings.warn("loadBefore is deprecated - use loadAt instead",
172+
DeprecationWarning, stacklevel=2)
154173
tid_data = self._data.get(oid)
155174
if tid_data:
156175
before = ZODB.utils.u64(tid)

src/ZODB/blob.py

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -866,10 +866,10 @@ def undo(self, serial_id, transaction):
866866
for oid in self.fshelper.getOIDsForSerial(serial_id):
867867
# we want to find the serial id of the previous revision
868868
# of this blob object.
869-
load_result = self.loadBefore(oid, serial_id)
870-
871-
if load_result is None:
869+
at_before = utils.p64(utils.u64(serial_id)-1)
870+
_, serial_before = utils.loadAt(self, oid, at_before)
872871

872+
if serial_before == utils.z64:
873873
# There was no previous revision of this blob
874874
# object. The blob was created in the transaction
875875
# represented by serial_id. We copy the blob data
@@ -884,7 +884,6 @@ def undo(self, serial_id, transaction):
884884
# transaction implied by "serial_id". We copy the blob
885885
# data to a new file that references the undo transaction
886886
# in case a user wishes to undo this undo.
887-
data, serial_before, serial_after = load_result
888887
orig_fn = self.fshelper.getBlobFilename(oid, serial_before)
889888
new_fn = self.fshelper.getBlobFilename(oid, undo_serial)
890889
with open(orig_fn, "rb") as orig:

src/ZODB/historical_connections.rst

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ development continues on a "development" head.
3636

3737
A database can be opened historically ``at`` or ``before`` a given transaction
3838
serial or datetime. Here's a simple example. It should work with any storage
39-
that supports ``loadBefore``.
39+
that supports ``loadAt`` or ``loadBefore``.
4040

4141
We'll begin our example with a fairly standard set up. We
4242

@@ -138,10 +138,9 @@ root.
138138
>>> historical_conn.root()['first']['count']
139139
0
140140

141-
In fact, ``at`` arguments are translated into ``before`` values because the
142-
underlying mechanism is a storage's loadBefore method. When you look at a
143-
connection's ``before`` attribute, it is normalized into a ``before`` serial,
144-
no matter what you pass into ``db.open``.
141+
In fact, ``at`` arguments are translated into ``before`` values.
142+
When you look at a connection's ``before`` attribute, it is normalized into a
143+
``before`` serial, no matter what you pass into ``db.open``.
145144

146145
>>> print(conn.before)
147146
None

src/ZODB/interfaces.py

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -710,6 +710,9 @@ def __len__():
710710
def loadBefore(oid, tid):
711711
"""Load the object data written before a transaction id
712712
713+
( This method is deprecated and kept for backward-compatibility.
714+
Please use loadAt instead. )
715+
713716
If there isn't data before the object before the given
714717
transaction, then None is returned, otherwise three values are
715718
returned:
@@ -886,6 +889,24 @@ def prefetch(oids, tid):
886889
more than once.
887890
"""
888891

892+
class IStorageLoadAt(Interface):
893+
894+
def loadAt(oid, at): # -> (data, serial)
895+
"""Load object data as observed at given database state.
896+
897+
loadAt returns data for object with given object ID as observed by
898+
database state ≤ at. Two values are returned:
899+
900+
- The data record,
901+
- The transaction ID of the data record.
902+
903+
If the object does not exist, or is deleted as of `at` database state,
904+
loadAt returns data=None, and serial indicates transaction ID of the
905+
most recent deletion done in transaction with ID ≤ at, or null tid if
906+
there is no such deletion.
907+
908+
Note: no POSKeyError is raised even if object id is not in the storage.
909+
"""
889910

890911
class IMultiCommitStorage(IStorage):
891912
"""A multi-commit storage can commit multiple transactions at once.

src/ZODB/mvccadapter.py

Lines changed: 24 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
import zope.interface
1111

1212
from . import interfaces, serialize, POSException
13-
from .utils import p64, u64, Lock, oid_repr, tid_repr
13+
from .utils import p64, u64, z64, maxtid, Lock, loadAt, oid_repr, tid_repr
1414

1515
class Base(object):
1616

@@ -99,7 +99,7 @@ class MVCCAdapterInstance(Base):
9999
'checkCurrentSerialInTransaction', 'tpc_abort',
100100
)
101101

102-
_start = None # Transaction start time
102+
_start = None # Transaction start time (before)
103103
_ltid = b'' # Last storage transaction id
104104

105105
def __init__(self, base):
@@ -151,8 +151,20 @@ def poll_invalidations(self):
151151

152152
def load(self, oid):
153153
assert self._start is not None
154-
r = self._storage.loadBefore(oid, self._start)
155-
if r is None:
154+
at = p64(u64(self._start)-1)
155+
data, serial = loadAt(self._storage, oid, at)
156+
if data is None:
157+
# raise POSKeyError if object does not exist at all
158+
# TODO raise POSKeyError always and switch to raising ReadOnlyError only when
159+
# actually detecting that load is being affected by simultaneous pack (see below).
160+
if serial == z64:
161+
# XXX second call to loadAt - it will become unneeded once we
162+
# switch to raising POSKeyError.
163+
_, serial_exists = loadAt(self._storage, oid, maxtid)
164+
if serial_exists == z64:
165+
# object does not exist at all
166+
raise POSException.POSKeyError(oid)
167+
156168
# object was deleted or not-yet-created.
157169
# raise ReadConflictError - not - POSKeyError due to backward
158170
# compatibility: a pack(t+δ) could be running simultaneously to our
@@ -174,11 +186,14 @@ def load(self, oid):
174186
# whether pack is/was actually running and its details, take that
175187
# into account, and raise ReadConflictError only in the presence of
176188
# database being simultaneously updated from back of its log.
189+
#
190+
# See https://github.com/zopefoundation/ZODB/pull/322 for
191+
# preliminary steps in this direction.
177192
raise POSException.ReadConflictError(
178193
"load %s @%s: object deleted, likely by simultaneous pack" %
179194
(oid_repr(oid), tid_repr(p64(u64(self._start) - 1))))
180195

181-
return r[:2]
196+
return data, serial
182197

183198
def prefetch(self, oids):
184199
try:
@@ -257,10 +272,11 @@ def poll_invalidations(self):
257272
new_oid = pack = store = read_only_writer
258273

259274
def load(self, oid, version=''):
260-
r = self._storage.loadBefore(oid, self._before)
261-
if r is None:
275+
at = p64(u64(self._before)-1)
276+
data, serial = loadAt(self._storage, oid, at)
277+
if data is None:
262278
raise POSException.POSKeyError(oid)
263-
return r[:2]
279+
return data, serial
264280

265281

266282
class UndoAdapterInstance(Base):

0 commit comments

Comments
 (0)