Skip to content

Commit 42ed738

Browse files
committed
loadAt
loadAt is new optional storage interface that is intended to replace loadBefore with more clean and uniform semantic. Compared to loadBefore, loadAt: 1) returns data=None and serial of the removal, when loaded object was found to be deleted. loadBefore is returning only data=None in such case. This loadAt property allows to fix DemoStorage data corruption when whiteouts in overlay part were not previously correctly taken into account. #318 2) for regular data records, does not require storages to return next_serial, in addition to (data, serial). loadBefore requirement to return both serial and next_serial is constraining storages unnecessarily, and, while for FileStorage it is free to implement, for other storages it is not - for example for NEO and RelStorage, finding out next_serial, after looking up oid@at data record, costs one more SQL query: https://lab.nexedi.com/nexedi/neoppod/blob/fb746e6b/neo/storage/database/mysqldb.py#L484-508 https://lab.nexedi.com/nexedi/neoppod/blob/fb746e6b/neo/storage/database/mysqldb.py#L477-482 https://github.com/zodb/relstorage/blob/3.1.1-1-ge7628f9/src/relstorage/storage/load.py#L259-L264 https://github.com/zodb/relstorage/blob/3.1.1-1-ge7628f9/src/relstorage/adapters/mover.py#L177-L199 next_serial is not only about execution overhead - it is semantically redundant to be there and can be removed from load return. The reason I say that next_serial can be removed is that in ZODB/py the only place, that I could find, where next_serial is used on client side is in client cache (e.g. in NEO client cache), and that cache can be remade to work without using that next_serial at all. In simple words whenever after loadAt(oid, at) -> (data, serial) query, the cache can remember data for oid in [serial, at] range. Next, when invalidation message from server is received, cache entries, that had at == client_head, are extended (at -> new_head) for oids that are not present in invalidation message, while for oids that are present in invalidation message no such extension is done. This allows to maintain cache in correct state, invalidate it when there is a need to invalidate, and not to throw away cache entries that should remain live. This of course requires ZODB server to include both modified and just-created objects into invalidation messages ( zopefoundation/ZEO#160 , #319 ). Switching to loadAt should thus allow storages like NEO and, maybe, RelStorage, to do 2x less SQL queries on every object access. #318 (comment) In other words loadAt unifies return signature to always be (data, serial) instead of POSKeyError object does not exist at all None object was removed (data, serial, next_serial) regular data record used by loadBefore. This patch: - introduces new interface. - introduces ZODB.utils.loadAt helper, that uses either storage.loadAt, or, if the storage does not implement loadAt interface, tries to mimic loadAt semantic via storage.loadBefore to possible extent + emits corresponding warning. - converts MVCCAdapter to use loadAt instead of loadBefore. - changes DemoStorage to use loadAt, and this way fixes above-mentioned data corruption issue; adds corresponding test; converts DemoStorage.loadBefore to be a wrapper around DemoStorage.loadAt. - adds loadAt implementation to FileStorage and MappingStorage. - adapts other tests/code correspondingly. /cc @jimfulton, @jamadden, @vpelletier, @jmuchemb, @arnaud-fontaine, @gidzit, @klawlf82, @hannosch
1 parent b446045 commit 42ed738

17 files changed

+335
-79
lines changed

src/ZODB/BaseStorage.py

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ class BaseStorage(UndoLogCompatible):
5959
6060
If it stores multiple revisions, it should implement
6161
loadSerial()
62-
loadBefore()
62+
loadAt()
6363
6464
Each storage will have two locks that are accessed via lock
6565
acquire and release methods bound to the instance. (Yuck.)
@@ -267,9 +267,8 @@ def loadSerial(self, oid, serial):
267267
raise POSException.Unsupported(
268268
"Retrieval of historical revisions is not supported")
269269

270-
def loadBefore(self, oid, tid):
271-
"""Return most recent revision of oid before tid committed."""
272-
return None
270+
# do not provide loadAt/loadBefore here in BaseStorage - if child forgets
271+
# to override it - storage will always return "no data" instead of failing.
273272

274273
def copyTransactionsFrom(self, other, verbose=0):
275274
"""Copy transactions from another storage.

src/ZODB/DB.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -726,8 +726,7 @@ def open(self, transaction_manager=None, at=None, before=None):
726726
- `before`: like `at`, but opens the readonly state before the
727727
tid or datetime.
728728
"""
729-
# `at` is normalized to `before`, since we use storage.loadBefore
730-
# as the underlying implementation of both.
729+
# `at` is normalized to `before`.
731730
before = getTID(at, before)
732731
if (before is not None and
733732
before > self.lastTransaction() and

src/ZODB/DemoStorage.py

Lines changed: 48 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
import random
2525
import weakref
2626
import tempfile
27+
import warnings
2728
import ZODB.BaseStorage
2829
import ZODB.blob
2930
import ZODB.interfaces
@@ -223,45 +224,55 @@ def __len__(self):
223224
# still want load for old clients (e.g. zeo servers)
224225
load = load_current
225226

226-
def loadBefore(self, oid, tid):
227-
try:
228-
result = self.changes.loadBefore(oid, tid)
229-
except ZODB.POSException.POSKeyError:
230-
# The oid isn't in the changes, so defer to base
231-
return self.base.loadBefore(oid, tid)
227+
def loadAt(self, oid, at):
228+
data, serial = ZODB.utils.loadAt(self.changes, oid, at)
229+
if (data is not None) or (serial != ZODB.utils.z64):
230+
# object is present in changes either as data or deletion record.
231+
return data, serial
232232

233-
if result is None:
234-
# The oid *was* in the changes, but there aren't any
235-
# earlier records. Maybe there are in the base.
236-
try:
237-
result = self.base.loadBefore(oid, tid)
238-
except ZODB.POSException.POSKeyError:
239-
# The oid isn't in the base, so None will be the right result
240-
pass
233+
# object is not present in changes at all - use base
234+
return ZODB.utils.loadAt(self.base, oid, at)
235+
236+
def loadBefore(self, oid, before):
237+
warnings.warn("loadBefore is deprecated - use loadAt instead",
238+
DeprecationWarning, stacklevel=2)
239+
p64 = ZODB.utils.p64
240+
u64 = ZODB.utils.u64
241+
242+
if before in (maxtid, ZODB.utils.z64):
243+
at = before
244+
else:
245+
at = p64(u64(before)-1)
246+
247+
data, serial = self.loadAt(oid, at)
248+
249+
# find out next_serial.
250+
# it is ok to use dumb/slow implementation since loadBefore should not
251+
# be used and is provided only for backward compatibility.
252+
next_serial = maxtid
253+
while 1:
254+
_, s = self.loadAt(oid, p64(u64(next_serial)-1))
255+
assert s >= serial
256+
if s == serial:
257+
# found - next_serial is serial of the next data record
258+
break
259+
next_serial = s
260+
261+
if next_serial == maxtid:
262+
next_serial = None
263+
264+
# next_serial found -> return/raise what loadBefore users expect
265+
if data is None:
266+
if next_serial is None:
267+
# object was never created
268+
raise ZODB.POSException.POSKeyError(oid)
241269
else:
242-
if result and not result[-1]:
243-
# The oid is current in the base. We need to find
244-
# the end tid in the base by fining the first tid
245-
# in the changes. Unfortunately, there isn't an
246-
# api for this, so we have to walk back using
247-
# loadBefore.
248-
249-
if tid == maxtid:
250-
# Special case: we were looking for the
251-
# current value. We won't find anything in
252-
# changes, so we're done.
253-
return result
254-
255-
end_tid = maxtid
256-
t = self.changes.loadBefore(oid, end_tid)
257-
while t:
258-
end_tid = t[1]
259-
t = self.changes.loadBefore(oid, end_tid)
260-
result = result[:2] + (
261-
end_tid if end_tid != maxtid else None,
262-
)
263-
264-
return result
270+
# object was deleted
271+
return None
272+
273+
# regular data record
274+
return data, serial, next_serial
275+
265276

266277
def loadBlob(self, oid, serial):
267278
try:

src/ZODB/FileStorage/FileStorage.py

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
import logging
2222
import os
2323
import time
24+
import warnings
2425
from struct import pack
2526
from struct import unpack
2627

@@ -58,6 +59,7 @@
5859
from ZODB.interfaces import IStorageRestoreable
5960
from ZODB.interfaces import IStorageUndoable
6061
from ZODB.interfaces import IStorageLastPack
62+
from ZODB.interfaces import IStorageLoadAt
6163
from ZODB.POSException import ConflictError
6264
from ZODB.POSException import MultipleUndoErrors
6365
from ZODB.POSException import POSKeyError
@@ -135,6 +137,7 @@ def __init__(self, afile):
135137
IExternalGC,
136138
IStorage,
137139
IStorageLastPack,
140+
IStorageLoadAt,
138141
)
139142
class FileStorage(
140143
FileStorageFormatter,
@@ -566,7 +569,40 @@ def loadSerial(self, oid, serial):
566569
else:
567570
return self._loadBack_impl(oid, h.back)[0]
568571

572+
def loadAt(self, oid, at):
573+
"""loadAt implements IStorageLoadAt."""
574+
with self._files.get() as _file:
575+
try:
576+
pos = self._lookup_pos(oid)
577+
except POSKeyError:
578+
# object does not exist
579+
return None, z64
580+
581+
while 1:
582+
h = self._read_data_header(pos, oid, _file)
583+
if h.tid <= at:
584+
break
585+
pos = h.prev
586+
if not pos:
587+
# object not yet created as of at
588+
return None, z64
589+
590+
# h is the most recent DataHeader with .tid <= at
591+
if h.plen:
592+
# regular data record
593+
return _file.read(h.plen), h.tid
594+
elif h.back:
595+
# backpointer
596+
data, _, _, _ = self._loadBack_impl(oid, h.back,
597+
fail=False, _file=_file)
598+
return data, h.tid
599+
else:
600+
# deletion
601+
return None, h.tid
602+
569603
def loadBefore(self, oid, tid):
604+
warnings.warn("loadBefore is deprecated - use loadAt instead",
605+
DeprecationWarning, stacklevel=2)
570606
with self._files.get() as _file:
571607
pos = self._lookup_pos(oid)
572608
end_tid = None

src/ZODB/MappingStorage.py

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919

2020
import BTrees
2121
import time
22+
import warnings
2223
import ZODB.BaseStorage
2324
import ZODB.interfaces
2425
import ZODB.POSException
@@ -31,6 +32,7 @@
3132
ZODB.interfaces.IStorage,
3233
ZODB.interfaces.IStorageIteration,
3334
ZODB.interfaces.IStorageLastPack,
35+
ZODB.interfaces.IStorageLoadAt,
3436
)
3537
class MappingStorage(object):
3638
"""In-memory storage implementation
@@ -149,9 +151,26 @@ def __len__(self):
149151

150152
load = ZODB.utils.load_current
151153

154+
# ZODB.interfaces.IStorageLoadAt
155+
@ZODB.utils.locked(opened)
156+
def loadAt(self, oid, at):
157+
z64 = ZODB.utils.z64
158+
tid_data = self._data.get(oid)
159+
if not tid_data:
160+
return None, z64
161+
if at == z64:
162+
return None, z64
163+
tids_at = tid_data.keys(None, at)
164+
if not tids_at:
165+
return None, z64
166+
serial = tids_at[-1]
167+
return tid_data[serial], serial
168+
152169
# ZODB.interfaces.IStorage
153170
@ZODB.utils.locked(opened)
154171
def loadBefore(self, oid, tid):
172+
warnings.warn("loadBefore is deprecated - use loadAt instead",
173+
DeprecationWarning, stacklevel=2)
155174
tid_data = self._data.get(oid)
156175
if tid_data:
157176
before = ZODB.utils.u64(tid)

src/ZODB/blob.py

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -866,10 +866,10 @@ def undo(self, serial_id, transaction):
866866
for oid in self.fshelper.getOIDsForSerial(serial_id):
867867
# we want to find the serial id of the previous revision
868868
# of this blob object.
869-
load_result = self.loadBefore(oid, serial_id)
870-
871-
if load_result is None:
869+
at_before = utils.p64(utils.u64(serial_id)-1)
870+
_, serial_before = utils.loadAt(self, oid, at_before)
872871

872+
if serial_before == utils.z64:
873873
# There was no previous revision of this blob
874874
# object. The blob was created in the transaction
875875
# represented by serial_id. We copy the blob data
@@ -884,7 +884,6 @@ def undo(self, serial_id, transaction):
884884
# transaction implied by "serial_id". We copy the blob
885885
# data to a new file that references the undo transaction
886886
# in case a user wishes to undo this undo.
887-
data, serial_before, serial_after = load_result
888887
orig_fn = self.fshelper.getBlobFilename(oid, serial_before)
889888
new_fn = self.fshelper.getBlobFilename(oid, undo_serial)
890889
with open(orig_fn, "rb") as orig:

src/ZODB/historical_connections.rst

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ development continues on a "development" head.
3636

3737
A database can be opened historically ``at`` or ``before`` a given transaction
3838
serial or datetime. Here's a simple example. It should work with any storage
39-
that supports ``loadBefore``.
39+
that supports ``loadAt`` or ``loadBefore``.
4040

4141
We'll begin our example with a fairly standard set up. We
4242

@@ -138,10 +138,9 @@ root.
138138
>>> historical_conn.root()['first']['count']
139139
0
140140

141-
In fact, ``at`` arguments are translated into ``before`` values because the
142-
underlying mechanism is a storage's loadBefore method. When you look at a
143-
connection's ``before`` attribute, it is normalized into a ``before`` serial,
144-
no matter what you pass into ``db.open``.
141+
In fact, ``at`` arguments are translated into ``before`` values.
142+
When you look at a connection's ``before`` attribute, it is normalized into a
143+
``before`` serial, no matter what you pass into ``db.open``.
145144

146145
>>> print(conn.before)
147146
None

src/ZODB/interfaces.py

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -701,6 +701,9 @@ def __len__():
701701
def loadBefore(oid, tid):
702702
"""Load the object data written before a transaction id
703703
704+
( This method is deprecated and kept for backward-compatibility.
705+
Please use loadAt instead. )
706+
704707
If there isn't data before the object before the given
705708
transaction, then None is returned, otherwise three values are
706709
returned:
@@ -907,6 +910,24 @@ def lastPack(): # -> pack-cut-point (tid)
907910
to perform round-trip and synchronize with the server.
908911
"""
909912

913+
class IStorageLoadAt(Interface):
914+
915+
def loadAt(oid, at): # -> (data, serial)
916+
"""Load object data as observed at given database state.
917+
918+
loadAt returns data for object with given object ID as observed by
919+
database state ≤ at. Two values are returned:
920+
921+
- The data record,
922+
- The transaction ID of the data record.
923+
924+
If the object does not exist, or is deleted as of `at` database state,
925+
loadAt returns data=None, and serial indicates transaction ID of the
926+
most recent deletion done in transaction with ID ≤ at, or null tid if
927+
there is no such deletion.
928+
929+
Note: no POSKeyError is raised even if object id is not in the storage.
930+
"""
910931

911932
class IMultiCommitStorage(IStorage):
912933
"""A multi-commit storage can commit multiple transactions at once.

src/ZODB/mvccadapter.py

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
import zope.interface
1111

1212
from . import interfaces, serialize, POSException
13-
from .utils import p64, u64, Lock, oid_repr, tid_repr
13+
from .utils import p64, u64, Lock, loadAt, oid_repr, tid_repr
1414

1515
class Base(object):
1616

@@ -99,7 +99,7 @@ class MVCCAdapterInstance(Base):
9999
'checkCurrentSerialInTransaction', 'tpc_abort',
100100
)
101101

102-
_start = None # Transaction start time
102+
_start = None # Transaction start time (before)
103103
_ltid = b'' # Last storage transaction id
104104

105105
def __init__(self, base):
@@ -151,8 +151,9 @@ def poll_invalidations(self):
151151

152152
def load(self, oid):
153153
assert self._start is not None
154-
r = self._storage.loadBefore(oid, self._start)
155-
if r is None:
154+
at = p64(u64(self._start)-1)
155+
data, serial = loadAt(self._storage, oid, at)
156+
if data is None:
156157
# object was deleted or not-yet-created.
157158
# raise POSKeyError, or ReadConflictError, if the deletion is
158159
# potentially due to simultaneous pack: a pack(t+δ) could be
@@ -186,7 +187,7 @@ def load(self, oid):
186187
# no simultaneous pack detected, or lastPack was before our view of the database
187188
raise POSException.POSKeyError(oid)
188189

189-
return r[:2]
190+
return data, serial
190191

191192
def prefetch(self, oids):
192193
try:
@@ -265,10 +266,11 @@ def poll_invalidations(self):
265266
new_oid = pack = store = read_only_writer
266267

267268
def load(self, oid, version=''):
268-
r = self._storage.loadBefore(oid, self._before)
269-
if r is None:
269+
at = p64(u64(self._before)-1)
270+
data, serial = loadAt(self._storage, oid, at)
271+
if data is None:
270272
raise POSException.POSKeyError(oid)
271-
return r[:2]
273+
return data, serial
272274

273275

274276
class UndoAdapterInstance(Base):

0 commit comments

Comments
 (0)