Skip to content

Querying subspace end #78

@keijokapp

Description

@keijokapp

There doesn't seem to be a mechanism to do range operations from an arbitrary key to the end of the subspace. This has been bothering me for years but I've managed to work around or avoid it so far.

I've gone down the rabbit hole trying to understand how subspace ends should be handled in the context of range operations. Maybe it should be a topic for the FDB forum but the issue is at least partially specific to NodeJS bindings.

Problems

  1. Suppose I need keyspace for arbitrary keys (byte strings). Perhaps binary compressed IDs for the users' collection. Any section of the keyspace should be enumerable - from the very beginning to any arbitrary key to the very end. It's currently not possible to query from an arbitrary key to the end of the namespace. At best, I could query until '\xff'.

  2. When using a key encoding, it might not be possible to enumerate even until '\xff'. For example, tuple encoding has no equivalent to '\xff'. At best, I could hope that no actual keys start with a version stamp (0x33) or a UUID (0x30) and use those as the end key.

Workarounds

  1. Use Subspace and .at to remove the configured key encoder and do all packing/unpacking manually (or not configure a key encoder in the first place). That way, the developer can apply strInc to the end key to get the desired behavior.

  2. Use a custom key encoder, similar to a prefix or tuple encoder but with a special value (eg. ES6 symbol) to indicate "end of keyspace". If the encoder encounters that special value, it returns '\xff', otherwise the original key with eg \x00 (or any non-\xff) prefix.

Other bindings

Other bindings avoid the problem with a combination of the following:

  1. Using tuples for everything. Subspace prefixes are always tuples. (Python, Go) Keys are always tuples. (Python, Go). That removes the question about whether keys >=\xff are part of the subspace or not because tuples don't start with \xff.

  2. Expecting users to pack their keys explicitly, enabling them to do strInc or whatever as needed. (Python, Go)

  3. Optional start and end parameters, defaulting to '' and '\xff' respectively. (Python) Though it may or may not be relevant in the context of subspaces.

NodeJS bindings are unique in the sense that they are not specialized for tuple encoding and allow "hiding" an arbitrary key encoder in the subspace, which in turn is "hidden" in the transaction object. That way, the developer can't directly use raw keys.

Keys >='\xff'

There's a forum post explaining that the keyspace foo should not contain entries in the range ['foo\xff', 'fop'). That's not a big problem with the official bindings because keys are expected to be tuples. The official bindings also have inconsistencies (eg. Subspace.contains returning True for a key that wouldn't actually be part of the range). I don't think it's a reasonable limitation. The root \xff keyspace is an abstraction leak which prevents users from using the keyspace for arbitrary keys. The leak makes some sense for the root keyspace but artificially creating it for subspaces does not. In my opinion, the benefits are negligible but the implications are significant. If this is the case, then getRange('') and clearRange('') should apply only to the entries until '\xff' instead of the next key after the keyspace as they currently do. Of course it wouldn't make a difference if there are no keys >='\xff' but it's still an inconsistency.

I think that keys after >='\xff' should be considered first-class citizens in the subspace and have to be fully enumerable just like all other keys.

Solutions

Ideally, the end parameter for getRange and clearRange methods would default to the next key after the subspace instead of after the prefix end. The users could use existing *StartsWith methods to get the current behavior. That's similar to Python bindings. (Also similarly, start could be optional, defaulting to '' independent of the encoder.) Changing this is dangerous though. db.at('foo').clearRange('bar') would have previously cleared only ['foobar', 'foobas') but with the new behavior, it clears ['foobar', 'fop'). A careless developer might not notice a change before deploying to production.

An alternative would be to introduce a special value (in addition to null and undefined which are currently used to indicate "end of prefix") that would be used in a place of end. An ES6 symbol would be fine.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions