-
Notifications
You must be signed in to change notification settings - Fork 220
Description
ZeroTrie has been designed for "forward" lookup (string to integer). We could also support "backward" lookup (offset to string) by adding nodes to the ZeroTrie that allow jumping backward through the trie.
For example, a basic ZeroTrie containing "axb" and "ayc" has these bytes:
b'a'
0b11000010 <-- a branch node containing 2 edges
b'x'
b'y'
2 <-- part of the branch node: jump over 2 bytes if the value is 'y'
b'b'
0b10000000 <-- value 0
b'c'
0b10000001 <-- value 1
Reading the trie forward is as follows; suppose you are looking up "ayc":
- First node is an 'a'. Good, continue.
- Second node is a branch node with 2 values. Binary-search to find 'y', then skip 2 bytes past the end of the branch node.
- Third node is 'c'. Good, continue.
- Fourth node is the value node. This means "ayc" is in the trie.
The jump-backward nodes would be added as follows:
b'a'
0b11000010 <-- a branch node (0b110) containing 2 edges
b'x'
b'y'
3 <-- part of the branch node: jump over 3 bytes if the value is 'y'
0b11100010 <-- a jump-backward node (0b1110) that says to jump 4 bytes back
b'b'
0b10000000 <-- value node (0b10) with value 0
0b11100101 <-- a jump-backward node (0b1110) that says to jump 7 bytes back
b'c'
0b10000001 <-- value node (0b10) with value 1
Reading the trie backward would be as follows; suppose you are trying to extract the string "axb":
- Start at index 6 (this index is the integer that needs to be stored externally in order to perform reverse lookup)
- The node at index 6 is 'b'. Initialize the string with that character.
- Walk back. There is a jump-backward node, which tells us to go to the branch node.
- Evaluate the branch node to find the edge that led to our earlier position. It is a 'x'; prepend it to the string.
- Walk back. The node is 'a'; prepend it to the string.
- We are at the beginning of the string, and we successfully extracted "axb".
Alternatively, we could make a separate data structure that only supports reverse lookup:
b'a'
b'x'
b'b'
0b11100001 <-- skip 2
b'y'
b'c'
With this simpler structure, we only store jump-back nodes and remove everything else. Again, we start at the end of the string, and then walk backward, obeying skip nodes until we reach the start of the string. (We could instead store it in reverse order and walk forward.)
Related: #1397