-
Notifications
You must be signed in to change notification settings - Fork 14
Hash Field Expiration RFC #22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't read all of it, but don't spend too much time on the document. I'd like to see a working implementation. :)
Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad to see us making progress on our top feature request!
NOTE that for some cases (e.g HSETEX, there will be 2 events issued: `HSET` and `EXPIRE`) | ||
* A new `hpersist` event will be issued whenever an item is persisted. this can be when `HPERSIST` was issued. | ||
* A new `hexpired` event will be issued whenever an item is actually being expired (either actively or lazily) | ||
NOTE 1 - for the initial implementation the plan is to emit `hexpired` event for each field expiry, however it might be a valid future performance optimization to batch multiple expirations on the same key into a single event reporting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like something we shouldn't optimize for later if we think it's important. That would require client changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure which is better. the event has no indication about the field being expired, BUT if we batch to the same event the application will not get events in the same number as the expired items. not sure which is better performance vs functionality. It is true that batching it later will probably be a breaking change (but in case it will break it also means that some applications are logically dependent on this separation :) )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any other examples of keyspace events which mention multiple keys(fields)? For instance, does HMSET, generate a single event? or multiple?
If there are no examples of events spanning multiple keys/fields, I think it's better to maintain consistency & simplicity. It's very probable that nobody will subscribe to these events, which make performance a non-issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any other examples of keyspace events which mention multiple keys(fields)? For instance, does HMSET, generate a single event? or multiple?
Most commands generate a single event per command and not per field. For example hdel is reported once per command. In this proposal the hexpire event will ALSO be reported once per command and not per item. the porposal DOES state that the new hexpired
event will be reported once per item. However I can think of improving that if we decide to. For example we can identify that we expired items in the start/end of the access context and decide to issue the event in the closing of the access context.
* As item expiration will produce replication content, in some cases we will avoid applying full expiration logic. | ||
the following cases will avoid lazy expiring items: | ||
- during HSCAN, HGETALL, Copy (when duplicating an element) and RDB/EOF loading. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume also client pause and coordinated failovers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. true. I mentioned at another place that the expiration context will have the same apply logic as generic key expiration (ie when expirations are paused or on replica etc...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Said another way... You're saying that READ commands (like HGETALL) shouldn't perform modifications (WRITES) to the data structure. Right?
Modification of the structure (even deleting logically expired data) should be avoided during a "READ" operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modification of the structure (even deleting logically expired data) should be avoided during a "READ" operation.
No. saying that would basically mean we do not support lazy expirations. I am saying that there is a common expiration logic which is applied for generic keys. currently this logic can be observed by reading the content of expireIfNeededWithDictIndex
which performs various checks on importing mode, replication client, eviction pause etc... we will also comply with the same logic when deciding to expire hash items.
* As item expiration will produce replication content, in some cases we will avoid applying full expiration logic. | ||
the following cases will avoid lazy expiring items: | ||
- during HSCAN, HGETALL, Copy (when duplicating an element) and RDB/EOF loading. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume also client pause and coordinated failovers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also don't follow why HGETALL wouldn't emit deletion events? KEYS *
does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK KEYS command does NOT expire keys. it will just avoid adding them in the response
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we don't want READ commands to be generating replication traffic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we don't want READ commands to be generating replication traffic.
Well, we probably don't want it because of the implementation complexity, however lazy expiring keys is basically what this feature provides. we can decide to avoid READ commands lazy expirations if we want to.
|
||
### Volatile hash entry memory layout | ||
|
||
Currently a field is always an SDS in Valkey. Although it is possible to match a key with external metadata (eg TTL) by mapping the key to the relevant metadata, it will incur extra memory utilization to hold the mapping and will require to use extra CPU cycles in order to locate the TTL per each query. Some dictionaries use objects with embedded keys were the metadata can be set as part of the object. However that would require every dictionary which needs TTL support to use objects with embedded keys and might significantly complicate existing code paths as well as require extra memory in order hold the object metadata. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the folks from VSS want to replace hash values with VSS indexed positions as well, I think that's covered with your referenced value, but wanted to comment.
Cons: | ||
- Error prone - there are many cases where an item is accessed | ||
- Might require extensive code changes. | ||
- In some cases can lead to performance degradation on the good-path - It is possible that in order to avoid code complexity we would consider to apply the `itemExpireIfNeeded` logic by first searching the item (provided in the command arguments) and then proceed with the normal implementation. Since double searching the element would probably maintain cache locality for the second search, in reality we observed 2-3% degradation by applying double search on items in the command processing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might consider having a dedicated "HashWithExpire" type, that also includes the number of expirable items. That ways we can efficiently skip any logic if we are operating on the special type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible, however I wanted to check if we can have a change with minimal footprint, without sacrifice other things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regarding dynamically change the hashtable type... it is doable, small change probably, lets evaluate during the PR review?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that also includes the number of expirable items. That ways we can efficiently skip any logic if we are operating on the special type.
That one is planned and will be part of the initial draft.
we will have both hashTypeHasVolatileItems()
and hashTypeNumVolatileItems()
- Minimal code changes. | ||
- Less error prone - since this will be applied in every hashtable access we will reduce the risk of missing item being accessed in some flows. | ||
Cons: | ||
- Additional check for `accessElement` existence in the generic hashtable implementation (we have not yet evaluated if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intuition is that this will be more degrading than we expect, but it's hard to intuit this without actually seeing the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeh. the draft will be issues this week (if all goes well)
* Support Redis compatible API to set, get and manipulate Hash field TTL. | ||
* Support both Lazy and active expiry mechanisms for Hash field expirations. | ||
* Support Replication of elements TTL as well as expired element replication. | ||
* Extended support for the same functionality with Sets and Sorted sets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just go through several parts quickly, and the Sets and Sorted Set are not mentioned in this RFC, can we just remove these words? Secondly, is there any reason a List can not have the TTL for its element?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SETs and SortedSets are something we should plan for IMO. I am just not sure if we will be able to make it for 9.0 timeline with both of them so we prioritised Hashes first. SETs are the next target.
is there any reason a List can not have the TTL for its element
No special reason aside for the fact that there was no clear request for it in the community. you could use a set or sorted set for that matter right?
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com> Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
### Tenets | ||
|
||
* **memory efficiency** - At the highest priority we target a minimal metadata overhead used in order to store and manage items TTL. While the optimal overhead to maintain item TTL is 8 bytes (could be less if we allow keeping offsets from the existing epoch diff time), we understand that maintaining active expiry logic will require use of more bytes for metadata. We will make our top priority effort to minimize this overhead. | ||
* **Latency** - After memory efficiency considerations we will require a solution which provides low latency for hash operations. Hash objects are expected to operate in O(1) for all single access operations (get, set, delete etc...) and we will not break this promise even for items with expiry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"provides low latency" is vague. Do you mean 'doesn’t regress existing latency by a noticeable margin'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO I was clear about asymptotic guarantees.
HFE.md
Outdated
1. VALID - meaning the item is to be treated as any existing item and be included in replies as well as operated on during actions and mutations. | ||
2. INVALID - meaning the item should be treated as “not exists” and is NOT to be included in any operations and/or replies. | ||
3. INVALIDATED(deleted) - meaning the item is to-be-removed immediately (i.e expired) and thus does not exist anymore. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The terms VALID, INVALID, and INVALIDATED are too similar and not intuitive. INVALID suggests corruption, but an expired field is still well-formed and valid. Consider clearer terms like LIVE/NORMAL, EXPIRED, and DELETED/PURGED to reflect TTL state accurately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. In general, I think we should use the word "expired" to mean that the time has passed for a key, whether it still exists in memory or not. An expired key can exist in the database if we haven't deleted it yet.
We should not use this word as a transitive verb as in "to expire a key". Instead we should say "to delete an expired key".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree about this terminology in the scope of lazy expiration (i.e. in the RFC). I used these terms since these are the states recognized by the hashtable which I would like to keep out of scope of "expiration" and "volatile".
I will change it in the RFC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xbasel / @zuiderkwast fixed. looks more to your point?
Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
NOTE - we can also consider adding keys_with_volatile_items statistic to track how many objects have | ||
volatile items. eg: | ||
|
||
``` | ||
db0:keys=1,expires=0,avg_ttl=0,volatile_items=16,keys_with_volatile_items=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have anything for items currently so it seems fine to me to skip the info about volatile items initially.
If we want it, then I guess we should also add the total number of items (including non-volatile) and the number of keys with items (including non-volatile items).
Regarding the nameing, shouldn't it match the naming used for keys? Here "expires" means the number of volatile keys. To match that, we could use "items", "item_expires" and "item_avg_ttl".
Later, if we add expiration on set elements, sorted set elements, etc. then all these are considered items too, right? Isn't it more useful to have the number of hash fields with expire, set elements with expire and sorted set elements with expire as separate metrics? The full picture starts to look like Wen's KEYSIZES fields: valkey-io/valkey#1967
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The full picture starts to look like Wen's KEYSIZES fields: valkey-io/valkey#1967
Exactly. I think we will be able to add items statistics and such later on as part of the new KEYS observability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I suggest a new line. Also, in the new line, you can't use a prefix like db0:
as this might be searched for explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just checked what redis does. From their docs of the INFO command, we can see they added "subexpiry":
# Keyspace
db0:keys=112125,expires=456,avg_ttl=31368299122246,subexpiry=0
1. volatile_items will be added to the Keyspace section per-db line. eg: | ||
|
||
``` | ||
db0:keys=1,expires=0,avg_ttl=0,volatile_items=16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
volatile_items
is technically clear, but slightly ambiguous, it might be confused with expires
. Maybe:
expire_fields
expiring_fields
volatile_fields
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend against modification of this existing INFO line.
This line - containing the number of keys - is one of the most likely lines in INFO to be parsed by a client application (or even a client library). Altering this line has the opportunity to break a large number of existing client applications.
I suggest you add an additional line (or lines) containing the new information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can hope that clients that parse this split by comma and can handle extra fields. It may break a few but the alternative (a new line per db) may be uglier.
Redis added "subexpiry=0". Can we do the same? We copy their command API...
* **Latency** - After memory efficiency considerations we will require a solution which provides low latency for hash operations. Hash objects are expected to operate in O(1) for all single access operations (get, set, delete etc...) and we will not break this promise even for items with expiry. | ||
* **CPU efficiency** - After latency we priorities system CPU efficiency. For example we would like to avoid high CPU utilization caused by need to perform null active expiry checks during cron runs. | ||
* **Compatability** - We will avoid breaking clients which are already using HFE API provided by other providers. | ||
* **Coherency** - We would like the reported system state to match the logical state as much as possible. For example the reported number of keys per DB is expected to match the number of keys which are NOT logically expired. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not completely sure what you're saying here. Are you comparing items in a hash to items in the DB?
From the DB perspective, when we perform INFO, the number of keys reported in the last line will include keys which have passed the expiration time but have not yet been physically deleted, right?
Are you suggesting that HASHes should behave "properly"? and only report the number of unexpired items?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting that HASHes should behave "properly"? and only report the number of unexpired items?
I was mainly setting a tenet here. We will not fully support this at this point (for example HLEN will, return the number of items even though some of the items have already been expired). In some implementations, though, we can basically know EXACTLY how many items are expired. For example in case we track all volatile hash items in rax (like the client's timeout rax) we can provide an O(1) report for the number of items which are already expired. However this would cost much memory to maintain, thus this is currently avoided and the fact that this tenet is lower priority than the memory efficiency is providing the justification for such a decision.
NOTE that for some cases (e.g HSETEX, there will be 2 events issued: `HSET` and `EXPIRE`) | ||
* A new `hpersist` event will be issued whenever an item is persisted. this can be when `HPERSIST` was issued. | ||
* A new `hexpired` event will be issued whenever an item is actually being expired (either actively or lazily) | ||
NOTE 1 - for the initial implementation the plan is to emit `hexpired` event for each field expiry, however it might be a valid future performance optimization to batch multiple expirations on the same key into a single event reporting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any other examples of keyspace events which mention multiple keys(fields)? For instance, does HMSET, generate a single event? or multiple?
If there are no examples of events spanning multiple keys/fields, I think it's better to maintain consistency & simplicity. It's very probable that nobody will subscribe to these events, which make performance a non-issue.
1. volatile_items will be added to the Keyspace section per-db line. eg: | ||
|
||
``` | ||
db0:keys=1,expires=0,avg_ttl=0,volatile_items=16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend against modification of this existing INFO line.
This line - containing the number of keys - is one of the most likely lines in INFO to be parsed by a client application (or even a client library). Altering this line has the opportunity to break a large number of existing client applications.
I suggest you add an additional line (or lines) containing the new information.
NOTE - we can also consider adding keys_with_volatile_items statistic to track how many objects have | ||
volatile items. eg: | ||
|
||
``` | ||
db0:keys=1,expires=0,avg_ttl=0,volatile_items=16,keys_with_volatile_items=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I suggest a new line. Also, in the new line, you can't use a prefix like db0:
as this might be searched for explicitly.
* As item expiration will produce replication content, in some cases we will avoid applying full expiration logic. | ||
the following cases will avoid lazy expiring items: | ||
- during HSCAN, HGETALL, Copy (when duplicating an element) and RDB/EOF loading. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Said another way... You're saying that READ commands (like HGETALL) shouldn't perform modifications (WRITES) to the data structure. Right?
Modification of the structure (even deleting logically expired data) should be avoided during a "READ" operation.
* As item expiration will produce replication content, in some cases we will avoid applying full expiration logic. | ||
the following cases will avoid lazy expiring items: | ||
- during HSCAN, HGETALL, Copy (when duplicating an element) and RDB/EOF loading. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we don't want READ commands to be generating replication traffic.
The same `hexpire` event will be issued for all different commands which manipulate item TTL (e.g. `HEXPIRE`, `HEXPIREAT`, `HPEXPIRE` etc...) | ||
NOTE that for some cases (e.g HSETEX, there will be 2 events issued: `HSET` and `HEXPIRE`) | ||
* A new `hpersist` event will be issued whenever an item is persisted. this can be when `HPERSIST` was issued. | ||
* A new `hexpired` event will be issued whenever an item is actually being expired (either actively or lazily) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding "being expired" terminology, I think the word expired should always refer to logically expired.
* A new `hexpired` event will be issued whenever an item is actually being expired (either actively or lazily) | |
* A new `hexpired` event will be issued whenever an expired item is detected and deleted (either actively or lazily) |
No description provided.