Hash Field Expiration RFC #22

ranshid · 2025-05-07T13:10:56Z

No description provided.

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>

HFE.md

zuiderkwast

I didn't read all of it, but don't spend too much time on the document. I'd like to see a working implementation. :)

HFE.md

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>

madolson

Glad to see us making progress on our top feature request!

HFE.md

madolson · 2025-05-07T16:37:52Z

HFE.md

+    NOTE that for some cases (e.g HSETEX, there will be 2 events issued: `HSET` and `EXPIRE`)
+* A new `hpersist` event will be issued whenever an item is persisted. this can be when `HPERSIST` was issued.
+* A new `hexpired` event will be issued whenever an item is actually being expired (either actively or lazily) 
+    NOTE 1 - for the initial implementation the plan is to emit `hexpired` event for each field expiry, however it might be a valid future performance optimization to batch multiple expirations on the same key into a single event reporting.  


Sounds like something we shouldn't optimize for later if we think it's important. That would require client changes.

I am not sure which is better. the event has no indication about the field being expired, BUT if we batch to the same event the application will not get events in the same number as the expired items. not sure which is better performance vs functionality. It is true that batching it later will probably be a breaking change (but in case it will break it also means that some applications are logically dependent on this separation :) )

Are there any other examples of keyspace events which mention multiple keys(fields)? For instance, does HMSET, generate a single event? or multiple?

If there are no examples of events spanning multiple keys/fields, I think it's better to maintain consistency & simplicity. It's very probable that nobody will subscribe to these events, which make performance a non-issue.

Are there any other examples of keyspace events which mention multiple keys(fields)? For instance, does HMSET, generate a single event? or multiple?

Most commands generate a single event per command and not per field. For example hdel is reported once per command. In this proposal the hexpire event will ALSO be reported once per command and not per item. the porposal DOES state that the new hexpired event will be reported once per item. However I can think of improving that if we decide to. For example we can identify that we expired items in the start/end of the access context and decide to issue the event in the closing of the access context.

madolson · 2025-05-08T15:46:42Z

HFE.md

+* As item expiration will produce replication content, in some cases we will avoid applying full expiration logic.
+    the following cases will avoid lazy expiring items:
+     - during HSCAN, HGETALL, Copy (when duplicating an element) and RDB/EOF loading.


I assume also client pause and coordinated failovers.

Yes. true. I mentioned at another place that the expiration context will have the same apply logic as generic key expiration (ie when expirations are paused or on replica etc...)

Said another way... You're saying that READ commands (like HGETALL) shouldn't perform modifications (WRITES) to the data structure. Right?

Modification of the structure (even deleting logically expired data) should be avoided during a "READ" operation.

Modification of the structure (even deleting logically expired data) should be avoided during a "READ" operation.

No. saying that would basically mean we do not support lazy expirations. I am saying that there is a common expiration logic which is applied for generic keys. currently this logic can be observed by reading the content of expireIfNeededWithDictIndex which performs various checks on importing mode, replication client, eviction pause etc... we will also comply with the same logic when deciding to expire hash items.

HFE.md

madolson · 2025-05-08T15:47:37Z

HFE.md

+* As item expiration will produce replication content, in some cases we will avoid applying full expiration logic.
+    the following cases will avoid lazy expiring items:
+     - during HSCAN, HGETALL, Copy (when duplicating an element) and RDB/EOF loading.


I assume also client pause and coordinated failovers.

I also don't follow why HGETALL wouldn't emit deletion events? KEYS * does?

AFAIK KEYS command does NOT expire keys. it will just avoid adding them in the response

Yes, we don't want READ commands to be generating replication traffic.

Yes, we don't want READ commands to be generating replication traffic.

Well, we probably don't want it because of the implementation complexity, however lazy expiring keys is basically what this feature provides. we can decide to avoid READ commands lazy expirations if we want to.

madolson · 2025-05-08T15:53:28Z

HFE.md

+
+### Volatile hash entry memory layout
+
+Currently a field is always an SDS in Valkey. Although it is possible to match a key with external metadata (eg TTL) by mapping the key to the relevant metadata, it will incur extra memory utilization to hold the mapping and will require to use extra CPU cycles in order to locate the TTL per each query. Some dictionaries use objects with embedded keys were the metadata can be set as part of the object. However that would require every dictionary which needs TTL support to use objects with embedded  keys and might significantly complicate existing code paths as well as require extra memory in order hold the object metadata.


I think the folks from VSS want to replace hash values with VSS indexed positions as well, I think that's covered with your referenced value, but wanted to comment.

HFE.md

madolson · 2025-05-08T15:59:47Z

HFE.md

+    Cons:
+     - Error prone - there are many cases where an item is accessed  
+     - Might require extensive code changes.
+     - In some cases can lead to performance degradation on the good-path - It is possible that in order to avoid code complexity we would consider to apply the `itemExpireIfNeeded` logic by first searching the item (provided in the command arguments) and then proceed with the normal implementation. Since double searching the element would probably maintain cache locality for the second search, in reality we observed 2-3% degradation by applying double search on items in the command processing.


We might consider having a dedicated "HashWithExpire" type, that also includes the number of expirable items. That ways we can efficiently skip any logic if we are operating on the special type.

It is possible, however I wanted to check if we can have a change with minimal footprint, without sacrifice other things.

regarding dynamically change the hashtable type... it is doable, small change probably, lets evaluate during the PR review?

that also includes the number of expirable items. That ways we can efficiently skip any logic if we are operating on the special type.

That one is planned and will be part of the initial draft.
we will have both hashTypeHasVolatileItems() and hashTypeNumVolatileItems()

madolson · 2025-05-08T16:02:35Z

HFE.md

+     - Minimal code changes.
+     - Less error prone - since this will be applied in every hashtable access we will reduce the risk of missing item being                 accessed in some flows. 
+Cons:
+      - Additional check for `accessElement` existence in the generic hashtable implementation (we have not yet evaluated if


My intuition is that this will be more degrading than we expect, but it's hard to intuit this without actually seeing the code.

Yeh. the draft will be issues this week (if all goes well)

hwware · 2025-05-08T18:28:09Z

HFE.md

+* Support Redis compatible API to set, get and manipulate Hash field TTL.
+* Support both Lazy and active expiry mechanisms for Hash field expirations.
+* Support Replication of elements TTL as well as expired element replication.
+* Extended support for the same functionality with Sets and Sorted sets.


I just go through several parts quickly, and the Sets and Sorted Set are not mentioned in this RFC, can we just remove these words? Secondly, is there any reason a List can not have the TTL for its element?

SETs and SortedSets are something we should plan for IMO. I am just not sure if we will be able to make it for 9.0 timeline with both of them so we prioritised Hashes first. SETs are the next target.

is there any reason a List can not have the TTL for its element

No special reason aside for the fact that there was no clear request for it in the community. you could use a set or sorted set for that matter right?

Co-authored-by: Madelyn Olson <madelyneolson@gmail.com> Signed-off-by: Ran Shidlansik <ranshid@amazon.com>

xbasel · 2025-05-11T09:40:09Z

HFE.md

+### Tenets
+
+* **memory efficiency** - At the highest priority we target a minimal metadata overhead used in order to store and manage items TTL. While the optimal overhead to maintain item TTL is 8 bytes (could be less if we allow keeping offsets from the existing epoch diff time), we understand that maintaining active expiry logic will require use of more bytes for metadata. We will make our top priority effort to minimize this overhead.
+* **Latency** - After memory efficiency considerations we will require a solution which provides low latency for hash operations. Hash objects are expected to operate in O(1) for all single access operations (get, set, delete etc...) and we will not break this promise even for items with expiry.


"provides low latency" is vague. Do you mean 'doesn’t regress existing latency by a noticeable margin'?

IMO I was clear about asymptotic guarantees.

xbasel · 2025-05-11T10:46:23Z

HFE.md

+1. VALID - meaning the item is to be treated as any existing item and be included in replies as well as operated on during actions and mutations.     
+2. INVALID - meaning the item should be treated as “not exists” and is NOT to be included in any operations and/or replies.
+3. INVALIDATED(deleted) - meaning the item is to-be-removed immediately (i.e expired) and thus does not exist anymore. 


The terms VALID, INVALID, and INVALIDATED are too similar and not intuitive. INVALID suggests corruption, but an expired field is still well-formed and valid. Consider clearer terms like LIVE/NORMAL, EXPIRED, and DELETED/PURGED to reflect TTL state accurately.

I agree. In general, I think we should use the word "expired" to mean that the time has passed for a key, whether it still exists in memory or not. An expired key can exist in the database if we haven't deleted it yet.

We should not use this word as a transitive verb as in "to expire a key". Instead we should say "to delete an expired key".

I agree about this terminology in the scope of lazy expiration (i.e. in the RFC). I used these terms since these are the states recognized by the hashtable which I would like to keep out of scope of "expiration" and "volatile".
I will change it in the RFC.

@xbasel / @zuiderkwast fixed. looks more to your point?

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>

zuiderkwast · 2025-05-11T20:45:29Z

HFE.md

+NOTE - we can also consider adding keys_with_volatile_items statistic to track how many objects have 
+volatile items. eg:
+
+```
+db0:keys=1,expires=0,avg_ttl=0,volatile_items=16,keys_with_volatile_items=1


We don't have anything for items currently so it seems fine to me to skip the info about volatile items initially.

If we want it, then I guess we should also add the total number of items (including non-volatile) and the number of keys with items (including non-volatile items).

Regarding the nameing, shouldn't it match the naming used for keys? Here "expires" means the number of volatile keys. To match that, we could use "items", "item_expires" and "item_avg_ttl".

Later, if we add expiration on set elements, sorted set elements, etc. then all these are considered items too, right? Isn't it more useful to have the number of hash fields with expire, set elements with expire and sorted set elements with expire as separate metrics? The full picture starts to look like Wen's KEYSIZES fields: valkey-io/valkey#1967

The full picture starts to look like Wen's KEYSIZES fields: valkey-io/valkey#1967

Exactly. I think we will be able to add items statistics and such later on as part of the new KEYS observability

Again, I suggest a new line. Also, in the new line, you can't use a prefix like db0: as this might be searched for explicitly.

I just checked what redis does. From their docs of the INFO command, we can see they added "subexpiry":

# Keyspace db0:keys=112125,expires=456,avg_ttl=31368299122246,subexpiry=0

xbasel · 2025-05-12T12:43:59Z

HFE.md

+1. volatile_items will be added to the Keyspace section per-db line. eg:
+
+```
+db0:keys=1,expires=0,avg_ttl=0,volatile_items=16


volatile_items is technically clear, but slightly ambiguous, it might be confused with expires. Maybe:
expire_fields
expiring_fields
volatile_fields

I recommend against modification of this existing INFO line.
This line - containing the number of keys - is one of the most likely lines in INFO to be parsed by a client application (or even a client library). Altering this line has the opportunity to break a large number of existing client applications.

I suggest you add an additional line (or lines) containing the new information.

We can hope that clients that parse this split by comma and can handle extra fields. It may break a few but the alternative (a new line per db) may be uglier.

Redis added "subexpiry=0". Can we do the same? We copy their command API...

JimB123 · 2025-05-16T22:04:49Z

HFE.md

+* **Latency** - After memory efficiency considerations we will require a solution which provides low latency for hash operations. Hash objects are expected to operate in O(1) for all single access operations (get, set, delete etc...) and we will not break this promise even for items with expiry.
+* **CPU efficiency** - After latency we priorities system CPU efficiency. For example we would like to avoid high CPU utilization caused by need to perform null active expiry checks during cron runs.
+* **Compatability** - We will avoid breaking clients which are already using HFE API provided by other providers.
+* **Coherency** -  We would like the reported system state to match the logical state as much as possible. For example the reported number of keys per DB is expected to match the number of keys which are NOT logically expired.  


I'm not completely sure what you're saying here. Are you comparing items in a hash to items in the DB?

From the DB perspective, when we perform INFO, the number of keys reported in the last line will include keys which have passed the expiration time but have not yet been physically deleted, right?

Are you suggesting that HASHes should behave "properly"? and only report the number of unexpired items?

Are you suggesting that HASHes should behave "properly"? and only report the number of unexpired items?

I was mainly setting a tenet here. We will not fully support this at this point (for example HLEN will, return the number of items even though some of the items have already been expired). In some implementations, though, we can basically know EXACTLY how many items are expired. For example in case we track all volatile hash items in rax (like the client's timeout rax) we can provide an O(1) report for the number of items which are already expired. However this would cost much memory to maintain, thus this is currently avoided and the fact that this tenet is lower priority than the memory efficiency is providing the justification for such a decision.

JimB123 · 2025-05-16T22:12:56Z

HFE.md

+    NOTE that for some cases (e.g HSETEX, there will be 2 events issued: `HSET` and `EXPIRE`)
+* A new `hpersist` event will be issued whenever an item is persisted. this can be when `HPERSIST` was issued.
+* A new `hexpired` event will be issued whenever an item is actually being expired (either actively or lazily) 
+    NOTE 1 - for the initial implementation the plan is to emit `hexpired` event for each field expiry, however it might be a valid future performance optimization to batch multiple expirations on the same key into a single event reporting.  


Are there any other examples of keyspace events which mention multiple keys(fields)? For instance, does HMSET, generate a single event? or multiple?

If there are no examples of events spanning multiple keys/fields, I think it's better to maintain consistency & simplicity. It's very probable that nobody will subscribe to these events, which make performance a non-issue.

JimB123 · 2025-05-16T22:19:33Z

HFE.md

+1. volatile_items will be added to the Keyspace section per-db line. eg:
+
+```
+db0:keys=1,expires=0,avg_ttl=0,volatile_items=16


I recommend against modification of this existing INFO line.
This line - containing the number of keys - is one of the most likely lines in INFO to be parsed by a client application (or even a client library). Altering this line has the opportunity to break a large number of existing client applications.

I suggest you add an additional line (or lines) containing the new information.

JimB123 · 2025-05-16T22:20:59Z

HFE.md

+NOTE - we can also consider adding keys_with_volatile_items statistic to track how many objects have 
+volatile items. eg:
+
+```
+db0:keys=1,expires=0,avg_ttl=0,volatile_items=16,keys_with_volatile_items=1


Again, I suggest a new line. Also, in the new line, you can't use a prefix like db0: as this might be searched for explicitly.

JimB123 · 2025-05-16T22:23:41Z

HFE.md

+* As item expiration will produce replication content, in some cases we will avoid applying full expiration logic.
+    the following cases will avoid lazy expiring items:
+     - during HSCAN, HGETALL, Copy (when duplicating an element) and RDB/EOF loading.


Said another way... You're saying that READ commands (like HGETALL) shouldn't perform modifications (WRITES) to the data structure. Right?

Modification of the structure (even deleting logically expired data) should be avoided during a "READ" operation.

JimB123 · 2025-05-16T22:24:21Z

HFE.md

+* As item expiration will produce replication content, in some cases we will avoid applying full expiration logic.
+    the following cases will avoid lazy expiring items:
+     - during HSCAN, HGETALL, Copy (when duplicating an element) and RDB/EOF loading.


Yes, we don't want READ commands to be generating replication traffic.

zuiderkwast · 2025-05-17T09:42:56Z

HFE.md

+    The same `hexpire` event will be issued for all different commands which manipulate item TTL (e.g.  `HEXPIRE`, `HEXPIREAT`, `HPEXPIRE` etc...)
+    NOTE that for some cases (e.g HSETEX, there will be 2 events issued: `HSET` and `HEXPIRE`)
+* A new `hpersist` event will be issued whenever an item is persisted. this can be when `HPERSIST` was issued.
+* A new `hexpired` event will be issued whenever an item is actually being expired (either actively or lazily) 


Regarding "being expired" terminology, I think the word expired should always refer to logically expired.

Suggested change

* A new `hexpired` event will be issued whenever an item is actually being expired (either actively or lazily)

* A new `hexpired` event will be issued whenever an expired item is detected and deleted (either actively or lazily)

xbasel · 2025-06-24T20:31:57Z

HFE.md

+### Active expiry cycle and credits 
+
+We plan to introduce a new type of active expiry cycle (in addition to `ACTIVE_EXPIRE_CYCLE_FAST` and `ACTIVE_EXPIRE_CYCLE_SLOW` ): `ACTIVE_EXPIRE_CYCLE_ITEMS`.
+The new expiry cycle will use the same overall logic as the regular active expiry cycle with the following adjustments:  


Key active expiry is built around iterating over databases, and for each DB it scans db->expires using a hit-or-miss approach over keys with expiry. Hash field expiry doesn’t follow the same model, it’s not based on random sampling, and its pacing is entirely different.

I think trying to couple field expiry with key expiry would require a significant refactor, and I don’t see a clear benefit. Expiring keys and fields are logically unrelated and should remain decoupled. I don’t see how field expiry fits naturally into the existing loops in activeExpireCycle(). Entry expiry should have its own database iterator IMO.

Using the same logic for both means combining two almost entirely independent mechanisms into one, which adds complexity without meaningful gain. Is it really worth it?

Separate sounds good to me, unless Ran has some good motivation for coupling them.

Hash Field Expiration RFC

63b66b5

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>

ranshid marked this pull request as draft May 7, 2025 13:14

madolson reviewed May 7, 2025

View reviewed changes

HFE.md Show resolved Hide resolved

zuiderkwast reviewed May 7, 2025

View reviewed changes

HFE.md Outdated Show resolved Hide resolved

HFE.md Outdated Show resolved Hide resolved

HFE.md Outdated Show resolved Hide resolved

refactor to better match the RFC template

a96c297

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>

madolson reviewed May 8, 2025

View reviewed changes

hwware reviewed May 8, 2025

View reviewed changes

Update HFE.md

a58e32d

Co-authored-by: Madelyn Olson <madelyneolson@gmail.com> Signed-off-by: Ran Shidlansik <ranshid@amazon.com>

xbasel reviewed May 11, 2025

View reviewed changes

ranshid added 3 commits May 11, 2025 18:42

add AOF, RDB and configuration sections

e9b9775

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>

fix PR review comment about lazy expiration terminology

9c7ba43

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>

Add Observability section

3ba5a46

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>

ranshid marked this pull request as ready for review May 11, 2025 16:51

Add RFC header

2f16944

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>

zuiderkwast reviewed May 11, 2025

View reviewed changes

xbasel reviewed May 12, 2025

View reviewed changes

This was referenced May 13, 2025

Support field level expire/TTL for hash, set and sorted set valkey-io/valkey#640

Open

Introduce HASH items expiration valkey-io/valkey#2089

Open

JimB123 reviewed May 16, 2025

View reviewed changes

zuiderkwast reviewed May 17, 2025

View reviewed changes

xbasel reviewed Jun 24, 2025

View reviewed changes


		### Volatile hash entry memory layout

		Currently a field is always an SDS in Valkey. Although it is possible to match a key with external metadata (eg TTL) by mapping the key to the relevant metadata, it will incur extra memory utilization to hold the mapping and will require to use extra CPU cycles in order to locate the TTL per each query. Some dictionaries use objects with embedded keys were the metadata can be set as part of the object. However that would require every dictionary which needs TTL support to use objects with embedded keys and might significantly complicate existing code paths as well as require extra memory in order hold the object metadata.

	* A new `hexpired` event will be issued whenever an item is actually being expired (either actively or lazily)
	* A new `hexpired` event will be issued whenever an expired item is detected and deleted (either actively or lazily)

Hash Field Expiration RFC #22

Are you sure you want to change the base?

Hash Field Expiration RFC #22

Uh oh!

Conversation

ranshid commented May 7, 2025

Uh oh!

Uh oh!

zuiderkwast left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

madolson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ranshid May 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ranshid May 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ranshid May 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ranshid May 11, 2025 •

edited

Loading

ranshid May 11, 2025 •

edited

Loading

ranshid May 11, 2025 •

edited

Loading

ranshid May 18, 2025 •

edited

Loading