Skip to content

[NEW]Info Keysizes section #1969

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
hwware opened this issue Apr 17, 2025 · 5 comments
Open

[NEW]Info Keysizes section #1969

hwware opened this issue Apr 17, 2025 · 5 comments
Labels
major-decision-pending Major decision pending by TSC team

Comments

@hwware
Copy link
Member

hwware commented Apr 17, 2025

The problem/use-case that the feature addresses

In Valkey, for any key, we have 2 different concepts to identify it: bigkey or memkey (aka large key). (You can check the valkey-cli options https://valkey.io/topics/cli/)

bigkey: it represents how many elements with this key, includes List, Set, Hash, and Sorted Set, or how long with the String datatype.
memkeys (large key): it represents the number of bytes that a key and its value require to be stored in RAM.

For example:

127.0.0.1:7000> lpush mylist "Hello" "World" "and" "Hello" "Valkey"
(integer) 5

bigkey for mylist: (it returns 5)

127.0.01:7000> llen mylist
(integer) 5

memkey for mylist: (it returns 57 in Valkey 8.1 version)

127.0.01:7000> MEMORY USAGE mylist
(integer) 57

Now in Valkey, we do not directly calculate the total memory consumption for all the keys in memory in real time, we use a
reverse calculation way: used_memory - overhead_memory. Because it is too expensive to calculate the memory usage of every key directly. So it is same, implementing memkey(large keys) in real time is challenging, thus someone use the big keys as an alternative as an approximate method to achieve the large key goal.

Description of the feature

Reference Redis latest feature in Version 8 as below:

127.0.0.1:6379> INFO keysizes

Keysizes
db0_distrib_strings_sizes:1=19,2=655,4=3918,8=19,16=23,32=326,64=5,128=1,256=47,512=100899,1K=31,2K=29,4K=23,8K=16,16K=3,32K=2
db0_distrib_lists_items:1=5784492,2=151535,4=46670,8=20453,16=8637,32=3558,64=1047,128=676,256=533,512=218,4K=1,8K=42
db0_distrib_sets_items:1=7355615,2=207181,4=50612,8=21462,16=8864,32=2905,64=1365,128=974,256=773,512=675,1K=395,2K=292,4K=154,8K=89,16K=48,32K=21,64K=27,128K=16,256K=5,512K=4,1M=2
db0_distrib_hashes_items:2=1,4=544,8=746485,16=324174,32=141169,64=207329,128=4349,256=136226,1K=1

The POC PR is here: #1967
I already implement part of this feature for several commands, such as LPUSH, HSET etc.

Ref: We have another feature for BigKeyLog #1827

Alternatives you've considered

Any alternative solutions or features you've considered, including references to existing open and closed feature requests in this repository.

Additional information

Any additional information that is relevant to the feature request.

@hwware hwware added the major-decision-pending Major decision pending by TSC team label Apr 17, 2025
@PingXie
Copy link
Member

PingXie commented Apr 28, 2025

@hwware, with the upcoming atomic slot migration and slot level metrics (cpu/memory/network), the operator can do their work without having key-level metrics anymore IMO. this seems to be more geared towards developers as oppose to operators? can you talk about the developer use cases a bit?

@wuranxx
Copy link

wuranxx commented Apr 29, 2025

Info keysize show the distribution range of keysizes. I think it can be used to confirm whether the current valkey has large key.

@hwware
Copy link
Member Author

hwware commented Apr 30, 2025

@hwware, with the upcoming atomic slot migration and slot level metrics (cpu/memory/network), the operator can do their work without having key-level metrics anymore IMO. this seems to be more geared towards developers as oppose to operators? can you talk about the developer use cases a bit?

Do you point this issue #852. I think slot level metrics can not align with the key level issue due to the following reasons:

  1. In the standalone mode, there is no slot concept, then slot level metrics won't work
  2. the key level metrics will reflect the key distribution for every datatype, I do not remember the slot level metrics including this kind of requirement. (If i am wrong, pls correct me)

@hwware
Copy link
Member Author

hwware commented May 1, 2025

@hwware, with the upcoming atomic slot migration and slot level metrics (cpu/memory/network), the operator can do their work without having key-level metrics anymore IMO. this seems to be more geared towards developers as oppose to operators? can you talk about the developer use cases a bit?

This #1803 is much closer to the info keysizes feature, but it provides another prospective to admin

@madolson
Copy link
Member

madolson commented May 5, 2025

Discussed during the core team meeting:

  1. Consensus was that having an info field is the better suggestion as compared to a dedicated command such as Adding KEYINFO command to find out keys that have large number of elements #1151.
  2. The implementation needs to be cleaned up a little bit, we can use the location of the first bit instead of looping over the exponential buckets.
  3. We want to evaluate if the implementation can be naturally extended to include memory in the future. If it can, then we can support both with a single configuration option without impacting performance.
  4. We need to evaluate the performance and decide if this needs to be behind a configuration flag.
  5. @PingXie will talk to wen offline about use cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
major-decision-pending Major decision pending by TSC team
Projects
None yet
Development

No branches or pull requests

4 participants