Skip to content

Adding support for DumpSerializedValue API #2331

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: unstable
Choose a base branch
from

Conversation

cdorantes05
Copy link

@cdorantes05 cdorantes05 commented Jul 8, 2025

The purpose of this API is for Modules to be able to access a key and retrieve its associated value in order to serialize that value. This allows modules to extract and store data in a format that can later be reconstructed.

A use case is when building a CDC module, we need to generate the serialized version of any given key (of any data type). This allows for any specific key and its value(s) to be logged in the CDC message.

Copy link
Collaborator

@hpatro hpatro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall.

This is a new module API, so I believe this requires voting @valkey-io/core-team. Please vote 👍 / 👎

@@ -7532,6 +7533,32 @@ ValkeyModuleString *VM_SaveDataTypeToString(ValkeyModuleCtx *ctx, void *data, co
}
}

/**
* Serialize the value of a given key from the database of the module context's client.
* If the serialization fails for any reason (e.g. key doesn't exist, or rdbSaveObject() fails), return NULL
Copy link
Collaborator

@hpatro hpatro Jul 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't the module benefit from knowing the actual reason of failure?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When does this operation rdbSaveObject fail ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rdbSaveObject can fail when writing out to network, but it never fails when pushing an object in-memory.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought so. We can simplify the comment to if NULL, the key doesn't exists.

src/module.c Outdated
// Create a ValkeyModuleString (robj*) with the result SDS and return it.
sds result = rdb.io.buffer.ptr;
ValkeyModuleString *o = createObject(OBJ_STRING, result);
if (ctx != NULL) autoMemoryAdd(ctx, VALKEYMODULE_AM_STRING, o);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on the entry of the method we check if ctx is NULL, we perform an early exit so we could avoid the NULL check here.

Suggested change
if (ctx != NULL) autoMemoryAdd(ctx, VALKEYMODULE_AM_STRING, o);
autoMemoryAdd(ctx, VALKEYMODULE_AM_STRING, o);

src/module.c Outdated
Comment on lines 7548 to 7550
if (value == NULL) {
return NULL;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consistent with your code style

Suggested change
if (value == NULL) {
return NULL;
}
if (!value) {
return NULL;
}

@@ -362,6 +362,7 @@
#define RedisModule_SelectDb ValkeyModule_SelectDb
#define RedisModule_KeyExists ValkeyModule_KeyExists
#define RedisModule_OpenKey ValkeyModule_OpenKey
#define RedisModule_SaveObjectToString ValkeyModule_SaveObjectToString
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we keep adding this for new API(s) as well? @PingXie

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, these are only for redis module API compat. We can add it if redis has it imo, but otherwise not.

@hpatro hpatro added major-decision-pending Major decision pending by TSC team needs-doc-pr This change needs to update a documentation page. Remove label once doc PR is open. labels Jul 8, 2025
Copy link

codecov bot commented Jul 8, 2025

Codecov Report

Attention: Patch coverage is 7.69231% with 12 lines in your changes missing coverage. Please review.

Project coverage is 71.42%. Comparing base (4827720) to head (733f77a).
Report is 36 commits behind head on unstable.

Files with missing lines Patch % Lines
src/module.c 7.69% 12 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #2331      +/-   ##
============================================
- Coverage     71.43%   71.42%   -0.01%     
============================================
  Files           123      123              
  Lines         66908    67101     +193     
============================================
+ Hits          47795    47930     +135     
- Misses        19113    19171      +58     
Files with missing lines Coverage Δ
src/module.c 9.55% <7.69%> (-0.01%) ⬇️

... and 30 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@zuiderkwast
Copy link
Contributor

zuiderkwast commented Jul 8, 2025

I'm fine with this API but I would like to know more about the use case(s). What module do you want to use it for and why?

src/module.c Outdated
* If the serialization fails for any reason (e.g. key doesn't exist, or rdbSaveObject() fails), return NULL
* Otherwise return ValkeyModuleString which contains the serialized value for the key.
*/
ValkeyModuleString *VM_SaveObjectToString(ValkeyModuleCtx *ctx, ValkeyModuleString *key) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have a better name since it basically retruns serialized value?
e.g., VM_DumpSerializedValue

src/module.c Outdated
if (value == NULL) {
return NULL;
}
rio rdb;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
rio rdb;
rio payload;

Comment on lines 134 to 147

test {DataType: Save Object to String correctly serialzies strings} {
r set key value
set serialized [r datatype.save_object key]
r restore newkey 0 $serialized
r get newkey
} {value}

test {DataType: Save Object to String correctly serializes hashes} {
r hset myhash field1 myvalue
set serialized [r datatype.save_object myhash]
r restore newhash 0 $serialized
r hgetall newhash
} {field1 myvalue}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also add a test for nonexistent key case

@hpatro
Copy link
Collaborator

hpatro commented Jul 8, 2025

I'm fine with this API but I would like to know more about the use case(s). What module do you want to use it for and why?

The team is exploring change data capture use case around Valkey. On a key space notification to the module, they would like to create a serialized dump of the value and pass it forward.

@zuiderkwast
Copy link
Contributor

Change data capture can be used for implementing custom replication and AOF-like backups, etc.?

If the new API returns a dump of a key in RDB format, it's very similar to the DUMP command? like VM_Call(DUMP)

@hpatro
Copy link
Collaborator

hpatro commented Jul 8, 2025

If the new API returns a dump of a key in RDB format, it's very similar to the DUMP command? like VM_Call(DUMP)

Yeah, basically the same. Talked to @QuChen88 about this yesterday. Do we plan on introducing more information to the module API or does VM_Call suffice?

Copy link
Member

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with the high level proposal, but think we should flow through the normal mechanism of opening a key instead of a new way to access keys. Otherwise I agree with others that the right way would be to do VM_Call + DUMP.

src/module.c Outdated
return NULL;
}

int flags = LOOKUP_NOTOUCH | LOOKUP_NONOTIFY | LOOKUP_NOEXPIRE | LOOKUP_NOEFFECTS;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int flags = LOOKUP_NOTOUCH | LOOKUP_NONOTIFY | LOOKUP_NOEXPIRE | LOOKUP_NOEFFECTS;
int flags = LOOKUP_NOEFFECTS;

LOOKUP_NOEFFECTS implies the other three :)

@@ -7532,6 +7533,32 @@ ValkeyModuleString *VM_SaveDataTypeToString(ValkeyModuleCtx *ctx, void *data, co
}
}

/**
* Serialize the value of a given key from the database of the module context's client.
* If the serialization fails for any reason (e.g. key doesn't exist, or rdbSaveObject() fails), return NULL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rdbSaveObject can fail when writing out to network, but it never fails when pushing an object in-memory.

* If the serialization fails for any reason (e.g. key doesn't exist, or rdbSaveObject() fails), return NULL
* Otherwise return ValkeyModuleString which contains the serialized value for the key.
*/
ValkeyModuleString *VM_DumpSerializedValue(ValkeyModuleCtx *ctx, ValkeyModuleString *key) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the more correct API would be to operate on an already opened Key. The handling of the DB parameters here isn't quite right (it will always be DB zero). something like:

ValkeyModuleString *VM_DumpSerializedValue(ValkeyModuleKey *key);

@QuChen88
Copy link
Contributor

QuChen88 commented Jul 9, 2025

Yes this module API is for change data capture where we can capture keyspace notifications and generate a message containing the information about key name / type of update / serialized value post modification, and making the changes available elsewhere in a different system.

Doing it via VM_Call() with DUMP command requires constructing a fake client with the command and passing it through the full command processing flow. It carries a fair amount of overhead compared to the current route of opening the key in the module and getting the dump of its value directly.

Signed-off-by: Carolyn Dorantes <cdorants@amazon.com>
@cdorantes05 cdorantes05 changed the title Adding support for SaveObjectToString API Adding support for DumpSerializedValue API Jul 9, 2025
@zuiderkwast
Copy link
Contributor

@QuChen88 yes VM_Call has a lot of overhead. It's just good to point out the equivalents and alternatives.

Maybe since this is equivalent to the DUMP command, should we call it VM_Dump?

What about VM_Restore, do we need it too? Dump/restore are used together and providing both maybe be more complete?

@QuChen88
Copy link
Contributor

QuChen88 commented Jul 10, 2025

@zuiderkwast For the API name, I think either VM_DumpKey() or VM_DumpValue() is fine.

Even though VM_Restore is not required for CDC, I agree that we can create another module API VM_RestoreKey() as well. Maybe something like: int VM_RestoreKey(ValkeyModuleCtx *ctx, ValkeyModuleString *key, ValkeyModuleString *serializedValue). The new key name and the serialized value are passed in as arguments. The method returns VALKEYMODULE_OK if success, and VALKEYMODULE_ERR if failure.

A couple of questions here about the implement for VM_RestoreKey() due to the complexity of the regular RESTORE command:

  • Can we return ERR if the key already exists?
  • Do we also need to pass in a TTL to create the key with like for regular RESTORE command?
  • Do we also need to do signalModifiedKey() and notifyKeyspaceEvent() here?

@zuiderkwast
Copy link
Contributor

zuiderkwast commented Jul 11, 2025

I have no strong opinion about the name. DumpSerializedValue is OK. It should take a ValkeyModuleKey though.

We can postpone Restore to a separate PR if it's non-trivial. (It too should operate on ValkeyNoduleKey probably.)

@JimB123
Copy link
Contributor

JimB123 commented Jul 11, 2025

Doing it via VM_Call() with DUMP command requires constructing a fake client with the command and passing it through the full command processing flow. It carries a fair amount of overhead compared to the current route of opening the key in the module and getting the dump of its value directly.

I am strongly against creating new module APIs as an alternative for calling existing commands. VM_Call is designed expressly for modules to execute commands. If the module API is inherently inefficient, we should address that inefficiency (if it exists) rather than creating new APIs, willy-nilly, for each command where we suspect that VM_Call is adding "too much" overhead.

In this case, in particular, as far as I can tell, there has been no measurements to show that VM_Call is introducing any significant overhead. A change at this point would be (at best) pre-mature optimization. At worst, this sets a precedent for introducing, documenting, and maintaining hundreds of new module API functions.

We need to address API changes with caution.

@QuChen88
Copy link
Contributor

QuChen88 commented Jul 11, 2025

I looked into the implementation of VM_Call(), it pretty much copies most of the validation logic from processCommand() before it reaches the actual command execution logic via call(). See https://github.com/valkey-io/valkey/blob/unstable/src/module.c#L6336. I am not sure if that was by design or not. Maybe we can optimize out with a new API option to skip validation and directly get to call()?

I agree that in the meantime we can proceed with VM_Call() in the module to achieve the equivalent functionality and do some perf measurements later to understand how much overhead this is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
major-decision-pending Major decision pending by TSC team needs-doc-pr This change needs to update a documentation page. Remove label once doc PR is open.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants