Skip to content

Allow key value based transformers with custom replacement function #10

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 12, 2025

Conversation

dfangl
Copy link
Member

@dfangl dfangl commented Mar 6, 2025

Motivation

Currently our KeyValueBasedTransformer only allows static values for replacement.
However, this limits us in the kind of transformers we can build - for me, I wanted to build a key value based transformer where the length of the value was part of the transformation.

Changes

  • Allow replacement function instead of static value and rename to KeyValueBasedTransformerFunctionReplacement (a bit bulky)
  • Create a new KeyValueBasedTransformer with a simple function returning the static value for compatibility with existing code (and convenience)

Open questions

  • The reference replacement is based on the replacement value - so different length replacement within the same snapshot session will have different counters - they are still identifiable, but we do have two ":1" replacements for the same transformer, if the generated replacement differs. I think this is not a big problem, but maybe something we do not want.
  • In general, we can ask ourselves the question if a transformer asserting the length is really something we need (however I argue the more accurate the better).

@dfangl dfangl marked this pull request as ready for review March 6, 2025 14:08
@dfangl dfangl requested review from dominikschubert and tiurin March 6, 2025 16:55
Copy link
Contributor

@tiurin tiurin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great addition @dfangl, thank you! Love the flexibility - example (and maybe a typical use` is for length, but your implementation allows for a more complex use cases as well. 👍
I suggested a test enhancement in a separate PR, would love to know your thoughts.

def key_value_replacement_function(
key: str,
replacement_function: Callable[[str, Any], str] = None,
reference_replacement: bool = True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: so far I've seen reference_replacement as not too intuitive when defaulted to True, especially because it can change anything in the snapshot, including keys. Many times it goes like this: adding key_value transformer, see that comparison breaks, set reference_replacement to False.

Maybe we shouldn't set a default at all to this new enhanced function so that with each use the user thinks about whether they need a reference replacement. I see a drawback that the behavior would then differ from existing transformers, but also have an opinion that the default makes more harm than good.

WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I made a different observation - usually we want to have reference replacements where possible. We just have to avoid reference replacing values in predefined keys - usually this happens if the transformer matches too much, as random data is not usually in our snapshot keys.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, thanks for sharing!. I do see value in reference replacements by default, just not in keys. This is a separate discussion though, happy to leave current behavior for this transformer. 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the key replacement, this is more a side effect than anything else. Happy to work together to fix this in the future!

"hello": "<placeholder(5):1>",
"hello2": "again",
"path": {
"to": {"anotherkey": "hi", "<placeholder(6):1>": {"hello": "<placeholder(6):1>"}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: test it with another same-length value to see if we get :2 in replacement correctly. Also for case without reference replacement would be great to test what happens when function gives the same result for two different strings.
I added a PR for that: #11

@tiurin
Copy link
Contributor

tiurin commented Mar 7, 2025

The reference replacement is based on the replacement value - so different length replacement within the same snapshot session will have different counters - they are still identifiable, but we do have two ":1" replacements for the same transformer, if the generated replacement differs. I think this is not a big problem, but maybe something we do not want.

I'd expect such behavior from reference replacement, don't see a problem here tbh. With kv function transformer you basically explicitly add a pre-processor before the transformer, so you should expect it to give two different replacements for two different function results.

Length example can be a bit misleading though, because two different strings can have the same length: I show that in test change in #11. But since the flexibility of a function goes beyond just length I don't see a problem.

@dfangl dfangl merged commit 8e37a87 into main Mar 12, 2025
3 checks passed
@tiurin tiurin deleted the function-key-value-transformer branch March 13, 2025 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants