Skip to content

Add transformer for JSON strings #7

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
May 15, 2025
Merged

Add transformer for JSON strings #7

merged 16 commits into from
May 15, 2025

Conversation

tiurin
Copy link
Contributor

@tiurin tiurin commented Jan 29, 2025

Motivation

Some services response contains JSON strings within JSON strings, e.g. Cloudwatch Logs for EventBridge Pipes events.

We do have first level JSON strings parsed but we don't go recursively into parsed JSONs. This leads to the need of extracting and parsing such payloads separately in the test because otherwise only regex transformers can be applied to such string. Ideally, snapshot library should take care of this. See corresponding discussion (thanks @gregfurman for inspiration!).

Also, in the default parser we don't parse top-level JSON arrays.

Changes

Starting to parse nested JSONs in common transformation flow is a breaking change for existing recorded snapshots. This PR proposes non-breaking approach by introducing a separate transformer that can be applied on demand. This draft shows a usage example: https://github.com/localstack/localstack-ext/pull/3971

Copy link
Member

@joe4dev joe4dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for thinking about a backward-compatible way of adding such a neat feature, making lists of JSON-encoded snapshots much easier to handle 👏 .

As discussed, the main open part is handling the special edge case of empty strings. Otherwise, I added some suggestions and ideas for future iterations 📈

Idea 💡: One of the disadvantages of automatic parsing is that we are loosing structural parity information, we can:

  • a) highlight this in our snapshot testing guidelines
  • b) implement a technical solution at some point such as using some kind of metadata to store the raw value or structural information (e.g., key__META__: {"json": True})

Sidenotes (as discussed and partly related):

  • Clarify dev installation instructions with troubleshooting for the weird editable install issue (separate quick docs PR)
  • Improve Utility inheritance structure such that LocalStack Core only adds AWS-specific transformer utils (e.g., lambda_api) and utils such as sorting are implemented in the snapshot library

@pytest.mark.parametrize(
"input_value,transformed_value",
[
pytest.param('{"a": "b"}', {"a": "b"}, id="simple_json_object"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great parametrization coverage 💯
I like the inline id within pytest.param rather having a disconnected id list

class JsonStringTransformer:
"""
Parses JSON string at key.
Additionally, attempts to parse any JSON strings inside the parsed JSON
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs: Are there any limitations (e.g., we're losing parity information that the original response is actually JSON-encoded and should have some tests covering that)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could provide a simple example (e.g., the first test case) to clarify usage and foster easy adoption.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular, highlighting the difference to the default behavior might be most relevant, clarifying that it adds extra coverage for JSON lists starting with [ such as [{},{}]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs: Are there any limitations (e.g., we're losing parity information that the original response is actually JSON-encoded and should have some tests covering that)?

I don't think we're losing parity information here, at least not any more than with other transformers. Conceptually, it is the same as with any other transformer - we apply some transformation to data that can't be compared between AWS and Localstack test runs, or between different runs.
Applying this transformer is an explicit choice of its user. This means that the user decided to parse JSON string at specific key in order to reliably compare its contents - after decoding other transformers like timestamp, UUID, ARNs, etc. will automatically apply.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated docstring as suggested, thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great clarifications and comparison in the docstring 💯 👍
Our future selfes will thank us :)

json_value = json.loads(input_data)
input_data = self._transform_nested(json_value)
except JSONDecodeError:
SNAPSHOT_LOGGER.debug(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good log level choice given nested parsing is best effort and we might encounter false positives here 👏

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we think it's worth logging here rather than continuing silently similar to the top-level behavior here:

pass # parsing error can be ignored

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd err on logging side. By explicitly applying this transformer user expects correct nested JSON parsing, unlike with top-level behavior that is always a best effort, and an implicit one. If there is anything wrong with JSON parsing here I'd rather let user know about it.

if k == self.key and isinstance(v, str):
try:
SNAPSHOT_LOGGER.debug(f"Replacing string value of {k} with parsed JSON")
json_value = json.loads(v)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed, we're missing a test and fix for empty strings here "" to avoid verbose exception logging

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, thanks! Added several empty values test cases - empty string, empty list, empty object. Also added extra check for string to condition.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding extra coverage for these cases 👍


key: str

def __init__(self, key: str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea: To facilitate an easy transition, it could be nice to have an easy way to apply this transformer globally, for example, when not providing a key.

sidenote: it ultimately comes down to separating anchors (e.g., JSON path selectors) from actual transformer functionality, but that's far out of scope :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This transformer handles a specific case and I'd not recommend to apply it globally.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Let's not introduce unnecessary complexity 👍

@joe4dev
Copy link
Member

joe4dev commented Feb 4, 2025

👍 I tested this change against the localstack-ext PR https://github.com/localstack/localstack-ext/pull/3971 with the following changes (not necessary anymore once the Utility re-use hierarchy is fixed) and can confirm it works as expected (apart from the minor empty string stack trace):

         snapshot.add_transformer(
-            snapshot.transform.sorting(
+            SortingTransformer(
                 "events",
                 lambda event: (event["message"]["executionId"], event["message"]["messageType"]),
             )
         )
-        snapshot.add_transformer(snapshot.transform.json_string("payload"))
+        # snapshot.add_transformer(snapshot.transform.json_string("payload"))
+        snapshot.add_transformer(JsonStringTransformer("payload"))

✅ There is no regression (identity check) using this transformer via snapshot.add_transformer(JsonStringTransformer("Document")) in the X-Ray test tests.aws.services.lambda_.test_lambda_xray.TestLambdaXrayIntegration.test_basic_xray_integration (X-Ray commonly uses nested JSON documents, but that's already handled by the default transformer)

@tiurin tiurin force-pushed the parse-nested-json-strings branch from a3efa66 to 6a56a9d Compare May 13, 2025 22:34
@tiurin tiurin requested a review from joe4dev May 13, 2025 22:56
Copy link
Member

@joe4dev joe4dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pushing this over the line and extending unit test coverage @tiurin 👍

I re-tested the following two cases:

🟢 There is no regression (identity check) using this transformer via snapshot.add_transformer(JsonStringTransformer("Document")) in the X-Ray test tests.aws.services.lambda_.test_lambda_xray.TestLambdaXrayIntegration.test_basic_xray_integration (X-Ray commonly uses nested JSON documents, but that's already handled by the default transformer)

🟡/🟢 The localstack-ext PR https://github.com/localstack/localstack-ext/pull/3971 requires the following changes. This is primarily syntactic sugar for importing. The behavior works properly for the payload field 👍 :

```diff
         snapshot.add_transformer(
-            snapshot.transform.sorting(
+            SortingTransformer(
                 "events",
                 lambda event: (event["message"]["executionId"], event["message"]["messageType"]),
             )
         )
-        snapshot.add_transformer(snapshot.transform.json_string("payload"))
+        # snapshot.add_transformer(snapshot.transform.json_string("payload"))
+        snapshot.add_transformer(JsonStringTransformer("payload"))
```

🚀 This unblocks https://github.com/localstack/localstack-ext/pull/3971

@tiurin Are you planning to adjust the utility re-use hierarchy in a follow-up, or do we advocate for direct imports?

class JsonStringTransformer:
"""
Parses JSON string at key.
Additionally, attempts to parse any JSON strings inside the parsed JSON
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great clarifications and comparison in the docstring 💯 👍
Our future selfes will thank us :)


key: str

def __init__(self, key: str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Let's not introduce unnecessary complexity 👍

if k == self.key and isinstance(v, str):
try:
SNAPSHOT_LOGGER.debug(f"Replacing string value of {k} with parsed JSON")
json_value = json.loads(v)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding extra coverage for these cases 👍

@tiurin
Copy link
Contributor Author

tiurin commented May 14, 2025

Are you planning to adjust the utility re-use hierarchy in a follow-up, or do we advocate for direct imports?

@joe4dev I was about to propose advocating for direct imports but then key_value factory actually does some additional processing for transformer initialization. Also there is a resource_name one. Let's analyze and elaborate separately.

One thing I surely don't like is having snapshot.add_transformer(snapshot.transform. - passing the parameter from snapshot session to a function also defined in a snapshot session. It's quite difficult to read every time I see it. So definitely some utility and fixture rework is welcome, let's keep this conversation going outside this PR.

In any case, as long as we have the utility, I think TransformerUtility in localstack codebase should inherit the one snapshot library. Right now both are imported in different places across community and pro repos and having the same name they are so easy to confuse. Snapshot fixture even issues a warning about type mismatch when I open it in PyCharm. Also, TransformerUtilityExt inherits the utility from localstack, so right now it doesn't get new methods added to the library one. That's why example PR in ext doesn't compile - this is how I identified the issue first, just didn't change the example yet.
I will definitely address this issue in upcoming PRs, it is a bit of a time bomb.

@tiurin tiurin merged commit 6259903 into main May 15, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants