Add transformer for JSON strings #7

tiurin · 2025-01-29T16:50:06Z

Motivation

Some services response contains JSON strings within JSON strings, e.g. Cloudwatch Logs for EventBridge Pipes events.

We do have first level JSON strings parsed but we don't go recursively into parsed JSONs. This leads to the need of extracting and parsing such payloads separately in the test because otherwise only regex transformers can be applied to such string. Ideally, snapshot library should take care of this. See corresponding discussion (thanks @gregfurman for inspiration!).

Also, in the default parser we don't parse top-level JSON arrays.

Changes

Starting to parse nested JSONs in common transformation flow is a breaking change for existing recorded snapshots. This PR proposes non-breaking approach by introducing a separate transformer that can be applied on demand. This draft shows a usage example: https://github.com/localstack/localstack-ext/pull/3971

tests/test_transformer.py

joe4dev

Thank you for thinking about a backward-compatible way of adding such a neat feature, making lists of JSON-encoded snapshots much easier to handle 👏 .

As discussed, the main open part is handling the special edge case of empty strings. Otherwise, I added some suggestions and ideas for future iterations 📈

Idea 💡: One of the disadvantages of automatic parsing is that we are loosing structural parity information, we can:

a) highlight this in our snapshot testing guidelines
b) implement a technical solution at some point such as using some kind of metadata to store the raw value or structural information (e.g., key__META__: {"json": True})

Sidenotes (as discussed and partly related):

Clarify dev installation instructions with troubleshooting for the weird editable install issue (separate quick docs PR)
Improve Utility inheritance structure such that LocalStack Core only adds AWS-specific transformer utils (e.g., lambda_api) and utils such as sorting are implemented in the snapshot library

joe4dev · 2025-01-31T18:19:32Z

tests/test_transformer.py

+    @pytest.mark.parametrize(
+        "input_value,transformed_value",
+        [
+            pytest.param('{"a": "b"}', {"a": "b"}, id="simple_json_object"),


Great parametrization coverage 💯
I like the inline id within pytest.param rather having a disconnected id list

joe4dev · 2025-02-03T16:46:48Z

localstack_snapshot/snapshots/transformer.py

+class JsonStringTransformer:
+    """
+    Parses JSON string at key.
+    Additionally, attempts to parse any JSON strings inside the parsed JSON


docs: Are there any limitations (e.g., we're losing parity information that the original response is actually JSON-encoded and should have some tests covering that)?

We could provide a simple example (e.g., the first test case) to clarify usage and foster easy adoption.

In particular, highlighting the difference to the default behavior might be most relevant, clarifying that it adds extra coverage for JSON lists starting with [ such as [{},{}]

docs: Are there any limitations (e.g., we're losing parity information that the original response is actually JSON-encoded and should have some tests covering that)?

I don't think we're losing parity information here, at least not any more than with other transformers. Conceptually, it is the same as with any other transformer - we apply some transformation to data that can't be compared between AWS and Localstack test runs, or between different runs.
Applying this transformer is an explicit choice of its user. This means that the user decided to parse JSON string at specific key in order to reliably compare its contents - after decoding other transformers like timestamp, UUID, ARNs, etc. will automatically apply.

Updated docstring as suggested, thanks!

Great clarifications and comparison in the docstring 💯 👍
Our future selfes will thank us :)

joe4dev · 2025-02-04T09:00:58Z

localstack_snapshot/snapshots/transformer.py

+                json_value = json.loads(input_data)
+                input_data = self._transform_nested(json_value)
+            except JSONDecodeError:
+                SNAPSHOT_LOGGER.debug(


Good log level choice given nested parsing is best effort and we might encounter false positives here 👏

Do we think it's worth logging here rather than continuing silently similar to the top-level behavior here:

localstack-snapshot/localstack_snapshot/snapshots/prototype.py

Line 279 in 302c993

pass # parsing error can be ignored

I'd err on logging side. By explicitly applying this transformer user expects correct nested JSON parsing, unlike with top-level behavior that is always a best effort, and an implicit one. If there is anything wrong with JSON parsing here I'd rather let user know about it.

joe4dev · 2025-02-04T15:08:11Z

localstack_snapshot/snapshots/transformer.py

+            if k == self.key and isinstance(v, str):
+                try:
+                    SNAPSHOT_LOGGER.debug(f"Replacing string value of {k} with parsed JSON")
+                    json_value = json.loads(v)


as discussed, we're missing a test and fix for empty strings here "" to avoid verbose exception logging

Good point, thanks! Added several empty values test cases - empty string, empty list, empty object. Also added extra check for string to condition.

Thanks for adding extra coverage for these cases 👍

joe4dev · 2025-02-04T15:11:52Z

localstack_snapshot/snapshots/transformer.py

+
+    key: str
+
+    def __init__(self, key: str):


Idea: To facilitate an easy transition, it could be nice to have an easy way to apply this transformer globally, for example, when not providing a key.

sidenote: it ultimately comes down to separating anchors (e.g., JSON path selectors) from actual transformer functionality, but that's far out of scope :)

This transformer handles a specific case and I'd not recommend to apply it globally.

Good point. Let's not introduce unnecessary complexity 👍

joe4dev · 2025-02-04T16:10:01Z

👍 I tested this change against the localstack-ext PR https://github.com/localstack/localstack-ext/pull/3971 with the following changes (not necessary anymore once the Utility re-use hierarchy is fixed) and can confirm it works as expected (apart from the minor empty string stack trace):

         snapshot.add_transformer(
-            snapshot.transform.sorting(
+            SortingTransformer(
                 "events",
                 lambda event: (event["message"]["executionId"], event["message"]["messageType"]),
             )
         )
-        snapshot.add_transformer(snapshot.transform.json_string("payload"))
+        # snapshot.add_transformer(snapshot.transform.json_string("payload"))
+        snapshot.add_transformer(JsonStringTransformer("payload"))

✅ There is no regression (identity check) using this transformer via snapshot.add_transformer(JsonStringTransformer("Document")) in the X-Ray test tests.aws.services.lambda_.test_lambda_xray.TestLambdaXrayIntegration.test_basic_xray_integration (X-Ray commonly uses nested JSON documents, but that's already handled by the default transformer)

This reverts commit 32b9010. This change is somewhat nuclear option for existing snapshots, will be implemented as a separate transformer first.

Avoid verbose exception logging when not necessary.

joe4dev

Thank you for pushing this over the line and extending unit test coverage @tiurin 👍

I re-tested the following two cases:

🟢 There is no regression (identity check) using this transformer via snapshot.add_transformer(JsonStringTransformer("Document")) in the X-Ray test tests.aws.services.lambda_.test_lambda_xray.TestLambdaXrayIntegration.test_basic_xray_integration (X-Ray commonly uses nested JSON documents, but that's already handled by the default transformer)

🟡/🟢 The localstack-ext PR https://github.com/localstack/localstack-ext/pull/3971 requires the following changes. This is primarily syntactic sugar for importing. The behavior works properly for the payload field 👍 :

```diff
         snapshot.add_transformer(
-            snapshot.transform.sorting(
+            SortingTransformer(
                 "events",
                 lambda event: (event["message"]["executionId"], event["message"]["messageType"]),
             )
         )
-        snapshot.add_transformer(snapshot.transform.json_string("payload"))
+        # snapshot.add_transformer(snapshot.transform.json_string("payload"))
+        snapshot.add_transformer(JsonStringTransformer("payload"))
```

🚀 This unblocks https://github.com/localstack/localstack-ext/pull/3971

@tiurin Are you planning to adjust the utility re-use hierarchy in a follow-up, or do we advocate for direct imports?

joe4dev · 2025-05-14T09:02:09Z

localstack_snapshot/snapshots/transformer.py

+class JsonStringTransformer:
+    """
+    Parses JSON string at key.
+    Additionally, attempts to parse any JSON strings inside the parsed JSON


Great clarifications and comparison in the docstring 💯 👍
Our future selfes will thank us :)

joe4dev · 2025-05-14T09:28:17Z

localstack_snapshot/snapshots/transformer.py

+
+    key: str
+
+    def __init__(self, key: str):


Good point. Let's not introduce unnecessary complexity 👍

joe4dev · 2025-05-14T09:28:48Z

localstack_snapshot/snapshots/transformer.py

+            if k == self.key and isinstance(v, str):
+                try:
+                    SNAPSHOT_LOGGER.debug(f"Replacing string value of {k} with parsed JSON")
+                    json_value = json.loads(v)


Thanks for adding extra coverage for these cases 👍

tiurin · 2025-05-14T17:10:13Z

Are you planning to adjust the utility re-use hierarchy in a follow-up, or do we advocate for direct imports?

@joe4dev I was about to propose advocating for direct imports but then key_value factory actually does some additional processing for transformer initialization. Also there is a resource_name one. Let's analyze and elaborate separately.

One thing I surely don't like is having snapshot.add_transformer(snapshot.transform. - passing the parameter from snapshot session to a function also defined in a snapshot session. It's quite difficult to read every time I see it. So definitely some utility and fixture rework is welcome, let's keep this conversation going outside this PR.

In any case, as long as we have the utility, I think TransformerUtility in localstack codebase should inherit the one snapshot library. Right now both are imported in different places across community and pro repos and having the same name they are so easy to confuse. Snapshot fixture even issues a warning about type mismatch when I open it in PyCharm. Also, TransformerUtilityExt inherits the utility from localstack, so right now it doesn't get new methods added to the library one. That's why example PR in ext doesn't compile - this is how I identified the issue first, just didn't change the example yet.
I will definitely address this issue in upcoming PRs, it is a bit of a time bomb.

tiurin requested review from dominikschubert, joe4dev and gregfurman January 29, 2025 16:50

tiurin added the semver: minor label Jan 29, 2025

tiurin commented Feb 4, 2025

View reviewed changes

tests/test_transformer.py Show resolved Hide resolved

joe4dev reviewed Feb 4, 2025

View reviewed changes

tiurin and others added 16 commits May 12, 2025 20:54

Parse nested JSON strings

aefca69

Revert "Parse nested JSON strings"

a42909a

This reverts commit 32b9010. This change is somewhat nuclear option for existing snapshots, will be implemented as a separate transformer first.

Add JsonStringTransformer test

bd8a489

Add JsonStringTransformer implementation

1e4c883

Add tests for nested JSON strings

ac9803f

Parse nested JSON strings

cbcdf8c

Add test for nested key

1ea3d28

Parse nested JSON string

4738d98

Add docstrings

fb2632c

Improve logging

d700da8

Handle possible whitespaces

5429a91

Add missing factory methods to TransformerUtility

32cce9f

Add comment and TODO to common transformation flow

0f18a57

Enhance transformer's docstring

257e3d9

Add empty structures edge cases to tests

023423d

Add sanity check before trying to load JSON at key

6a56a9d

Avoid verbose exception logging when not necessary.

tiurin force-pushed the parse-nested-json-strings branch from a3efa66 to 6a56a9d Compare May 13, 2025 22:34

tiurin requested a review from joe4dev May 13, 2025 22:56

joe4dev approved these changes May 14, 2025

View reviewed changes

tiurin merged commit 6259903 into main May 15, 2025
3 checks passed

Add transformer for JSON strings #7

Add transformer for JSON strings #7

Uh oh!

Conversation

tiurin commented Jan 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Changes

Uh oh!

Uh oh!

joe4dev left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joe4dev commented Feb 4, 2025

Uh oh!

joe4dev left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tiurin commented May 14, 2025

Uh oh!

Uh oh!

Uh oh!

tiurin commented Jan 29, 2025 •

edited

Loading