-
Notifications
You must be signed in to change notification settings - Fork 1
Add transformer for JSON strings #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for thinking about a backward-compatible way of adding such a neat feature, making lists of JSON-encoded snapshots much easier to handle 👏 .
As discussed, the main open part is handling the special edge case of empty strings. Otherwise, I added some suggestions and ideas for future iterations 📈
Idea 💡: One of the disadvantages of automatic parsing is that we are loosing structural parity information, we can:
- a) highlight this in our snapshot testing guidelines
- b) implement a technical solution at some point such as using some kind of metadata to store the raw value or structural information (e.g.,
key__META__
: {"json": True})
Sidenotes (as discussed and partly related):
- Clarify dev installation instructions with troubleshooting for the weird editable install issue (separate quick docs PR)
- Improve Utility inheritance structure such that LocalStack Core only adds AWS-specific transformer utils (e.g., lambda_api) and utils such as
sorting
are implemented in the snapshot library
@pytest.mark.parametrize( | ||
"input_value,transformed_value", | ||
[ | ||
pytest.param('{"a": "b"}', {"a": "b"}, id="simple_json_object"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great parametrization coverage 💯
I like the inline id
within pytest.param
rather having a disconnected id list
class JsonStringTransformer: | ||
""" | ||
Parses JSON string at key. | ||
Additionally, attempts to parse any JSON strings inside the parsed JSON |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs: Are there any limitations (e.g., we're losing parity information that the original response is actually JSON-encoded and should have some tests covering that)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could provide a simple example (e.g., the first test case) to clarify usage and foster easy adoption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In particular, highlighting the difference to the default behavior might be most relevant, clarifying that it adds extra coverage for JSON lists starting with [
such as [{},{}]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs: Are there any limitations (e.g., we're losing parity information that the original response is actually JSON-encoded and should have some tests covering that)?
I don't think we're losing parity information here, at least not any more than with other transformers. Conceptually, it is the same as with any other transformer - we apply some transformation to data that can't be compared between AWS and Localstack test runs, or between different runs.
Applying this transformer is an explicit choice of its user. This means that the user decided to parse JSON string at specific key in order to reliably compare its contents - after decoding other transformers like timestamp, UUID, ARNs, etc. will automatically apply.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated docstring as suggested, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great clarifications and comparison in the docstring 💯 👍
Our future selfes will thank us :)
json_value = json.loads(input_data) | ||
input_data = self._transform_nested(json_value) | ||
except JSONDecodeError: | ||
SNAPSHOT_LOGGER.debug( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good log level choice given nested parsing is best effort and we might encounter false positives here 👏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we think it's worth logging here rather than continuing silently similar to the top-level behavior here:
pass # parsing error can be ignored |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd err on logging side. By explicitly applying this transformer user expects correct nested JSON parsing, unlike with top-level behavior that is always a best effort, and an implicit one. If there is anything wrong with JSON parsing here I'd rather let user know about it.
if k == self.key and isinstance(v, str): | ||
try: | ||
SNAPSHOT_LOGGER.debug(f"Replacing string value of {k} with parsed JSON") | ||
json_value = json.loads(v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as discussed, we're missing a test and fix for empty strings here ""
to avoid verbose exception logging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, thanks! Added several empty values test cases - empty string, empty list, empty object. Also added extra check for string to condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding extra coverage for these cases 👍
|
||
key: str | ||
|
||
def __init__(self, key: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idea: To facilitate an easy transition, it could be nice to have an easy way to apply this transformer globally, for example, when not providing a key.
sidenote: it ultimately comes down to separating anchors (e.g., JSON path selectors) from actual transformer functionality, but that's far out of scope :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This transformer handles a specific case and I'd not recommend to apply it globally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Let's not introduce unnecessary complexity 👍
👍 I tested this change against the localstack-ext PR https://github.com/localstack/localstack-ext/pull/3971 with the following changes (not necessary anymore once the Utility re-use hierarchy is fixed) and can confirm it works as expected (apart from the minor empty string stack trace): snapshot.add_transformer(
- snapshot.transform.sorting(
+ SortingTransformer(
"events",
lambda event: (event["message"]["executionId"], event["message"]["messageType"]),
)
)
- snapshot.add_transformer(snapshot.transform.json_string("payload"))
+ # snapshot.add_transformer(snapshot.transform.json_string("payload"))
+ snapshot.add_transformer(JsonStringTransformer("payload")) ✅ There is no regression (identity check) using this transformer via |
This reverts commit 32b9010. This change is somewhat nuclear option for existing snapshots, will be implemented as a separate transformer first.
Avoid verbose exception logging when not necessary.
a3efa66
to
6a56a9d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for pushing this over the line and extending unit test coverage @tiurin 👍
I re-tested the following two cases:
🟢 There is no regression (identity check) using this transformer via snapshot.add_transformer(JsonStringTransformer("Document"))
in the X-Ray test tests.aws.services.lambda_.test_lambda_xray.TestLambdaXrayIntegration.test_basic_xray_integration
(X-Ray commonly uses nested JSON documents, but that's already handled by the default transformer)
🟡/🟢 The localstack-ext PR https://github.com/localstack/localstack-ext/pull/3971 requires the following changes. This is primarily syntactic sugar for importing. The behavior works properly for the payload field 👍 :
```diff
snapshot.add_transformer(
- snapshot.transform.sorting(
+ SortingTransformer(
"events",
lambda event: (event["message"]["executionId"], event["message"]["messageType"]),
)
)
- snapshot.add_transformer(snapshot.transform.json_string("payload"))
+ # snapshot.add_transformer(snapshot.transform.json_string("payload"))
+ snapshot.add_transformer(JsonStringTransformer("payload"))
```
🚀 This unblocks https://github.com/localstack/localstack-ext/pull/3971
@tiurin Are you planning to adjust the utility re-use hierarchy in a follow-up, or do we advocate for direct imports?
class JsonStringTransformer: | ||
""" | ||
Parses JSON string at key. | ||
Additionally, attempts to parse any JSON strings inside the parsed JSON |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great clarifications and comparison in the docstring 💯 👍
Our future selfes will thank us :)
|
||
key: str | ||
|
||
def __init__(self, key: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Let's not introduce unnecessary complexity 👍
if k == self.key and isinstance(v, str): | ||
try: | ||
SNAPSHOT_LOGGER.debug(f"Replacing string value of {k} with parsed JSON") | ||
json_value = json.loads(v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding extra coverage for these cases 👍
@joe4dev I was about to propose advocating for direct imports but then One thing I surely don't like is having In any case, as long as we have the utility, I think |
Motivation
Some services response contains JSON strings within JSON strings, e.g. Cloudwatch Logs for EventBridge Pipes events.
We do have first level JSON strings parsed but we don't go recursively into parsed JSONs. This leads to the need of extracting and parsing such payloads separately in the test because otherwise only regex transformers can be applied to such string. Ideally, snapshot library should take care of this. See corresponding discussion (thanks @gregfurman for inspiration!).
Also, in the default parser we don't parse top-level JSON arrays.
Changes
Starting to parse nested JSONs in common transformation flow is a breaking change for existing recorded snapshots. This PR proposes non-breaking approach by introducing a separate transformer that can be applied on demand. This draft shows a usage example: https://github.com/localstack/localstack-ext/pull/3971