-
Notifications
You must be signed in to change notification settings - Fork 1
Add transformer for JSON strings #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
aefca69
a42909a
bd8a489
1e4c883
ac9803f
cbcdf8c
1ea3d28
4738d98
fb2632c
d700da8
5429a91
32cce9f
0f18a57
257e3d9
023423d
6a56a9d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||
---|---|---|---|---|
@@ -1,8 +1,10 @@ | ||||
import copy | ||||
import json | ||||
import logging | ||||
import os | ||||
import re | ||||
from datetime import datetime | ||||
from json import JSONDecodeError | ||||
from re import Pattern | ||||
from typing import Any, Callable, Optional, Protocol | ||||
|
||||
|
@@ -375,3 +377,78 @@ def replace_val(s): | |||
f"Registering text pattern '{self.text}' in snapshot with '{self.replacement}'" | ||||
) | ||||
return input_data | ||||
|
||||
|
||||
class JsonStringTransformer: | ||||
""" | ||||
Parses JSON string at the specified key. | ||||
Additionally, attempts to parse any JSON strings inside the parsed JSON | ||||
|
||||
This transformer complements the default parsing of JSON strings in | ||||
localstack_snapshot.snapshots.prototype.SnapshotSession._transform_dict_to_parseable_values | ||||
|
||||
Shortcomings of the default parser that this transformer addresses: | ||||
- parsing of nested JSON strings '{"a": "{\\"b\\":42}"}' | ||||
- parsing of JSON arrays at the specified key, e.g. '["a", "b"]' | ||||
|
||||
Such parsing allows applying transformations further to the elements of the parsed JSON - timestamps, ARNs, etc. | ||||
|
||||
Such parsing is not done by default because it's not a common use case. | ||||
Whether to parse a JSON string or not should be decided by the user on a case by case basis. | ||||
Limited general parsing that we already have is preserved for backwards compatibility. | ||||
""" | ||||
|
||||
key: str | ||||
|
||||
def __init__(self, key: str): | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Idea: To facilitate an easy transition, it could be nice to have an easy way to apply this transformer globally, for example, when not providing a key. sidenote: it ultimately comes down to separating anchors (e.g., JSON path selectors) from actual transformer functionality, but that's far out of scope :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This transformer handles a specific case and I'd not recommend to apply it globally. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point. Let's not introduce unnecessary complexity 👍 |
||||
self.key = key | ||||
|
||||
def transform(self, input_data: dict, *, ctx: TransformContext = None) -> dict: | ||||
return self._transform_dict(input_data, ctx=ctx) | ||||
|
||||
def _transform(self, input_data: Any, ctx: TransformContext = None) -> Any: | ||||
if isinstance(input_data, dict): | ||||
return self._transform_dict(input_data, ctx=ctx) | ||||
elif isinstance(input_data, list): | ||||
return self._transform_list(input_data, ctx=ctx) | ||||
return input_data | ||||
|
||||
def _transform_dict(self, input_data: dict, ctx: TransformContext = None) -> dict: | ||||
for k, v in input_data.items(): | ||||
if k == self.key and isinstance(v, str) and v.strip().startswith(("{", "[")): | ||||
try: | ||||
SNAPSHOT_LOGGER.debug(f"Replacing string value of {k} with parsed JSON") | ||||
json_value = json.loads(v) | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. as discussed, we're missing a test and fix for empty strings here There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point, thanks! Added several empty values test cases - empty string, empty list, empty object. Also added extra check for string to condition. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for adding extra coverage for these cases 👍 |
||||
input_data[k] = self._transform_nested(json_value) | ||||
except JSONDecodeError: | ||||
SNAPSHOT_LOGGER.exception( | ||||
f'Value mapped to "{k}" key is not a valid JSON string and won\'t be transformed. Value: {v}' | ||||
) | ||||
else: | ||||
input_data[k] = self._transform(v, ctx=ctx) | ||||
return input_data | ||||
|
||||
def _transform_list(self, input_data: list, ctx: TransformContext = None) -> list: | ||||
return [self._transform(item, ctx=ctx) for item in input_data] | ||||
|
||||
def _transform_nested(self, input_data: Any) -> Any: | ||||
""" | ||||
Separate method from the main `_transform_dict` one because | ||||
it checks every string while the main one attempts to load at specified key only. | ||||
This one is implicit, best-effort attempt, | ||||
while the main one is explicit about at which key transform should happen | ||||
""" | ||||
if isinstance(input_data, list): | ||||
input_data = [self._transform_nested(item) for item in input_data] | ||||
if isinstance(input_data, dict): | ||||
for k, v in input_data.items(): | ||||
input_data[k] = self._transform_nested(v) | ||||
if isinstance(input_data, str) and input_data.strip().startswith(("{", "[")): | ||||
try: | ||||
json_value = json.loads(input_data) | ||||
input_data = self._transform_nested(json_value) | ||||
except JSONDecodeError: | ||||
SNAPSHOT_LOGGER.debug( | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good log level choice given nested parsing is best effort and we might encounter false positives here 👏 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we think it's worth logging here rather than continuing silently similar to the top-level behavior here:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd err on logging side. By explicitly applying this transformer user expects correct nested JSON parsing, unlike with top-level behavior that is always a best effort, and an implicit one. If there is anything wrong with JSON parsing here I'd rather let user know about it. |
||||
f"The value is not a valid JSON string and won't be transformed. The value: {input_data}" | ||||
) | ||||
return input_data |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,6 +4,7 @@ | |
import pytest | ||
|
||
from localstack_snapshot.snapshots.transformer import ( | ||
JsonStringTransformer, | ||
SortingTransformer, | ||
TimestampTransformer, | ||
TransformContext, | ||
|
@@ -311,6 +312,57 @@ def test_text(self, value): | |
output = sr(output) | ||
assert json.loads(output) == expected | ||
|
||
@pytest.mark.parametrize( | ||
"input_value,transformed_value", | ||
[ | ||
pytest.param('{"a": "b"}', {"a": "b"}, id="simple_json_object"), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Great parametrization coverage 💯
tiurin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
pytest.param('{\n "a": "b"\n}', {"a": "b"}, id="formatted_json_object"), | ||
pytest.param('\n {"a": "b"}', {"a": "b"}, id="json_with_whitespaces"), | ||
pytest.param('{"a": 42}malformed', '{"a": 42}malformed', id="malformed_json"), | ||
pytest.param('["a", "b"]', ["a", "b"], id="simple_json_list"), | ||
pytest.param('{"a": "{\\"b\\":42}"}', {"a": {"b": 42}}, id="nested_json_object"), | ||
pytest.param( | ||
'{"a": "\\n {\\n \\"b\\":42}"}', | ||
{"a": {"b": 42}}, | ||
id="nested_formatted_json_object_with_whitespaces", | ||
), | ||
pytest.param( | ||
'{"a": "[{\\"b\\":\\"c\\"}]"}', {"a": [{"b": "c"}]}, id="nested_json_list" | ||
), | ||
pytest.param( | ||
'{"a": "{\\"b\\":42malformed}"}', | ||
{"a": '{"b":42malformed}'}, | ||
id="malformed_nested_json", | ||
), | ||
pytest.param("[]", [], id="empty_list"), | ||
pytest.param("{}", {}, id="empty_object"), | ||
pytest.param("", "", id="empty_string"), | ||
], | ||
) | ||
def test_json_string(self, input_value, transformed_value): | ||
key = "key" | ||
input_data = {key: input_value} | ||
expected = {key: transformed_value} | ||
|
||
transformer = JsonStringTransformer(key) | ||
|
||
ctx = TransformContext() | ||
output = transformer.transform(input_data, ctx=ctx) | ||
|
||
assert output == expected | ||
|
||
def test_json_string_in_a_nested_key(self): | ||
key = "nested-key-in-an-object-hidden-inside-a-list" | ||
input_data = {"top-level-key": [{key: '{"a": "b"}'}]} | ||
expected = {"top-level-key": [{key: {"a": "b"}}]} | ||
|
||
transformer = JsonStringTransformer(key) | ||
|
||
ctx = TransformContext() | ||
output = transformer.transform(input_data, ctx=ctx) | ||
|
||
assert output == expected | ||
|
||
|
||
class TestTimestampTransformer: | ||
def test_generic_timestamp_transformer(self): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs: Are there any limitations (e.g., we're losing parity information that the original response is actually JSON-encoded and should have some tests covering that)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could provide a simple example (e.g., the first test case) to clarify usage and foster easy adoption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In particular, highlighting the difference to the default behavior might be most relevant, clarifying that it adds extra coverage for JSON lists starting with
[
such as[{},{}]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we're losing parity information here, at least not any more than with other transformers. Conceptually, it is the same as with any other transformer - we apply some transformation to data that can't be compared between AWS and Localstack test runs, or between different runs.
Applying this transformer is an explicit choice of its user. This means that the user decided to parse JSON string at specific key in order to reliably compare its contents - after decoding other transformers like timestamp, UUID, ARNs, etc. will automatically apply.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated docstring as suggested, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great clarifications and comparison in the docstring 💯 👍
Our future selfes will thank us :)