-
Notifications
You must be signed in to change notification settings - Fork 82
feat(clp-s): Expose JSON serialization in SchemaReader
.
#1406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughRefactors SchemaReader JSON generation to accept a Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Caller
participant SR as SchemaReader
participant JS as JsonSerializer
Caller->>SR: initialize_serializer()
Note right of SR #DDEEFF: Serializer setup (public)
Caller->>SR: generate_json_string(message_index)
activate SR
SR->>SR: Extract values using message_index
SR->>JS: Add fields/values for message_index
JS-->>SR: Serialized string
SR-->>Caller: std::string (JSON)
deactivate SR
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
components/core/src/clp_s/SchemaReader.cpp
(8 hunks)components/core/src/clp_s/SchemaReader.hpp
(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>
rather than!<expression>
.
Files:
components/core/src/clp_s/SchemaReader.cpp
components/core/src/clp_s/SchemaReader.hpp
🧬 Code graph analysis (1)
components/core/src/clp_s/SchemaReader.cpp (2)
components/core/src/clp_s/SchemaReader.hpp (4)
message_index
(162-162)message
(169-169)message
(177-177)message
(187-192)components/core/src/clp_s/JsonSerializer.hpp (4)
column
(133-136)column
(133-133)column
(138-143)column
(139-139)
🔇 Additional comments (6)
components/core/src/clp_s/SchemaReader.hpp (2)
157-162
: LGTM! Well-documented public API.The new signature with
message_index
parameter andstd::string
return type enables external callers to generate JSON for specific messages. The documentation clearly describes the parameter and return value.
207-210
: LGTM! Public initializer exposed.Making
initialize_serializer
public allows external callers to prepare the serialization state before callinggenerate_json_string
.components/core/src/clp_s/SchemaReader.cpp (4)
103-177
: LGTM! Consistent parameter usage.All value extractions correctly use
message_index
instead ofm_cur_message
, enabling external callers to serialize arbitrary message indices.
193-193
: LGTM! Proper return value.Returning the serialized string enables external callers to retrieve the JSON without direct access to internal state.
204-204
: LGTM! Call sites correctly updated.All internal call sites properly assign the returned string and pass
m_cur_message
as the message index argument.Also applies to: 225-225, 257-257
66-194
: Thread-safety: Protect shared serializer.
m_json_serializer
is a mutable member used across calls. Confirm whethergenerate_json_string
may be invoked concurrently; if so, serialize access with a mutex or switch to a thread-local serializer.
} | ||
|
||
void SchemaReader::generate_json_string() { | ||
std::string SchemaReader::generate_json_string(uint64_t message_index) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add bounds validation for message_index
.
The function doesn't validate that message_index < m_num_messages
. Out-of-bounds access in column readers could cause undefined behaviour or crashes.
Apply this diff to add bounds checking:
std::string SchemaReader::generate_json_string(uint64_t message_index) {
+ if (message_index >= m_num_messages) {
+ throw OperationFailed(ErrorCodeBadParam, __FILENAME__, __LINE__);
+ }
m_json_serializer.reset();
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
std::string SchemaReader::generate_json_string(uint64_t message_index) { | |
std::string SchemaReader::generate_json_string(uint64_t message_index) { | |
if (message_index >= m_num_messages) { | |
throw OperationFailed(ErrorCodeBadParam, __FILENAME__, __LINE__); | |
} | |
m_json_serializer.reset(); | |
// … | |
} |
🤖 Prompt for AI Agents
In components/core/src/clp_s/SchemaReader.cpp around line 66, add a guard that
validates message_index is within bounds (message_index < m_num_messages) before
using it; if the check fails, throw a std::out_of_range with a clear message (or
otherwise return an error) to prevent out-of-bounds access in column readers,
and add #include <stdexcept> to the file if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor comments.
Also, could we do a quick benchmark to confirm there aren't performance regressions? I don't expect this change to affect performance, but still worth checking.
* Generates a json string from the extracted values | ||
* @param message_index the index of the message to generate the json string for | ||
* @return the generated json string | ||
*/ | ||
std::string generate_json_string(uint64_t message_index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Generates a json string from the extracted values | |
* @param message_index the index of the message to generate the json string for | |
* @return the generated json string | |
*/ | |
std::string generate_json_string(uint64_t message_index); | |
* Generates a json string from the encoded columns. | |
* @param message_index The index of the message to generate the JSON string for. | |
* @return The generated JSON string. | |
*/ | |
[[nodiscard]] auto generate_json_string(uint64_t message_index) -> std::string; |
Updating the comment a bit since I'm pretty sure that extracted values line refers to a very old implementation + updating the prototype to comply with our current standard since we're touching this anyway.
} | ||
|
||
void SchemaReader::generate_json_string() { | ||
std::string SchemaReader::generate_json_string(uint64_t message_index) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::string SchemaReader::generate_json_string(uint64_t message_index) { | |
auto SchemaReader::generate_json_string(uint64_t message_index) -> std::string { |
Same as other comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
components/core/src/clp_s/SchemaReader.cpp (1)
66-66
: Add bounds validation formessage_index
.The function doesn't validate that
message_index < m_num_messages
. Out-of-bounds access in column readers could cause undefined behaviour or crashes.Apply this diff to add bounds checking:
auto SchemaReader::generate_json_string(uint64_t message_index) -> std::string { + if (message_index >= m_num_messages) { + throw OperationFailed(ErrorCodeBadParam, __FILENAME__, __LINE__); + } m_json_serializer.reset();
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
components/core/src/clp_s/SchemaReader.cpp
(8 hunks)components/core/src/clp_s/SchemaReader.hpp
(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>
rather than!<expression>
.
Files:
components/core/src/clp_s/SchemaReader.cpp
components/core/src/clp_s/SchemaReader.hpp
🧬 Code graph analysis (1)
components/core/src/clp_s/SchemaReader.cpp (2)
components/core/src/clp_s/SchemaReader.hpp (5)
message_index
(162-162)message
(169-169)message
(177-177)message
(187-192)m_cur_message
(248-248)components/core/src/clp_s/JsonSerializer.hpp (4)
column
(133-136)column
(133-133)column
(138-143)column
(139-139)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: musllinux_1_2-x86_64-static-linked-bins
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
- GitHub Check: package-image
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: build-macos (macos-15, false)
- GitHub Check: build-macos (macos-14, false)
- GitHub Check: build-macos (macos-14, true)
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
- GitHub Check: rust-checks (macos-15)
- GitHub Check: rust-checks (ubuntu-22.04)
- GitHub Check: rust-checks (ubuntu-24.04)
🔇 Additional comments (7)
components/core/src/clp_s/SchemaReader.hpp (1)
157-162
: LGTM!The method signature and documentation are clear and appropriate. The
[[nodiscard]]
attribute correctly ensures the returned string isn't ignored, and the trailing return type syntax aligns with modern C++ standards.components/core/src/clp_s/SchemaReader.cpp (6)
98-158
: LGTM!All value extractions correctly use the
message_index
parameter instead ofm_cur_message
. The changes are consistent across integer, float, and boolean operations.
130-180
: LGTM!All serializer append operations correctly propagate the
message_index
parameter to the column readers. The changes maintain consistency across formatted floats, strings, and arrays.
192-194
: LGTM!The return statement correctly returns the serialized JSON string, making the API more intuitive and explicit.
196-212
: LGTM!The call site correctly uses the returned string from
generate_json_string(m_cur_message)
, simplifying the code and improving readability.
214-237
: LGTM!The filtered message retrieval correctly uses the returned string from
generate_json_string(m_cur_message)
, maintaining consistency with other call sites.
239-272
: LGTM!The message retrieval with metadata correctly uses the returned string from
generate_json_string(m_cur_message)
, completing the refactor across all call sites.
void initialize_filter_with_column_map(FilterClass* filter); | ||
|
||
/** | ||
* Initializes all internal data structured required to serialize records. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Initializes all internal data structured required to serialize records. | |
* Initializes all internal data structures required to serialize records. |
Just explicitly adding my other comment as an edit suggestion. After this should be good to merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
components/core/src/clp_s/SchemaReader.hpp
(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>
rather than!<expression>
.
Files:
components/core/src/clp_s/SchemaReader.hpp
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
- GitHub Check: rust-checks (ubuntu-24.04)
- GitHub Check: rust-checks (ubuntu-22.04)
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: package-image
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: musllinux_1_2-x86_64-static-linked-bins
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: build-macos (macos-15, false)
- GitHub Check: build-macos (macos-15, true)
- GitHub Check: build-macos (macos-14, true)
- GitHub Check: build-macos (macos-14, false)
🔇 Additional comments (1)
components/core/src/clp_s/SchemaReader.hpp (1)
207-210
: LGTM!The typo fix from "structured" to "structures" has been applied as requested in previous reviews. Making this method public aligns with the PR objective of supporting external JSON serialization.
/** | ||
* Generates a JSON string from the encoded columns | ||
* @param message_index The index of the message to generate the JSON string for. | ||
* @return The generated JSON string | ||
*/ | ||
[[nodiscard]] auto generate_json_string(uint64_t message_index) -> std::string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify bounds checking and consider documenting preconditions.
The method signature looks correct and uses appropriate modern C++ idioms ([[nodiscard]]
, trailing return type). However, since this is now a public API that accepts an arbitrary message_index
, consider the following:
- The implementation should validate that
message_index < m_num_messages
to prevent out-of-bounds access. - The documentation could be enhanced to specify:
- Precondition:
message_index
must be less thanget_num_messages()
- What happens on invalid input (throws exception, returns empty string, etc.)
- Whether the serializer must be initialized first via
initialize_serializer()
- Precondition:
Run the following script to verify bounds checking is implemented:
🏁 Script executed:
#!/bin/bash
# Description: Verify that generate_json_string performs bounds checking on message_index
# Search for the implementation and check for bounds validation
ast-grep --pattern $'auto SchemaReader::generate_json_string($_, message_index) -> std::string {
$$$
}'
Length of output: 0
🏁 Script executed:
#!/bin/bash
# Locate generate_json_string implementation and check for bounds validation
rg -nP "SchemaReader::generate_json_string\s*\(.*\)\s*->\s*std::string" -n -C5 --type=cpp
Length of output: 878
🏁 Script executed:
#!/bin/bash
# Display the full body of generate_json_string to check for bounds validation
rg -nP "auto SchemaReader::generate_json_string" -n -A100 components/core/src/clp_s/SchemaReader.cpp
Length of output: 5090
Validate message_index in generate_json_string and document preconditions
- In SchemaReader.cpp (line 66), add a check
if (message_index >= get_num_messages()) throw OperationFailed(ErrorCodeInvalidArgument, __FILENAME__, __LINE__);
to prevent out-of-bounds access. - Update the header doc (SchemaReader.hpp lines 157–162) to specify that
message_index
must be< get_num_messages()
, describe the exception on invalid input, and note thatinitialize_serializer()
must be called first.
🤖 Prompt for AI Agents
In components/core/src/clp_s/SchemaReader.hpp around lines 157–162, update the
method documentation for generate_json_string to state that message_index must
be < get_num_messages(), that initialize_serializer() must have been called
before invoking this method, and that the method throws
OperationFailed(ErrorCodeInvalidArgument, __FILENAME__, __LINE__) on invalid
message_index; and in components/core/src/clp_s/SchemaReader.cpp at ~line 66,
add a runtime check: if message_index >= get_num_messages() then throw
OperationFailed(ErrorCodeInvalidArgument, __FILENAME__, __LINE__) to prevent
out-of-bounds access before proceeding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Did some quick benchmarking, and this seems to slow down decompression by ~1.5%, but that should be okay for now.
For title how about feat(clp-s): Expose JSON serialization in `SchemaReader`.
SchemaReader
.
Description
This PR exposes two previously private JSON serialization methods by making them public and updates
generate_json_string
to accept amessage_index
parameter and return the serialized string. The internal serialization logic has been modified to pass this index, enabling external callers to generate JSON strings for specific messages.The main goal is to support
CLP_GET_JSON_STRING()
in Presto/Velox integration, allowing users to retrieve serialized JSON representations of messages.Checklist
breaking change.
Validation performed
Summary by CodeRabbit
New Features
Refactor