Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 16 additions & 19 deletions components/core/src/clp_s/SchemaReader.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ SchemaReader::load(std::shared_ptr<char[]> stream_buffer, size_t offset, size_t
}
}

void SchemaReader::generate_json_string() {
std::string SchemaReader::generate_json_string(uint64_t message_index) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Add bounds validation for message_index.

The function doesn't validate that message_index < m_num_messages. Out-of-bounds access in column readers could cause undefined behaviour or crashes.

Apply this diff to add bounds checking:

 std::string SchemaReader::generate_json_string(uint64_t message_index) {
+    if (message_index >= m_num_messages) {
+        throw OperationFailed(ErrorCodeBadParam, __FILENAME__, __LINE__);
+    }
     m_json_serializer.reset();
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
std::string SchemaReader::generate_json_string(uint64_t message_index) {
std::string SchemaReader::generate_json_string(uint64_t message_index) {
if (message_index >= m_num_messages) {
throw OperationFailed(ErrorCodeBadParam, __FILENAME__, __LINE__);
}
m_json_serializer.reset();
//
}
🤖 Prompt for AI Agents
In components/core/src/clp_s/SchemaReader.cpp around line 66, add a guard that
validates message_index is within bounds (message_index < m_num_messages) before
using it; if the check fails, throw a std::out_of_range with a clear message (or
otherwise return an error) to prevent out-of-bounds access in column readers,
and add #include <stdexcept> to the file if needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::string SchemaReader::generate_json_string(uint64_t message_index) {
auto SchemaReader::generate_json_string(uint64_t message_index) -> std::string {

Same as other comment.

m_json_serializer.reset();
m_json_serializer.begin_document();
size_t column_id_index = 0;
Expand Down Expand Up @@ -100,14 +100,14 @@ void SchemaReader::generate_json_string() {
auto name = m_global_schema_tree->get_node(column->get_id()).get_key_name();
m_json_serializer.append_key(name);
m_json_serializer.append_value(
std::to_string(std::get<int64_t>(column->extract_value(m_cur_message)))
std::to_string(std::get<int64_t>(column->extract_value(message_index)))
);
break;
}
case JsonSerializer::Op::AddIntValue: {
column = m_reordered_columns[column_id_index++];
m_json_serializer.append_value(
std::to_string(std::get<int64_t>(column->extract_value(m_cur_message)))
std::to_string(std::get<int64_t>(column->extract_value(message_index)))
);
break;
}
Expand All @@ -116,43 +116,43 @@ void SchemaReader::generate_json_string() {
auto name = m_global_schema_tree->get_node(column->get_id()).get_key_name();
m_json_serializer.append_key(name);
m_json_serializer.append_value(
std::to_string(std::get<double>(column->extract_value(m_cur_message)))
std::to_string(std::get<double>(column->extract_value(message_index)))
);
break;
}
case JsonSerializer::Op::AddFloatValue: {
column = m_reordered_columns[column_id_index++];
m_json_serializer.append_value(
std::to_string(std::get<double>(column->extract_value(m_cur_message)))
std::to_string(std::get<double>(column->extract_value(message_index)))
);
break;
}
case JsonSerializer::Op::AddFormattedFloatField: {
column = m_reordered_columns[column_id_index++];
auto name = m_global_schema_tree->get_node(column->get_id()).get_key_name();
m_json_serializer.append_key(name);
m_json_serializer.append_value_from_column(column, m_cur_message);
m_json_serializer.append_value_from_column(column, message_index);
break;
}
case JsonSerializer::Op::AddFormattedFloatValue: {
column = m_reordered_columns[column_id_index++];
m_json_serializer.append_value_from_column(column, m_cur_message);
m_json_serializer.append_value_from_column(column, message_index);
break;
}
case JsonSerializer::Op::AddBoolField: {
column = m_reordered_columns[column_id_index++];
auto name = m_global_schema_tree->get_node(column->get_id()).get_key_name();
m_json_serializer.append_key(name);
m_json_serializer.append_value(
std::get<uint8_t>(column->extract_value(m_cur_message)) != 0 ? "true"
std::get<uint8_t>(column->extract_value(message_index)) != 0 ? "true"
: "false"
);
break;
}
case JsonSerializer::Op::AddBoolValue: {
column = m_reordered_columns[column_id_index++];
m_json_serializer.append_value(
std::get<uint8_t>(column->extract_value(m_cur_message)) != 0 ? "true"
std::get<uint8_t>(column->extract_value(message_index)) != 0 ? "true"
: "false"
);
break;
Expand All @@ -161,20 +161,20 @@ void SchemaReader::generate_json_string() {
column = m_reordered_columns[column_id_index++];
auto name = m_global_schema_tree->get_node(column->get_id()).get_key_name();
m_json_serializer.append_key(name);
m_json_serializer.append_value_from_column_with_quotes(column, m_cur_message);
m_json_serializer.append_value_from_column_with_quotes(column, message_index);
break;
}
case JsonSerializer::Op::AddStringValue: {
column = m_reordered_columns[column_id_index++];
m_json_serializer.append_value_from_column_with_quotes(column, m_cur_message);
m_json_serializer.append_value_from_column_with_quotes(column, message_index);
break;
}
case JsonSerializer::Op::AddArrayField: {
column = m_reordered_columns[column_id_index++];
m_json_serializer.append_key(
m_global_schema_tree->get_node(column->get_id()).get_key_name()
);
m_json_serializer.append_value_from_column(column, m_cur_message);
m_json_serializer.append_value_from_column(column, message_index);
break;
}
case JsonSerializer::Op::AddNullField: {
Expand All @@ -190,6 +190,7 @@ void SchemaReader::generate_json_string() {
}

m_json_serializer.end_document();
return m_json_serializer.get_serialized_string();
}

bool SchemaReader::get_next_message(std::string& message) {
Expand All @@ -200,9 +201,7 @@ bool SchemaReader::get_next_message(std::string& message) {
if (false == m_serializer_initialized) {
initialize_serializer();
}
generate_json_string();

message = m_json_serializer.get_serialized_string();
message = generate_json_string(m_cur_message);

if (message.back() != '\n') {
message += '\n';
Expand All @@ -223,8 +222,7 @@ bool SchemaReader::get_next_message(std::string& message, FilterClass* filter) {
if (false == m_serializer_initialized) {
initialize_serializer();
}
generate_json_string();
message = m_json_serializer.get_serialized_string();
message = generate_json_string(m_cur_message);

if (message.back() != '\n') {
message += '\n';
Expand Down Expand Up @@ -256,8 +254,7 @@ bool SchemaReader::get_next_message_with_metadata(
if (false == m_serializer_initialized) {
initialize_serializer();
}
generate_json_string();
message = m_json_serializer.get_serialized_string();
message = generate_json_string(m_cur_message);

if (message.back() != '\n') {
message += '\n';
Expand Down
22 changes: 12 additions & 10 deletions components/core/src/clp_s/SchemaReader.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,13 @@ class SchemaReader {
*/
uint64_t get_num_messages() const { return m_num_messages; }

/**
* Generates a json string from the extracted values
* @param message_index the index of the message to generate the json string for
* @return the generated json string
*/
std::string generate_json_string(uint64_t message_index);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Generates a json string from the extracted values
* @param message_index the index of the message to generate the json string for
* @return the generated json string
*/
std::string generate_json_string(uint64_t message_index);
* Generates a json string from the encoded columns.
* @param message_index The index of the message to generate the JSON string for.
* @return The generated JSON string.
*/
[[nodiscard]] auto generate_json_string(uint64_t message_index) -> std::string;

Updating the comment a bit since I'm pretty sure that extracted values line refers to a very old implementation + updating the prototype to comply with our current standard since we're touching this anyway.


/**
* Gets next message
* @param message
Expand Down Expand Up @@ -197,6 +204,11 @@ class SchemaReader {
*/
void initialize_filter_with_column_map(FilterClass* filter);

/**
* Initializes all internal data structured required to serialize records.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Initializes all internal data structured required to serialize records.
* Initializes all internal data structures required to serialize records.

Just explicitly adding my other comment as an edit suggestion. After this should be good to merge.

*/
void initialize_serializer();

/**
* Marks a column as timestamp
* @param column_reader
Expand Down Expand Up @@ -299,16 +311,6 @@ class SchemaReader {
std::vector<int32_t>& path_to_intersection
);

/**
* Generates a json string from the extracted values
*/
void generate_json_string();

/**
* Initializes all internal data structured required to serialize records.
*/
void initialize_serializer();

int32_t m_schema_id;
uint64_t m_num_messages;
uint64_t m_cur_message;
Expand Down
Loading