Skip to content

Conversation

davemarco
Copy link
Contributor

@davemarco davemarco commented Aug 6, 2025

Description

Create MongoDB document in results metadata collection to keep track of presto search state.

Using the same collection for regular clp and presto since the document format is pretty much identical, and can query a single collection rather than two for aggregate statistics. I added a new field just in case we need to distinguish between the two in the future. Also using presto search signal type instead of clp search signal types.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Document created and state changes based on status.

Summary by CodeRabbit

  • New Features
    • Enhanced tracking and storage of search query engine type and metadata for search results, including support for Presto queries.
  • Bug Fixes
    • Improved handling and typing of job identifiers and search signals, reducing potential issues with search job updates.
  • Chores
    • Updated configuration to set the default query engine to "clp".
    • Streamlined code by consolidating query engine definitions and updating type usage for consistency.
    • Simplified search results metadata updates by removing deprecated utility functions.
    • Adjusted search controls rendering logic based on query engine settings.

@davemarco davemarco requested a review from a team as a code owner August 6, 2025 18:15

This comment was marked as resolved.

@davemarco davemarco requested a review from hoophalab August 6, 2025 18:15
coderabbitai[bot]

This comment was marked as resolved.

Copy link
Contributor

@hoophalab hoophalab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me in general. There are some changes requested

coderabbitai[bot]

This comment was marked as resolved.

@davemarco davemarco requested a review from hoophalab August 8, 2025 18:14
coderabbitai[bot]

This comment was marked as resolved.

Copy link
Contributor

@hoophalab hoophalab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good except that we probably need to set lastSignal to "FAILED" in error callback.

For example:

  1. run a non-sql query abc in the webui
  2. presto-client-node calls error callback but not state callback
  3. as the result, the lastSignal ends up being WAITING_FOR_PREREQUISITES in mongodb

@hoophalab hoophalab self-requested a review August 11, 2025 16:23
Copy link
Contributor

@hoophalab hoophalab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me.

The error callback issue will be fixed in an upcoming PR

Validations:
In presto:

  1. run several correct sql queries
  2. metadata is correctly updated in mongodb

In clp-text:

  1. run several queries
  2. metadata related features still works correctly

@davemarco
Copy link
Contributor Author

yes - working on error state in other PR

@junhaoliao can you approve

Copy link
Member

@kirkrodrigues kirkrodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deferring to @hoophalab's review.

{
"ClpStorageEngine": "clp",
"ClpQueryEngine": "native",
"ClpQueryEngine": "clp",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should also configure this in server_settings_json_updates in components/clp-package-utils/clp_package_utils/scripts/start_clp.py

@coderabbitai create an issue to track this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

I'll examine the current configuration in the start_clp.py file and then create an issue to track this update.

@junhaoliao I've created an issue to track updating the ClpQueryEngine setting in the server_settings_json_updates configuration within start_clp.py. The issue has been assigned to you and includes the context from this PR and comment thread.


🏁 Script executed:

#!/bin/bash
# Examine the server_settings_json_updates configuration in start_clp.py
rg -n -A10 -B5 "server_settings_json_updates" components/clp-package-utils/clp_package_utils/scripts/start_clp.py

Length of output: 2708


🏁 Script executed:

#!/bin/bash
# Create a GitHub issue to track updating the ClpQueryEngine setting
gh issue create \
  --title "Update ClpQueryEngine setting in start_clp.py server_settings_json_updates" \
  --body "## Description

The ClpQueryEngine setting was updated from \"native\" to \"clp\" in PR #1168 for both client and server settings.json files, but the corresponding configuration in \`server_settings_json_updates\` within \`components/clp-package-utils/clp_package_utils/scripts/start_clp.py\` also needs to be updated for consistency.

## Task

Update the \`server_settings_json_updates\` configuration in \`components/clp-package-utils/clp_package_utils/scripts/start_clp.py\` to change \`ClpQueryEngine\` from \"native\" to \"clp\".

## Context

- **PR**: #1168
- **Comment**: https://github.com/y-scope/clp/pull/1168#discussion_r2296622518
- **Requested by**: @junhaoliao

This ensures consistency across all configuration files in the codebase." \
  --assignee junhaoliao

Length of output: 907


@junhaoliao I've successfully created issue #1259 to track updating the ClpQueryEngine setting in the server_settings_json_updates configuration.

From examining the current code, I can see that the server_settings_json_updates dictionary in start_clp.py doesn't currently include a ClpQueryEngine setting, so it will need to be added with the value "clp" to maintain consistency with the changes made in this PR.

The issue has been assigned to you and includes all the relevant context and links.


🧠 Learnings used
Learnt from: junhaoliao
PR: y-scope/clp#1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants