Skip to content

Conversation

wraymo
Copy link
Member

@wraymo wraymo commented Oct 3, 2025

Description

This PR adds docs for CLP_GET_* and CLP_WILDCARD_* UDFs.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Summary by CodeRabbit

  • Documentation
    • Added a CLP Functions guide covering path-based JSON helpers and wildcard column functions, with return types, constraints, and NULL behaviour.
    • Documented usage patterns, supported operators, and filter-only restrictions, plus examples for strings, integers, floats, booleans, and arrays.
    • Clarified performance expectations: queries are optimised with no penalty.
    • Note: The new CLP Functions content is currently duplicated in a nearby section and will be consolidated in a follow-up.

Copy link

coderabbitai bot commented Oct 3, 2025

Walkthrough

Adds a new CLP Functions section to the CLP connector documentation describing path-based JSON helper functions and wildcard column functions, their constraints, supported operators, behaviour on invalid paths, examples, and notes on optimizer rewrites to KQL. The section content appears duplicated within the same document.

Changes

Cohort / File(s) Summary
Documentation — CLP functions
presto-docs/src/main/sphinx/connector/clp.rst
Introduces documentation for path-based functions (CLP_GET_STRING/… for JSON paths) and wildcard column functions (CLP_WILDCARD_*_COLUMN), including usage constraints, supported operators, NULL behaviour, example queries, and note on optimizer rewrites to KQL. Mentions duplicated section content within the file.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The PR description includes a description and a contributor checklist but does not follow the repository’s required template structure, as it is missing the Motivation and Context, Impact, Test Plan and Release Notes sections and uses incorrect heading levels. It does not provide reviewers with the rationale for the change, details on its impact or how it was tested, or guidance on release notes. Without these sections, the description is incomplete and hinders a thorough review. Please update the PR description to match the repository template by adding and populating the Motivation and Context, Impact, Test Plan, and Release Notes sections, use the correct “##” headings for each section, and include any relevant issue links, testing steps, and release note entries.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title follows the Conventional Commits format and clearly conveys that documentation for CLP UDFs is being added, which matches the core change in the pull request.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 971443a and e750458.

📒 Files selected for processing (1)
  • presto-docs/src/main/sphinx/connector/clp.rst (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-16T17:39:45.102Z
Learnt from: anlowee
PR: y-scope/presto#64
File: presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java:284-299
Timestamp: 2025-09-16T17:39:45.102Z
Learning: CLP wildcard UDFs (CLP_WILDCARD_STRING_COLUMN, CLP_WILDCARD_INT_COLUMN, etc.) cannot be used as split filters (formerly called metadata filters) because the split filter mechanism checks whether the filter contains field names that are defined in the split filter configuration file. Since wildcard functions don't correspond to specific column names, they don't pass this validation.

Applied to files:

  • presto-docs/src/main/sphinx/connector/clp.rst
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
  • GitHub Check: maven-checks (17.0.13)
  • GitHub Check: maven-checks (8.0.442)
  • GitHub Check: test (:presto-docs)
  • GitHub Check: prestissimo-worker-images-build

Comment on lines +343 to +346
.. function:: CLP_GET_BOOL(varchar) -> boolean

Returns the double value of the given JSON path, where the column type is ``Boolean``. Returns a Presto ``BOOLEAN``.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix return-type description

The doc for CLP_GET_BOOL says it returns “the double value,” which is a copy-paste typo. Update the text to state it returns the boolean value.

🤖 Prompt for AI Agents
In presto-docs/src/main/sphinx/connector/clp.rst around lines 343 to 346, update
the return-type description for CLP_GET_BOOL which currently reads "Returns the
double value..." to instead say it returns the boolean value; change the phrase
to "Returns the boolean value of the given JSON path, where the column type is
``Boolean``. Returns a Presto ``BOOLEAN``." to fix the copy-paste typo.

Comment on lines +362 to +365
SELECT CLP_GET_STRING(msg.author) AS author
FROM clp.default.table_1
WHERE CLP_GET_INT('msg.timestamp') > 1620000000;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Correct function name in example

CLP_GET_INT is not documented (the integer accessor you describe is CLP_GET_BIGINT). Please update the example to call the published function to avoid sending users on a wild goose chase.

🤖 Prompt for AI Agents
In presto-docs/src/main/sphinx/connector/clp.rst around lines 362 to 365, the
example uses a non-existent function name CLP_GET_INT; replace it with the
documented integer accessor CLP_GET_BIGINT so the example matches the published
API and will work for users; update the WHERE clause to call
CLP_GET_BIGINT('msg.timestamp') and keep the rest of the example unchanged.

Comment on lines +362 to +368
SELECT CLP_GET_STRING(msg.author) AS author
FROM clp.default.table_1
WHERE CLP_GET_INT('msg.timestamp') > 1620000000;
SELECT CLP_GET_STRING_ARRAY(msg.tags) AS tags
FROM clp.default.table_2
WHERE CLP_GET_BOOL('msg.is_active') = true;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix JSON path examples to match literal requirement

Both sample queries pass msg.author, msg.timestamp, and msg.tags without quotes, yet the note just above states JSON paths must be constant string literals. Update the examples to use quoted paths (e.g., CLP_GET_STRING('msg.author')) so readers aren’t misled into syntax errors.

🤖 Prompt for AI Agents
presto-docs/src/main/sphinx/connector/clp.rst around lines 362 to 368: the
example queries use unquoted JSON path identifiers (msg.author, msg.timestamp,
msg.tags) which contradict the note that JSON paths must be constant string
literals; update the examples to pass quoted string literals (e.g.,
CLP_GET_STRING('msg.author'), CLP_GET_INT('msg.timestamp'),
CLP_GET_STRING_ARRAY('msg.tags')) and similarly quote the boolean path
(CLP_GET_BOOL('msg.is_active')) so the examples match the documented literal
requirement.

@kirkrodrigues kirkrodrigues self-requested a review October 6, 2025 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant