-
Notifications
You must be signed in to change notification settings - Fork 2
feat: Add CLP UDF docs #75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release-0.293-clp-connector
Are you sure you want to change the base?
Conversation
WalkthroughAdds a new CLP Functions section to the CLP connector documentation describing path-based JSON helper functions and wildcard column functions, their constraints, supported operators, behaviour on invalid paths, examples, and notes on optimizer rewrites to KQL. The section content appears duplicated within the same document. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
presto-docs/src/main/sphinx/connector/clp.rst
(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-16T17:39:45.102Z
Learnt from: anlowee
PR: y-scope/presto#64
File: presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java:284-299
Timestamp: 2025-09-16T17:39:45.102Z
Learning: CLP wildcard UDFs (CLP_WILDCARD_STRING_COLUMN, CLP_WILDCARD_INT_COLUMN, etc.) cannot be used as split filters (formerly called metadata filters) because the split filter mechanism checks whether the filter contains field names that are defined in the split filter configuration file. Since wildcard functions don't correspond to specific column names, they don't pass this validation.
Applied to files:
presto-docs/src/main/sphinx/connector/clp.rst
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
- GitHub Check: maven-checks (17.0.13)
- GitHub Check: maven-checks (8.0.442)
- GitHub Check: test (:presto-docs)
- GitHub Check: prestissimo-worker-images-build
.. function:: CLP_GET_BOOL(varchar) -> boolean | ||
|
||
Returns the double value of the given JSON path, where the column type is ``Boolean``. Returns a Presto ``BOOLEAN``. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix return-type description
The doc for CLP_GET_BOOL
says it returns “the double value,” which is a copy-paste typo. Update the text to state it returns the boolean value.
🤖 Prompt for AI Agents
In presto-docs/src/main/sphinx/connector/clp.rst around lines 343 to 346, update
the return-type description for CLP_GET_BOOL which currently reads "Returns the
double value..." to instead say it returns the boolean value; change the phrase
to "Returns the boolean value of the given JSON path, where the column type is
``Boolean``. Returns a Presto ``BOOLEAN``." to fix the copy-paste typo.
SELECT CLP_GET_STRING(msg.author) AS author | ||
FROM clp.default.table_1 | ||
WHERE CLP_GET_INT('msg.timestamp') > 1620000000; | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct function name in example
CLP_GET_INT
is not documented (the integer accessor you describe is CLP_GET_BIGINT
). Please update the example to call the published function to avoid sending users on a wild goose chase.
🤖 Prompt for AI Agents
In presto-docs/src/main/sphinx/connector/clp.rst around lines 362 to 365, the
example uses a non-existent function name CLP_GET_INT; replace it with the
documented integer accessor CLP_GET_BIGINT so the example matches the published
API and will work for users; update the WHERE clause to call
CLP_GET_BIGINT('msg.timestamp') and keep the rest of the example unchanged.
SELECT CLP_GET_STRING(msg.author) AS author | ||
FROM clp.default.table_1 | ||
WHERE CLP_GET_INT('msg.timestamp') > 1620000000; | ||
SELECT CLP_GET_STRING_ARRAY(msg.tags) AS tags | ||
FROM clp.default.table_2 | ||
WHERE CLP_GET_BOOL('msg.is_active') = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix JSON path examples to match literal requirement
Both sample queries pass msg.author
, msg.timestamp
, and msg.tags
without quotes, yet the note just above states JSON paths must be constant string literals. Update the examples to use quoted paths (e.g., CLP_GET_STRING('msg.author')
) so readers aren’t misled into syntax errors.
🤖 Prompt for AI Agents
presto-docs/src/main/sphinx/connector/clp.rst around lines 362 to 368: the
example queries use unquoted JSON path identifiers (msg.author, msg.timestamp,
msg.tags) which contradict the note that JSON paths must be constant string
literals; update the examples to pass quoted string literals (e.g.,
CLP_GET_STRING('msg.author'), CLP_GET_INT('msg.timestamp'),
CLP_GET_STRING_ARRAY('msg.tags')) and similarly quote the boolean path
(CLP_GET_BOOL('msg.is_active')) so the examples match the documented literal
requirement.
Description
This PR adds docs for
CLP_GET_*
andCLP_WILDCARD_*
UDFs.Checklist
breaking change.
Validation performed
Summary by CodeRabbit