Replies: 2 comments
-
✅ Inputs: Example: json { Example: 🔍 Work:
Columns required: user_id, amount, created_at, name Filters: created_at >= NOW() - INTERVAL 30 DAY Aggregations: SUM(amount) Group By: user_id Joins (if required): Join orders.user_id with users.id
jinja2 SELECT SELECT users.name, SUM(orders.amount) AS total_amount |
Beta Was this translation helpful? Give feedback.
-
Hey @Mayank-cyber-cell, thanks for fleshing out the example! This gives us a good example to move forward: simple but significant. Quick note on the Jinja2 part: what you've shown is the final SQL output, but we need the actual template with placeholders. Something like: SELECT
{{ select_columns | join(', ') }}
FROM {{ main_table }}
{% for join in joins %}
JOIN {{ join.table }} ON {{ join.condition }}
{% endfor %}
{% if filters %}
WHERE {{ filters | join(' AND ') }}
{% endif %}
{% if group_by %}
GROUP BY {{ group_by | join(', ') }}
{% endif %} Now there are tricky parts: Templating approach: Let's start constrained. Instead of trying to handle all SQL, we create specific templates for common patterns:
Each template handles its use case well, and the pipeline picks the right one based on the extracted features. Filter generation (especially that Even with "safe" queries, LLMs can hallucinate table names or create queries that look right but subtly corrupt your data. SQL injection is old news, but prompt injection is the new kid on the block doing the same damage with a fresh coat of paint. So here's where we can show the power of deterministic pipelines. Like SaaS filtering GUIs, we could:
{
"time_filter": {
"column": "created_at",
"operator": ">=",
"reference": "now",
"offset": {
"value": 30,
"unit": "days",
"direction": "past"
}
}
}
TIME_PATTERNS = {
"mysql": "{{ column }} >= NOW() - INTERVAL {{ value }} {{ unit }}",
"postgres": "{{ column }} >= CURRENT_DATE - INTERVAL '{{ value }} {{ unit }}'",
# ...
} This gives us dropdown-style safety with natural language flexibility. No arbitrary code generation, no prompt injection risks, just deterministic transformation. I prefer solution (1) because we can have a common model for the building blocks and just use templates to adapt to each SQL dialect. And I like the idea of having enumerated values for the possibilities (operator, reference…). Want to collaborate on building this out as a proper pipeline? We could start with the time-series aggregation case and expand from there. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Inputs
Work
Output
Beta Was this translation helpful? Give feedback.
All reactions