A production-ready suite of modular, asynchronous tools for analyzing, refactoring, commenting, and auditing SQL code using Azure OpenAI (GPT-4o).
Designed with security, performance, auditability, and HIPAA/HITECH compliance in mind.
sql_tools/
βββ app.py # CLI interface and controller
βββ requirements.txt
βββ README.md
βββ LICENSE
βββ core/ # Framework and shared logic
β βββ base_ai_client.py # Async Azure OpenAI client
β βββ config_loader.py # Configuration loader (mirrors original config.py)
β βββ logger.py # HIPAA-compliant logging utility
β βββ sql_task_base.py # Abstract base class for GenAI SQL tasks
βββ tasks/ # Modular GenAI SQL task classes
β βββ sql_analyzer.py
β βββ sql_commenter.py
β βββ sql_data_masker.py
β βββ sql_refactorer.py
β βββ sql_explainer.py
β βββ sql_security_auditor.py
β βββ sql_test_generator.py
β βββ sql_performance_benchmark.py # New: Added performance benchmarking and optimization
β βββ sql_data_masker.py # New: Data masking and anonymization
β βββ sql_visualizer.py # New: Visualization and insights
β βββ sql_error_corrector.py # New: Error correction and debugging
β βββ sql_style_enforcer.py # New: Style guide enforcement
β βββ natural_language_to_sql.py # New: Natural language to SQL conversion
βββ learn/ # Learning Mode folder
β βββ sql_learn_mode.py # Interactive tutorials for SQL learning
βββ prompts/ # Centralized prompt management
β βββ index.yaml # YAML file defining all prompts and metadata
β βββ summarization/ # Summarization-related prompt templates
β βββ classification/ # Classification-related prompt templates
βββ utils/
βββ file_utils.py # File I/O, backup, and directory handling
βββ prompt_manager.py # Centralized prompt loading and validation
βββ sanitizer.py # LLM output cleaner (markdown, GPT comments)
βββ dynamic_sql_detector.py # New: Utility for dynamic SQL detection
- π§Ή Data Masking and Anonymization - Automatically masks sensitive data such as emails, phone numbers, credit card numbers, and SSNs.
- βοΈ Modular task engine (comment, analyze, refactor, audit, explain, test)
- π Query Simulation and Validation
- π Centralized prompt management via
prompts/index.yaml
- β‘ Asynchronous OpenAI integration using
httpx
- π§Ό Sanitized output with
--sanitize
- π Directory recursion and batching with
--recursive
- π§ͺ Preview results before modifying files with
--dry-run
- π Backup and HIPAA-safe logging
- π Git integration: auto-stage files with
--git
- π€ Export to separate files using
--output
- π Support for multiple SQL dialects using nl_to_sql task (e.g., T-SQL, PostgreSQL, Oracle)
- π Natural Language to SQL Conversion
- π Enhanced security audits with SQL injection and role misuse detection
- π Performance benchmarking and optimization
- π§ Data masking and anonymization
- π¨ SQL Style Guide Enforcement
- π Dynamic SQL Detection
- π§βπ« SQL Education Mode (Interactive Tutorials)
python app.py --task=mask --path=example.sql --output=masked_example.sql
- Task:
mask
β Automatically identifies and masks sensitive data such as:- Email addresses.
- Phone numbers.
- Credit card numbers.
- Social Security Numbers (SSNs).
--output=...
: Writes the masked SQL to a new file.
python app.py --task=style_enforce --path=example.sql --sql_dialect=PostgreSQL --output=styled_example.sql
- Task:
style_enforce
β Enforces SQL coding standards dynamically using AI. --sql_dialect=...
: Specifies the SQL dialect (e.g., PostgreSQL, T-SQL).--output=...
: Writes the styled SQL to a new file.
python app.py --task=comment --path=example.sql
python app.py --task=comment --path=example.sql --sanitize --output=cleaned_example.sql
python app.py --task=refactor --path=query.sql --dry-run
python app.py --task=analyze --path=./sql_scripts --recursive --backup
python app.py --task=audit --path=query.sql --git
python app.py --task=test --path=example.sql --dry-run
python app.py --task=nl_to_sql --path="list all patients diagnosed with diabetes last month" --sql_dialect="PostgreSQL" --schema_path="schema/schema.json" --dry-run
python app.py --task=nl_to_sql --path=queries/nl_query.txt --sql_dialect="T-SQL" --schema_path="schema/HealthClaimsDW.json" --output=output/generated_query.sql
python app.py --task=benchmark --path=example.sql --dry-run
python app.py --task=visualize --path=example.sql
Detect dynamic SQL patterns and analyze risks/optimizations:
python app.py --task=dynamic_sql --path="queries/sample_query.sql" --detect_only --dry-run
python app.py --task=dynamic_sql --path="queries/" --recursive
The sql_learn_mode.py script provides an interactive platform for learning SQL concepts through quizzes, practice, and conversational guidance. It leverages an AI client (BaseAIClient) for generating dynamic SQL content, such as quiz questions and feedback on queries.
python learn/sql_learn_mode.py
Optionally, you can Set as startup file from within VS Professional and click Start/F5 to run the console app. This will start the AI Agent.
Edit core/config_loader.py
to match your Azure OpenAI deployment:
AOPAI_KEY = "your-api-key"
API_BASE = "https://your-resource.openai.azure.com/"
AOPAI_API_VERSION = "2025-01-01-preview"
AOPAI_DEPLOY_MODEL = "gpt-4o-dev"
pip install -r requirements.txt
All prompts are defined in a single index.yaml
file, which maps specific tasks to their associated prompt templates. This design enables:
- Consistent prompt formatting
- Easier version control and auditing
- Reusability across multiple modules
Each task class dynamically loads its associated prompt using metadata from this file.
commenter.add_comments:
inline: |
"You are a T-SQL expert. Given the SQL code below, please:
1. Prepend a comment header block with:
-- =============================================
-- Author: {user}
-- Create date: {timestamp}
-- Description: <Provide a detailed overview of this query>
-- =============================================
2. Add or improve inline comments throughout the query.
3. Only return the updated SQL code with no markdown formatting.
SQL Code:
{sql_query}"
used_by: tasks.sql_commenter.SQLCommenter
inputs:
- sql_query
- user
- timestamp
version: 1.0
description: Add comments and metadata headers to SQL queries.
- Logs are stored per task under the
logs/
directory - Safe use of T-SQL comments (
--
,/* ... */
) - Output sanitized for code-only results when using
--sanitize
- Aligned with HIPAA/HITECH compliance standards
This project is licensed under the MIT License.
- Fork and open a PR
- Follow modular design and solid principles
- Ensure proper logging, error handling, and secure configuration
- Vision and engineering by Hans Esquivel
- Powered by Python & Azure OpenAI
- Thanks to the SQL and ML community for insights and best practices
Please open an issue or start a discussion if you want to get in touch.