Skip to content

GenAI-SQL is a modular, extensible suite of AI-powered tools for automating SQL code improvement, documentation, and validation. Built for developers, analysts, and data engineers, it leverages Azure OpenAI (GPT-4o) to analyze, refactor, comment, explain, test, and audit SQL β€” all within a secure, asynchronous, and HIPAA-compliant framework.

License

Notifications You must be signed in to change notification settings

Club-Innovate/GenAI-SQL-CLI

Repository files navigation

🧠 GenAI SQL Tools Suite

A production-ready suite of modular, asynchronous tools for analyzing, refactoring, commenting, and auditing SQL code using Azure OpenAI (GPT-4o).

Designed with security, performance, auditability, and HIPAA/HITECH compliance in mind.


πŸ“ Project Structure

sql_tools/
β”œβ”€β”€ app.py                      # CLI interface and controller
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ README.md
β”œβ”€β”€ LICENSE
β”œβ”€β”€ core/                       # Framework and shared logic
β”‚   β”œβ”€β”€ base_ai_client.py       # Async Azure OpenAI client
β”‚   β”œβ”€β”€ config_loader.py        # Configuration loader (mirrors original config.py)
β”‚   β”œβ”€β”€ logger.py               # HIPAA-compliant logging utility
β”‚   └── sql_task_base.py        # Abstract base class for GenAI SQL tasks
β”œβ”€β”€ tasks/                      # Modular GenAI SQL task classes
β”‚   β”œβ”€β”€ sql_analyzer.py
β”‚   β”œβ”€β”€ sql_commenter.py
β”‚   β”œβ”€β”€ sql_data_masker.py
β”‚   β”œβ”€β”€ sql_refactorer.py
β”‚   β”œβ”€β”€ sql_explainer.py
β”‚   β”œβ”€β”€ sql_security_auditor.py
β”‚   β”œβ”€β”€ sql_test_generator.py
β”‚   β”œβ”€β”€ sql_performance_benchmark.py # New: Added performance benchmarking and optimization
β”‚   β”œβ”€β”€ sql_data_masker.py           # New: Data masking and anonymization
β”‚   β”œβ”€β”€ sql_visualizer.py            # New: Visualization and insights
β”‚   β”œβ”€β”€ sql_error_corrector.py       # New: Error correction and debugging
β”‚   β”œβ”€β”€ sql_style_enforcer.py        # New: Style guide enforcement
β”‚   β”œβ”€β”€ natural_language_to_sql.py   # New: Natural language to SQL conversion
β”œβ”€β”€ learn/                      # Learning Mode folder
β”‚   └── sql_learn_mode.py       # Interactive tutorials for SQL learning
β”œβ”€β”€ prompts/                    # Centralized prompt management
β”‚   β”œβ”€β”€ index.yaml              # YAML file defining all prompts and metadata
β”‚   β”œβ”€β”€ summarization/          # Summarization-related prompt templates
β”‚   β”œβ”€β”€ classification/         # Classification-related prompt templates
└── utils/
    β”œβ”€β”€ file_utils.py           # File I/O, backup, and directory handling
    β”œβ”€β”€ prompt_manager.py       # Centralized prompt loading and validation
    β”œβ”€β”€ sanitizer.py            # LLM output cleaner (markdown, GPT comments)
    β”œβ”€β”€ dynamic_sql_detector.py # New: Utility for dynamic SQL detection

βœ… Features

  • 🧹 Data Masking and Anonymization - Automatically masks sensitive data such as emails, phone numbers, credit card numbers, and SSNs.
  • βš™οΈ Modular task engine (comment, analyze, refactor, audit, explain, test)
  • πŸ” Query Simulation and Validation
  • πŸ“‹ Centralized prompt management via prompts/index.yaml
  • ⚑ Asynchronous OpenAI integration using httpx
  • 🧼 Sanitized output with --sanitize
  • πŸ” Directory recursion and batching with --recursive
  • πŸ§ͺ Preview results before modifying files with --dry-run
  • πŸ” Backup and HIPAA-safe logging
  • πŸ”€ Git integration: auto-stage files with --git
  • πŸ“€ Export to separate files using --output
  • 🌐 Support for multiple SQL dialects using nl_to_sql task (e.g., T-SQL, PostgreSQL, Oracle)
  • πŸš€ Natural Language to SQL Conversion
  • πŸ” Enhanced security audits with SQL injection and role misuse detection
  • πŸ“Š Performance benchmarking and optimization
  • πŸ”§ Data masking and anonymization
  • 🎨 SQL Style Guide Enforcement
  • πŸ›  Dynamic SQL Detection
  • πŸ§‘β€πŸ« SQL Education Mode (Interactive Tutorials)

πŸ§ͺ CLI Usage

πŸ”’ Mask Sensitive Data in SQL Queries

python app.py --task=mask --path=example.sql --output=masked_example.sql

πŸ” What It Does:

  • Task: mask β€” Automatically identifies and masks sensitive data such as:
    • Email addresses.
    • Phone numbers.
    • Credit card numbers.
    • Social Security Numbers (SSNs).
  • --output=...: Writes the masked SQL to a new file.

πŸ”§ Enforce SQL Style Guide

python app.py --task=style_enforce --path=example.sql --sql_dialect=PostgreSQL --output=styled_example.sql

What It Does:

  • Task: style_enforce β€” Enforces SQL coding standards dynamically using AI.
  • --sql_dialect=...: Specifies the SQL dialect (e.g., PostgreSQL, T-SQL).
  • --output=...: Writes the styled SQL to a new file.

πŸ”§ Comment a SQL file

python app.py --task=comment --path=example.sql

🧼 Clean and save to new file

python app.py --task=comment --path=example.sql --sanitize --output=cleaned_example.sql

πŸ” Preview refactored query (no overwrite)

python app.py --task=refactor --path=query.sql --dry-run

πŸ—ƒοΈ Process all .sql files in folder (with backups)

python app.py --task=analyze --path=./sql_scripts --recursive --backup

πŸ” Run security audit and stage for Git

python app.py --task=audit --path=query.sql --git

πŸ§ͺ Generate SQL test cases

python app.py --task=test --path=example.sql --dry-run

πŸš€ Natural Language to SQL Conversion (inline query)

python app.py --task=nl_to_sql --path="list all patients diagnosed with diabetes last month" --sql_dialect="PostgreSQL" --schema_path="schema/schema.json" --dry-run

πŸš€ Natural Language to SQL Conversion (query file)

python app.py --task=nl_to_sql --path=queries/nl_query.txt --sql_dialect="T-SQL" --schema_path="schema/HealthClaimsDW.json" --output=output/generated_query.sql

πŸ“Š Benchmark SQL query performance

python app.py --task=benchmark --path=example.sql --dry-run

πŸ“ˆ Visualize query execution plan

python app.py --task=visualize --path=example.sql

πŸ” Dynamic SQL Detection

Detect dynamic SQL patterns and analyze risks/optimizations:

Analyze risks and optimizations - Detect patterns only:

python app.py --task=dynamic_sql --path="queries/sample_query.sql" --detect_only --dry-run

Process all SQL files in a directory recursively:

python app.py --task=dynamic_sql --path="queries/" --recursive

πŸ‘©β€πŸ« SQL Learning Mode (Interactive Tutorials)

The sql_learn_mode.py script provides an interactive platform for learning SQL concepts through quizzes, practice, and conversational guidance. It leverages an AI client (BaseAIClient) for generating dynamic SQL content, such as quiz questions and feedback on queries.

Run the Learning Mode directly:

python learn/sql_learn_mode.py

Optionally, you can Set as startup file from within VS Professional and click Start/F5 to run the console app. This will start the AI Agent.


πŸ›  Configuration

Edit core/config_loader.py to match your Azure OpenAI deployment:

AOPAI_KEY = "your-api-key"
API_BASE = "https://your-resource.openai.azure.com/"
AOPAI_API_VERSION = "2025-01-01-preview"
AOPAI_DEPLOY_MODEL = "gpt-4o-dev"

πŸ“¦ Install Requirements

pip install -r requirements.txt

πŸ“‹ Centralized Prompt Management

All prompts are defined in a single index.yaml file, which maps specific tasks to their associated prompt templates. This design enables:

  • Consistent prompt formatting
  • Easier version control and auditing
  • Reusability across multiple modules

Each task class dynamically loads its associated prompt using metadata from this file.

🧾 Example index.yaml Entry

commenter.add_comments:
  inline: |
    "You are a T-SQL expert. Given the SQL code below, please:
    1. Prepend a comment header block with:
       -- =============================================
       -- Author:      {user}
       -- Create date: {timestamp}
       -- Description: <Provide a detailed overview of this query>
       -- =============================================

    2. Add or improve inline comments throughout the query.
    3. Only return the updated SQL code with no markdown formatting.

    SQL Code:
    {sql_query}"
  used_by: tasks.sql_commenter.SQLCommenter
  inputs:
    - sql_query
    - user
    - timestamp
  version: 1.0
  description: Add comments and metadata headers to SQL queries.

πŸ” Security & Compliance

  • Logs are stored per task under the logs/ directory
  • Safe use of T-SQL comments (--, /* ... */)
  • Output sanitized for code-only results when using --sanitize
  • Aligned with HIPAA/HITECH compliance standards

πŸ“„ License

This project is licensed under the MIT License.


🀝 Contributing (Comming Soon)

  • Fork and open a PR
  • Follow modular design and solid principles
  • Ensure proper logging, error handling, and secure configuration

πŸ™Œ Authors & Acknowledgments

  • Vision and engineering by Hans Esquivel
  • Powered by Python & Azure OpenAI
  • Thanks to the SQL and ML community for insights and best practices

Contact

Please open an issue or start a discussion if you want to get in touch.


About

GenAI-SQL is a modular, extensible suite of AI-powered tools for automating SQL code improvement, documentation, and validation. Built for developers, analysts, and data engineers, it leverages Azure OpenAI (GPT-4o) to analyze, refactor, comment, explain, test, and audit SQL β€” all within a secure, asynchronous, and HIPAA-compliant framework.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published