This repository contains a minimal Streamlit application for querying an Oracle HeatWave instance using natural language. The app relies on a large language model (LLM) to translate user prompts into SQL queries.
Before starting the deployment, ensure the following networking setup is in place:
- The VCN (Virtual Cloud Network) must include at least two subnets:
- One public subnet
- One private subnet
- The private subnet must have the following ports open to HeatWave:
3306
(classic MySQL)33060
(MySQL X Protocol)- ℹ️ If your HeatWave system uses custom ports, ensure these are used consistently throughout the deployment process.
- The public subnet must have port
8501
open to the internet to allow public access to the app.
Ensure that your Virtual Machine (VM) meets the following conditions:
- Operating System: RHEL 8 or newer
- Network Placement: The VM must reside in the same VCN as the pre-configured HeatWave DB system.
- Internet Connectivity:
- If the VM is in a public subnet, it must be accessible via its public IP.
- If deployed in a private subnet, it must still have outbound internet access (e.g., via a NAT gateway).
To download this repository directly, use the following command:
wget https://github.com/Oscardenas1000/NL2SQL/archive/refs/heads/main.zip
Once the download is complete, unzip the file:
unzip main.zip
Navigate into the extracted project directory:
cd NL2SQL-main
This repository includes an executable setup script named setup.sh
.
To make it executable, run:
chmod +x setup.sh
Then execute the script:
./setup.sh
To make the script executable, run:
chmod +x oci_cli_setup.sh
Then execute it:
./oci_cli_setup.sh
Proceed through the OCI CLI setup by providing the following:
User OCID
Tenancy OCID
Region
Then provide a path and filename for your OCI CLI configuration file, along with the public and private API keys.
To finalize setup, follow the official Oracle documentation here: Signing Requests with API Key to ensure your API keys are properly configured.
To complete the deployment, supply your HeatWave_OCID
and credentials so they can be automatically embedded into the app's .py
file.
The app expects the following environment variables to be defined:
HW_HOST
– hostname of the HeatWave serviceHW_DB_USER
– database user nameHW_DB_PASS
– database passwordHW_DB_NAME
– database schema to useMODEL_ID
– HeatWave GenAI Model ID
Start the Streamlit server with:
streamlit run nl2sql_app.py
Enter a natural language query, and the app will display the generated SQL statement and the query results from HeatWave.
-
Edit the configuration variables at the top of
nl2sql_app.py
to match your HeatWave credentials. -
Launch the app:
streamlit run nl2sql_app.py
-
When the browser opens, type a question about your data, for example:
Which airlines operate flights from SFO to JFK?
The app will generate a SQL query, execute it, and display the results. If the query returns fewer than 25 rows, a short natural language summary is also shown.
The app works at the database/schema level. It queries into the selected schema and pulls metadata like table names, column names, column types, and a required column comment that explains why each column is important.
TABLE_NAME | COLUMN_NAME | COLUMN_TYPE | COLUMN_COMMENT |
---|---|---|---|
airline | airline_id | smallint | Unique identifier for each airline. |
airline | iata | char(2) | Two-character IATA code assigned to the airline, used globally for identification. |
airline | airlinename | varchar(30) | The full name of the airline. |
airline | base_airport | smallint | ID of the base airport for the airline, referring to the primary operational hub. |
airplane | airplane_id | int | Unique identifier for each airplane. This is the primary key and is auto-incremented. |
airplane | capacity | mediumint unsigned | Maximum number of passengers that the airplane can accommodate. |
airplane | type_id | int | Identifier for the airplane model/type. This is a foreign key referencing the airplane_type table. |
airplane | airline_id | int | Identifier of the airline that owns or operates the airplane. This is a foreign key referencing the airline table. |
airplane_type | type_id | int | Unique identifier for each airplane type or model. |
airplane_type | identifier | varchar(50) | Model identifier or code for the airplane type. |
airplane_type | description | text | Additional details or specifications about the airplane type. |
airport | airport_id | smallint | Unique identifier for each airport. |
If your table doesn’t have column comments yet, follow these steps.
Check the current structure and see which columns lack comments:
SHOW FULL COLUMNS FROM your_table_name;
This will display a Comment
column at the end.
In MySQL, you must modify the column definition to add a comment. You cannot just attach a comment separately.
Use this syntax:
ALTER TABLE your_table_name
MODIFY COLUMN column_name column_definition COMMENT 'Your comment here';
Given this table:
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
name VARCHAR(100),
hire_date DATE
);
To add comments:
-
Add to
employee_id
:ALTER TABLE employees MODIFY COLUMN employee_id INT PRIMARY KEY COMMENT 'Unique identifier for each employee';
-
Add to
name
:ALTER TABLE employees MODIFY COLUMN name VARCHAR(100) COMMENT 'Full name of the employee';
-
Add to
hire_date
:ALTER TABLE employees MODIFY COLUMN hire_date DATE COMMENT 'The date the employee was hired';
Run:
SHOW FULL COLUMNS FROM employees;
You should now see your comments listed.
The flowchart below outlines the full workflow of how natural language queries (in any language) are processed and transformed into SQL queries to retrieve data from a database.
-
Input in Natural Language (NL) The user submits a question in natural language, possibly in any language, such as:
¿Cuáles son los productos más vendidos?
-
Retrieve Schema Information The system first executes:
SELECT * FROM information_schema.tables
to discover all available tables in the database. This ensures the system knows which data sources it can query.
-
Check Input Language The system determines if the query is already in English:
- If yes → Proceeds to generate SQL.
- If not → Translates it into English using a translation engine.
-
Generate SQL Code Based on the now-English input and known schema, the app generates a raw SQL query matching the user's intent.
-
Clean Up SQL Code The raw SQL is cleaned or adjusted to:
- Fix structural issues.
- Improve compatibility with the actual database schema.
-
Execute the SQL Code The cleaned SQL query is run against the database.
-
Check Execution Result:
- If the query fails (due to auto-generated syntax or schema issues):
The procedure tries to self-correct the query for multiple times. - If the query succeeds:
- If the result has fewer than 24 rows, it is translated back into natural language and returned to the user in sentence form.
- If the result has 24 rows or more, it is shown as a tabular output.
- If the query fails (due to auto-generated syntax or schema issues):
This project is licensed under the MIT License - see the LICENSE file for details.