This project demonstrates how to insert large datasets from JSON files into a PostgreSQL database and extract selected data back into a JSON file. It includes database table creation, data insertion, and JSON conversion processes, organized with clear, reusable utility modules.
The goal of this project is to automate the process of importing JSON data into a PostgreSQL database and exporting filtered rows back to a JSON file. This setup is useful for scenarios where large or complex JSON structures need to be stored, queried, and partially retrieved. The process is broken down into two major steps:
- Data Ingestion: Reads JSON files and inserts data into relevant PostgreSQL tables.
- Data Extraction: Performs custom SQL queries on these tables and saves the results into a new JSON file.
By following the steps below, you’ll be able to replicate or extend this functionality in your own PostgreSQL environment.
- Data Ingestion: Inserts data from JSON files into PostgreSQL tables.
- Data Extraction: Exports filtered data from the database to a JSON file.
- Structured Code: Organized into reusable utilities for database and file operations.
- Logging: Generates logs to help trace the data flow and debug issues.
- Flexible Querying: Easily configure which rows or columns get exported.
- Python 3.x
- PostgreSQL (latest version recommended)
- Docker Desktop (optional, if running PostgreSQL in a container)
.
├── main.py # Entry point of the application
├── .env # Environment variables (not in repo)
├── data/ # JSON files for input/output
├── db/ # SQL scripts for creating tables
├── src/
│ ├── __init__.py # Module initialization
│ ├── app.py # Main logic for running insertions and extraction
│ ├── utils/
│ ├── __init__.py # Utility initialization
│ ├── database_functions.py
│ └── file_functions.py
└── requirements.txt # Python dependencies
-
main.py
- Runs the primary flow: reads JSON files, inserts data, and optionally extracts data to a new JSON file.
-
.env
- Holds PostgreSQL credentials (
DB_NAME
,DB_USER
,DB_PASSWORD
) and other environment variables.
- Holds PostgreSQL credentials (
-
data/
- Contains JSON files to be ingested, and where the output JSON file (with filtered data) is generated.
-
db/
- Contains SQL scripts for creating the meter_data, meter_readings, and mandate_data tables (or any additional tables).
-
src/
-
app.py
: Central logic for coordinating reading JSON, inserting records, and exporting results. -
utils/
: Utility modules for database connections/operations and file handling.
-
-
Clone the Repository:
git clone https://github.com/JorgeAJT/sql-fundamentals-final-project.git cd sql-fundamentals-final-project
-
Install dependencies (if you have a
requirements.txt
):pip install -r requirements.txt
Or install manually:
pip install psycopg2-binary python-dotenv
-
Create a
.env
File
In the project root, create a file named.env
with your PostgreSQL credentials:DB_NAME=your_db_name DB_USER=your_db_user DB_PASSWORD=your_db_password
These values are used by the utility functions to connect to your PostgreSQL database.
-
Set Up the Database
- Ensure PostgreSQL is running (either locally or via Docker).
- Run the SQL scripts located in the
db/
folder to create the required tables (meter_data
,meter_readings
,mandate_data
, etc.). - Verify that your user has permission to read/write these tables.
-
Prepare JSON Data
- Place your source JSON files into the
data/
folder. - The scripts will look for these files when inserting records.
- Place your source JSON files into the
-
Execute the Main Script
python main.py
By default, it will:
-
Read the specified JSON files (e.g.,
data/meter_data.json
) -
Insert the data into the corresponding tables (
meter_data
,meter_readings
,mandate_data
) -
Filter rows based on certain conditions (for example,
mandate_status = 'N'
) -
Export the filtered rows to a JSON file (e.g.,
data/selected_data.json
)
-
-
Customize Queries/Logic
-
Edit
src/app.py
or the utility functions insrc/utils/database_functions.py
to change how data is inserted or which rows are extracted. -
Modify
src/utils/file_functions.py
to customize file paths or handle different file formats.
-
-
Review Logs (Optional)
-
Depending on your implementation, you may have logging statements in these utility functions.
-
Check the logs to confirm data was processed successfully or diagnose issues if something goes wrong.
-
Located in src/utils/database_functions.py
:
-
Connection Handling: Manages PostgreSQL connections using credentials from
.env
. -
Bulk Insertions: Functions to insert complex JSON data into tables without manually specifying each column.
-
Custom Queries: Methods for running queries to filter or transform data before exporting.
Located in src/utils/file_functions.py
:
-
JSON Handling: Reads input JSON files and writes output JSON files.
-
Data Validation: Optionally checks file structure before processing.
-
Flexible Querying: Allows you to export the results of custom SQL queries to new JSON files.
-
Debugging: Log messages (if configured) let you see each step: connection to the database, reading files, inserting rows, exporting JSON, etc.
-
Troubleshooting: Errors are logged to help identify where a failure occurred (e.g., invalid file format, connection issues).
Feel free to fork this repository, open issues, or submit pull requests. All contributions are welcome!
Please follow these steps:
-
Fork the project.
-
Create a new branch with your changes:
git checkout -b my-branch
-
Make your changes and commit them:
git commit -m "Description of my changes"
-
Push your changes to your forked repository:
git push origin my-branch
-
Create a Pull Request on GitHub.
👨💻 Jorge Jiménez - JorgeAJT 🏋️♂️
Questions or suggestions? Feel free to open an issue or submit a pull request!