This project implements an analytical database for a UPS delivery system with features for optimized query performance, full-text search capabilities, and table partitioning.
schema.sql
- Creates tables without constraints for faster data loadingdata_generator.sql
- Generates test data usingplpython3u
with optimized batch processingindexes.sql
- Creates strategic indexes for query optimizationqueries.sql
- Contains sample analytical queries with execution planstable_partitioning.sql
- Implements date-based table partitioningrun.sql
- Main script that executes all components in sequencecleanup.sql
- Removes all database objects for clean restartsdetails.md
- Technical documentation of implementation decisionsTODO.md
- Original assignment and requirements
PostgreSQL
16 or higherplpython3u
extension enabled
To install the plpython3u
extension on Ubuntu:
# Determine your PostgreSQL version
psql --version
# Install the appropriate package
sudo apt update
sudo apt install postgresql-plpython3-16 # adjust version number as needed
# Restart PostgreSQL
sudo systemctl restart postgresql
- Clone this repository
- Create a database for the project:
psql -U postgres -c "CREATE DATABASE ups_analytics;"
- Run the main script:
psql -U postgres -d ups_analytics -f ./run.sql
This project uses several techniques to generate millions of records efficiently:
- Batch Processing: Records are inserted in configurable batch sizes (10,000-50,000)
- Deferred Constraints: Foreign keys and other constraints are added after data generation
- Progress Reporting: Real-time feedback on generation progress and insertion rates
- Realistic Data: Transportation comments are generated with varied vocabulary for testing full-text search
The project demonstrates several PostgreSQL optimization techniques:
- Strategic B-Tree Indexes: Created on columns commonly used in WHERE clauses and joins
- GIN Indexes: For efficient full-text search operations
- Query Analysis: EXPLAIN ANALYZE output to compare optimized vs. non-optimized queries
- Index Toggling: Queries are run with and without indexes to demonstrate performance impact
The item_transportation
table (10+ million rows) is partitioned by date:
- Range Partitioning: Monthly partitions based on the
created_at
column - Utility Functions: Helper functions provided to create new partitions and drop old ones:
create_transportation_partition(year, month)
- Creates a new partitiondrop_old_transportation_partition(months_old)
- Drops partitions older than specified months
- Partition Management Demo:
run.sql
includes a demonstration of these functions in action
The project provides a comprehensive implementation of PostgreSQL's text search:
- Functional GIN Index: On
to_tsvector('english', comment)
for optimized search - Trigram Index: Using the
pg_trgm
extension forLIKE
/ILIKE
queries - Performance Comparison: Queries demonstrate both approaches, allowing performance comparison
When run with default settings, the script generates:
- 10 retail centers
- 10 transport events
- 1,000,000 shipped items
- 10,000,000 item transportation records
For testing purposes, you can adjust these volumes in data_generator.sql
.