This README outlines the steps to install PostgreSQL, generate and load TPC-H benchmark data into PostgreSQL, and monitor system performance during SQL query execution. This setup uses an Ubuntu operating system on a Parallels virtual machine running on an M3 MacBook configured with an 8-core CPU and 8 GB of RAM.
-
Update your system's package index:
sudo apt update sudo apt upgrade
-
Install PostgreSQL and its contrib package:
sudo apt install postgresql postgresql-contrib
-
Start and enable PostgreSQL:
sudo systemctl start postgresql sudo systemctl enable postgresql
-
Install sysstat:
sudo apt-get install sysstat
-
Enable sysstat to collect system performance data:
sudo nano /etc/default/sysstat # Change ENABLED="false" to ENABLED="true" # Save and exit sudo service sysstat restart
-
Install necessary packages:
sudo apt install git make gcc
-
Clone and build TPC-H DBGen:
git clone https://github.com/electrum/tpch-dbgen.git cd tpch-dbgen make
Generate data for testing (where "1" represents a 1 GB data size):
./dbgen -s 1
-
Start PostgreSQL and Change the PostgreSQL user password:
sudo -u postgres psql ALTER USER postgres WITH PASSWORD '0000';
-
Create the TPC-H database and switch to it:
CREATE DATABASE tpch; \c tpch
-
Create tables for TPC-H data:
CREATE TABLE nation ( n_nationkey INTEGER not null, n_name CHAR(25) not null, n_regionkey INTEGER not null, n_comment VARCHAR(152) ); CREATE TABLE region ( r_regionkey INTEGER not null, r_name CHAR(25) not null, r_comment VARCHAR(152) ); CREATE TABLE part ( p_partkey BIGINT not null, p_name VARCHAR(55) not null, p_mfgr CHAR(25) not null, p_brand CHAR(10) not null, p_type VARCHAR(25) not null, p_size INTEGER not null, p_container CHAR(10) not null, p_retailprice DOUBLE PRECISION not null, p_comment VARCHAR(23) not null ); CREATE TABLE supplier ( s_suppkey BIGINT not null, s_name CHAR(25) not null, s_address VARCHAR(40) not null, s_nationkey INTEGER not null, s_phone CHAR(15) not null, s_acctbal DOUBLE PRECISION not null, s_comment VARCHAR(101) not null ); CREATE TABLE partsupp ( ps_partkey BIGINT not null, ps_suppkey BIGINT not null, ps_availqty BIGINT not null, ps_supplycost DOUBLE PRECISION not null, ps_comment VARCHAR(199) not null ); CREATE TABLE customer ( c_custkey BIGINT not null, c_name VARCHAR(25) not null, c_address VARCHAR(40) not null, c_nationkey INTEGER not null, c_phone CHAR(15) not null, c_acctbal DOUBLE PRECISION not null, c_mktsegment CHAR(10) not null, c_comment VARCHAR(117) not null ); CREATE TABLE orders ( o_orderkey BIGINT not null, o_custkey BIGINT not null, o_orderstatus CHAR(1) not null, o_totalprice DOUBLE PRECISION not null, o_orderdate DATE not null, o_orderpriority CHAR(15) not null, o_clerk CHAR(15) not null, o_shippriority INTEGER not null, o_comment VARCHAR(79) not null ); CREATE TABLE lineitem ( l_orderkey BIGINT not null, l_partkey BIGINT not null, l_suppkey BIGINT not null, l_linenumber BIGINT not null, l_quantity DOUBLE PRECISION not null, l_extendedprice DOUBLE PRECISION not null, l_discount DOUBLE PRECISION not null, l_tax DOUBLE PRECISION not null, l_returnflag CHAR(1) not null, l_linestatus CHAR(1) not null, l_shipdate DATE not null, l_commitdate DATE not null, l_receiptdate DATE not null, l_shipinstruct CHAR(25) not null, l_shipmode CHAR(10) not null, l_comment VARCHAR(44) not null );
Move TPC-H generated .tbl
files to PostgreSQL accessible directory and prepare them:
sudo cp /home/parallels/tpch-dbgen/*.tbl /var/lib/postgresql/
sudo chown postgres:postgres /var/lib/postgresql/*.tbl
# Remove the vertical bar (|) at the end of each line in each `.tbl` file
sudo sed 's/|$//' /var/lib/postgresql/customer.tbl > /tmp/customer_fixed.tbl
sudo mv /tmp/customer_fixed.tbl /var/lib/postgresql/customer_fixed.tbl
sudo chown postgres:postgres /var/lib/postgresql/customer_fixed.tbl
sudo sed 's/|$//' /var/lib/postgresql/orders.tbl > /tmp/orders_fixed.tbl
sudo mv /tmp/orders_fixed.tbl /var/lib/postgresql/orders_fixed.tbl
sudo chown postgres:postgres /var/lib/postgresql/orders_fixed.tbl
sudo sed 's/|$//' /var/lib/postgresql/lineitem.tbl > /tmp/lineitem_fixed.tbl
sudo mv /tmp/lineitem_fixed.tbl /var/lib/postgresql/lineitem_fixed.tbl
sudo chown postgres:postgres /var/lib/postgresql/lineitem_fixed.tbl
sudo sed 's/|$//' /var/lib/postgresql/nation.tbl > /tmp/nation_fixed.tbl
sudo mv /tmp/nation_fixed.tbl /var/lib/postgresql/nation_fixed.tbl
sudo chown postgres:postgres /var/lib/postgresql/nation_fixed.tbl
sudo sed 's/|$//' /var/lib/postgresql/region.tbl > /tmp/region_fixed.tbl
sudo mv /tmp/region_fixed.tbl /var/lib/postgresql/region_fixed.tbl
sudo chown postgres:postgres /var/lib/postgresql/region_fixed.tbl
sudo sed 's/|$//' /var/lib/postgresql/part.tbl > /tmp/part_fixed.tbl
sudo mv /tmp/part_fixed.tbl /var/lib/postgresql/part_fixed.tbl
sudo chown postgres:postgres /var/lib/postgresql/part_fixed.tbl
sudo sed 's/|$//' /var/lib/postgresql/supplier.tbl > /tmp/supplier_fixed.tbl
sudo mv /tmp/supplier_fixed.tbl /var/lib/postgresql/supplier_fixed.tbl
sudo chown postgres:postgres /var/lib/postgresql/supplier_fixed.tbl
sudo sed 's/|$//' /var/lib/postgresql/partsupp.tbl > /tmp/partsupp_fixed.tbl
sudo mv /tmp/partsupp_fixed.tbl /var/lib/postgresql/partsupp_fixed.tbl
sudo chown postgres:postgres /var/lib/postgresql/partsupp_fixed.tbl
Load data into PostgreSQL using the \COPY command:
\COPY customer FROM '/var/lib/postgresql/customer_fixed.tbl' WITH (FORMAT csv, DELIMITER '|', HEADER false, NULL 'null');
\COPY orders FROM '/var/lib/postgresql/orders_fixed.tbl' WITH (FORMAT csv, DELIMITER '|', HEADER false, NULL 'null');
\COPY lineitem FROM '/var/lib/postgresql/lineitem_fixed.tbl' WITH (FORMAT csv, DELIMITER '|', HEADER false, NULL 'null');
\COPY nation FROM '/var/lib/postgresql/nation_fixed.tbl' WITH (FORMAT csv, DELIMITER '|', HEADER false, NULL 'null');
\COPY region FROM '/var/lib/postgresql/region_fixed.tbl' WITH (FORMAT csv, DELIMITER '|', HEADER false, NULL 'null');
\COPY part FROM '/var/lib/postgresql/part_fixed.tbl' WITH (FORMAT csv, DELIMITER '|', HEADER false, NULL 'null');
\COPY supplier FROM '/var/lib/postgresql/supplier_fixed.tbl' WITH (FORMAT csv, DELIMITER '|', HEADER false, NULL 'null');
\COPY partsupp FROM '/var/lib/postgresql/partsupp_fixed.tbl' WITH (FORMAT csv, DELIMITER '|', HEADER false, NULL 'null');
-
Created a script called
postgresql.sh
that is attached to execute SQL queries and monitor system resources:The
postgresql_queries
file includes the TPC-H benchmark SQL queries that have been modified to work with PostgreSQL.The
postgresql.sh
script handles executing queries from a specified directory and logs their execution time and system resource usage. The monitoring processes likevmstat
,iostat
, andsar
are started before executing the queries and stopped afterward.The script also calculates and logs throughput based on total execution time and number of queries.
-
Usage (open a new terminal in the same location as the
postgresql.sh
file)Make sure to mark
postgresql.sh
as executable and run it:chmod +x postgresql.sh ./postgresql.sh
This README provides a comprehensive guide for setting up a PostgreSQL database with TPC-H data and instructions for monitoring the performance of SQL queries executed against this database. Adjust paths and parameters as needed for your specific setup.