This the code for LLMOpt: Query Optimization utilizing Large Language Models.
- Prepare PostgreSQL You can find corresponding version of PostgreSQL in (https://www.postgresql.org/ftp/source/v16.1/)
cd postgresql-16.1/
./configure --prefix=/usr/local/pgsql --without-icu
make
make install
- Install pg_hint_plan. We modifie it with [pg_hint_plan_lucifer] (https://github.com/lucifer12346/pg_hint_plan_lucifer)
cd ./pg_hint_plan_lucifer/
make
make install
- Python Requirements For LLM Training, we use:
pytorch 2.4.1+cu118
torchaudio 2.4.1+cu118
torchvision 0.19.1+cu118
transformers 4.44.2
accelerate 0.30.1
deepspeed 0.14.4
datasets 2.21.0
tensorboard 2.14.0
math
tpdm
in python 3.8.5
For LLM inference, we use:
vllm==0.6.3.post1
in python 3.9.0
The environment varies in different hardwares.
- Training You have to modify YOUR_ACCELERATE_CONFIG and PRETRAINED_MODEL in the script to your environment config and model path.
cd scripts
bash train_sel.bash
- Inference You have to modify ckpt in the script.
bash infer_sel.bash
- We use Tree-CNN in BAO to select the optimal hint from "pred_hints".
-
Following BAO, we use the most common hints to prepare data.
-
Training You have to modify YOUR_ACCELERATE_CONFIG and PRETRAINED_MODEL in the script to your environment config and model path.
cd scripts
bash train_gen.bash
- Inference You have to modify ckpt in the script.
bash infer_gen.bash
The hint in "pred_hints" is the chosen one