Train a lightweight Phi‑2 model using LoRA to perform chain‑of‑thought reasoning on 2D grid maps. Solve simple maze or spatial planning problems step by step.
- Grid‑based visual reasoning using spatial CoT prompts
- LoRA fine‑tuning to keep compute and model size low
- Example maze puzzles with Web UI (
map_interface.html
) - Training and evaluation scripts included (
plot_trainer_state_cli.py
,flask_api.py
)
Simulate an agent navigating a 10x10 grid using discrete action steps.
The objective is to compare different input formats and reasoning strategies:
- CoT with vector inputs
- NLP-based commands
- Direct vector-to-position reasoning (baseline)
Model Folder | Format | Output | Description |
---|---|---|---|
phi2-CoT-finetune5 |
(dx, dy) |
CoT trace + final pos | Full reasoning with 5 starting points |
phi2-NLP-finetune1 |
up/down/... |
CoT trace + final pos | Instruction-following version |
phi2-vec-finetune |
(dx, dy) |
Final position only | Baseline model, no step-by-step explanation |
Each model is under outputs/
, and each .bin
file is under 100MB.
Model Folder | Format | Output | Description |
---|---|---|---|
phi2-CoT-finetune11x11 |
(dx, dy) |
CoT trace + final pos | Trained on 11x11 map-free world, perfect accuracy |
phi2-CoT-finetune11x11_map |
(dx, dy) |
CoT trace + final pos + SG map | Input includes grid map with S, model returns final map with SG |
phi2-Label-finetune1 |
(dx, dy) |
CoT trace + label | Labeled path validity on map with wall (future extension) |
11x11
: basic spatial trace task, vector action → position (no map)11x11 map
: adds map context to input, model must parse visual structurelabel
: data includes correctness classification (correct
,loop
, etc.)
GPT-CoT/
├── configs/ # LoRA training config files (YAML)
├── data/ # JSONL training files
├── outputs/ # Fine-tuned models (3 total)
│ ├── phi2-CoT-finetune5/
│ ├── phi2-CoT-finetune11x11/
│ ├── phi2-CoT-finetune11x11_map/
│ ├── phi2-Label-finetune1/
│ ├── phi2-NLP-finetune1/
│ └── phi2-vec-finetune/
├── source/ # Training and inference scripts
├── .gitignore
├── README.md
└── requirements.txt
git clone https://github.com/Seanaaa0/GPT-CoT.git
cd GPT-CoT
conda activate gpt-env # or your preferred environment
pip install -r requirements.txt
- New fine-tuned model:
phi2-Label-finetune1
- Task: Given a series of vector actions
(dx,dy)
, reason step-by-step to compute the final position and classify the path as one of: correct
,too short
,too long
,loop
,out of bound
,wrong
- Training data:
10x10_vec_labeled.jsonl
- Inference script:
inference_phi2_vec.py
- Accuracy: ~95%, supports full CoT + label correctness tracking
- Output example includes
"label"
and"correct"
field for each prediction
map_interface.html
: displays a 10x10 grid and agent paths interactivelyflask_api.py
: serves model predictions and links frontend ↔ backend- Future integration with live inference and editing
We provide a Python tool to visualize inference traces from test_label.jsonl.
This script will:
- Parse GPT output traces
- Generate per-sample visualizations
- Combine up to 25 images into a grid
Use plot_trainer_state_cli.py
to visualize training loss and gradients:
python plot_trainer_state_cli.py --file results/trainer_state/trainer_stateX.json --metrics 1
- Metric options:
1
: Loss2
: Grad norm3
: Learning rate
Output PNG files are saved to results/png/
.
cd source/data/test_output
python generate_trace_images.py
---
## TODO
- [✅] Train LoRA on vector trace task
- [✅] NLP command version
- [✅] Multi-entry point generalization
- [✅] Trace classification (valid/invalid)
- [ ] Decision Transformer for path generation
- [ ] Add goal-aware discriminator
---
## 📜 License
MIT