GitHub - CoDS-GCS/Chatty-Gen

This repository is dedicated to our paper titled "Dialogue Benchmark Generation from Knowledge Graphs with Cost-Effective Retrieval-Augmented LLMs" published in SIGMOD 2025. The full paper can be accessed here.

Abstract

Dialogue benchmarks are crucial in training and evaluating chatbots engaging in domain-specific conversations. Knowledge graphs (KGs) represent semantically rich and well-organized data spanning various domains, such as DBLP, DBpedia, and YAGO. Traditionally, dialogue benchmarks have been manually created from documents, neglecting the potential of KGs in automating this process. Some question-answering benchmarks are automatically generated using extensive preprocessing from KGs, but they do not support dialogue generation. This paper introduces Chatty-Gen, a novel multi-stage retrieval-augmented generation platform for automatically generating high-quality dialogue benchmarks tailored to a specific domain using a KG. Chatty-Gen decomposes the generation process into manageable stages and uses assertion rules for automatic validation between stages. Our approach enables control over intermediate results to prevent time-consuming restarts due to hallucinations. It also reduces reliance on costly and more powerful commercial LLMs. Chatty-Gen eliminates upfront processing of the entire KG using efficient query-based retrieval to find representative subgraphs based on the dialogue context. Our experiments with several real and large KGs demonstrate that Chatty-Gen significantly outperforms state-of-the-art systems and ensures consistent model and system performance across multiple LLMs of diverse capabilities, such as GPT-4o, Gemini 1.5, Llama 3, and Mistral.

System Requirements

The system requirements needed to run the codebase.

Operating System: [e.g., Ubuntu 20.04, macOS, Windows 10]
Python Version: [e.g., 3.8+]
Optional Software: [e.g., Docker, Docker Compose]

Installation

(without docker)

git clone the repo
cd repo-directory
python3 -m venv .venv create new python virtual environment
sudo apt install python3-pip make sure pip is installed python3 -m pip install --upgrade pip
source .venv/bin/activate - activate virtual environment
pip3 install -r requirements.txt

(with docker)

make sure you have docker installed docker --version, install it if not found - docker installation guide
git clone the repo
cd repo-directory
docker compose up --build

Run Experiments

To run the experiments you need to first configure the run-config yaml file

example runconfig.yaml looks like below.

kghost: 206.12.95.86 # knowledge graph sparql endpoint
kgport: 8894
redishost: localhost
outputdir: ./results/docker-test/dblp/singleshot/gpt-3.5-turbo # output directory path for generated benchmark data
kgname: dblp # the knowledge graph name
pipeline_type: original
dataset_size: 1
dialogue_size: 5
wandb_project: cov-kg-benchmark
approach: 
  - single-shot
  - subgraph-summarized
comman_model:
  model_type: "openai"
  model_name: "gpt-3.5-turbo"
  model_endpoint: ""
  model_apikey: "<OPENAI_API_KEY>"
use_label: true
tracing: true
logging: true

update benchmark/appconfig.py with location of your runconfig.yaml file

Run experiments without docker

activate virtual environment source .venv/bin/activate
install dependecies pip install -r requirements.txt
make sure you have updated the runconfig-yaml and its path in benchmark/appconfig.py
run python3 benchmark/main.py, will store the generated data at outputdir path in runconfig-yaml

Run experiments with docker

make sure you have updated the runconfig-yaml and its path in benchmark/appconfig.py
run docker compose up --build

Citing Our Work

@article{chattygen,
author = {Reham Omar, Omij Mangukiya, and Essam Mansour },
title = {Dialogue Benchmark Generation from Knowledge Graphs with Cost-Effective Retrieval-Augmented LLMs},
year = {2025},
doi = {10.1145/3709681},
journal = {Proceedings ACM Management Data(SIGMOD)},
}

Contact:

For any queries, feel free to send an e-mail to reham.omar@mail.concordia.ca or essam.mansour@concordia.ca. We look forward to receiving your feedback.

Name		Name	Last commit message	Last commit date
Latest commit History 131 Commits
benchmark		benchmark
generated_benchmarks		generated_benchmarks
generated_benchmarks_configs		generated_benchmarks_configs
subgraph_extractor		subgraph_extractor
tiktoken-cache		tiktoken-cache
.dockerignore		.dockerignore
.gitignore		.gitignore
Architecture.png		Architecture.png
Dockerfile		Dockerfile
README.md		README.md
Supplementary_material.pdf		Supplementary_material.pdf
__init__.py		__init__.py
cl100k_base.tiktoken		cl100k_base.tiktoken
dblp_kg_to_text.txt		dblp_kg_to_text.txt
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
subgraphs.jsonl		subgraphs.jsonl
subgraphs_webnlg_format.json		subgraphs_webnlg_format.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Table of Contents

Abstract

System Requirements

Installation

(without docker)

(with docker)

Run Experiments

Run experiments without docker

Run experiments with docker

Citing Our Work

Contact:

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

CoDS-GCS/Chatty-Gen

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

Abstract

System Requirements

Installation

(without docker)

(with docker)

Run Experiments

Run experiments without docker

Run experiments with docker

Citing Our Work

Contact:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages