Chatan

Create diverse, synthetic datasets. Start from scratch or augment an existing dataset. Simply define your dataset schema as a set of generators, typically being LLMs with a prompt describing what kind of examples you want.

Installation

Basic installation (includes OpenAI, Anthropic, and core functionality):

pip install chatan

With optional features:

# For local model support (transformers + PyTorch)
pip install chatan[local]

# For advanced evaluation features (semantic similarity, BLEU score)
pip install chatan[eval]

# For all optional features
pip install chatan[all]

Getting Started

import chatan

# Create a generator
gen = chatan.generator("openai", "YOUR_API_KEY")

# Define a dataset schema
ds = chatan.dataset({
    "topic": chatan.sample.choice(["Python", "JavaScript", "Rust"]),
    "prompt": gen("write a programming question about {topic}"),
    "response": gen("answer this question: {prompt}")
})

# Generate the data with a progress bar
df = ds.generate(n=10)

Generator Options

API-based Generators (included in base install)

# OpenAI
gen = chatan.generator("openai", "YOUR_OPENAI_API_KEY")

# Anthropic
gen = chatan.generator("anthropic", "YOUR_ANTHROPIC_API_KEY")

Local Model Support (requires `pip install chatan[local]`)

# HuggingFace Transformers
gen = chatan.generator("transformers", model="microsoft/DialoGPT-medium")

Examples

Create Data Mixes

from chatan import dataset, generator, sample
import uuid

gen = generator("openai", "YOUR_API_KEY")

mix = [
    "san antonio, tx",
    "marfa, tx",
    "paris, fr"
]

ds = dataset({
    "id": sample.uuid(),
    "topic": sample.choice(mix),
    "prompt": gen("write an example question about the history of {topic}"),
    "response": gen("respond to: {prompt}"),
})

Augment datasets

from chatan import generator, dataset, sample
from datasets import load_dataset

gen = generator("openai", "YOUR_API_KEY")
hf_data = load_dataset("some/dataset")

ds = dataset({
    "original_prompt": sample.from_dataset(hf_data, "prompt"),
    "variation": gen("rewrite this prompt: {original_prompt}"),
    "response": gen("respond to: {variation}")
})

Evaluation

Evaluate rows inline or compute aggregate metrics:

from chatan import dataset, eval, sample

ds = dataset({
    "col1": sample.choice(["a", "a", "b"]),
    "col2": "b",
    "score": eval.exact_match("col1", "col2")
})

df = ds.generate()
aggregate = ds.evaluate({
    "exact_match": ds.eval.exact_match("col1", "col2")
})

Advanced Evaluation (requires `pip install chatan[eval]`)

# Semantic similarity using sentence transformers
aggregate = ds.evaluate({
    "semantic_sim": ds.eval.semantic_similarity("col1", "col2")
})

# BLEU score evaluation
aggregate = ds.evaluate({
    "bleu": ds.eval.bleu_score("col1", "col2")
})

Installation Options Summary

Feature	Install Command	What's Included
Basic	`pip install chatan`	OpenAI, Anthropic, core sampling, basic evaluation
Local Models	`pip install chatan[local]`	+ HuggingFace Transformers, PyTorch
Advanced Eval	`pip install chatan[eval]`	+ Semantic similarity, BLEU scores, NLTK
Everything	`pip install chatan[all]`	All features above

Citation

If you use this code in your research, please cite:

@software{reetz2025chatan,
  author = {Reetz, Christian},
  title = {chatan: Create synthetic datasets with LLM generators.},
  url = {https://github.com/cdreetz/chatan},
  year = {2025}
}

Contributing

Community contributions are more than welcome, bug reports, bug fixes, feature requests, feature additions, please refer to the Issues tab.

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
.github/workflows		.github/workflows
docs		docs
src/chatan		src/chatan
tests		tests
.gitignore		.gitignore
.python-version		.python-version
.readthedocs.yml		.readthedocs.yml
LICENSE		LICENSE
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Chatan

Installation

Getting Started

Generator Options

API-based Generators (included in base install)

Local Model Support (requires `pip install chatan[local]`)

Examples

Evaluation

Advanced Evaluation (requires `pip install chatan[eval]`)

Installation Options Summary

Citation

Contributing

About

Uh oh!

Releases

Packages

Languages

License

cdreetz/chatan

Folders and files

Latest commit

History

Repository files navigation

Chatan

Installation

Getting Started

Generator Options

API-based Generators (included in base install)

Local Model Support (requires pip install chatan[local])

Examples

Evaluation

Advanced Evaluation (requires pip install chatan[eval])

Installation Options Summary

Citation

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Local Model Support (requires `pip install chatan[local]`)

Advanced Evaluation (requires `pip install chatan[eval]`)

Packages