Local Realtime Speech Agents

A fully local speech-to-speech AI pipeline combining real-time speech recognition, LLM reasoning, tool augmentation, and low-latency text-to-speech—built to run entirely on-device for faster, private, and extensible interactions.

Paper | Demo

Pipeline

User Speech → STT → LLM (Tool-Augmented) → TTS → Audio Response

STT: Converts audio into text (RealtimeSTT)
LLM + Tools: Text input is processed by a local LLM (in models/) which can invoke external tools (from tools/)
TTS: Streams the LLM response as audio (RealtimeTTS)

Setup

# Install uv
pip install uv

git clone https://github.com/ThePickleGawd/realtime-speech-agents.git
cd realtime-speech-agents
uv sync

Running the Agent

uv run models/V1.py

Models

V1.py, V2.py, V3.py: Variants of the core speech agent models.
Includes LLM orchestration logic, response synthesis, and tool invocation.
See paper for more details

Speak into the mic — your agent will respond in real time.

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
config		config
docs		docs
models		models
tests		tests
tools		tools
utils		utils
.DS_Store		.DS_Store
.gitignore		.gitignore
.gitmodules		.gitmodules
.python-version		.python-version
README.md		README.md
output.wav		output.wav
pyproject.toml		pyproject.toml
results_cot.json		results_cot.json
results_langgraph.json		results_langgraph.json
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Local Realtime Speech Agents

Pipeline

Setup

Running the Agent

Models

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

ThePickleGawd/realtime-speech-agents

Folders and files

Latest commit

History

Repository files navigation

Local Realtime Speech Agents

Pipeline

Setup

Running the Agent

Models

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages