Skip to content

ThePickleGawd/realtime-speech-agents

Repository files navigation

Local Realtime Speech Agents

A fully local speech-to-speech AI pipeline combining real-time speech recognition, LLM reasoning, tool augmentation, and low-latency text-to-speech—built to run entirely on-device for faster, private, and extensible interactions.

Paper | Demo

Pipeline

User Speech → STT → LLM (Tool-Augmented) → TTS → Audio Response
  • STT: Converts audio into text (RealtimeSTT)
  • LLM + Tools: Text input is processed by a local LLM (in models/) which can invoke external tools (from tools/)
  • TTS: Streams the LLM response as audio (RealtimeTTS)

Models

Setup

# Install uv
pip install uv

git clone https://github.com/ThePickleGawd/realtime-speech-agents.git
cd realtime-speech-agents
uv sync

Running the Agent

uv run models/V1.py

Models

  • V1.py, V2.py, V3.py: Variants of the core speech agent models.
  • Includes LLM orchestration logic, response synthesis, and tool invocation.
  • See paper for more details

Speak into the mic — your agent will respond in real time.

About

An Agentic Speech to Speech LLM for MacOS

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages