Chatterbox FastRTC Realtime Emotion (Local)

Real-time conversational AI with voice cloning and emotion detection. Analyses conversation context to deliver dramatically expressive responses using your cloned voice. Built with FastRTC and Chatterbox TTS for natural, emotionally-aware voice interactions.

✨ Features

🎭 Voice Cloning: Use any voice from a single reference audio file
🎯 Natural Emotion Detection: Analyses conversation context to detect emotions automatically
🎪 Dramatic Expression: Dynamic voice synthesis with exaggeration, temperature, and cfg_weight adjustments
⚡ Real-time Streaming: Low-latency audio generation and playback
💬 Dual Interface: WebSocket text chat and Gradio voice chat
🧠 Smart Context: Maintains conversation history with emotional awareness
🎵 12 Set Emotions: Excited, happy, sad, angry, surprised, confused, tired, worried, calm, frustrated, enthusiastic, neutral

YouTube Demo:

🛠️ Installation

Prerequisites

Python 3.10+
CUDA-compatible GPU (RTX 4090 recommended for real-time performance)
Ollama with Gemma 3 4B model

Setup

Clone the repository

git clone https://github.com/dwain-barnes/chatterbox-fastrtc-realtime-emotion.git
cd chatterbox-fastrtc-realtime-emotion

Install PyTorch for your system

# For CUDA 11.8 (check pytorch.org for your specific setup)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Install requirements
```
pip install -r requirements.txt
```

Install Chatterbox TTS (avoiding numpy conflicts)

# Important: Install without dependencies to avoid numpy==1.26.0 conflicts
pip install --no-deps chatterbox-tts

Install and run Ollama with Gemma 3 4B

# Install Ollama from https://ollama.ai
ollama pull gemma3:4b:latest
ollama serve

Add your voice reference (optional)

# Place your reference voice file in the project directory
cp /path/to/your/voice.wav reference_voice.wav

🎮 Usage

Start the Application

python realtime_emotion.py

Access the Interfaces

Text Chat: http://localhost:8000/
Voice Chat: http://localhost:8000/gradio

Voice Cloning Setup

Record a 10-30 second clear audio sample of the target voice
Save it as reference_voice.wav in the project directory
Restart the application
The cloned voice will be used for all emotional responses

⚙️ Technical Details

Emotion Parameters

Each emotion uses carefully tuned parameters for dramatic expression:

Exaggeration: 0.05 (tired) to 0.95 (excited)
CFG Weight: 0.2 (angry) to 0.95 (tired)
Temperature: 0.3 (tired) to 1.3 (excited)

Performance Requirements

Recommended: RTX 4090 GPU for real-time generation
Minimum: RTX 3070 or equivalent
Model: Gemma 3 4B for optimal speed/quality balance
RAM: 16GB+ recommended

Architecture

Frontend: FastAPI + WebSocket + HTML/CSS/JS
Voice Interface: Gradio + FastRTC
TTS: Chatterbox TTS with voice cloning
STT: FastRTC STT model
LLM: Ollama (Gemma 3 4B)
Emotion Detection: Context-based pattern matching

🎯 How It Works

Input Processing: Text or voice input is received
LLM Response: Gemma 3 generates contextual response
Emotion Detection: Analyses response text for emotional patterns
Voice Synthesis: Applies dramatic parameters based on detected emotion
Real-time Streaming: Audio chunks streamed as they're generated
Playback: Client receives and plays audio with minimal latency

🔧 Configuration

Emotion Tuning

Modify EMOTION_PARAMETERS in the code to adjust emotional expression:

"excited": {
    "exaggeration": 0.95,    # Higher = more expressive
    "cfg_weight": 0.2,       # Lower = more variation
    "temperature": 1.3       # Higher = more dynamic
}

Model Settings

Change LLM model in the init_chat_model call
Adjust chunk duration for latency vs quality trade-offs
Modify sample rates for different audio quality

📝 Requirements

Key dependencies include:

fastapi - Web framework
fastrtc - Real-time communication
chatterbox-tts - Voice synthesis and cloning
langchain - LLM integration
gradio - Voice interface
torch - Deep learning framework
numpy - Numerical computing

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Test with different emotions and voices
Submit a pull request

📄 License

MIT License - see LICENSE file for details.

🙏 Acknowledgments

Chatterbox TTS Streaming for TTS
FastRTC for real-time communication
Ollama for local LLM serving

Experience emotional conversations with your own cloned voice! 🎭🎤

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
realtime_emotion.py		realtime_emotion.py
reference_voice.wav		reference_voice.wav
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Chatterbox FastRTC Realtime Emotion (Local)

✨ Features

🛠️ Installation

Prerequisites

Setup

🎮 Usage

Start the Application

Access the Interfaces

Voice Cloning Setup

⚙️ Technical Details

Emotion Parameters

Performance Requirements

Architecture

🎯 How It Works

🔧 Configuration

Emotion Tuning

Model Settings

📝 Requirements

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Languages

dwain-barnes/chatterbox-fastrtc-realtime-emotion

Folders and files

Latest commit

History

Repository files navigation

Chatterbox FastRTC Realtime Emotion (Local)

✨ Features

🛠️ Installation

Prerequisites

Setup

🎮 Usage

Start the Application

Access the Interfaces

Voice Cloning Setup

⚙️ Technical Details

Emotion Parameters

Performance Requirements

Architecture

🎯 How It Works

🔧 Configuration

Emotion Tuning

Model Settings

📝 Requirements

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages