A production-ready voice agent implementation using LiveKit and Python, featuring advanced conversational AI capabilities and optional telephony integration.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LiveKit Voice Agent Architecture β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Web Client β β Phone System β β Mobile App β
β (Next.js) β β (Twilio) β β (React) β
βββββββββββ¬ββββββββ βββββββββββ¬ββββββββ βββββββββββ¬ββββββββ
β β β
ββββββββββββββββββββββββΌβββββββββββββββββββββββ
β
ββββββββββββββΌβββββββββββββ
β LiveKit Server β
β (WebRTC Gateway) β
ββββββββββββββ¬βββββββββββββ
β
ββββββββββββββΌβββββββββββββ
β Voice Pipeline Agent β
β β
β βββββββββββββββββββ β
β β Turn Detection β β
β β (Silero) β β
β βββββββββββββββββββ β
β β
β βββββββββββββββββββ β
β β Audio Pipeline β β
β β βββββββββββββββ β β
β β β Krisp β β β
β β β (Noise β β β
β β β Cancel) β β β
β β βββββββββββββββ β β
β βββββββββββββββββββ β
ββββββββββββββ¬βββββββββββββ
β
ββββββββββββββββββββββββββΌβββββββββββββββββββββββββ
β β β
βΌ βΌ βΌ
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β Speech-to- β β Language β β Text-to- β
β Text (STT) β β Model (LLM) β β Speech (TTS) β
β β β β β β
β Deepgram API β β OpenAI API β β ElevenLabs β
β β β β β API β
ββββββββ¬ββββββββ ββββββββ¬ββββββββ ββββββββ¬ββββββββ
β β β
β ββββββββββΌβββββββββ β
β β Function Calling β β
β β β β
β β ββββββββββββββββ β β
β β β Weather β β β
β β β Service β β β
β β ββββββββββββββββ β β
β β β β
β β ββββββββββββββββ β β
β β β Clock β β β
β β β Service β β β
β β ββββββββββββββββ β β
β β β β
β β ββββββββββββββββ β β
β β β Custom β β β
β β β Tools β β β
β β ββββββββββββββββ β β
β βββββββββββββββββββ β
β β
βββββββββββββββββββββ βββββββββββββββββββββββ
β β
ββββββββΌββββββΌβββββββ
β Logging & β
β Analytics β
β β
β β’ Usage Metrics β
β β’ Conversation β
β Summaries β
β β’ Performance β
β Monitoring β
βββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Data Flow Process β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1. Audio Input β 2. Noise Cancellation β 3. Speech Detection β 4. STT Processing
β
8. Audio Output β 7. TTS Generation β 6. Response Generation β 5. LLM Processing
β
Function Execution
(Weather, Clock, etc.)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Telephony Integration β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Phone Call β Twilio SIP β LiveKit SIP Gateway β Voice Agent β Response Pipeline
β β
βββββββββββββββββ Audio Response βββββββββββββββββββββββββββββββ
Regional SIP Configuration:
β’ US East: 54.172.60.0, 54.244.51.0
β’ US West: 54.171.127.192, 35.156.191.128
β’ Europe: 54.171.127.200, 35.156.191.140
β’ Asia Pacific: 54.169.127.128, 52.65.191.64
- Users connect via web browsers, mobile apps, or phone calls
- LiveKit server handles WebRTC connections and SIP integration
- Agent automatically detects connection type and optimizes accordingly
- Input: Raw audio from user's microphone or phone
- Noise Cancellation: Krisp AI removes background noise
- Turn Detection: Silero VAD detects when user starts/stops speaking
- Speech-to-Text: Deepgram converts speech to text in real-time
- Language Understanding: OpenAI processes user intent
- Function Calling: Agent can execute tools (weather, time, custom functions)
- Context Management: Maintains conversation history and state
- Text Generation: LLM creates appropriate responses
- Text-to-Speech: ElevenLabs converts text to natural speech
- Audio Delivery: Processed audio sent back to user
- Real-time performance metrics
- Conversation logging and summaries
- Usage analytics and optimization insights
- Intelligent Turn Detection - Natural conversation flow with automatic speech detection
- Function Calling - Extensible tool integration including:
- Weather information retrieval
- Real-time clock functionality
- Comprehensive Logging - Usage analytics and conversation summaries
- Telephony Integration - Inbound call support via Twilio SIP trunking
- Audio Enhancement - Krisp noise cancellation for crystal-clear communication
- Optimized Models - Automatic model switching for telephony vs. web-based interactions
- Python 3.8 or higher
- LiveKit Cloud account or self-hosted LiveKit server
- API keys for required services (OpenAI, ElevenLabs, Deepgram)
- Optional: Twilio account for telephony features
- Clone and navigate to the repository:
git clone https://github.com/danieladdisonorg/livekit-voice-agent.git
cd livekit-voice-agent
- Set up Python environment:
Linux/macOS:
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python3 agent.py download-files
Windows:
python3 -m venv venv
venv\Scripts\activate
pip install -r requirements.txt
python3 agent.py download-files
- Environment Setup: Copy the example environment file and configure your API credentials:
cp .env.example .env.local
-
Required Environment Variables:
LIVEKIT_URL=your_livekit_server_url LIVEKIT_API_KEY=your_api_key LIVEKIT_API_SECRET=your_api_secret OPENAI_API_KEY=your_openai_key ELEVEN_API_KEY=your_elevenlabs_key DEEPGRAM_API_KEY=your_deepgram_key
-
Automated Configuration (Optional): If using LiveKit Cloud, you can auto-configure using the CLI:
lk app env
Start the agent in development mode:
python3 agent.py dev
This agent requires a compatible frontend application. We recommend using the LiveKit Next.js Voice Agent Interface for a complete solution.
Enable inbound phone calls through Twilio SIP integration.
- LiveKit CLI installed and authenticated
- Twilio account with phone number
- SIP trunk configuration
- Install LiveKit CLI (macOS):
brew update && brew install livekit-cli
- Authenticate with LiveKit Cloud:
lk cloud auth
-
Create Twilio Resources:
- Sign up for a Twilio account
- Purchase a phone number
- Create a new SIP trunk in the Twilio Console
-
Configure SIP Trunk:
- Navigate to: Elastic SIP Trunking β SIP Trunks β Create
- Add Origination URI:
<YOUR_LIVEKIT_SIP_URI>;transport=tcp
- Associate your phone number with priority 1, weight 1
-
Deploy LiveKit SIP Configuration:
Create Inbound Trunk:
lk sip inbound create inbound-trunk.json
Create Dispatch Rule:
lk sip dispatch create dispatch-rule.json
Update inbound-trunk.json
with appropriate Twilio SIP signaling IP addresses for your region. The default configuration includes US IP addresses.
- Agent Core - Main conversation logic and state management
- Function Registry - Extensible tool calling system
- Audio Pipeline - Real-time audio processing with noise cancellation
- SIP Integration - Telephony gateway for inbound calls
- Logging System - Comprehensive usage and performance analytics
For issues and questions:
- Check the LiveKit Documentation
- Review existing GitHub issues
- Contact support through your LiveKit Cloud dashboard