A professional voice assistant system powered by Google's Gemini 2.5 Flash model, featuring both standalone voice interaction and real-time telephony integration with Asterisk ARI and Gemini Live API.
- ๐ Migrated from OpenAI to Gemini 2.5 Flash: More efficient and cost-effective AI responses
- ๐ NEW: Real-time Gemini Live API Integration: Direct voice-to-voice conversation with ultra-low latency
- ๐ก NEW: Asterisk ARI with externalMedia: Bidirectional audio streaming for telephony integration
- ๐ค NEW: Voice Activity Detection: Intelligent interruption handling for natural conversations
- ๐ NEW: slin16 Audio Format: Optimized for Asterisk with 16-bit signed linear PCM at 16kHz
- ๐ข Professional Architecture: Complete restructure with modular design
- ๐ฆ Package Structure: Proper Python package with clear separation of concerns
- ๐ง Enhanced Configuration: Pydantic-based settings management
- ๐ Better Logging: Comprehensive logging and error handling
- ๐งช Test Coverage: Unit tests and testing framework
- ๐ Documentation: Complete documentation and setup guides
# 1. Activate virtual environment
.venv\Scripts\activate # Windows
# source .venv/bin/activate # Linux/Mac
# 2. Install dependencies
pip install -r requirements.txt
# 3. Configure API key
cp .env.example .env
# Edit .env and add your Google API key
# 4. Run the voice assistant
python src/main.py
voice_assistant_ari_llm/
โโโ src/ # ๐ฏ Source code
โ โโโ voice_assistant/ # ๐ฆ Main package
โ โ โโโ core/ # ๐ง Core assistant logic
โ โ โ โโโ assistant.py # Main VoiceAssistant class
โ โ โ โโโ conversation.py # Conversation management
โ โ โโโ ai/ # ๐ค AI integration
โ โ โ โโโ gemini_client.py # Gemini 2.5 Flash client
โ โ โ โโโ prompts.py # System prompts
โ โ โโโ audio/ # ๐ต Audio processing
โ โ โ โโโ speech_recognition.py # Speech-to-text
โ โ โ โโโ text_to_speech.py # Text-to-speech
โ โ โ โโโ audio_utils.py # Audio utilities
โ โ โโโ telephony/ # ๐ Telephony integration
โ โ โ โโโ ari_handler.py # Asterisk ARI handler
โ โ โ โโโ call_manager.py # Call management
โ โ โโโ utils/ # ๐ ๏ธ Utilities
โ โ โโโ logger.py # Logging configuration
โ โ โโโ exceptions.py # Custom exceptions
โ โโโ main.py # ๐ Main entry point
โโโ config/ # โ๏ธ Configuration
โ โโโ settings.py # Pydantic settings
โ โโโ environment.py # Environment management
โโโ tests/ # ๐งช Test suite
โ โโโ test_ai/ # AI component tests
โ โโโ test_audio/ # Audio component tests
โ โโโ test_core/ # Core logic tests
โโโ docs/ # ๐ Documentation
โ โโโ README.md # Detailed documentation
โ โโโ API.md # API reference
โ โโโ SETUP.md # Setup instructions
โโโ scripts/ # ๐ Utility scripts
โ โโโ run_assistant.py # Simple run script
โ โโโ setup.py # Setup utilities
โโโ asterisk-config/ # ๐ Asterisk configuration
โโโ sounds/ # ๐ Audio files
โโโ requirements.txt # ๐ Dependencies
โโโ .env.example # ๐ Environment template
โโโ README.md # ๐ This file
- Latest Model: Uses Google's Gemini 2.5 Flash for intelligent responses
- Cost Efficient: More affordable than previous OpenAI integration
- Fast Responses: Optimized for real-time conversation
- Fallback System: Graceful handling of API failures
- Speech Recognition: Google Speech Recognition for accurate voice input
- Text-to-Speech: Google TTS with standard voice for clear output
- Audio Utils: Comprehensive audio processing utilities
- Real-time Processing: Low-latency audio handling
- Asterisk ARI: Full integration with Asterisk PBX
- Call Management: Handle incoming/outgoing calls
- Real-time Audio: Process phone conversations in real-time
- Multi-channel: Support multiple concurrent calls
- Modular Design: Clean separation of concerns
- Type Safety: Full type hints throughout
- Error Handling: Comprehensive exception management
- Logging: Structured logging with configurable levels
- Testing: Unit tests and testing framework
- Python 3.8+
- Google API key (free tier available)
- Microphone and speakers
- (Optional) Asterisk PBX for telephony
-
Environment Setup:
# Ensure virtual environment is active .venv\Scripts\activate # Verify Python version python --version # Should be 3.8+
-
Install Dependencies:
pip install -r requirements.txt
-
Get Google API Key:
- Visit Google AI Studio
- Sign in and create a new API key
- Copy the key for configuration
-
Configure Environment:
cp .env.example .env # Edit .env and set GOOGLE_API_KEY=your-key-here
-
Test Installation:
python src/main.py pytest -v [for all testcases]
The flagship feature - real-time conversational AI through phone calls:
# Quick start with real-time integration
./start_realtime.sh
# Or manually
python src/run_realtime_server.py
Real-time Features:
- ๐ก Bidirectional Audio Streaming: Direct WebSocket audio with Asterisk externalMedia
- ๐ค Voice Activity Detection: Intelligent speech start/stop detection
- โก Ultra-low Latency: Direct Gemini Live API integration
- ๐ Interruption Handling: Natural conversation flow with mid-response interruptions
- ๐ slin16 Format: Optimized 16-bit signed linear PCM at 16kHz
- ๐ Session Management: Complete conversation state tracking
Test Extensions:
- 1000: Main Gemini Voice Assistant (full real-time integration)
- 1001: External Media Test (direct WebSocket audio)
- 1002: Basic Audio Test (echo and playback)
python src/main.py
Features:
- ๐ค Voice input with timeout handling
- ๐ง AI processing with Gemini 2.5 Flash
- ๐ฃ๏ธ Speech output with Google TTS
- ๐ Real-time status updates
- ๐ Session statistics
- Normal conversation: Speak naturally after "๐ค Listening"
- Exit: Say "quit", "exit", "goodbye", or "bye"
- Force quit: Press Ctrl+C
For basic phone-based interactions:
# Start basic ARI handler
uvicorn src.voice_assistant.telephony.ari_handler:create_ari_app --host 0.0.0.0 --port 8000
# Configure Asterisk to send events to your handler
# See asterisk-config/ for configuration examples
For the real-time Gemini Live API integration:
# Run automated setup
python scripts/setup_realtime.py
# This will:
# - Check environment requirements
# - Validate configuration
# - Create required directories
# - Test connections
# - Generate startup scripts
Quick Setup:
- Copy
.env.example
to.env
- Set your
GOOGLE_API_KEY
- Configure Asterisk (copy
asterisk-config/*
) - Run
./start_realtime.sh
๐ Detailed Setup Guide: See docs/REALTIME_SETUP.md
# Required
GOOGLE_API_KEY=your-google-api-key-here
# AI Settings
GEMINI_MODEL=gemini-2.5-flash
GEMINI_LIVE_MODEL=gemini-2.0-flash-exp
GEMINI_VOICE=Puck
MAX_TOKENS=150
TEMPERATURE=0.7
# Real-time Audio Settings
AUDIO_FORMAT=slin16
AUDIO_SAMPLE_RATE=16000
AUDIO_CHUNK_SIZE=320
AUDIO_BUFFER_SIZE=1600
# Voice Activity Detection
VAD_ENERGY_THRESHOLD=300
VAD_SILENCE_THRESHOLD=0.5
VAD_SPEECH_THRESHOLD=0.1
# Assistant Settings
ASSISTANT_NAME=ARI
VOICE_LANGUAGE=en
LISTEN_TIMEOUT=20.0
PHRASE_TIME_LIMIT=15.0
# Audio Settings
VOICE_VOLUME=0.9
# Logging
LOG_LEVEL=INFO
# LOG_FILE=logs/assistant.log # Optional file logging
# Asterisk ARI Configuration
ARI_BASE_URL=http://localhost:8088/ari
ARI_USERNAME=asterisk
ARI_PASSWORD=1234
STASIS_APP=gemini-voice-assistant
# External Media Configuration
EXTERNAL_MEDIA_HOST=localhost
EXTERNAL_MEDIA_PORT=8090
# Real-time Processing
ENABLE_INTERRUPTION_HANDLING=true
MAX_CALL_DURATION=3600
AUTO_ANSWER_CALLS=true
The system follows a clean, modular architecture:
- Core Layer: Main assistant logic and conversation management
- AI Layer: Gemini integration and response generation
- Audio Layer: Speech recognition and text-to-speech
- Telephony Layer: Asterisk ARI integration
- Utils Layer: Logging, exceptions, and utilities
- Config Layer: Settings and environment management
The modular design makes it easy to extend:
# Add new AI provider
from voice_assistant.ai.base_client import BaseAIClient
class NewAIClient(BaseAIClient):
def generate_response(self, text: str) -> str:
# Your implementation
pass
# Add new audio processor
from voice_assistant.audio.base_processor import BaseAudioProcessor
class NewAudioProcessor(BaseAudioProcessor):
def process_audio(self, audio_data: bytes) -> str:
# Your implementation
pass
# Run all tests
pytest tests/
# Run with coverage
pytest tests/ --cov=src/voice_assistant --cov-report=html
# Run specific test file
pytest tests/test_gemini_client.py -v
The real-time integration provides comprehensive monitoring:
API Endpoints:
- System Status:
GET http://localhost:8000/status
- Active Calls:
GET http://localhost:8000/calls
- Call Details:
GET http://localhost:8000/calls/{channel_id}
- Health Check:
GET http://localhost:8000/health
- API Documentation:
GET http://localhost:8000/docs
Real-time Metrics:
- Audio Processing: Latency, buffer sizes, packet counts
- Session Management: Active sessions, conversation turns, duration
- Voice Activity: Speech detection accuracy, interruption handling
- Gemini Live API: Connection status, response times, error rates
- External Media: WebSocket connections, audio quality metrics
The assistant provides comprehensive monitoring:
- Real-time Status: State changes and processing updates
- Conversation Metrics: Success rates and response times
- Error Tracking: Detailed error logs and fallback handling
- Performance Stats: Session duration and interaction counts
Example output:
๐ค Voice Assistant with Gemini 2.5 Flash
============================================================
โ
System Information:
Assistant Name: ARI
AI Model: gemini-2.5-flash
Voice Language: en
Listen Timeout: 20.0s
โ
Virtual environment: Active
โ
Configuration: .env file found
โ
Google API Key: Configured
[๐ค Ready - Waiting for input]
[๐ค Listening - Speak now]
๐ค You: Hello, how are you?
[๐ง Processing - Thinking...]
[๐ฃ๏ธ Speaking - Response ready]
๐ค Assistant: Hello! I'm doing great, thank you for asking. I'm ARI, your voice assistant powered by Gemini 2.5 Flash. How can I help you today?
-
"Google API key is required":
- Check
.env
file exists and containsGOOGLE_API_KEY
- Verify API key is valid and has proper permissions
- Check
-
Microphone not detected:
- Check microphone permissions in system settings
- Try:
pip install pyaudio
for better microphone support - Test with different microphone devices
-
Audio playback issues:
- Verify speakers/headphones are connected
- Check system audio settings
- Try different audio output devices
-
Import errors:
- Ensure virtual environment is activated
- Run:
pip install -r requirements.txt
- Check Python version (3.8+ required)
Enable detailed logging:
# Set in .env file
LOG_LEVEL=DEBUG
# Or set environment variable
export LOG_LEVEL=DEBUG # Linux/Mac
set LOG_LEVEL=DEBUG # Windows
If upgrading from the old OpenAI-based version:
- Backup your data: Save any important configurations
- Update dependencies:
pip install -r requirements.txt
- Update environment: Replace
OPENAI_API_KEY
withGOOGLE_API_KEY
- Test functionality: Run
python src/main.py
to verify
- โ Removed: OpenAI dependency and API key
- โ Added: Google Generative AI (Gemini 2.5 Flash)
- ๐ Updated: Professional project structure
- ๐ Improved: Error handling and logging
- ๐งช Added: Test suite and documentation
This project is licensed under the MIT License.
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature
- Make your changes with proper tests
- Commit:
git commit -m 'Add amazing feature'
- Push:
git push origin feature/amazing-feature
- Open a Pull Request
- ๐ Documentation: Check
docs/README.md
for detailed guides - ๐ Issues: Report bugs on GitHub Issues
- ๐ก Features: Request features on GitHub Discussions
- ๐ง Contact: Open an issue for support questions
๐ Ready to start talking to your AI assistant!
Run python src/main.py
and start your conversation with Gemini 2.5 Flash! ๐