Skip to content

A professional voice assistant system powered by Google's Gemini 2.5 Flash model, featuring both standalone voice interaction and real-time telephony integration with Asterisk ARI and Gemini Live API.

Notifications You must be signed in to change notification settings

deepakchaudharigit/NPCL-Asterisk-ARI-Assistant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

7 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿค– Voice Assistant with Gemini 2.5 Flash & Real-time Live API

A professional voice assistant system powered by Google's Gemini 2.5 Flash model, featuring both standalone voice interaction and real-time telephony integration with Asterisk ARI and Gemini Live API.

โœจ What's New in Version 2.0

  • ๐Ÿ”„ Migrated from OpenAI to Gemini 2.5 Flash: More efficient and cost-effective AI responses
  • ๐ŸŽ† NEW: Real-time Gemini Live API Integration: Direct voice-to-voice conversation with ultra-low latency
  • ๐Ÿ“ก NEW: Asterisk ARI with externalMedia: Bidirectional audio streaming for telephony integration
  • ๐ŸŽค NEW: Voice Activity Detection: Intelligent interruption handling for natural conversations
  • ๐Ÿ”Š NEW: slin16 Audio Format: Optimized for Asterisk with 16-bit signed linear PCM at 16kHz
  • ๐Ÿข Professional Architecture: Complete restructure with modular design
  • ๐Ÿ“ฆ Package Structure: Proper Python package with clear separation of concerns
  • ๐Ÿ”ง Enhanced Configuration: Pydantic-based settings management
  • ๐Ÿ“ˆ Better Logging: Comprehensive logging and error handling
  • ๐Ÿงช Test Coverage: Unit tests and testing framework
  • ๐Ÿ“š Documentation: Complete documentation and setup guides

๐Ÿš€ Quick Start

# 1. Activate virtual environment
.venv\Scripts\activate  # Windows
# source .venv/bin/activate  # Linux/Mac

# 2. Install dependencies
pip install -r requirements.txt

# 3. Configure API key
cp .env.example .env
# Edit .env and add your Google API key

# 4. Run the voice assistant
python src/main.py

๐Ÿ“ Professional Project Structure

voice_assistant_ari_llm/
โ”œโ”€โ”€ src/                           # ๐ŸŽฏ Source code
โ”‚   โ”œโ”€โ”€ voice_assistant/           # ๐Ÿ“ฆ Main package
โ”‚   โ”‚   โ”œโ”€โ”€ core/                  # ๐Ÿง  Core assistant logic
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ assistant.py       # Main VoiceAssistant class
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ conversation.py    # Conversation management
โ”‚   โ”‚   โ”œโ”€โ”€ ai/                    # ๐Ÿค– AI integration
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ gemini_client.py   # Gemini 2.5 Flash client
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ prompts.py         # System prompts
โ”‚   โ”‚   โ”œโ”€โ”€ audio/                 # ๐ŸŽต Audio processing
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ speech_recognition.py  # Speech-to-text
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ text_to_speech.py      # Text-to-speech
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ audio_utils.py         # Audio utilities
โ”‚   โ”‚   โ”œโ”€โ”€ telephony/             # ๐Ÿ“ž Telephony integration
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ ari_handler.py     # Asterisk ARI handler
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ call_manager.py    # Call management
โ”‚   โ”‚   โ””โ”€โ”€ utils/                 # ๐Ÿ› ๏ธ Utilities
โ”‚   โ”‚       โ”œโ”€โ”€ logger.py          # Logging configuration
โ”‚   โ”‚       โ””โ”€โ”€ exceptions.py      # Custom exceptions
โ”‚   โ””โ”€โ”€ main.py                    # ๐Ÿš€ Main entry point
โ”œโ”€โ”€ config/                        # โš™๏ธ Configuration
โ”‚   โ”œโ”€โ”€ settings.py                # Pydantic settings
โ”‚   โ””โ”€โ”€ environment.py             # Environment management
โ”œโ”€โ”€ tests/                         # ๐Ÿงช Test suite
โ”‚   โ”œโ”€โ”€ test_ai/                   # AI component tests
โ”‚   โ”œโ”€โ”€ test_audio/                # Audio component tests
โ”‚   โ””โ”€โ”€ test_core/                 # Core logic tests
โ”œโ”€โ”€ docs/                          # ๐Ÿ“š Documentation
โ”‚   โ”œโ”€โ”€ README.md                  # Detailed documentation
โ”‚   โ”œโ”€โ”€ API.md                     # API reference
โ”‚   โ””โ”€โ”€ SETUP.md                   # Setup instructions
โ”œโ”€โ”€ scripts/                       # ๐Ÿ“œ Utility scripts
โ”‚   โ”œโ”€โ”€ run_assistant.py           # Simple run script
โ”‚   โ””โ”€โ”€ setup.py                   # Setup utilities
โ”œโ”€โ”€ asterisk-config/               # ๐Ÿ“ž Asterisk configuration
โ”œโ”€โ”€ sounds/                        # ๐Ÿ”Š Audio files
โ”œโ”€โ”€ requirements.txt               # ๐Ÿ“‹ Dependencies
โ”œโ”€โ”€ .env.example                   # ๐Ÿ“ Environment template
โ””โ”€โ”€ README.md                      # ๐Ÿ“– This file

๐ŸŽฏ Key Features

๐Ÿค– AI-Powered with Gemini 2.5 Flash

  • Latest Model: Uses Google's Gemini 2.5 Flash for intelligent responses
  • Cost Efficient: More affordable than previous OpenAI integration
  • Fast Responses: Optimized for real-time conversation
  • Fallback System: Graceful handling of API failures

๐ŸŽค Professional Audio Processing

  • Speech Recognition: Google Speech Recognition for accurate voice input
  • Text-to-Speech: Google TTS with standard voice for clear output
  • Audio Utils: Comprehensive audio processing utilities
  • Real-time Processing: Low-latency audio handling

๐Ÿ“ž Telephony Integration

  • Asterisk ARI: Full integration with Asterisk PBX
  • Call Management: Handle incoming/outgoing calls
  • Real-time Audio: Process phone conversations in real-time
  • Multi-channel: Support multiple concurrent calls

๐Ÿ—๏ธ Professional Architecture

  • Modular Design: Clean separation of concerns
  • Type Safety: Full type hints throughout
  • Error Handling: Comprehensive exception management
  • Logging: Structured logging with configurable levels
  • Testing: Unit tests and testing framework

๐Ÿ› ๏ธ Installation & Setup

Prerequisites

  • Python 3.8+
  • Google API key (free tier available)
  • Microphone and speakers
  • (Optional) Asterisk PBX for telephony

Detailed Setup

  1. Environment Setup:

    # Ensure virtual environment is active
    .venv\Scripts\activate
    
    # Verify Python version
    python --version  # Should be 3.8+
  2. Install Dependencies:

    pip install -r requirements.txt
  3. Get Google API Key:

    • Visit Google AI Studio
    • Sign in and create a new API key
    • Copy the key for configuration
  4. Configure Environment:

    cp .env.example .env
    # Edit .env and set GOOGLE_API_KEY=your-key-here
  5. Test Installation:

    python src/main.py
    pytest -v [for all testcases]

๐ŸŽฎ Usage

๐ŸŽ† Real-time Telephony Integration (NEW!)

The flagship feature - real-time conversational AI through phone calls:

# Quick start with real-time integration
./start_realtime.sh

# Or manually
python src/run_realtime_server.py

Real-time Features:

  • ๐Ÿ“ก Bidirectional Audio Streaming: Direct WebSocket audio with Asterisk externalMedia
  • ๐ŸŽค Voice Activity Detection: Intelligent speech start/stop detection
  • โšก Ultra-low Latency: Direct Gemini Live API integration
  • ๐Ÿ”„ Interruption Handling: Natural conversation flow with mid-response interruptions
  • ๐Ÿ”Š slin16 Format: Optimized 16-bit signed linear PCM at 16kHz
  • ๐Ÿ“ˆ Session Management: Complete conversation state tracking

Test Extensions:

  • 1000: Main Gemini Voice Assistant (full real-time integration)
  • 1001: External Media Test (direct WebSocket audio)
  • 1002: Basic Audio Test (echo and playback)

Standalone Voice Assistant

python src/main.py

Features:

  • ๐ŸŽค Voice input with timeout handling
  • ๐Ÿง  AI processing with Gemini 2.5 Flash
  • ๐Ÿ—ฃ๏ธ Speech output with Google TTS
  • ๐Ÿ“Š Real-time status updates
  • ๐Ÿ“ˆ Session statistics

Voice Commands

  • Normal conversation: Speak naturally after "๐ŸŽค Listening"
  • Exit: Say "quit", "exit", "goodbye", or "bye"
  • Force quit: Press Ctrl+C

Legacy Telephony Mode

For basic phone-based interactions:

# Start basic ARI handler
uvicorn src.voice_assistant.telephony.ari_handler:create_ari_app --host 0.0.0.0 --port 8000

# Configure Asterisk to send events to your handler
# See asterisk-config/ for configuration examples

โš™๏ธ Configuration

๐ŸŽ† Real-time Integration Setup

For the real-time Gemini Live API integration:

# Run automated setup
python scripts/setup_realtime.py

# This will:
# - Check environment requirements
# - Validate configuration
# - Create required directories
# - Test connections
# - Generate startup scripts

Quick Setup:

  1. Copy .env.example to .env
  2. Set your GOOGLE_API_KEY
  3. Configure Asterisk (copy asterisk-config/*)
  4. Run ./start_realtime.sh

๐Ÿ“š Detailed Setup Guide: See docs/REALTIME_SETUP.md

Environment Variables

# Required
GOOGLE_API_KEY=your-google-api-key-here

# AI Settings
GEMINI_MODEL=gemini-2.5-flash
GEMINI_LIVE_MODEL=gemini-2.0-flash-exp
GEMINI_VOICE=Puck
MAX_TOKENS=150
TEMPERATURE=0.7

# Real-time Audio Settings
AUDIO_FORMAT=slin16
AUDIO_SAMPLE_RATE=16000
AUDIO_CHUNK_SIZE=320
AUDIO_BUFFER_SIZE=1600

# Voice Activity Detection
VAD_ENERGY_THRESHOLD=300
VAD_SILENCE_THRESHOLD=0.5
VAD_SPEECH_THRESHOLD=0.1

# Assistant Settings
ASSISTANT_NAME=ARI
VOICE_LANGUAGE=en
LISTEN_TIMEOUT=20.0
PHRASE_TIME_LIMIT=15.0

# Audio Settings
VOICE_VOLUME=0.9

# Logging
LOG_LEVEL=INFO
# LOG_FILE=logs/assistant.log  # Optional file logging

# Asterisk ARI Configuration
ARI_BASE_URL=http://localhost:8088/ari
ARI_USERNAME=asterisk
ARI_PASSWORD=1234
STASIS_APP=gemini-voice-assistant

# External Media Configuration
EXTERNAL_MEDIA_HOST=localhost
EXTERNAL_MEDIA_PORT=8090

# Real-time Processing
ENABLE_INTERRUPTION_HANDLING=true
MAX_CALL_DURATION=3600
AUTO_ANSWER_CALLS=true

๐Ÿ”ง Development

Architecture Overview

The system follows a clean, modular architecture:

  1. Core Layer: Main assistant logic and conversation management
  2. AI Layer: Gemini integration and response generation
  3. Audio Layer: Speech recognition and text-to-speech
  4. Telephony Layer: Asterisk ARI integration
  5. Utils Layer: Logging, exceptions, and utilities
  6. Config Layer: Settings and environment management

Adding Features

The modular design makes it easy to extend:

# Add new AI provider
from voice_assistant.ai.base_client import BaseAIClient

class NewAIClient(BaseAIClient):
    def generate_response(self, text: str) -> str:
        # Your implementation
        pass

# Add new audio processor
from voice_assistant.audio.base_processor import BaseAudioProcessor

class NewAudioProcessor(BaseAudioProcessor):
    def process_audio(self, audio_data: bytes) -> str:
        # Your implementation
        pass

Running Tests

# Run all tests
pytest tests/

# Run with coverage
pytest tests/ --cov=src/voice_assistant --cov-report=html

# Run specific test file
pytest tests/test_gemini_client.py -v

๐Ÿ“Š Monitoring & Logging

๐ŸŽ† Real-time System Monitoring

The real-time integration provides comprehensive monitoring:

API Endpoints:

  • System Status: GET http://localhost:8000/status
  • Active Calls: GET http://localhost:8000/calls
  • Call Details: GET http://localhost:8000/calls/{channel_id}
  • Health Check: GET http://localhost:8000/health
  • API Documentation: GET http://localhost:8000/docs

Real-time Metrics:

  • Audio Processing: Latency, buffer sizes, packet counts
  • Session Management: Active sessions, conversation turns, duration
  • Voice Activity: Speech detection accuracy, interruption handling
  • Gemini Live API: Connection status, response times, error rates
  • External Media: WebSocket connections, audio quality metrics

Traditional Monitoring

The assistant provides comprehensive monitoring:

  • Real-time Status: State changes and processing updates
  • Conversation Metrics: Success rates and response times
  • Error Tracking: Detailed error logs and fallback handling
  • Performance Stats: Session duration and interaction counts

Example output:

๐Ÿค– Voice Assistant with Gemini 2.5 Flash
============================================================
โœ… System Information:
   Assistant Name: ARI
   AI Model: gemini-2.5-flash
   Voice Language: en
   Listen Timeout: 20.0s
โœ… Virtual environment: Active
โœ… Configuration: .env file found
โœ… Google API Key: Configured

[๐Ÿ’ค Ready - Waiting for input]
[๐ŸŽค Listening - Speak now]
๐Ÿ‘ค You: Hello, how are you?
[๐Ÿง  Processing - Thinking...]
[๐Ÿ—ฃ๏ธ Speaking - Response ready]
๐Ÿค– Assistant: Hello! I'm doing great, thank you for asking. I'm ARI, your voice assistant powered by Gemini 2.5 Flash. How can I help you today?

๐Ÿšจ Troubleshooting

Common Issues

  1. "Google API key is required":

    • Check .env file exists and contains GOOGLE_API_KEY
    • Verify API key is valid and has proper permissions
  2. Microphone not detected:

    • Check microphone permissions in system settings
    • Try: pip install pyaudio for better microphone support
    • Test with different microphone devices
  3. Audio playback issues:

    • Verify speakers/headphones are connected
    • Check system audio settings
    • Try different audio output devices
  4. Import errors:

    • Ensure virtual environment is activated
    • Run: pip install -r requirements.txt
    • Check Python version (3.8+ required)

Debug Mode

Enable detailed logging:

# Set in .env file
LOG_LEVEL=DEBUG

# Or set environment variable
export LOG_LEVEL=DEBUG  # Linux/Mac
set LOG_LEVEL=DEBUG     # Windows

๐Ÿ†š Migration from Previous Version

If upgrading from the old OpenAI-based version:

  1. Backup your data: Save any important configurations
  2. Update dependencies: pip install -r requirements.txt
  3. Update environment: Replace OPENAI_API_KEY with GOOGLE_API_KEY
  4. Test functionality: Run python src/main.py to verify

Key Changes

  • โŒ Removed: OpenAI dependency and API key
  • โœ… Added: Google Generative AI (Gemini 2.5 Flash)
  • ๐Ÿ”„ Updated: Professional project structure
  • ๐Ÿ“ˆ Improved: Error handling and logging
  • ๐Ÿงช Added: Test suite and documentation

๐Ÿ“ License

This project is licensed under the MIT License.

๐Ÿค Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Make your changes with proper tests
  4. Commit: git commit -m 'Add amazing feature'
  5. Push: git push origin feature/amazing-feature
  6. Open a Pull Request

๐Ÿ“ž Support

  • ๐Ÿ“š Documentation: Check docs/README.md for detailed guides
  • ๐Ÿ› Issues: Report bugs on GitHub Issues
  • ๐Ÿ’ก Features: Request features on GitHub Discussions
  • ๐Ÿ“ง Contact: Open an issue for support questions

๐ŸŽ‰ Ready to start talking to your AI assistant!

Run python src/main.py and start your conversation with Gemini 2.5 Flash! ๐Ÿš€

About

A professional voice assistant system powered by Google's Gemini 2.5 Flash model, featuring both standalone voice interaction and real-time telephony integration with Asterisk ARI and Gemini Live API.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages