Skip to content

A sophisticated voice-based AI assistant that operates entirely locally, integrating speech-to-text, text-to-speech, and large language model capabilities without relying on cloud services.

License

Notifications You must be signed in to change notification settings

mrandiw/ai-voice-assistant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python ChatGPT PythonAnywhere BuyMeACoffee

AI Voice Assistant

A sophisticated voice-based AI assistant that operates entirely locally, integrating speech-to-text, text-to-speech, and large language model capabilities without relying on cloud services.

Key Components

  • Speech Recognition: Powered by OpenAI's Whisper model for accurate speech-to-text conversion
  • Voice Synthesis: Implements Coqui TTS for natural-sounding text-to-speech responses
  • Language Processing: Connects with Ollama to run large language models locally
  • User Interface: Features an intuitive Gradio-based interface

Architecture

AI Voice Assistant

System Requirements

  • Python 3.12 or newer
  • 8GB RAM minimum (16GB recommended)
  • Storage: 2GB for base models
  • NVIDIA GPU recommended for optimal performance

Installation

1. Environment Setup

Create and activate a Python virtual environment:

# Windows
python -m venv venv
.\venv\Scripts\activate

# macOS/Linux
python -m venv venv
source venv/bin/activate

# Ubuntu
python -m venv venv
source venv/bin/activate

2. Dependencies Installation

# Core dependencies
pip install -U openai-whisper coqui-tts sounddevice soundfile gradio

# For NVIDIA GPU acceleration
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 

3. Ollama Setup

Ensure Ollama is installed on your system for local LLM functionality.

# Pull a recommended model
ollama pull gemma3:1b

Usage

Launch the application:

python main.py

The Gradio interface will start locally and can be accessed via your web browser.

Features

Speech Processing

  • Offline Speech Recognition: Transcribe voice input without internet connectivity
  • Natural Voice Output: Generate human-like speech responses
  • Voice Customization: Multiple voice options available through Coqui TTS models

AI Capabilities

  • Contextual Understanding: Maintains conversation history for coherent interactions
  • Local Processing: All data remains on your device for enhanced privacy
  • Extensible Architecture: Easily integrate additional models or functionality

Video Example

assistant-example.mp4

Troubleshooting

For common issues, see our troubleshooting guide.

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

A sophisticated voice-based AI assistant that operates entirely locally, integrating speech-to-text, text-to-speech, and large language model capabilities without relying on cloud services.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages