AI Voice Assistant

A sophisticated voice-based AI assistant that operates entirely locally, integrating speech-to-text, text-to-speech, and large language model capabilities without relying on cloud services.

Key Components

Speech Recognition: Powered by OpenAI's Whisper model for accurate speech-to-text conversion
Voice Synthesis: Implements Coqui TTS for natural-sounding text-to-speech responses
Language Processing: Connects with Ollama to run large language models locally
User Interface: Features an intuitive Gradio-based interface

Architecture

System Requirements

Python 3.12 or newer
8GB RAM minimum (16GB recommended)
Storage: 2GB for base models
NVIDIA GPU recommended for optimal performance

Installation

1. Environment Setup

Create and activate a Python virtual environment:

# Windows
python -m venv venv
.\venv\Scripts\activate

# macOS/Linux
python -m venv venv
source venv/bin/activate

# Ubuntu
python -m venv venv
source venv/bin/activate

2. Dependencies Installation

# Core dependencies
pip install -U openai-whisper coqui-tts sounddevice soundfile gradio

# For NVIDIA GPU acceleration
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

3. Ollama Setup

Ensure Ollama is installed on your system for local LLM functionality.

# Pull a recommended model
ollama pull gemma3:1b

Usage

Launch the application:

python main.py

The Gradio interface will start locally and can be accessed via your web browser.

Features

Speech Processing

Offline Speech Recognition: Transcribe voice input without internet connectivity
Natural Voice Output: Generate human-like speech responses
Voice Customization: Multiple voice options available through Coqui TTS models

AI Capabilities

Contextual Understanding: Maintains conversation history for coherent interactions
Local Processing: All data remains on your device for enhanced privacy
Extensible Architecture: Easily integrate additional models or functionality

Video Example

assistant-example.mp4

Troubleshooting

For common issues, see our troubleshooting guide.

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
modules		modules
videos		videos
.gitignore		.gitignore
LICENSE		LICENSE
Readme.md		Readme.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI Voice Assistant

Key Components

Architecture

System Requirements

Installation

1. Environment Setup

2. Dependencies Installation

3. Ollama Setup

Usage

Features

Speech Processing

AI Capabilities

Video Example

Troubleshooting

Contributing

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

mrandiw/ai-voice-assistant

Folders and files

Latest commit

History

Repository files navigation

AI Voice Assistant

Key Components

Architecture

System Requirements

Installation

1. Environment Setup

2. Dependencies Installation

3. Ollama Setup

Usage

Features

Speech Processing

AI Capabilities

Video Example

Troubleshooting

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages