Offline-Voice-LLM-assistant

This project is an experiment in running small but capable language models entirely offline on your own laptop. The goal is to create a private, local AI assistant that does not send your data anywhere, with full control over the model and the process — no cloud, no telemetry, and no limitations.

You can use it to interact with your own compact assistant model that remembers context, follows system instructions, and generates helpful responses — all locally on your machine.

For convenience, I added a voice input and simple GUI to communicate faster and more productive.

🔥 DEMO

🔒 Why?

🛡️ Privacy-first: Your chats stay on your device
💡 Hackable: Full control over generation, logic, prompts
🧪 Experimental: Try different models, quantizations, and runtimes
💻 Runs on consumer GPUs
😂 TO HAVE FUN!

🤖 Models Used

I use HuggingFaceTB/SmolLM3-3B — a compact 3B parameter chat-tuned model. It is licensed under the Apache License 2.0.

The full text of the Apache License 2.0 is included in the LICENSE file in this repository.

Massive thanks and respect to the authors of SmolLM3 and the HuggingFace community for making these tools open and accessible!

As for voice recognition, vosk is super option to make all the things quick and with high quality!

🚀 Setup Instructions

1. Clone the Repo and Create a Virtual Environment

python -m venv assistant_venv
# On Windows:
assistant_venv\Scripts\activate

2. Install Dependencies (for CUDA 12.4 and Torch 2.5.1)

# for llm
pip3 install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
pip3 install -U transformers accelerate bitsandbytes
# for voice recognition 
pip install vosk sounddevice

Also, don't forget to download the vosk model and put it into project/voice/models. I used the fastest version, but you can play with different one.

3. 💬 Usage

Run the assistant with context and a system prompt using the included script.

Features:

Maintains full chat history (context)
Supports a system role message
Uses 4-bit quantized loading with bitsandbytes
Optimized for fast response time
Supports voice input
Supports simple GUI

To chat interactively, run:

python local_chat.py

or if you prefer GUI-like bot:

python local_chat_gui.py

🧪 Optional: Further Optimization

If you want to push performance further on Linux, you can explore:

✅ exllamav2 - extremely fast inference engine for quantized LLMs.

Requirements: CUDA Toolkit 12.4 installed locally and set as an environment variable.

✅ vLLM - high-performance LLM inference engine with paged attention. Ideal for serving multiple prompts or streaming.

👋 Final Words

This is a lightweight personal assistant that respects your data and your control. Perfect for:

Offline Q&A
Personal journaling or reminders
Local experiments with new models
Happy tinkering — and thanks again to all open-source LLM developers! 😍😍😍

🌟 Like it? Star it!

If this little offline AI assistant made you smile — don’t forget to smash that ⭐️ button!

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
chat		chat
demo		demo
voice		voice
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
local_chat.py		local_chat.py
local_chat_gui.py		local_chat_gui.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Offline-Voice-LLM-assistant

🔥 DEMO

🔒 Why?

🤖 Models Used

🚀 Setup Instructions

1. Clone the Repo and Create a Virtual Environment

2. Install Dependencies (for CUDA 12.4 and Torch 2.5.1)

3. 💬 Usage

🧪 Optional: Further Optimization

👋 Final Words

🌟 Like it? Star it!

About

Uh oh!

Releases 2

Languages

License

Vitgracer/Offline-Voice-LLM-Assistant

Folders and files

Latest commit

History

Repository files navigation

Offline-Voice-LLM-assistant

🔥 DEMO

🔒 Why?

🤖 Models Used

🚀 Setup Instructions

1. Clone the Repo and Create a Virtual Environment

2. Install Dependencies (for CUDA 12.4 and Torch 2.5.1)

3. 💬 Usage

🧪 Optional: Further Optimization

👋 Final Words

🌟 Like it? Star it!

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Languages