t2yLLM : a fast LLM Voice Assistant

💡 What it does

🗣️ speak to your device of choice
Your favorite LLM is ready to answer (here Qwen3 by default)
Or just interact via console or the web interface
You can ask it for weather, to turn lights on or off, to play music on spotify and more
It should just work like any home assistant.

The default keyword to activate speech detection is "Ivy" (openwakeword trained)

requirements :
- one PC equiped with a microphone and a speaker. Tested on :
  - Jabra Speak2
  - Jabra Speak 810
  - Seeed Studio respeaker mic array V2.0
Available plugins :
- Meteo : It can search for meteo infos using your OpenWeather API key
- Pokemon : Look for any Pokemon info using Tyradex API (french) or PokeApi (english, others...)
- Wikipedia : Make Wikipedia searches using the python API
- Vector Search : stores all in a synthetic way in chromadb if needed and can retrieve the memorized info
- yeelight : control lights and create rooms to manage them
- Spotify : ask your LLM to play, pause shuffle your favorite songs (needs a premium account on spotify)

🚀 Quickstart

Backends
Specifics
Pipeline
Parameters
Environment variables
Links
APIs
Examples
Plugins
- Plugins Setup
ToDo

I/ Install

Install pytorch for your cuda version (see https://pytorch.org/get-started/locally/) :
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
for cuda 12.8.
For now on Blackwell GPUs, you need to uninstall current and install nightly :
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128
Install vllm : pip install vllm --extra-index-url https://download.pytorch.org/whl/cu128
if you have a Blackwell GPU , you should follow vllm guidelines :
https://docs.vllm.ai/en/stable/getting_started/installation/gpu.html#use-an-existing-pytorch-installation
```
git clone https://github.com/vllm-project/vllm.git
cd vllm
python use_existing_torch.py
pip install -r requirements/build.txt
pip install --no-build-isolation -e .
```
(Optional) Install flash attention 2 flash-attention
Clone the repository : git clone https://github.com/Saga9103/t2yLLM.git
cd t2yLLM
pip install -e .
(Optional) Create a porcupine account on picovoice
ATM if you want a custom keyword, it is mandatory to create a Picovoice account (check Picovoice), train and download a custom keyword to get a working pipeline.
make sure faster-whisper is up to date
(Optional) Install Caddy (and see in config folder README.md for more information) to access webUI on your local network see caddy configuration

II/ Basic examples

See examples in ./examples on how to use and import
AssistantEngine class receives user prompts (text/str) generated via Faster-Whisper, browses APIs if needed, generates an answer (token by token with the async engine of vllm) and forwards it to the dispatcher. the related python script should be installed on your server/desktop.

the VoiceEngine class runs Faster-Whisper, piperTTS, silero-vad and openwakeword/porcupine. It is responsible for :
- transforming answers generated by llm_backend_async.py to audio chunks (in .flac via piperTTS) and send them over udp to your raspberry Pi.
- getting audio from the raspberry Pi and converting it to text via Faster-Whisper with as low latency as possible. this script should be installed on your server/desktop (by default, the same as llm_backend_async.py but it can be different).

III/ Configuration

the .yaml config files should be used to tweak paramaters like silence, models, directories etc...
You can check models I use in the config files and also in the faster_whisper directory. To make things work on a 16GB gpu it needs quantization. You also need to fully load the llm_back_async.py first via llm_example.py. Should have no problem on 24GB GPUs.
Different parameters of vLLM can be used to save VRAM like enforce_eager, max_model_len etc... vLLM documentation is very rich

Web UI for interactive talk and displaying code or math formulas /examples/llm_example_webui.py :

🔥 Backends

vLLM : really fast and well documented inference pipeline for your favorite LLM
Faster-Whisper : incredibly fast STT for realtime text generation (see model used here : model)
piperTTS : fast text to speech generating natural voice, maybe the best for french atm
Silero-vad : process the audio buffer and prevents whisper hallucinations
Openwakeword : open source keyword detection
pvporcupine : keyword detection
Chromadb : a vector search database that serves as the model memory
default LLM : Qwen3 14B or other variants, GPTQ 4Bit quantized
Pytorch
FastAPI
Librespot

💡 Specifics

Material :

A 16GB GPU for default config
Jabra Speak2, Jabra Speak 810, Seeed Studio respeaker mic array v2.0 --> tested

Pipeline :

t2yLLM uses AsyncLLMEngine from vLLM in combination with faster-whisper in order to generate text from speech and stream tokens as fast as possible.
In distributed mode : The audio dispatcher processes text received from the LLM and transforms it to .flac segments and sends them to the client (raspberry Pi)
Sound reveived from the Jabra Speak2 or the Raspberry-Pi 5 is analyzed by silerovad to detect speech in addition to openwakeword or pvporcupine
Relevant sound is then translated by Faster-Whisper with low latency
The audio dispatcher transforms the LLM answer to speech with piperTTS over the network to reduce bandwidth usage and decrease latency

⚙️ Parameters

configuration should be done via the .yaml config file without having to directly interact with the code
configuration can be enhanced via the YamlConfigLoader.py
t2yLLM should be used on local network only since all is in clear text for now

⚙️ Environment variables

create a .env file and use python-dotenv or edit your ~/.bashrc :

export PORCUPINE_KEY='myporcupinekey'
export OPENWEATHERMAP_API_KEY='myopenweatherkey'
export VLLM_ATTENTION_BACKEND=FLASH_ATTN #for V1 engine
export VLLM_FLASH_ATTN_VERSION="2" #for V1 engine
export VLLM_USE_V1=1
VLLM_WORKER_MULTIPROC_METHOD="spawn" #for V1 engine
export TORCH_CUDA_ARCH_LIST='myarchitecture' #if needed

🔍 Github links

Repositories used in t2yLLM project :

🔗 vLLM
🔗 Faster-Whisper
🔗 Silero-vad
🔗 pvporcupine
🔗 openwakeword
🔗 Whisper-streaming
🔗 RealtimeSTT
🔗 Chromadb
🔗 piperTTS
🔗 json_repair
🔗 FastAPI
🔗 pydantic
🔗 librespot
🔗 Spotipy

🔍 APIs

🔗 Tyradex
🔗 pokeapi
🔗 OpenWeather
🔗 Spotify-dev

Plugins

Plugins can be added to the ./plugins folder. see example.py in ./plugins and pluginManager.py for implementation.
Plugins have to be activated and deactivated via config
List of supported plugins :
- date
- time
- weather
- wikipedia
- pokemon
- yeelight *
- Spotify *

yeelight and Spotify are harder to set up, read the setup process first.

🛠️ ToDo

async memory handler for non blocking operations when dealing with memory
WebOS plugin implementation
use FastAPI errors
use pydantic for UUID validation

⚖️ License

This code is under the MIT license. Please mention me as the author if you found this code useful

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Name		Name	Last commit message	Last commit date
Latest commit History 327 Commits
examples		examples
t2yLLM		t2yLLM
.gitattributes		.gitattributes
.gitignore		.gitignore
Checksums.txt		Checksums.txt
GPG.md		GPG.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

t2yLLM : a fast LLM Voice Assistant

💡 What it does