π£οΈ speak to your device of choice
Your favorite LLM is ready to answer (here Qwen3 by default)
Or just interact via console or the web interface
You can ask it for weather, to turn lights on or off, to play music on spotify and more
It should just work like any home assistant.
The default keyword to activate speech detection is "Ivy" (openwakeword trained)
-
requirements :
- one PC equiped with a microphone and a speaker. Tested on :
- Jabra Speak2
- Jabra Speak 810
- Seeed Studio respeaker mic array V2.0
- one PC equiped with a microphone and a speaker. Tested on :
-
Available plugins :
- Meteo : It can search for meteo infos using your OpenWeather API key
- Pokemon : Look for any Pokemon info using Tyradex API (french) or PokeApi (english, others...)
- Wikipedia : Make Wikipedia searches using the python API
- Vector Search : stores all in a synthetic way in chromadb if needed and can retrieve the memorized info
- yeelight : control lights and create rooms to manage them
- Spotify : ask your LLM to play, pause shuffle your favorite songs (needs a premium account on spotify)
-
Install pytorch for your cuda version (see https://pytorch.org/get-started/locally/) :
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
for cuda 12.8.
For now on Blackwell GPUs, you need to uninstall current and install nightly :
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128
-
Install vllm :
pip install vllm --extra-index-url https://download.pytorch.org/whl/cu128
if you have a Blackwell GPU , you should follow vllm guidelines :
https://docs.vllm.ai/en/stable/getting_started/installation/gpu.html#use-an-existing-pytorch-installation
git clone https://github.com/vllm-project/vllm.git cd vllm python use_existing_torch.py pip install -r requirements/build.txt pip install --no-build-isolation -e .
-
(Optional) Install flash attention 2 flash-attention
-
Clone the repository :
git clone https://github.com/Saga9103/t2yLLM.git
cd t2yLLM
pip install -e .
-
(Optional) Create a porcupine account on picovoice
-
ATM if you want a custom keyword, it is mandatory to create a Picovoice account (check Picovoice), train and download a custom keyword to get a working pipeline.
-
make sure faster-whisper is up to date
-
(Optional) Install Caddy (and see in config folder README.md for more information) to access webUI on your local network see caddy configuration
-
See examples in ./examples on how to use and import
-
AssistantEngine class receives user prompts (text/str) generated via Faster-Whisper, browses APIs if needed, generates an answer (token by token with the async engine of vllm) and forwards it to the dispatcher. the related python script should be installed on your server/desktop.
- the VoiceEngine class runs Faster-Whisper, piperTTS, silero-vad and openwakeword/porcupine. It is responsible for :
- transforming answers generated by llm_backend_async.py to audio chunks (in .flac via piperTTS) and send them over udp to your raspberry Pi.
- getting audio from the raspberry Pi and converting it to text via Faster-Whisper with as low latency as possible. this script should be installed on your server/desktop (by default, the same as llm_backend_async.py but it can be different).
-
the .yaml config files should be used to tweak paramaters like silence, models, directories etc...
-
You can check models I use in the config files and also in the faster_whisper directory. To make things work on a 16GB gpu it needs quantization. You also need to fully load the llm_back_async.py first via llm_example.py. Should have no problem on 24GB GPUs.
-
Different parameters of vLLM can be used to save VRAM like enforce_eager, max_model_len etc... vLLM documentation is very rich
- Web UI for interactive talk and displaying code or math formulas /examples/llm_example_webui.py :
- vLLM : really fast and well documented inference pipeline for your favorite LLM
- Faster-Whisper : incredibly fast STT for realtime text generation (see model used here : model)
- piperTTS : fast text to speech generating natural voice, maybe the best for french atm
- Silero-vad : process the audio buffer and prevents whisper hallucinations
- Openwakeword : open source keyword detection
- pvporcupine : keyword detection
- Chromadb : a vector search database that serves as the model memory
- default LLM : Qwen3 14B or other variants, GPTQ 4Bit quantized
- Pytorch
- FastAPI
- Librespot
- A 16GB GPU for default config
- Jabra Speak2, Jabra Speak 810, Seeed Studio respeaker mic array v2.0 --> tested
- t2yLLM uses AsyncLLMEngine from vLLM in combination with faster-whisper in order to generate text from speech and stream tokens as fast as possible.
- In distributed mode : The audio dispatcher processes text received from the LLM and transforms it to .flac segments and sends them to the client (raspberry Pi)
- Sound reveived from the Jabra Speak2 or the Raspberry-Pi 5 is analyzed by silerovad to detect speech in addition to openwakeword or pvporcupine
- Relevant sound is then translated by Faster-Whisper with low latency
- The audio dispatcher transforms the LLM answer to speech with piperTTS over the network to reduce bandwidth usage and decrease latency
- configuration should be done via the .yaml config file without having to directly interact with the code
- configuration can be enhanced via the YamlConfigLoader.py
- t2yLLM should be used on local network only since all is in clear text for now
create a .env file and use python-dotenv or edit your ~/.bashrc :
- export PORCUPINE_KEY='myporcupinekey'
- export OPENWEATHERMAP_API_KEY='myopenweatherkey'
- export VLLM_ATTENTION_BACKEND=FLASH_ATTN #for V1 engine
- export VLLM_FLASH_ATTN_VERSION="2" #for V1 engine
- export VLLM_USE_V1=1
- VLLM_WORKER_MULTIPROC_METHOD="spawn" #for V1 engine
- export TORCH_CUDA_ARCH_LIST='myarchitecture' #if needed
Repositories used in t2yLLM project :
- π vLLM
- π Faster-Whisper
- π Silero-vad
- π pvporcupine
- π openwakeword
- π Whisper-streaming
- π RealtimeSTT
- π Chromadb
- π piperTTS
- π json_repair
- π FastAPI
- π pydantic
- π librespot
- π Spotipy
- π Tyradex
- π pokeapi
- π OpenWeather
- π Spotify-dev
- Plugins can be added to the ./plugins folder. see example.py in ./plugins and pluginManager.py for implementation.
- Plugins have to be activated and deactivated via config
- List of supported plugins :
- date
- time
- weather
- wikipedia
- pokemon
- yeelight *
- Spotify *
- yeelight and Spotify are harder to set up, read the setup process first.
- async memory handler for non blocking operations when dealing with memory
- WebOS plugin implementation
- use FastAPI errors
- use pydantic for UUID validation
This code is under the MIT license. Please mention me as the author if you found this code useful
Copyright (c) 2025 Saga9103
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.