Skip to content

M4TH1EU/llama-assist

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

33 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ¦™ Llama Assist

hacs_badge GitHub release (latest by date)

img.png

Llama Assist is a Home Assistant integration that allows you to interact with almost any LLM (Large Language Model) with any LLM backend that is OpenAI-API compatible, such as the llama.cpp backend.

This integration creates a new Conversation agent in Home Assistant, which can be selected in the Voice Assistants section of the Home Assistant UI and used to interact with the LLM.

Important

This is NOT a llama.cpp backend, it connects to an existing llama.cpp backend running on your local network or accessible via the internet.

🧰 Features

  • Lightweight and fast
  • Easy to set up and use
  • Supports any LLMs supported by llama.cpp (or others OpenAI-API compatible backends)
  • Supports all built-in Home Assistant Assist actions
  • Supports embeddings for lightning fast responses (-50%) and lower token count (-65%)
  • Additional actions for more advanced interactions (COMING SOON)

πŸ“– Installation

Via HACS

My Home Assistant

  1. Install HACS if not already installed.
  2. In Home Assistant, go to "HACS" in the sidebar.
  3. Click on "Integrations."
  4. Click on the three dots in the top right corner and select "Custom repositories."
  5. Paste the following URL in the "Repo" field: https://github.com/M4TH1EU/llama-assist
  6. Select "Integration" from the "Category" dropdown.
  7. Click "Add."
  8. Search for "Llama Assist" and click "Install."

Manually

  1. Download the latest release from the GitHub repository.
  2. Extract the downloaded ZIP file.
  3. Copy the custom_components/easy_computer_manager directory to the config/custom_components/ directory in your Home Assistant instance.

πŸ› οΈ Usage

Go to Settings -> Devices & Services -> Add Integration and search for "Llama Assist". Fill in the required fields:

To use this integration, you must setup a llama.cpp HTTP backend. See instructions here

πŸ–₯️ Other backends?

The recommended way is to use llama.cpp but, while untested, any OpenAI-API compatible backend with tool/function calling should work with this integration.

Support for the official OpenAI API is not supported yet but will probably be added in the future.

Recommended models

This is only a personal recommendation based on my testing, you can use any model you want, as long as it is compatible with the llama.cpp backend or your OpenAI-API compatible backend.

The model you choose must support tools/functions calling

Model Name Size Notes
Qwen3 0.6B Fast and lightweight, reasonable for CPU (with reasoning enabled)
Qwen3 1.7B Better quality but slower on CPU
Qwen3 4B Good quality, almost instant answers on GPU (without reasoning)
Qwen3 14B High quality, requires GPU for reasonable performance
Qwen3 32B Wake up J.A.R.V.I.S. Daddy's home*

Note

If you have good experiences with other models, please open an issue or a pull request to add them to this list.

πŸ’¨ Embeddings

Llama Assist supports embeddings, which can significantly improve the performance of the assistant by reducing the amount of entities and functions descriptions that need to be processed by the LLM in the initial and subsequent requests. This is especially useful for low-end systems or when you have a lot of entities and functions in your Home Assistant.

Note

Embeddings work by analyzing the user input and the available entities and functions in Home Assistant, and then tries to find the most relevant entities and functions to use in the response.
While this is generally very effective, it can sometimes lead to unexpected results, such as the system not recognizing an entity or function that you expect it to recognize.
Please report any issues you encounter with embeddings to help improve the system.

Embeddings are disabled by default, you can enable them in the configuration if you want to use them.

Note

To use embeddings with the llama.cpp backend, you will have to run a separate instance of the llama.cpp server with the --embedding flag enabled. See here for more details.

⚑ Performance

In this example, we compare the system behavior with and without embeddings on a low-end system (CPU only, Intel i5-11400, 4 cores) for a simple request:

User: Hi Jarvis!
Assistant (1): Hello! How can I assist you today?
User: Add strawberries to my shopping list.
ToolCall (2): HassShoppingListAddItem
Assistant (3): Strawberries have been added to your shopping list.

Without embeddings:

Message Time (ms) Tokens (Prompt + Completion) Content Summary
1 7855 + 2581 = 10s 1920 + 84 Greeting
2 8477 + 4282 = 13s 1947 + 136 πŸ”§ ToolCall β†’ Add to Shopping List
3 712 + 3944 = 5s 2042 + 120 βœ… Confirmation (Strawberries added)
Total ~28s ~6200

With embeddings:

Message Time (ms) Tokens (Prompt + Completion) Content Summary
1 1700 + 2312 = 4s 584 + 90 Greeting
2 1483 + 2554 = 4s 497 + 102 πŸ”§ ToolCall β†’ Add to Shopping List
3 445 + 3375 = 4s 592 + 131 βœ… Confirmation (Strawberries added)
Total ~12s ~2000

This reduction in time and tokens enables low-end systems to use LLMs more effectively.

βš™οΈ Build llama.cpp

Official documentation can be found here.

You might be able to use pre-built executable which can be found in the releases of llama.cpp repository

Note

Theses scripts are provided as examples that worked for me, you may need to adapt them to your system.
Please do NOT open issues related to building llama.cpp, this is not the purpose of this repository.
If you have issues, please open an issue on the llama.cpp repository.

Intel CPUs (oneAPI)

This script is for building llama.cpp with Intel oneAPI compiler.

#!/bin/bash
sudo apt install intel-oneapi-base-toolkit # Required to build llama.cpp for Intel CPUs

rm -Rf llama.cpp
git clone --depth=1 https://github.com/ggerganov/llama.cpp.git llama.cpp

source /opt/intel/oneapi/setvars.sh # You can skip this step if  in oneapi-basekit docker image, only required for manual installation
cd llama.cpp/
cmake -B build -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=Intel10_64lp -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DGGML_NATIVE=ON
cmake --build build --config Release

The executable will be in llama.cpp/build/bin/llama-server

AMD GPU (ROCM)

This script is for building llama.cpp with AMD ROCM compiler, this has been tested on Fedora 42 with ROCM 6.3.1

#!/bin/bash

# This script compiles llamacpp for ROCM under fedora (tested on 42), must have all 'rocm*'
# packages installed along with hipblas and other stuff...
# sudo dnf install 'rocm*' 'hipblaslt' 'hipblas-*' rocblas-devel make gcc cmake libcurl-devel

rm -rf sources/
git clone --depth=1 https://github.com/ggerganov/llama.cpp.git sources

cd sources/

MAX_THREADS=8

# Automatically detect HIP configuration paths
HIPCXX=$(hipconfig -l)/clang
HIP_PATH=$(hipconfig -R)
HIP_VISIBLE_DEVICES=$(hipconfig -R)

# Ensure hipconfig is successful
if [[ -z "$HIP_PATH" ]]; then
  echo "Error: Unable to detect HIP_PATH. Ensure HIP is correctly installed."
  exit 1
fi

# Automatically detect AMDGPU_TARGETS
AMDGPU_TARGET=$(rocminfo | grep gfx | head -1 | awk '{print $2}')
if [[ -z "$AMDGPU_TARGET" ]]; then
  echo "Error: Unable to detect AMDGPU target using rocminfo."
  exit 1
fi

# Find HIP device library path
HIP_DEVICE_LIB_PATH=$(find "${HIP_PATH}" -name "oclc_abi_version_400.bc" -exec dirname {} \; | head -n 1)
if [[ -z "$HIP_DEVICE_LIB_PATH" ]]; then
  echo "Error: Unable to find oclc_abi_version_400.bc under HIP_PATH."
  exit 1
fi

# Export necessary paths
export HIPCXX
export HIP_PATH
export HIP_VISIBLE_DEVICES
export HIP_DEVICE_LIB_PATH
export DEVICE_LIB_PATH=$HIP_DEVICE_LIB_PATH
export ROCM_PATH=/usr/

# Automatically detect clang and clang++ if installed
CLANG_C_COMPILER=$(which clang)
CLANG_CXX_COMPILER=$(which clang++)

# Ensure clang is detected
if [[ ! -x "$CLANG_C_COMPILER" ]]; then
  echo "Error: clang compiler not found."
  exit 1
fi
if [[ ! -x "$CLANG_CXX_COMPILER" ]]; then
  echo "Error: clang++ compiler not found."
  exit 1
fi

# Clean build directory
rm -rf build/*
# Run cmake with dynamically detected variables
cmake -S . -B build \
  -DGGML_HIPBLAS=ON \
  -DGGML_HIP=ON \
  -DAMDGPU_TARGETS="$AMDGPU_TARGET" \
  -DCMAKE_C_COMPILER="$CLANG_C_COMPILER" \
  -DCMAKE_CXX_COMPILER="$CLANG_CXX_COMPILER" \
  -DCMAKE_BUILD_TYPE=Release \
  -DCMAKE_PREFIX_PATH=$ROCM_PATH
  
# Build the project
cmake --build build --config Release -- -j $MAX_THREADS

The executables will be in sources/build/bin/llama-server

About

Manage your smart home in Home Assistant with local LLMs running with llama.cpp

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages