Home Assistant integration #575

pepijndevos · 2025-01-11T12:58:58Z

pepijndevos
Jan 11, 2025

I got here from your shoutout on the ollama vulkan issue. I figured I'd continue here.

I'm working on a raspberry pi mini ITX board that supports discrete AMD GPUs to run LLMs inside home assistant. But there is no rocm for arm.

This needs several puzzle pieces:

A HA integration:
1. Ollama obviously uses the ollama API with function calling
2. Extended OpenAI, to set the base URL, also requires function calling support
3. Home-llm supports OpenAI API with client side function calling through various templates
An inference engine
1. Ollama doesn't support vulkan or sycl or anything
2. vLLM doesn't have vulkan
3. Mlc-llm vulkan crashes on ARM
4. Llama.cpp server doesn't have function calling
5. Llama-cpp-python has limited function calling support but is CPU bottlenecked
Speech pipeline with Wyoming
1. Faster-whisper is CPU or cuda iirc
2. Whisper.cpp has vulkan but Wyoming wrapper seems unmaintained
3. Piper seems more reasonable to run on CPU
HA addon to run inference (container!)

So I'm very interested to explore what ramalama brings to the table here. From what I understand you're basically running llama.cpp in a container with an ollama compatible API, correct?

Do you support the native Ollama/OpenAI function calling features or are you indeed depending on the inference engine or client side templates to support this?

My current implementation is llama.cpp + home-llm which works but the function calling isn't great.

What I think could be potentially very interesting is a ramalama HA Integration/addon that can automatically launch the right inference engine.

ericcurtin · 2025-01-11T13:08:14Z

ericcurtin
Jan 11, 2025
Maintainer

I got here from your shoutout on the ollama vulkan issue. I figured I'd continue here.

I'm working on a raspberry pi mini ITX board that supports discrete AMD GPUs to run LLMs inside home assistant. But there is no rocm for arm.

This needs several puzzle pieces:

A HA integration:

Ollama obviously uses the ollama API with function calling

Extended OpenAI, to set the base URL, also requires function calling support

Home-llm supports OpenAI API with client side function calling through various templates

An inference engine

Ollama doesn't support vulkan or sycl or anything

vLLM doesn't have vulkan

Mlc-llm vulkan crashes on ARM

Llama.cpp server doesn't have function calling

Llama-cpp-python has limited function calling support but is CPU bottlenecked

Speech pipeline with Wyoming

Faster-whisper is CPU or cuda iirc

Whisper.cpp has cuda but Wyoming wrapper seems unmaintained

Piper seems more reasonable to run on CPU

HA addon to run inference (container!)

So I'm very interested to explore what ramalama brings to the table here. From what I understand you're basically running llama.cpp in a container with an ollama compatible API, correct?

Yes, I would say an OpenAI compatible API, we haven't committed to an Ollama compatible API yet, but would be open to it (we run llama-server or vllm server under the covers).

Do you support the native Ollama/OpenAI function calling features or are you indeed depending on the inference engine or client side templates to support this?

My current implementation is llama.cpp + home-llm which works but the function calling isn't great.

What I think could be potentially very interesting is a ramalama HA Integration/addon that can automatically launch the right inference engine.

We would certainly need the help here but yes I agree.

7 replies

ericcurtin Jan 11, 2025
Maintainer

UBI does have some advantages over some distros though ROCm and CUDA userspace drivers being some examples

pepijndevos Jan 16, 2025
Author

I learned this the hard way, fighting with Alpine Vulkan drivers unsuccessfully

ericcurtin Jan 16, 2025
Maintainer

Deploying the UBI9 containers on whatever Linux host you have will save effort. It's one of the reasons RamaLama exists.

pepijndevos Jan 16, 2025
Author

The ha containers do contain some trickery to talk with ha and I guess it saves space if all add-ons are the same base image but in this case it seems indeed much easier to use ubi.

The addon is almost a oneliner with the caveat that it needs a tool calling pr that hasn't been merged yet to be useful in HA, so for the time being I'll have to use a fork with a different llama.cpp branch but eventually I can just use yours without issue.

pepijndevos Jan 16, 2025
Author

Actually ubi gives the same error, except instead it just uses llvmpipe so I didn't notice before.

'DISPLAY' environment variable not set... skipping surface info
error: XDG_RUNTIME_DIR is invalid or not set in the environment.
ERROR: [../src/amd/vulkan/radv_physical_device.c:1929] Code 0 : Could not open device /dev/dri/renderD129: Operation not permitted (VK_ERROR_INCOMPATIBLE_DRIVER)
TU: error: ../src/freedreno/vulkan/tu_knl.cc:227: failed to open device /dev/dri/renderD128 (VK_ERROR_INCOMPATIBLE_DRIVER)
TU: error: ../src/freedreno/vulkan/tu_knl.cc:227: failed to open device /dev/dri/renderD129 (VK_ERROR_INCOMPATIBLE_DRIVER)
Opening /dev/dri/renderD128 failed: Operation not permitted
ERROR: [Loader Message] Code 0 : setup_loader_term_phys_devs: Call to 'vkEnumeratePhysicalDevices' in ICD /usr/lib64/libvulkan_broadcom.so failed with error code -3
Opening /dev/dri/renderD128 failed: Operation not permitted
ERROR: [Loader Message] Code 0 : setup_loader_term_phys_devs: Call to 'vkEnumeratePhysicalDevices' in ICD /usr/lib64/libvulkan_broadcom.so failed with error code -3
error: XDG_RUNTIME_DIR is invalid or not set in the environment.
Opening /dev/dri/renderD128 failed: Operation not permitted
ERROR: [Loader Message] Code 0 : setup_loader_term_phys_devs: Call to 'vkEnumeratePhysicalDevices' in ICD /usr/lib64/libvulkan_broadcom.so failed with error code -3
Opening /dev/dri/renderD128 failed: Operation not permitted
ERROR: [Loader Message] Code 0 : setup_loader_term_phys_devs: Call to 'vkEnumeratePhysicalDevices' in ICD /usr/lib64/libvulkan_broadcom.so failed with error code -3
error: XDG_RUNTIME_DIR is invalid or not set in the environment.
error: XDG_RUNTIME_DIR is invalid or not set in the environment.
error: XDG_RUNTIME_DIR is invalid or not set in the environment.
error: XDG_RUNTIME_DIR is invalid or not set in the environment.
error: XDG_RUNTIME_DIR is invalid or not set in the environment.

pepijndevos · 2025-01-16T21:20:14Z

pepijndevos
Jan 16, 2025
Author

It's an AMD 7600 XT, and I'm running with ``` devices: - "/dev/dri" - "/dev/kfd" ```

…

On Thursday, January 16th, 2025 at 9:29 PM, Eric Curtin ***@***.***> wrote: What's your GPU? — Reply to this email directly, [view it on GitHub](#575 (comment)), or [unsubscribe](https://github.com/notifications/unsubscribe-auth/AABJFINOMV72FY7LH6J6LO32LAJCTAVCNFSM6AAAAABU76PBKSVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTCOBVHEZDSNY). You are receiving this because you authored the thread.Message ID: ***@***.***>

4 replies

ericcurtin Jan 16, 2025
Maintainer

Ah I see this is why you were saying you can't use ROCm. I'd first try building llama.cpp without containers (or with super priviledged rooted containers) with both:

"-DGGML_KOMPUTE=ON -DKOMPUTE_OPT_DISABLE_VULKAN_VERSION_CHECK=ON"

and

"-DGGML_VULKAN=ON"

those are two different types of vulkan backends for llama.cpp . Maybe one of them will work. I don't want you to get your hopes up either though this could be challenging, it's an exotic setup.

I don't have any further advice for you after this.

If you get it working we should contribute the changes back to RamaLama so this all automagically happens when others try this aarch64/ROCm kinda setup.

pepijndevos Jan 16, 2025
Author

So this worked like a charm on plain Debian with the normal vulkan backend.
Now I'm trying to make Home Assistant work, so I added the amdgpu driver to their buildroot.
Adding dependencies to buildroot is kind of a chore but indeed maybe I should try making Vulkan work on the host first.
Or in a privileged docker container.

(huh I replied per email to your question which GPU which is nowhere to be seen and the email turned into a top level answer, weird)

ericcurtin Jan 16, 2025
Maintainer

Also most of our testing is done with podman, although we want docker to work just as well. Was the debian version the exact same hardware with docker? Sometimes it's just getting the various permissions, selinux, that kinda stuff right, users, groups

pepijndevos Jan 16, 2025
Author

Exact same hardware, no docker. I will hunt down the problem.

Home Assistant integration #575

Uh oh!

Uh oh!

pepijndevos Jan 11, 2025

Replies: 2 comments · 11 replies

Uh oh!

ericcurtin Jan 11, 2025 Maintainer

Uh oh!

Uh oh!

ericcurtin Jan 11, 2025 Maintainer

Uh oh!

pepijndevos Jan 16, 2025 Author

Uh oh!

Uh oh!

ericcurtin Jan 16, 2025 Maintainer

Uh oh!

pepijndevos Jan 16, 2025 Author

Uh oh!

pepijndevos Jan 16, 2025 Author

Uh oh!

pepijndevos Jan 16, 2025 Author

Uh oh!

Uh oh!

ericcurtin Jan 16, 2025 Maintainer

Uh oh!

pepijndevos Jan 16, 2025 Author

Uh oh!

ericcurtin Jan 16, 2025 Maintainer

Uh oh!

pepijndevos Jan 16, 2025 Author

pepijndevos
Jan 11, 2025

Replies: 2 comments 11 replies

ericcurtin
Jan 11, 2025
Maintainer

ericcurtin Jan 11, 2025
Maintainer

pepijndevos Jan 16, 2025
Author

ericcurtin Jan 16, 2025
Maintainer

pepijndevos Jan 16, 2025
Author

pepijndevos Jan 16, 2025
Author

pepijndevos
Jan 16, 2025
Author

ericcurtin Jan 16, 2025
Maintainer

pepijndevos Jan 16, 2025
Author

ericcurtin Jan 16, 2025
Maintainer

pepijndevos Jan 16, 2025
Author