AMD Instinct MI100 ROCm Guide for llama.cpp #2824

ghost · 2023-08-27T05:42:47Z

ghost
Aug 27, 2023

I wanted to share what I've learned in the many many hours spent trying to get x2 MI100s working with llama.cpp. This guide may also apply to other cards like the MI25, MI60, MI200, MI300 etc. Hopefully this saves someone from the nightmare that is ROCm. If anyone has any questions please don't hesitate to ask me!

The Problem: x1 MI100 works fine on Arch Linux however x2+ GPUs results in segfaults and eventually crashes. I believe this may be caused due to the requirement of the amdgpu-dkms proprietary driver but I'm not 100% sure.

Errors on Arch Linux with the latest rocm-hip-sdk

amdgpu: init_user_pages: Failed to get user pages: -1
amdgpu: init_user_pages: Failed to get user pages: -1
amdgpu: init_user_pages: Failed to get user pages: -1
... (100s of these errors followed by a segfault)
:1:rocmemory.cpp :945 : 3790242352 us: 64007: [tid:0x6cfb0df5cc00] Failed to lock memory to pool, failed with hsa_status: 4096
:1:rocdevice.cpp :1897: 3790242359 us: 64007: [tid:0x6cfb0df5cc00] Failed creating memory
:1:memory.cpp :347 : 3790242365 us: 64007: [tid:0x6cfb0df5cc00] Video memory allocation failed!
:1:memory.cpp :308 : 3790242370 us: 64007: [tid:0x6cfb0df5cc00] Can't allocate memory size - 0x05BE1000 bytes!
:1:rocblit.cpp :2668: 3790242374 us: 64007: [tid:0x6cfb0df5cc00] Buffer create failed, Buffer: 0xb9c535f0
:1:rocdevice.cpp :3253: 4009260835 us: 64502: [tid:0x6d0daeb8cc00] hsa_amd_pointer_info() failed

Steps to get Multi-GPU working

Follow https://wiki.archlinux.org/title/PCI_passthrough_via_OVMF and pass-through all Instinct cards into a VM with Ubuntu 22.04 or a Ubuntu 22.04 based distribution.
Setup ROCm first before adding the GPUs into the VM else you will have to deal with the GPU Reset Bug when restarting.
Install the following version of libstdc++ with $ sudo apt install libstdc++-12-dev
Finish setting up ROCm 5.6 using this guide and make sure the environment variables are set: https://rocm.docs.amd.com/en/latest/deploy/linux/os-native/install.html
Shutdown and add the Instinct Cards to your VM
Finish your install of llama.cpp normally by compiling with LLAMA_HIPBLAS=1 and enjoy!

Additional Notes:

Disable CSM in BIOS if you are having trouble detecting your GPU.
If you run into issues compiling with ROCm, try using cmake instead of make.
amdgpu-install may have problems when combined with another package manager. It's better to stick to 1 install method.
Don't forget to edit LLAMA_CUDA_DMMV_X, LLAMA_CUDA_MMV_Y etc for slightly better t/s.
Use AMD_LOG_LEVEL=1 when running llama.cpp to help with troubleshooting.
Docker seems to have the same problem when running on Arch Linux.

x2 MI100 Speed - 70B t/s with Q6_K

llama_print_timings: sample time = 412,48 ms / 715 runs ( 0,58 ms per token, 1733,43 tokens per second)
llama_print_timings: prompt eval time = 5360,81 ms / 262 tokens ( 20,46 ms per token, 48,87 tokens per second)
llama_print_timings: eval time = 85709,90 ms / 713 runs ( 120,21 ms per token, 8,32 tokens per second)

I hope this saves someone from the nightmare of ROCm. Have fun!

BarfingLemurs · 2023-08-27T06:05:41Z

BarfingLemurs
Aug 27, 2023

Any reason why you haven't installed Ubuntu natively on your main box? Various ML packages seem to run with Ubuntu primarily..

1 reply

ghost Aug 27, 2023

It would be a huge pain to switch completely to another OS. Plus, it's nice to be able to use ROCm outside of Ubuntu, RHEL or SLES. It might be possible to port the drivers directly into Arch Linux as well but that's a project for another time.

billcsm · 2023-12-02T03:08:50Z

billcsm
Dec 2, 2023

I setup 4x MI250 GPU passthrough in a VM with ubuntu. It worked with llama.cpp.

0 replies

aymane-eljerari · 2024-06-29T18:00:10Z

aymane-eljerari
Jun 29, 2024

The ROCm multigpu system I am running requires p2p to be disabled to get reasonable output. Do you know if peer to peer enabled on your system?

0 replies

serhii-nakon · 2024-10-03T12:43:18Z

serhii-nakon
Oct 3, 2024

Hello, RX7900XTX and I have exactly the same errors in dmesg log on llama.cpp startup
ROCm 6.2.2,
Linux 6.11.1
Latest firmwares
Debian 12
Runtime environment: Docker with pre-built ROCm 6.2.2 and Ubuntu 22.04
Logs:

init_user_pages: Failed to get user pages: -1

0 replies

l29ah · 2025-07-09T22:17:00Z

l29ah
Jul 9, 2025

Observing the same problem on Gentoo, but perhaps my build is off, since i'm trying it on unsupported hardware. Did any of you succeed at fixing it at your side?

1 reply

l29ah Jul 10, 2025

Alright, my problem was not having HSA_AMD_SVM kernel option enabled.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AMD Instinct MI100 ROCm Guide for llama.cpp #2824

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 5 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

AMD Instinct MI100 ROCm Guide for llama.cpp #2824

Uh oh!

Uh oh!

Errors on Arch Linux with the latest rocm-hip-sdk

Steps to get Multi-GPU working

x2 MI100 Speed - 70B t/s with Q6_K

Replies: 5 comments · 2 replies

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Replies: 5 comments 2 replies