Replies: 5 comments 2 replies
-
Any reason why you haven't installed Ubuntu natively on your main box? Various ML packages seem to run with Ubuntu primarily.. |
Beta Was this translation helpful? Give feedback.
-
I setup 4x MI250 GPU passthrough in a VM with ubuntu. It worked with llama.cpp. |
Beta Was this translation helpful? Give feedback.
-
The ROCm multigpu system I am running requires p2p to be disabled to get reasonable output. Do you know if peer to peer enabled on your system? |
Beta Was this translation helpful? Give feedback.
-
Hello, RX7900XTX and I have exactly the same errors in dmesg log on llama.cpp startup
|
Beta Was this translation helpful? Give feedback.
-
Observing the same problem on Gentoo, but perhaps my build is off, since i'm trying it on unsupported hardware. Did any of you succeed at fixing it at your side? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I wanted to share what I've learned in the many many hours spent trying to get x2 MI100s working with llama.cpp. This guide may also apply to other cards like the MI25, MI60, MI200, MI300 etc. Hopefully this saves someone from the nightmare that is ROCm. If anyone has any questions please don't hesitate to ask me!
The Problem: x1 MI100 works fine on Arch Linux however x2+ GPUs results in segfaults and eventually crashes. I believe this may be caused due to the requirement of the amdgpu-dkms proprietary driver but I'm not 100% sure.
Errors on Arch Linux with the latest rocm-hip-sdk
Steps to get Multi-GPU working
Additional Notes:
x2 MI100 Speed - 70B t/s with Q6_K
I hope this saves someone from the nightmare of ROCm. Have fun!
Beta Was this translation helpful? Give feedback.
All reactions