-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Description
Describe the issue
After 34% progress, the build randomly starts hanging until I CTRL+C and restart the whole process.
But now +20 hours into the compilation, it's been permanently stuck on 61% no matter what I do.
<redacted to reduce spam>
[ 48%] Built target onnxruntime_lora
[ 52%] Built target onnxruntime_optimizer
[ 54%] Built target onnxruntime_session
[ 61%] Built target onnxruntime_providers
[ 61%] Built target onnx_test_runner
[ 61%] Built target onnxruntime_perf_test
<redacted to reduce spam>
[ 61%] Building HIP object CMakeFiles/onnxruntime_composable_kernel_fmha.dir/_deps/composable_kernel-build/fmha_fwd_d32_fp16_batch_b128x64x16x32x32x32_r2x1x1_w32x32x16_qr_async_vr_psddv_bias.cpp.o
[ 61%] Building HIP object CMakeFiles/onnxruntime_composable_kernel_fmha.dir/_deps/composable_kernel-build/fmha_fwd_d32_fp16_batch_b128x64x16x32x32x32_r2x1x1_w32x32x16_qr_async_vc_psskddv_bias.cpp.o
Oddly enough all 16 logical cores are at 100% CPU utilization implying that is it doing something.
And the memory usage is nowhere near to trigger swapping, 19.3GB out of 64GB.
I'm unsure if this is a problem with the project itself, or a bug with clang-rocm.
Also I did try to rule out DDR5 memory issues by running the VT3 memory test bundled with y-cruncher and nothing out of the ordinary showed up.
In the meantime, is someone able to provide me pre-built x86-64 binaries, with ROCm & MIGraphX execution providers + debug symbols for Ubuntu 24.04?
Urgency
No response
Target platform
Unprivileged Ubuntu 24.04 LXC (Host: Fedora 42, 9800X3D, 64GB CL28 6000MHz, 7900XTX)
Build script
# Might have missed one or two commands
apt update && apt upgrade
apt install build-essential
wget https://repo.radeon.com/amdgpu-install/6.4.2/ubuntu/noble/amdgpu-install_6.4.60402-1_all.deb
sudo apt install ./amdgpu-install_6.4.60402-1_all.deb
sudo apt update
sudo apt install python3-setuptools python3-wheel
sudo usermod -a -G render,video $LOGNAME # Add the current user to the render and video groups
sudo apt install rocm
wget https://repo.radeon.com/amdgpu-install/6.4.2/ubuntu/noble/amdgpu-install_6.4.60402-1_all.deb
sudo apt install ./amdgpu-install_6.4.60402-1_all.deb
sudo apt update
sudo apt install "linux-headers-$(uname -r)" "linux-modules-extra-$(uname -r)"
sudo apt install amdgpu-dkms
git clone https://github.com/microsoft/onnxruntime.git
git checkout tags/v1.22.1
./build.sh \
--config RelWithDebInfo \
--parallel \
--use_rocm --rocm_home /opt/rocm \
--use_migraphx --migraphx_home /opt/rocm \
--allow_running_as_root # if you cba to make user account
Error / output
None
Visual Studio Version
None
GCC / Compiler Version
gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0