Skip to content

[Build] Build hangs with ROCm #25596

@copper-s

Description

@copper-s

Describe the issue

After 34% progress, the build randomly starts hanging until I CTRL+C and restart the whole process.
But now +20 hours into the compilation, it's been permanently stuck on 61% no matter what I do.

<redacted to reduce spam>

[ 48%] Built target onnxruntime_lora
[ 52%] Built target onnxruntime_optimizer
[ 54%] Built target onnxruntime_session
[ 61%] Built target onnxruntime_providers
[ 61%] Built target onnx_test_runner
[ 61%] Built target onnxruntime_perf_test

<redacted to reduce spam>

[ 61%] Building HIP object CMakeFiles/onnxruntime_composable_kernel_fmha.dir/_deps/composable_kernel-build/fmha_fwd_d32_fp16_batch_b128x64x16x32x32x32_r2x1x1_w32x32x16_qr_async_vr_psddv_bias.cpp.o
[ 61%] Building HIP object CMakeFiles/onnxruntime_composable_kernel_fmha.dir/_deps/composable_kernel-build/fmha_fwd_d32_fp16_batch_b128x64x16x32x32x32_r2x1x1_w32x32x16_qr_async_vc_psskddv_bias.cpp.o

Oddly enough all 16 logical cores are at 100% CPU utilization implying that is it doing something.
And the memory usage is nowhere near to trigger swapping, 19.3GB out of 64GB.

I'm unsure if this is a problem with the project itself, or a bug with clang-rocm.

Also I did try to rule out DDR5 memory issues by running the VT3 memory test bundled with y-cruncher and nothing out of the ordinary showed up.

In the meantime, is someone able to provide me pre-built x86-64 binaries, with ROCm & MIGraphX execution providers + debug symbols for Ubuntu 24.04?

Urgency

No response

Target platform

Unprivileged Ubuntu 24.04 LXC (Host: Fedora 42, 9800X3D, 64GB CL28 6000MHz, 7900XTX)

Build script

# Might have missed one or two commands

apt update && apt upgrade
apt install build-essential

wget https://repo.radeon.com/amdgpu-install/6.4.2/ubuntu/noble/amdgpu-install_6.4.60402-1_all.deb
sudo apt install ./amdgpu-install_6.4.60402-1_all.deb
sudo apt update
sudo apt install python3-setuptools python3-wheel
sudo usermod -a -G render,video $LOGNAME # Add the current user to the render and video groups
sudo apt install rocm

wget https://repo.radeon.com/amdgpu-install/6.4.2/ubuntu/noble/amdgpu-install_6.4.60402-1_all.deb
sudo apt install ./amdgpu-install_6.4.60402-1_all.deb
sudo apt update
sudo apt install "linux-headers-$(uname -r)" "linux-modules-extra-$(uname -r)"
sudo apt install amdgpu-dkms

git clone https://github.com/microsoft/onnxruntime.git
git checkout tags/v1.22.1

./build.sh \
  --config RelWithDebInfo \
  --parallel \
  --use_rocm --rocm_home /opt/rocm \
  --use_migraphx --migraphx_home /opt/rocm \
  --allow_running_as_root # if you cba to make user account

Error / output

None

Visual Studio Version

None

GCC / Compiler Version

gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    buildbuild issues; typically submitted using templateep:MIGraphXissues related to AMD MI GraphX execution providerep:ROCmquestions/issues related to ROCm execution provider

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions