-
Notifications
You must be signed in to change notification settings - Fork 2.6k
fix(docker) rocm 6.3 based image #8152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 4 commits
c10a6fd
96523ca
28e0242
47508b8
f27471c
641a6cf
a3cb3e0
0db304f
31ca314
6d7b231
8c5fcfd
233740a
8213f62
3e8e0f6
78eb6b0
017d38e
1b6ebed
2caa1b1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -31,7 +31,8 @@ RUN --mount=type=cache,target=/var/cache/apt \ | |
libglx-mesa0 \ | ||
build-essential \ | ||
libopencv-dev \ | ||
libstdc++-10-dev | ||
libstdc++-10-dev \ | ||
wget | ||
|
||
ENV \ | ||
PYTHONUNBUFFERED=1 \ | ||
|
@@ -44,7 +45,6 @@ ENV \ | |
UV_MANAGED_PYTHON=1 \ | ||
UV_LINK_MODE=copy \ | ||
UV_PROJECT_ENVIRONMENT=/opt/venv \ | ||
UV_INDEX="https://download.pytorch.org/whl/cu124" \ | ||
INVOKEAI_ROOT=/invokeai \ | ||
INVOKEAI_HOST=0.0.0.0 \ | ||
INVOKEAI_PORT=9090 \ | ||
|
@@ -54,6 +54,10 @@ ENV \ | |
|
||
ARG GPU_DRIVER=cuda | ||
|
||
ARG CUDA_TORCH="https://download.pytorch.org/whl/cu124" | ||
ARG CPU_TORCH="https://download.pytorch.org/whl/cpu" | ||
ARG ROCM_TORCH="https://download.pytorch.org/whl/rocm6.2.4" | ||
|
||
# Install `uv` for package management | ||
COPY --from=ghcr.io/astral-sh/uv:0.6.9 /uv /uvx /bin/ | ||
|
||
|
@@ -72,23 +76,41 @@ WORKDIR ${INVOKEAI_SRC} | |
# x86_64/CUDA is the default | ||
RUN --mount=type=cache,target=/root/.cache/uv \ | ||
--mount=type=bind,source=pyproject.toml,target=pyproject.toml \ | ||
--mount=type=bind,source=uv.lock,target=uv.lock \ | ||
# Cannot use uv sync and uv.lock as that is locked to CUDA version packages, which breaks rocm... | ||
# --mount=type=bind,source=uv.lock,target=uv.lock \ | ||
# this is just to get the package manager to recognize that the project exists, without making changes to the docker layer | ||
--mount=type=bind,source=invokeai/version,target=invokeai/version \ | ||
if [ "$TARGETPLATFORM" = "linux/arm64" ] || [ "$GPU_DRIVER" = "cpu" ]; then UV_INDEX="https://download.pytorch.org/whl/cpu"; \ | ||
elif [ "$GPU_DRIVER" = "rocm" ]; then UV_INDEX="https://download.pytorch.org/whl/rocm6.2"; \ | ||
ulimit -n 30000 && \ | ||
if [ "$TARGETPLATFORM" = "linux/arm64" ] || [ "$GPU_DRIVER" = "cpu" ]; then export UV_INDEX="$CPU_TORCH"; \ | ||
elif [ "$GPU_DRIVER" = "rocm" ]; then export UV_INDEX="$ROCM_TORCH"; \ | ||
else export UV_INDEX="$CUDA_TORCH"; \ | ||
fi && \ | ||
uv sync --frozen | ||
uv venv --python 3.12 && \ | ||
# Use the public version to install existing known dependencies but using the UV_INDEX, not the hardcoded URLs within the uv.lock | ||
uv pip install invokeai | ||
|
||
RUN --mount=type=cache,target=/var/cache/apt \ | ||
--mount=type=cache,target=/var/lib/apt \ | ||
if [ "$GPU_DRIVER" = "rocm" ]; then \ | ||
wget -O /tmp/amdgpu-install.deb \ | ||
https://repo.radeon.com/amdgpu-install/6.2.4/ubuntu/noble/amdgpu-install_6.2.60204-1_all.deb && \ | ||
apt install -y /tmp/amdgpu-install.deb && \ | ||
apt update && \ | ||
amdgpu-install --usecase=rocm -y && \ | ||
apt-get autoclean && \ | ||
apt clean && \ | ||
rm -rf /tmp/* /var/tmp/* && \ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is likely unnecessary. the gpu driver should be provided by the kernel, and rocm itself is usually not needed in the image because it's already bundled with pytorch. That is unless something changed in the most recent torch/rocm that makes this a requirement. (to be clear, the video/render group additions for ubuntu user are needed should be kept) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Skipped the rocm install, but kept the groups and got: I went and looked at the rocm-pytorch docker, and they are installing the full rocmdev, I limited it to just the rocm binaries (also tried the hip alone but that still error'd). Suggestions? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. to be sure - are you using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, that's my goal, I don't want to have to modify the host and ensure that the container has everything. I'm running a proxmox host, with a docker LXC. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If this isn't ideal, I can split that logic into my own and have this build the minimal way, or make it another config? |
||
usermod -a -G render ubuntu && \ | ||
usermod -a -G video ubuntu && \ | ||
echo "\\n/opt/rocm/lib\\n/opt/rocm/lib64" >> /etc/ld.so.conf.d/rocm.conf && \ | ||
ldconfig && \ | ||
update-alternatives --auto rocm; \ | ||
fi | ||
|
||
# build patchmatch | ||
RUN cd /usr/lib/$(uname -p)-linux-gnu/pkgconfig/ && ln -sf opencv4.pc opencv.pc | ||
RUN python -c "from patchmatch import patch_match" | ||
|
||
# Link amdgpu.ids for ROCm builds | ||
# contributed by https://github.com/Rubonnek | ||
RUN mkdir -p "/opt/amdgpu/share/libdrm" &&\ | ||
ln -s "/usr/share/libdrm/amdgpu.ids" "/opt/amdgpu/share/libdrm/amdgpu.ids" | ||
|
||
ebr marked this conversation as resolved.
Show resolved
Hide resolved
|
||
RUN mkdir -p ${INVOKEAI_ROOT} && chown -R ${CONTAINER_UID}:${CONTAINER_GID} ${INVOKEAI_ROOT} | ||
|
||
COPY docker/docker-entrypoint.sh ./ | ||
|
@@ -105,9 +127,12 @@ COPY invokeai ${INVOKEAI_SRC}/invokeai | |
# in a previous layer | ||
RUN --mount=type=cache,target=/root/.cache/uv \ | ||
--mount=type=bind,source=pyproject.toml,target=pyproject.toml \ | ||
--mount=type=bind,source=uv.lock,target=uv.lock \ | ||
if [ "$TARGETPLATFORM" = "linux/arm64" ] || [ "$GPU_DRIVER" = "cpu" ]; then UV_INDEX="https://download.pytorch.org/whl/cpu"; \ | ||
elif [ "$GPU_DRIVER" = "rocm" ]; then UV_INDEX="https://download.pytorch.org/whl/rocm6.2"; \ | ||
# Cannot use the uv.lock as that is locked to CUDA version packages, which breaks rocm... | ||
# --mount=type=bind,source=uv.lock,target=uv.lock \ | ||
ulimit -n 30000 && \ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this ulimit doesn't affect much, wondering what's the reason for it here and the value of 30000? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. CUDA and CPU doesn't hit the limit, but with ROCM it fails as to many files are being opened. I can try to lower the limit if it concerns you, I just made it something high and was able to continue, so never went back. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. doesn't matter much, this only applies during build, it's just really weird that this is needed at all. |
||
if [ "$TARGETPLATFORM" = "linux/arm64" ] || [ "$GPU_DRIVER" = "cpu" ]; then export UV_INDEX="$CPU_TORCH"; \ | ||
elif [ "$GPU_DRIVER" = "rocm" ]; then export UV_INDEX="$ROCM_TORCH"; \ | ||
else export UV_INDEX="$CUDA_TORCH"; \ | ||
fi && \ | ||
uv pip install -e . | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could conditionalize this logic, to use the uv.lock for cuda, and then use the UV_INDEX for CPU and ROCM, to reduce the risk of this change, but I went with this for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be preferable to continue using uv.lock for the CUDA images, if possible, to keep it consistent with the installations produced by the official installer.
Ideally - if you're willing to work on this - we should find a way to support both cuda and rocm dependencies in a single uv.lock/pyproject.toml, perhaps by leveraging the uv dependency groups: https://docs.astral.sh/uv/concepts/projects/config/#conflicting-dependencies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the uv.lock, there's some notes about things in the pyproject.toml that I would like your input on.