Skip to content

(feat) Add TransformerEngine build. #76

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 55 commits into from
Sep 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
5f6ed4c
(feat) Add `TransformerEngine` build.
wbrown Aug 21, 2024
431fcf1
Fix typo in URI
wbrown Aug 21, 2024
c40675a
Compensate for weirs release tagging.
wbrown Aug 21, 2024
bdbf84d
Change how we get the branch. Correct version tagging.
wbrown Aug 21, 2024
bd0bb04
Apparently they chagned the release version tagging scheme.
wbrown Aug 21, 2024
ff2c28e
Their release tagging uses x.x.x, but the brqnches use x.x.
wbrown Aug 21, 2024
1045646
Do we need the double build?
wbrown Aug 21, 2024
42a67b8
Add `WORKDIR` back for TransformerEngine build.
wbrown Aug 21, 2024
2fa7f5d
We need cudnn9.
wbrown Aug 21, 2024
125f56f
Forgot the `fi`
wbrown Aug 21, 2024
f137049
Try adding CUDNN path.
wbrown Aug 21, 2024
55c9e59
We may need to install CUDNN.
wbrown Aug 21, 2024
9592467
Try again with package name.
wbrown Aug 21, 2024
71e6b2c
Names are hard.
wbrown Aug 21, 2024
f5d0bf1
Add `nsight-systems-cli`
wbrown Aug 21, 2024
fd534ed
Remove the `--depth` argument to the git submodule for `TransformerEn…
wbrown Aug 21, 2024
e0e3160
Remove `--depth` in another place.
wbrown Aug 21, 2024
d3829e6
Add `CUDA_*` env vars.
wbrown Aug 21, 2024
a38df66
Add `cuda-nvtx` package.
wbrown Aug 22, 2024
1ad2d97
Add `cuda-nvrtc`.
wbrown Aug 22, 2024
a13c609
`nvrtc-dev`
wbrown Aug 22, 2024
a155b06
Try TE 1.8
wbrown Aug 22, 2024
7b5bc5c
Python <3.9 doesn't have cache.
wbrown Aug 22, 2024
07433a8
Move the `sed`
wbrown Aug 22, 2024
05610f3
Disable CUDA 11.8
wbrown Aug 22, 2024
5c89026
Remove extraneous `;`.
wbrown Aug 22, 2024
3aa32cb
DIsable ancient versions for now.
wbrown Aug 22, 2024
2d65aed
PRovide both major and minor versions to cudnn.
wbrown Aug 22, 2024
d862c5e
Try instlaling cudann before everything else.
wbrown Aug 22, 2024
adf06da
Fix typo in apt argument.
wbrown Aug 22, 2024
d274a8f
Install separate clauses
wbrown Aug 22, 2024
00e84b1
Remove apt-update.
wbrown Aug 22, 2024
7061b1a
Restore updating package version.
wbrown Aug 22, 2024
b63a883
Try no updates and flagggon on cudnn major version.
wbrown Aug 22, 2024
98f1960
Move `transformerengine` build into `torch``
wbrown Aug 28, 2024
7151fc4
Remove errant `FROM`
wbrown Aug 28, 2024
5e397d1
Align `TransformerEngine` with the build pattern of the rest of the `…
wbrown Aug 28, 2024
4de958f
Add nvtx.
wbrown Aug 28, 2024
6283825
Remove cudnn.
wbrown Aug 28, 2024
422dba3
Add `cudnn` to build.
wbrown Aug 28, 2024
0976c2d
Remove `CC=./compiler`
wbrown Aug 29, 2024
12a20c7
DIsable 12.0.1
wbrown Aug 29, 2024
fd9fe70
Switch to specific CUDNN runtime.
wbrown Aug 30, 2024
40f58ba
Changed the wrong Dockerfile.
wbrown Aug 30, 2024
aa91f8e
May not need CUDNN for `torch-extras` build.
wbrown Aug 30, 2024
2a478d7
build(torch): Install cuDNN when not present
Eta0 Aug 30, 2024
8db89f8
ci(torch): Disable remaining torch-nccl build with CUDA 12.0.1
Eta0 Aug 30, 2024
12e0613
fix(torch): Fix version check logic when cuDNN is not present
Eta0 Aug 30, 2024
732ef40
build(torch): Suppress a few less-important PyTorch build warnings
Eta0 Aug 30, 2024
e979736
refactor(torch): Redo Python version-checking logic
Eta0 Aug 30, 2024
b5e2ed7
feat(torch): Update TransformerEngine to unreleased ver. 1.11, `458c7de`
Eta0 Sep 27, 2024
1087fdd
fix(torch): Install `packaging` in the final `torch` image
Eta0 Sep 27, 2024
da54c0f
feat(torch): Move `flash-attn` from `torch-extras` to `torch`
Eta0 Sep 27, 2024
52fb3c4
fix(torch): Include `compiler_wrapper.f95` in the `torch` build context
Eta0 Sep 27, 2024
9184858
Merge pull request #78 from coreweave/es/te-base-flash-attn
wbrown Sep 27, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .github/configurations/torch-base.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,12 @@ exclude:
# Not a supported combination
- cuda: 11.8.0
os: ubuntu22.04
- cuda: 11.8.0
os: ubuntu20.04
- cuda: 12.0.1
os: ubuntu20.04
- cuda: 12.0.1
os: ubuntu22.04
include:
- torch: 2.4.0
vision: 0.19.0
Expand Down
30 changes: 15 additions & 15 deletions .github/configurations/torch-nccl.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,11 @@ image:
os: ubuntu22.04
nccl: 2.19.3-1
nccl-tests-hash: 85f9143
- cuda: 12.0.1
cudnn: cudnn8
os: ubuntu22.04
nccl: 2.18.5-1
nccl-tests-hash: 85f9143
# - cuda: 12.0.1
# cudnn: cudnn8
# os: ubuntu22.04
# nccl: 2.18.5-1
# nccl-tests-hash: 85f9143
# Ubuntu 20.04
- cuda: 12.4.1
cudnn: cudnn
Expand All @@ -36,16 +36,16 @@ image:
os: ubuntu20.04
nccl: 2.21.5-1
nccl-tests-hash: 85f9143
- cuda: 12.0.1
cudnn: cudnn8
os: ubuntu20.04
nccl: 2.19.3-1
nccl-tests-hash: 85f9143
- cuda: 11.8.0
cudnn: cudnn8
os: ubuntu20.04
nccl: 2.16.5-1
nccl-tests-hash: 868dc3d
# - cuda: 12.0.1
# cudnn: cudnn8
# os: ubuntu20.04
# nccl: 2.19.3-1
# nccl-tests-hash: 85f9143
# - cuda: 11.8.0
# cudnn: cudnn8
# os: ubuntu20.04
# nccl: 2.16.5-1
# nccl-tests-hash: 868dc3d
include:
- torch: 2.4.0
vision: 0.19.0
Expand Down
67 changes: 19 additions & 48 deletions torch-extras/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,9 @@

ARG BASE_IMAGE
ARG DEEPSPEED_VERSION="0.14.4"
ARG FLASH_ATTN_VERSION="2.6.3"
ARG APEX_COMMIT="23c1f86520e22b505e8fdfcf6298273dff2d93d8"
ARG XFORMERS_VERSION="0.0.27.post2"

FROM alpine/git:2.36.3 as flash-attn-downloader
WORKDIR /git
ARG FLASH_ATTN_VERSION
RUN git clone --recurse-submodules --shallow-submodules -j8 --depth 1 \
--filter=blob:none --also-filter-submodules \
https://github.com/Dao-AILab/flash-attention -b v${FLASH_ATTN_VERSION}

FROM alpine/git:2.36.3 as apex-downloader
WORKDIR /git
ARG APEX_COMMIT
Expand All @@ -24,7 +16,6 @@ RUN git clone --filter=blob:none --depth 1 --no-single-branch --no-checkout \
--depth 1 --filter=blob:none && \
find -type d -name docs -prune -exec rm -r '{}' ';'


# Dependencies requiring NVCC are built ahead of time in a separate stage
# so that the ~2 GiB dev library installations don't have to be included
# in the final image.
Expand All @@ -34,7 +25,8 @@ RUN export \
CUDA_MINOR_VERSION=$(echo $CUDA_VERSION | cut -d. -f2) && \
export \
CUDA_PACKAGE_VERSION="${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}" && \
apt-get -qq update && apt-get install -y --no-install-recommends \
#apt-get install -y --no-install-recommends \
apt-get install -y --no-install-recommends \
cuda-nvcc-${CUDA_PACKAGE_VERSION} \
cuda-nvml-dev-${CUDA_PACKAGE_VERSION} \
libcurand-dev-${CUDA_PACKAGE_VERSION} \
Expand All @@ -43,6 +35,8 @@ RUN export \
libcusolver-dev-${CUDA_PACKAGE_VERSION} \
cuda-nvprof-${CUDA_PACKAGE_VERSION} \
cuda-profiler-api-${CUDA_PACKAGE_VERSION} \
cuda-nvtx-${CUDA_PACKAGE_VERSION} \
cuda-nvrtc-dev-${CUDA_PACKAGE_VERSION} \
libaio-dev \
ninja-build && \
apt-get clean
Expand Down Expand Up @@ -153,40 +147,6 @@ SHELL ["/bin/sh", "-c"]
WORKDIR /wheels


FROM builder-base as flash-attn-builder

SHELL ["/bin/bash", "-o", "pipefail", "-c"]
RUN --mount=type=bind,from=flash-attn-downloader,source=/git/flash-attention,target=flash-attention/,rw \
python3 -m pip install -U --no-cache-dir \
packaging setuptools wheel pip && \
export CC=$(realpath -e ./compiler) \
MAX_JOBS="$(./scale.sh "$(./effective_cpu_count.sh)" 8 12)" \
NVCC_APPEND_FLAGS='-diag-suppress 186,177' \
PYTHONUNBUFFERED=1 \
FLASH_ATTENTION_FORCE_BUILD='TRUE' && \
cd flash-attention && \
( \
for EXT_DIR in $(realpath -s -e \
. \
csrc/ft_attention \
csrc/fused_dense_lib \
csrc/fused_softmax \
csrc/layer_norm \
csrc/rotary \
csrc/xentropy); \
do \
cd $EXT_DIR && \
python3 setup.py bdist_wheel --dist-dir /wheels && \
cd - || \
exit 1; \
done; \
) | \
grep -Ev --line-buffered 'ptxas info\s*:|bytes spill stores'
SHELL ["/bin/sh", "-c"]

WORKDIR /wheels


FROM builder-base as apex-builder

RUN LIBNCCL2_VERSION=$(dpkg-query --showformat='${Version}' --show libnccl2) && \
Expand All @@ -199,6 +159,17 @@ RUN LIBNCCL2_VERSION=$(dpkg-query --showformat='${Version}' --show libnccl2) &&
RUN --mount=type=bind,from=apex-downloader,source=/git/apex,target=apex/,rw \
python3 -m pip install -U --no-cache-dir \
packaging setuptools wheel pip && \
CUDA_MAJOR_VERSION=$(echo "${CUDA_VERSION}" | cut -d. -f1) && \
CHECK_VERSION() { \
dpkg-query --status "$1" 2>/dev/null \
| sed -ne 's/Version: //p' \
| grep .; \
} && \
LIBCUDNN_VER="$( \
CHECK_VERSION libcudnn8-dev || \
CHECK_VERSION "libcudnn9-dev-cuda-${CUDA_MAJOR_VERSION}" || \
:; \
)" && \
export CC=$(realpath -e ./compiler) && \
export MAX_JOBS="$(./scale.sh "$(./effective_cpu_count.sh)" 8 24)" && \
export NVCC_APPEND_FLAGS='-diag-suppress 186,177' && \
Expand All @@ -222,7 +193,7 @@ RUN --mount=type=bind,from=apex-downloader,source=/git/apex,target=apex/,rw \
--peer_memory \
--nccl_p2p \
--fast_bottleneck && \
if dpkg-query --status libcudnn8-dev > /dev/null 2> /dev/null; then \
if [ -n "$LIBCUDNN_VER" ]; then \
echo \
--bnp \
--cudnn_gbn \
Expand All @@ -235,7 +206,6 @@ RUN --mount=type=bind,from=apex-downloader,source=/git/apex,target=apex/,rw \

WORKDIR /wheels


FROM builder-base as xformers-builder

ARG XFORMERS_VERSION
Expand All @@ -253,19 +223,20 @@ RUN python3 -m pip install -U --no-cache-dir \
--no-binary=xformers \
xformers==${XFORMERS_VERSION} 2> \
>(grep -Ev --line-buffered 'ptxas info\s*:|bytes spill stores' >&2)

SHELL ["/bin/sh", "-c"]

WORKDIR /build

FROM ${BASE_IMAGE}

RUN apt-get -qq update && \
apt-get install -y --no-install-recommends libaio-dev && \
apt-get clean


RUN --mount=type=bind,from=deepspeed-builder,source=/wheels,target=/tmp/wheels \
python3 -m pip install --no-cache-dir /tmp/wheels/*.whl
RUN --mount=type=bind,from=flash-attn-builder,source=/wheels,target=/tmp/wheels \
python3 -m pip install --no-cache-dir /tmp/wheels/*.whl
RUN --mount=type=bind,from=apex-builder,source=/wheels,target=/tmp/wheels \
python3 -m pip install --no-cache-dir /tmp/wheels/*.whl
RUN --mount=type=bind,from=xformers-builder,source=/wheels,target=/tmp/wheels \
Expand Down
Loading
Loading