Skip to content

Commit f2cdc77

Browse files
authored
Update Colab Base image to colab_20250219-060225_RC01 (#1475)
We are upgrading the base image to the latest release image by colab: colab_20250219-060225_RC01 Which includes the following upgrades: TF 2.18 Python 3.11 Cuda 12.5 This PR includes a handful of fixes to resolve conflicts related to these upgrade. Notably issues pertaining torch and cudnn. We also bumped lightgbm version as well We also included a fix to tune cli package conflict.
1 parent a0906f3 commit f2cdc77

File tree

4 files changed

+34
-31
lines changed

4 files changed

+34
-31
lines changed

Dockerfile.tmpl

Lines changed: 23 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -31,29 +31,41 @@ RUN uv pip uninstall --system google-cloud-bigquery-storage
3131
# b/394382016: sigstore (dependency of kagglehub) requires a prerelease packages, installing separate.
3232
RUN uv pip install --system --force-reinstall --prerelease=allow kagglehub[pandas-datasets,hf-datasets,signing]>=0.3.9
3333

34+
# b/408284143: google-cloud-automl 2.0.0 introduced incompatible API changes, need to pin to 1.0.1
35+
36+
# b/408284435: Keras 3.6 broke test_keras.py > test_train > keras.datasets.mnist.load_data()
37+
# See https://github.com/keras-team/keras/commit/dcefb139863505d166dd1325066f329b3033d45a
38+
# Colab base is on Keras 3.8, we have to install the package separately
39+
RUN uv pip install --system google-cloud-automl==1.0.1 google-cloud-aiplatform google-cloud-translate==3.12.1 \
40+
google-cloud-videointelligence google-cloud-vision google-genai "keras<3.6"
41+
3442
# uv cannot install this in requirements.txt without --no-build-isolation
3543
# to avoid affecting the larger build, we'll post-install it.
3644
RUN uv pip install --no-build-isolation --system "git+https://github.com/Kaggle/learntools"
3745

38-
# b/385161357 Latest Colab uses tf 2.17.1, but tf decision forests only has a version for 2.17.0.
39-
# Instead, we'll install tfdf with its deps and hope that 2.17.0 compat tfdf works with tf 2.17.1.
40-
RUN uv pip install --system --no-deps tensorflow-decision-forests==1.10.0 wurlitzer==3.1.1 ydf==0.9.0
46+
# b/408281617: Torch is adamant that it can not install cudnn 9.3.x, only 9.1.x, but Tensorflow can only support 9.3.x.
47+
# This conflict causes a number of package downgrades, which are handled in this command
48+
RUN uv pip install --system --force-reinstall --extra-index-url https://pypi.nvidia.com pynvjitlink-cu12 cuml-cu12==25.2.1 \
49+
nvidia-cudnn-cu12==9.3.0.75 scipy tsfresh
50+
RUN uv pip install --system --force-reinstall pynvjitlink-cu12==0.5.2
4151

4252
# b/385145217 Latest Colab lacks mkl numpy, install it.
4353
RUN uv pip install --system --force-reinstall -i https://pypi.anaconda.org/intel/simple numpy
4454

45-
# b/328788268 We install an incompatible pair of libs (shapely<2, libpysal==4.9.2) so we can't put this one in the requirements.txt
4655
# newer daal4py requires tbb>=2022, but libpysal is downgrading it for some reason
4756
RUN uv pip install --system "tbb>=2022" "libpysal==4.9.2"
4857

58+
# b/404590350: Ray and torchtune have conflicting tune cli, we will prioritize torchtune.
59+
RUN uv pip install --system --force-reinstall --no-deps torchtune
60+
4961
# Adding non-package dependencies:
5062

5163
ADD clean-layer.sh /tmp/clean-layer.sh
5264
ADD patches/nbconvert-extensions.tpl /opt/kaggle/nbconvert-extensions.tpl
5365
ADD patches/template_conf.json /opt/kaggle/conf.json
5466

55-
# /opt/conda/lib/python3.10/site-packages
56-
ARG PACKAGE_PATH=/usr/local/lib/python3.10/dist-packages
67+
# /opt/conda/lib/python3.11/site-packages
68+
ARG PACKAGE_PATH=/usr/local/lib/python3.11/dist-packages
5769

5870
# Install GPU-specific non-pip packages.
5971
{{ if eq .Accelerator "gpu" }}
@@ -108,6 +120,9 @@ RUN apt-get install -y libfreetype6-dev && \
108120
apt-get install -y libglib2.0-0 libxext6 libsm6 libxrender1 libfontconfig1 --fix-missing
109121

110122
# NLTK Project datasets
123+
# b/408298750: We currently reinstall the package, because we get the following error:
124+
# `AttributeError: module 'inspect' has no attribute 'formatargspec'. Did you mean: 'formatargvalues'?`
125+
RUN uv pip install --system --force-reinstall "nltk>=3.9.1"
111126
RUN mkdir -p /usr/share/nltk_data && \
112127
# NLTK Downloader no longer continues smoothly after an error, so we explicitly list
113128
# the corpuses that work
@@ -120,7 +135,7 @@ RUN mkdir -p /usr/share/nltk_data && \
120135
masc_tagged maxent_ne_chunker maxent_treebank_pos_tagger moses_sample movie_reviews \
121136
mte_teip5 names nps_chat omw opinion_lexicon paradigms \
122137
pil pl196x porter_test ppattach problem_reports product_reviews_1 product_reviews_2 propbank \
123-
pros_cons ptb punkt qc reuters rslp rte sample_grammars semcor senseval sentence_polarity \
138+
pros_cons ptb punkt punkt_tab qc reuters rslp rte sample_grammars semcor senseval sentence_polarity \
124139
sentiwordnet shakespeare sinica_treebank smultron snowball_data spanish_grammars \
125140
state_union stopwords subjectivity swadesh switchboard tagsets timit toolbox treebank \
126141
twitter_samples udhr2 udhr unicode_samples universal_tagset universal_treebanks_v20 \
@@ -198,7 +213,7 @@ ADD patches/kaggle_gcp.py \
198213

199214
# Figure out why this is in a different place?
200215
# Found by doing a export PYTHONVERBOSE=1 and then running python and checking for where it looked for it.
201-
ADD patches/sitecustomize.py /usr/lib/python3.10/sitecustomize.py
216+
ADD patches/sitecustomize.py /usr/lib/python3.11/sitecustomize.py
202217

203218
ARG GIT_COMMIT=unknown \
204219
BUILD_DATE=unknown

config.txt

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
BASE_IMAGE=us-docker.pkg.dev/colab-images/public/runtime
2-
BASE_IMAGE_TAG=release-colab_20241217-060132_RC00
3-
LIGHTGBM_VERSION=4.5.0
2+
BASE_IMAGE_TAG=release-colab_20250219-060225_RC01
3+
LIGHTGBM_VERSION=4.6.0
44
CUDA_MAJOR_VERSION=12
5-
CUDA_MINOR_VERSION=2
5+
CUDA_MINOR_VERSION=5

kaggle_requirements.txt

Lines changed: 3 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
11
# Please keep this in alphabetical order
2-
--extra-index-url https://pypi.nvidia.com
32
Altair>=5.4.0
43
Babel
54
Boruta
@@ -24,7 +23,6 @@ catboost
2423
category-encoders
2524
cesium
2625
comm
27-
cuml-cu12
2826
cytoolz
2927
dask-expr
3028
# Older versions of datasets fail with "Loading a dataset cached in a LocalFileSystem is not supported"
@@ -46,14 +44,6 @@ fuzzywuzzy
4644
geojson
4745
# geopandas > v0.14.4 breaks learn tools
4846
geopandas==v0.14.4
49-
google-cloud-aiplatform
50-
# google-cloud-automl 2.0.0 introduced incompatible API changes, need to pin to 1.0.1
51-
google-cloud-automl==1.0.1
52-
# b/315753846: Unpin translate package.
53-
google-cloud-translate==3.12.1
54-
google-cloud-videointelligence
55-
google-cloud-vision
56-
google-genai
5747
gpxpy
5848
h2o
5949
haversine
@@ -70,15 +60,11 @@ jupyter_server==2.12.5
7060
jupyterlab
7161
jupyterlab-lsp
7262
kaggle-environments
73-
# Keras 3.6 broke test_keras.py > test_train > keras.datasets.mnist.load_data():
74-
# See https://github.com/keras-team/keras/commit/dcefb139863505d166dd1325066f329b3033d45a
75-
keras<3.6
7663
keras-cv
7764
keras-nlp
7865
keras-tuner
7966
kornia
8067
langid
81-
leven
8268
# b/328788268: libpysal 4.10 seems to fail with "module 'shapely' has no attribute 'Geometry'. Did you mean: 'geometry'"
8369
libpysal<=4.9.2
8470
lime
@@ -142,12 +128,13 @@ squarify
142128
tensorflow-cloud
143129
tensorflow-io
144130
tensorflow-text
145-
# b/385161357: tf 2.17.1 does not have matching tensorflow_decision_forests release
146-
# tensorflow_decision_forests
131+
tensorflow_decision_forests
147132
timm
133+
torchao
148134
torchinfo
149135
torchmetrics
150136
torchtune
137+
triton
151138
tsfresh
152139
vtk
153140
wandb

tests/test_torchtune.py

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,10 @@
11
import unittest
2-
32
import subprocess
43

54
class TestTorchtune(unittest.TestCase):
65
def test_help(self):
7-
ret_code = subprocess.run(["tune", "--help"])
8-
self.assertEqual(0, ret_code.returncode)
9-
self.assertIsNone(ret_code.stderr)
6+
result = subprocess.run(["tune", "--help"], stdout=subprocess.PIPE)
7+
8+
self.assertEqual(0, result.returncode)
9+
self.assertIsNone(result.stderr)
10+
self.assertIn("Download a model from the Hugging Face Hub or Kaggle Model Hub.", result.stdout.decode("utf-8"))

0 commit comments

Comments
 (0)