Skip to content

Commit f6db354

Browse files
authored
Fix torch tune, keras, tensorflow tests (#1489)
Looks like torch tune changed the output of the --help command, this cause issues with our smoke tests. Keras, along with other package had issues with existing issues with cudnn downgrading due to torch requirements, we pinned relevant tests.
1 parent 987863d commit f6db354

File tree

3 files changed

+16
-8
lines changed

3 files changed

+16
-8
lines changed

Dockerfile.tmpl

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,10 @@ RUN uv pip install --no-build-isolation --system "git+https://github.com/Kaggle/
3535
# b/408281617: Torch is adamant that it can not install cudnn 9.3.x, only 9.1.x, but Tensorflow can only support 9.3.x.
3636
# This conflict causes a number of package downgrades, which are handled in this command
3737
RUN uv pip install --system --force-reinstall --extra-index-url https://pypi.nvidia.com "cuml-cu12==25.2.1" \
38-
"nvidia-cudnn-cu12==9.3.0.75"
38+
"nvidia-cudnn-cu12==9.3.0.75" "nvidia-cublas-cu12==12.5.3.2" "nvidia-cusolver-cu12==11.6.3.83" \
39+
"nvidia-cuda-cupti-cu12==12.5.82" "nvidia-cuda-nvrtc-cu12==12.5.82" "nvidia-cuda-runtime-cu12==12.5.82" \
40+
"nvidia-cufft-cu12==11.2.3.61" "nvidia-curand-cu12==10.3.6.82" "nvidia-cusparse-cu12==12.5.1.3" \
41+
"nvidia-nvjitlink-cu12==12.5.82"
3942
RUN uv pip install --system --force-reinstall "pynvjitlink-cu12==0.5.2"
4043

4144
# b/385145217 Latest Colab lacks mkl numpy, install it.
@@ -46,7 +49,7 @@ RUN uv pip install --system "tbb>=2022" "libpysal==4.9.2"
4649

4750
# b/404590350: Ray and torchtune have conflicting tune cli, we will prioritize torchtune.
4851
# b/415358158: Gensim removed from Colab image to upgrade scipy
49-
RUN uv pip install --system --force-reinstall --no-deps torchtune gensim
52+
RUN uv pip install --system --force-reinstall --no-deps torchtune gensim "scipy<=1.15.3"
5053

5154
# Adding non-package dependencies:
5255
ADD clean-layer.sh /tmp/clean-layer.sh

kaggle_requirements.txt

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -121,19 +121,18 @@ qtconsole
121121
ray
122122
rgf-python
123123
s3fs
124+
# b/302136621: Fix eli5 import for learntools
124125
scikit-learn==1.2.2
125126
# Scikit-learn accelerated library for x86
126127
scikit-learn-intelex>=2023.0.1
127128
scikit-multilearn
128129
scikit-optimize
129130
scikit-plot
130131
scikit-surprise
131-
# b/415358158: Gensim removed from Colab image to upgrade scipy to 1.14.1
132-
scipy==1.15.1
133132
# Also pinning seaborn for learntools
134133
seaborn==0.12.2
135134
git+https://github.com/facebookresearch/segment-anything.git
136-
# b/329869023 shap 0.45.0 breaks learntools
135+
# b/329869023: shap 0.45.0 breaks learntools
137136
shap==0.44.1
138137
squarify
139138
tensorflow-cloud

tests/test_torchtune.py

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,14 @@
33

44
class TestTorchtune(unittest.TestCase):
55
def test_help(self):
6-
result = subprocess.run(["tune", "--help"], stdout=subprocess.PIPE)
6+
result = subprocess.run(
7+
["tune", "--help"],
8+
capture_output=True,
9+
text=True
10+
)
711

812
self.assertEqual(0, result.returncode)
9-
self.assertIsNone(result.stderr)
10-
self.assertIn("Download a model from the Hugging Face Hub or Kaggle Model Hub.", result.stdout.decode("utf-8"))
13+
self.assertIn(
14+
"Download a model from the Hugging Face Hub or Kaggle",
15+
result.stdout
16+
)

0 commit comments

Comments
 (0)