Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 71 additions & 0 deletions .github/workflows/test-e2e.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
name: E2E Test

on:
pull_request

jobs:
e2e-test:
name: E2E Test
runs-on: oracle-vm-16cpu-64gb-x86-64

strategy:
fail-fast: false
matrix:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we start adding the Trainer version in the matrix right now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the sdk is compatible with v2 of the trainer and it's not yet released, I'm not sure what we would put here or how it would work. I think we should address it in a future PR, wdyt? Related maybe, we should finalise the version control and compatibility strategy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, we could either put master there right now, but addressing it in a future PR is perfectly fine too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added master ref

kubernetes-version: ["1.29.14", "1.30.0", "1.31.0", "1.32.3"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we drop support for Kubernetes v1.29 and add v1.33 ?

trainer-ref: ["master"]

steps:
- name: Checkout Kubeflow SDK repository
uses: actions/checkout@v4

- name: Checkout Kubeflow Trainer repository
uses: actions/checkout@v4
with:
repository: kubeflow/trainer
ref: ${{ matrix.trainer-ref }}
path: trainer

- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: 3.11

- name: Install Python dependencies
run: |
echo "Installing Papermill and Jupyter"
pip install papermill==2.6.0 jupyter==1.1.1 ipykernel==6.29.5
echo "Installing Kubeflow SDK locally from ./python"
# This path (./python) is relative to the *main* repository checkout (kubeflow/sdk)
pip install ./python
working-directory: . # Ensure pip runs from the SDK repo root

- name: Setup cluster
run: |
cd ./trainer
make test-e2e-setup-cluster \
K8S_VERSION=${{ matrix.kubernetes-version }} \
working-directory: . # Execute make from the root of the SDK repo

- name: Run e2e test for example Notebooks
run: |
mkdir -p artifacts/notebooks # Create the output directory
cd ./trainer
# Execute make commands, passing notebook paths and output locations
make test-e2e-notebook \
NOTEBOOK_INPUT=./examples/pytorch/image-classification/mnist.ipynb \
NOTEBOOK_OUTPUT=../artifacts/notebooks/${{ matrix.kubernetes-version }}_mnist.ipynb \
PAPERMILL_TIMEOUT=900
make test-e2e-notebook \
NOTEBOOK_INPUT=./examples/pytorch/question-answering/fine-tune-distilbert.ipynb \
NOTEBOOK_OUTPUT=../artifacts/notebooks/${{ matrix.kubernetes-version }}_fine-tune-distilbert.ipynb \
PAPERMILL_TIMEOUT=900
working-directory: . # Execute make from the root of the SDK repo

- name: Upload Artifacts to GitHub
uses: actions/upload-artifact@v4
if: always() # Ensure artifacts are uploaded even if previous steps fail
with:
name: ${{ matrix.kubernetes-version }}
path: ./artifacts/notebooks/* # Path relative to the workspace root
retention-days: 1 #