Skip to content

Conversation

@wassimbensalem
Copy link
Contributor

What this PR does

Implements support for multiple pip index URLs in CustomTrainer as requested in #72.

Changes Made

• Types: Changed pip_index_url to pip_index_urls: list[str] in CustomTrainer
• Constants: Added DEFAULT_PIP_INDEX_URLS with environment variable support
• Utils: Implemented logic for --index-url and --extra-index-url in pip install commands
• Tests: Updated test cases to use new field names
• Backward Compatibility: Maintained DEFAULT_PIP_INDEX_URL for existing code

Usage Examples

Single URL (backward compatible)

trainer = CustomTrainer(
func=training_function,
pip_index_urls=["https://pypi.org/simple"]
)

Multiple URLs

trainer = CustomTrainer(
func=training_function,
pip_index_urls=[
"https://pypi.org/simple",
"https://private.repo.com/simple"
]
)

Environment Variable Support

export DEFAULT_PIP_INDEX_URLS="https://pypi.org/simple,https://private.repo.com/simple"

Technical Details

• Uses field(default_factory=...) to prevent mutable default issues
• First URL becomes --index-url, subsequent URLs become --extra-index-url
• Automatically includes PyPI if not in the provided URLs
• Follows the same pattern as KFP for consistency

Closes #72

@wassimbensalem
Copy link
Contributor Author

@andreyvelich @kramaranya Sorry I had to create a new PR because I had some issues with signing the commits ( I am new to this please bare with me :) ). I hope this PR includes all the feedback. If anything is missing let me know,

@wassimbensalem
Copy link
Contributor Author

/assign

@kramaranya
Copy link
Contributor

/ok-to-test

Copy link
Contributor

@kramaranya kramaranya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @wassimbensalem!
Could you please sign your commits? You can follow this guide https://github.com/kubeflow/community/tree/master/dco-signoff-hook#signing-off-commits

# Handle environment variable for multiple URLs (comma-separated)
DEFAULT_PIP_INDEX_URLS = os.getenv("DEFAULT_PIP_INDEX_URLS", DEFAULT_PYPI_URL).split(",")
# Keep backward compatibility
DEFAULT_PIP_INDEX_URL = DEFAULT_PYPI_URL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this, since it's not exported anywhere?

@kramaranya
Copy link
Contributor

@wassimbensalem you'll also need to update unit tests

return models.TrainerV1alpha1Trainer(
command=[
"bash",
"-c",
'\nif ! [ -x "$(command -v pip)" ]; then\n python -m ensurepip '
"|| python -m ensurepip --user || apt-get install python-pip"
"\nfi\n\nPIP_DISABLE_PIP_VERSION_CHECK=1 python -m pip install --quiet"
" --no-warn-script-location --index-url https://pypi.org/simple "
"torch numpy \n\nread -r -d '' SCRIPT << EOM\n\nfunc=lambda: "
'print("Hello World"),\n\n<lambda>('
"{'learning_rate': 0.001, 'batch_size': 32})\n\nEOM\nprintf \"%s\" "
'"$SCRIPT" > "backend_test.py"\ntorchrun "backend_test.py"',

And it would be great to add a new test case for multiple pip index urls!

@wassimbensalem wassimbensalem force-pushed the feature/support-multiple-pip-index-urls-clean branch 2 times, most recently from 4e08634 to 00daf6c Compare August 31, 2025 12:42
@google-oss-prow google-oss-prow bot added size/L and removed size/M labels Aug 31, 2025
@wassimbensalem
Copy link
Contributor Author

@kramaranya Is there anything still missing?

@andreyvelich
Copy link
Member

@wassimbensalem Please can you rebase your PR ?

- Add pip_index_urls parameter to CustomTrainer (list of URLs)
- Update constants to support multiple URLs with environment variable
- Modify utils to handle multiple index URLs in pip installation
- Update tests to use new parameter
- Fix mutable default issue with dataclasses.field(default_factory=...)

Closes kubeflow#72

Signed-off-by: wassimbensalem <bswassim@gmail.com>
Signed-off-by: wassimbensalem <bswassim@gmail.com>
Signed-off-by: wassimbensalem <bswassim@gmail.com>
…ation

Signed-off-by: wassimbensalem <bswassim@gmail.com>
Signed-off-by: wassimbensalem <bswassim@gmail.com>
Signed-off-by: wassimbensalem <bswassim@gmail.com>
@wassimbensalem wassimbensalem force-pushed the feature/support-multiple-pip-index-urls-clean branch from 4aab345 to 5eb42b1 Compare September 1, 2025 20:05
Comment on lines 231 to 232
if packages_to_install is None:
packages_to_install = ["torch", "numpy"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just explicitly set the packages from the input args here.

Suggested change
if packages_to_install is None:
packages_to_install = ["torch", "numpy"]

Comment on lines 1127 to 1142

def test_get_script_for_python_packages_multiple_urls():
"""Test get_script_for_python_packages with multiple pip index URLs."""
from kubeflow.trainer.utils import utils

# Test with multiple pip index URLs
packages = ["torch", "numpy", "custom-package"]
pip_index_urls = [
"https://pypi.org/simple",
"https://private.repo.com/simple",
"https://internal.company.com/simple"
]

script = utils.get_script_for_python_packages(
packages_to_install=packages,
pip_index_urls=pip_index_urls,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move those test cases to the utils_test.py, and refactor them similar to other tests, e.g.

@pytest.mark.parametrize(
    "test_case",
    [
        TestCase(
            name="....",
            config={....},
            expected_output={....}
        ),
    ],
)

expected_error=ValueError,
),
TestCase(
name="valid flow with multiple pip index URLs",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this test case, since we verify it in the utils_test.py.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wassimbensalem Did you address this comment ?


# The default PIP index URL to download Python packages.
DEFAULT_PIP_INDEX_URL = os.getenv("DEFAULT_PIP_INDEX_URL", "https://pypi.org/simple")
# Handle environment variable for multiple URLs (comma-separated)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Handle environment variable for multiple URLs (comma-separated)
# Handle environment variable for multiple URLs (comma-separated).
# The first URL will be the index-url, and remaining ones are extra-index-urls.

Comment on lines 258 to 259
pip_index_urls: list[str] = constants.DEFAULT_PIP_INDEX_URLS,
is_mpi: bool = False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not set the defaults here, since it should be controlled by the public APIs.

Suggested change
pip_index_urls: list[str] = constants.DEFAULT_PIP_INDEX_URLS,
is_mpi: bool = False,
pip_index_urls: list[str],
is_mpi: bool,

import textwrap
import threading
from typing import Any, Callable, Optional
from typing import Any, Callable, Dict, Optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from typing import Any, Callable, Dict, Optional
from typing import Any, Callable, Optional

train_func: Callable,
train_func_parameters: Optional[dict[str, Any]],
pip_index_url: str,
train_func_parameters: Optional[Dict[str, Any]],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
train_func_parameters: Optional[Dict[str, Any]],
train_func_parameters: Optional[dict[str, Any]],

train_func_parameters: Optional[dict[str, Any]],
pip_index_url: str,
train_func_parameters: Optional[Dict[str, Any]],
pip_index_urls: list[str] = constants.DEFAULT_PIP_INDEX_URLS,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment.

Suggested change
pip_index_urls: list[str] = constants.DEFAULT_PIP_INDEX_URLS,
pip_index_urls: list[str],

@coveralls
Copy link

coveralls commented Sep 1, 2025

Pull Request Test Coverage Report for Build 17472441226

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 4 of 5 (80.0%) changed or added relevant lines in 1 file are covered.
  • 73 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+5.2%) to 70.192%

Changes Missing Coverage Covered Lines Changed/Added Lines %
kubeflow/trainer/utils/utils.py 4 5 80.0%
Files with Coverage Reduction New Missed Lines %
kubeflow/trainer/backends/kubernetes/backend.py 17 78.1%
kubeflow/trainer/utils/utils.py 56 62.14%
Totals Coverage Status
Change from base Build 17382047367: 5.2%
Covered Lines: 292
Relevant Lines: 416

💛 - Coveralls

@kramaranya
Copy link
Contributor

/milestone v0.1

@google-oss-prow google-oss-prow bot added this to the v0.1 milestone Sep 1, 2025
…d format

Signed-off-by: wassimbensalem <bswassim@gmail.com>
expected_error=ValueError,
),
TestCase(
name="valid flow with multiple pip index URLs",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wassimbensalem Did you address this comment ?

Comment on lines 223 to 230
pip_index_urls: Optional[list[str]] = None,
packages_to_install: list[str] = ["torch", "numpy"],
) -> models.TrainerV1alpha1Trainer:
"""
Get the custom trainer for the TrainJob.
"""
if pip_index_urls is None:
pip_index_urls = constants.DEFAULT_PIP_INDEX_URLS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pip_index_urls: Optional[list[str]] = None,
packages_to_install: list[str] = ["torch", "numpy"],
) -> models.TrainerV1alpha1Trainer:
"""
Get the custom trainer for the TrainJob.
"""
if pip_index_urls is None:
pip_index_urls = constants.DEFAULT_PIP_INDEX_URLS
pip_index_urls: Optional[list[str]] = constants.DEFAULT_PIP_INDEX_URLS,
packages_to_install: list[str] = ["torch", "numpy"],
) -> models.TrainerV1alpha1Trainer:
"""
Get the custom trainer for the TrainJob.
"""

Comment on lines 122 to 131
for expected_text in test_case.expected_output["contains"]:
assert expected_text in script, (
f"Expected '{expected_text}' to be in script"
)

# Verify the script does not contain unexpected elements
for unexpected_text in test_case.expected_output["not_contains"]:
assert unexpected_text not in script, (
f"Expected '{unexpected_text}' to NOT be in script"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having contains and not_contains check, can you assert the entire script that get_script_for_python_packages returns?
e.g.

assert test_case.expected_output == script

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andreyvelich All comments should be addressed now. Please let me know if anything is missing.

…ontains/not_contains checks

- Changed TestCase expected_output from Dict[str, Any] to str
- Replaced contains/not_contains logic with direct string equality
- Updated all test cases to specify complete expected script output
- Tests now assert the entire script returned by get_script_for_python_packages

Signed-off-by: wassimbensalem <bswassim@gmail.com>
- Add support for multiple pip index URLs in CustomTrainer
- Update constants to handle comma-separated environment variables
- Add integration test for multiple pip index URLs flow
- Update utils to generate correct pip install commands with --index-url and --extra-index-url
- Ensure backward compatibility with single URL usage

Signed-off-by: wassimbensalem <bswassim@gmail.com>
- Remove redundant test case since comprehensive tests already exist in utils test
- Keep integration tests focused on core functionality rather than implementation details
- Maintain test coverage while reducing duplication

Signed-off-by: wassimbensalem <bswassim@gmail.com>
…backend"

This reverts commit 0ec003d.

Signed-off-by: wassimbensalem <bswassim@gmail.com>
Signed-off-by: wassimbensalem <bswassim@gmail.com>
Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this great contribution @wassimbensalem!
/lgtm
/assign @kubeflow/kubeflow-sdk-team

train_job_trainer=get_custom_trainer(),
train_job_trainer=get_custom_trainer(
pip_index_urls=constants.DEFAULT_PIP_INDEX_URLS,
packages_to_install=["torch", "numpy"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit: could be other packages since those are already in the runtime image.

@astefanutti
Copy link
Contributor

/lgtm

Thanks @wassimbensalem!

Copy link
Contributor

@kramaranya kramaranya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this contribution @wassimbensalem!
/lgtm

@astefanutti
Copy link
Contributor

/approve

Great work!

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: astefanutti

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit 427b35d into kubeflow:main Sep 5, 2025
10 checks passed
accorvin pushed a commit to opendatahub-io/kubeflow-sdk that referenced this pull request Oct 8, 2025
* feat: Support multiple pip index URLs in CustomTrainer

- Add pip_index_urls parameter to CustomTrainer (list of URLs)
- Update constants to support multiple URLs with environment variable
- Modify utils to handle multiple index URLs in pip installation
- Update tests to use new parameter
- Fix mutable default issue with dataclasses.field(default_factory=...)

Closes kubeflow#72

Signed-off-by: wassimbensalem <bswassim@gmail.com>

* test: add comprehensive test cases for multiple pip index URLs

Signed-off-by: wassimbensalem <bswassim@gmail.com>

* refactor: simplify constants for pip index URLs

Signed-off-by: wassimbensalem <bswassim@gmail.com>

* fix: remove Optional from pip_index_urls docstring to match implementation

Signed-off-by: wassimbensalem <bswassim@gmail.com>

* test: adjust unit test for get_custom_trainer

Signed-off-by: wassimbensalem <bswassim@gmail.com>

* fix: add missing Dict import in utils.py

Signed-off-by: wassimbensalem <bswassim@gmail.com>

* refactor: move test cases to utils_test.py and convert to parametrized format

Signed-off-by: wassimbensalem <bswassim@gmail.com>

* refactor: update utils test to use exact string matching instead of contains/not_contains checks

- Changed TestCase expected_output from Dict[str, Any] to str
- Replaced contains/not_contains logic with direct string equality
- Updated all test cases to specify complete expected script output
- Tests now assert the entire script returned by get_script_for_python_packages

Signed-off-by: wassimbensalem <bswassim@gmail.com>

* feat: add multiple pip index URLs support with comprehensive tests

- Add support for multiple pip index URLs in CustomTrainer
- Update constants to handle comma-separated environment variables
- Add integration test for multiple pip index URLs flow
- Update utils to generate correct pip install commands with --index-url and --extra-index-url
- Ensure backward compatibility with single URL usage

Signed-off-by: wassimbensalem <bswassim@gmail.com>

* refactor: remove redundant multiple pip index URLs test from backend

- Remove redundant test case since comprehensive tests already exist in utils test
- Keep integration tests focused on core functionality rather than implementation details
- Maintain test coverage while reducing duplication

Signed-off-by: wassimbensalem <bswassim@gmail.com>

* Revert "refactor: remove redundant multiple pip index URLs test from backend"

This reverts commit 0ec003d.

Signed-off-by: wassimbensalem <bswassim@gmail.com>

* refactor: remove duplicated test case from the backend.

Signed-off-by: wassimbensalem <bswassim@gmail.com>

---------

Signed-off-by: wassimbensalem <bswassim@gmail.com>
MStokluska pushed a commit to opendatahub-io/kubeflow-sdk that referenced this pull request Oct 15, 2025
* feat: Support multiple pip index URLs in CustomTrainer

- Add pip_index_urls parameter to CustomTrainer (list of URLs)
- Update constants to support multiple URLs with environment variable
- Modify utils to handle multiple index URLs in pip installation
- Update tests to use new parameter
- Fix mutable default issue with dataclasses.field(default_factory=...)

Closes kubeflow#72

Signed-off-by: wassimbensalem <bswassim@gmail.com>

* test: add comprehensive test cases for multiple pip index URLs

Signed-off-by: wassimbensalem <bswassim@gmail.com>

* refactor: simplify constants for pip index URLs

Signed-off-by: wassimbensalem <bswassim@gmail.com>

* fix: remove Optional from pip_index_urls docstring to match implementation

Signed-off-by: wassimbensalem <bswassim@gmail.com>

* test: adjust unit test for get_custom_trainer

Signed-off-by: wassimbensalem <bswassim@gmail.com>

* fix: add missing Dict import in utils.py

Signed-off-by: wassimbensalem <bswassim@gmail.com>

* refactor: move test cases to utils_test.py and convert to parametrized format

Signed-off-by: wassimbensalem <bswassim@gmail.com>

* refactor: update utils test to use exact string matching instead of contains/not_contains checks

- Changed TestCase expected_output from Dict[str, Any] to str
- Replaced contains/not_contains logic with direct string equality
- Updated all test cases to specify complete expected script output
- Tests now assert the entire script returned by get_script_for_python_packages

Signed-off-by: wassimbensalem <bswassim@gmail.com>

* feat: add multiple pip index URLs support with comprehensive tests

- Add support for multiple pip index URLs in CustomTrainer
- Update constants to handle comma-separated environment variables
- Add integration test for multiple pip index URLs flow
- Update utils to generate correct pip install commands with --index-url and --extra-index-url
- Ensure backward compatibility with single URL usage

Signed-off-by: wassimbensalem <bswassim@gmail.com>

* refactor: remove redundant multiple pip index URLs test from backend

- Remove redundant test case since comprehensive tests already exist in utils test
- Keep integration tests focused on core functionality rather than implementation details
- Maintain test coverage while reducing duplication

Signed-off-by: wassimbensalem <bswassim@gmail.com>

* Revert "refactor: remove redundant multiple pip index URLs test from backend"

This reverts commit 0ec003d.

Signed-off-by: wassimbensalem <bswassim@gmail.com>

* refactor: remove duplicated test case from the backend.

Signed-off-by: wassimbensalem <bswassim@gmail.com>

---------

Signed-off-by: wassimbensalem <bswassim@gmail.com>
MStokluska pushed a commit to opendatahub-io/kubeflow-sdk that referenced this pull request Oct 15, 2025
* feat: Support multiple pip index URLs in CustomTrainer

- Add pip_index_urls parameter to CustomTrainer (list of URLs)
- Update constants to support multiple URLs with environment variable
- Modify utils to handle multiple index URLs in pip installation
- Update tests to use new parameter
- Fix mutable default issue with dataclasses.field(default_factory=...)

Closes kubeflow#72

Signed-off-by: wassimbensalem <bswassim@gmail.com>

* test: add comprehensive test cases for multiple pip index URLs

Signed-off-by: wassimbensalem <bswassim@gmail.com>

* refactor: simplify constants for pip index URLs

Signed-off-by: wassimbensalem <bswassim@gmail.com>

* fix: remove Optional from pip_index_urls docstring to match implementation

Signed-off-by: wassimbensalem <bswassim@gmail.com>

* test: adjust unit test for get_custom_trainer

Signed-off-by: wassimbensalem <bswassim@gmail.com>

* fix: add missing Dict import in utils.py

Signed-off-by: wassimbensalem <bswassim@gmail.com>

* refactor: move test cases to utils_test.py and convert to parametrized format

Signed-off-by: wassimbensalem <bswassim@gmail.com>

* refactor: update utils test to use exact string matching instead of contains/not_contains checks

- Changed TestCase expected_output from Dict[str, Any] to str
- Replaced contains/not_contains logic with direct string equality
- Updated all test cases to specify complete expected script output
- Tests now assert the entire script returned by get_script_for_python_packages

Signed-off-by: wassimbensalem <bswassim@gmail.com>

* feat: add multiple pip index URLs support with comprehensive tests

- Add support for multiple pip index URLs in CustomTrainer
- Update constants to handle comma-separated environment variables
- Add integration test for multiple pip index URLs flow
- Update utils to generate correct pip install commands with --index-url and --extra-index-url
- Ensure backward compatibility with single URL usage

Signed-off-by: wassimbensalem <bswassim@gmail.com>

* refactor: remove redundant multiple pip index URLs test from backend

- Remove redundant test case since comprehensive tests already exist in utils test
- Keep integration tests focused on core functionality rather than implementation details
- Maintain test coverage while reducing duplication

Signed-off-by: wassimbensalem <bswassim@gmail.com>

* Revert "refactor: remove redundant multiple pip index URLs test from backend"

This reverts commit 0ec003d.

Signed-off-by: wassimbensalem <bswassim@gmail.com>

* refactor: remove duplicated test case from the backend.

Signed-off-by: wassimbensalem <bswassim@gmail.com>

---------

Signed-off-by: wassimbensalem <bswassim@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support Multiple Pip Index URLs

5 participants