-
Couldn't load subscription status.
- Fork 43
feat: Support multiple pip index URLs in CustomTrainer #79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support multiple pip index URLs in CustomTrainer #79
Conversation
|
@andreyvelich @kramaranya Sorry I had to create a new PR because I had some issues with signing the commits ( I am new to this please bare with me :) ). I hope this PR includes all the feedback. If anything is missing let me know, |
|
/assign |
|
/ok-to-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @wassimbensalem!
Could you please sign your commits? You can follow this guide https://github.com/kubeflow/community/tree/master/dco-signoff-hook#signing-off-commits
| # Handle environment variable for multiple URLs (comma-separated) | ||
| DEFAULT_PIP_INDEX_URLS = os.getenv("DEFAULT_PIP_INDEX_URLS", DEFAULT_PYPI_URL).split(",") | ||
| # Keep backward compatibility | ||
| DEFAULT_PIP_INDEX_URL = DEFAULT_PYPI_URL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this, since it's not exported anywhere?
|
@wassimbensalem you'll also need to update unit tests sdk/kubeflow/trainer/backends/kubernetes/backend_test.py Lines 228 to 239 in 06d77f9
And it would be great to add a new test case for multiple pip index urls! |
4e08634 to
00daf6c
Compare
|
@kramaranya Is there anything still missing? |
|
@wassimbensalem Please can you rebase your PR ? |
- Add pip_index_urls parameter to CustomTrainer (list of URLs) - Update constants to support multiple URLs with environment variable - Modify utils to handle multiple index URLs in pip installation - Update tests to use new parameter - Fix mutable default issue with dataclasses.field(default_factory=...) Closes kubeflow#72 Signed-off-by: wassimbensalem <bswassim@gmail.com>
Signed-off-by: wassimbensalem <bswassim@gmail.com>
Signed-off-by: wassimbensalem <bswassim@gmail.com>
…ation Signed-off-by: wassimbensalem <bswassim@gmail.com>
Signed-off-by: wassimbensalem <bswassim@gmail.com>
Signed-off-by: wassimbensalem <bswassim@gmail.com>
4aab345 to
5eb42b1
Compare
| if packages_to_install is None: | ||
| packages_to_install = ["torch", "numpy"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just explicitly set the packages from the input args here.
| if packages_to_install is None: | |
| packages_to_install = ["torch", "numpy"] |
|
|
||
| def test_get_script_for_python_packages_multiple_urls(): | ||
| """Test get_script_for_python_packages with multiple pip index URLs.""" | ||
| from kubeflow.trainer.utils import utils | ||
|
|
||
| # Test with multiple pip index URLs | ||
| packages = ["torch", "numpy", "custom-package"] | ||
| pip_index_urls = [ | ||
| "https://pypi.org/simple", | ||
| "https://private.repo.com/simple", | ||
| "https://internal.company.com/simple" | ||
| ] | ||
|
|
||
| script = utils.get_script_for_python_packages( | ||
| packages_to_install=packages, | ||
| pip_index_urls=pip_index_urls, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move those test cases to the utils_test.py, and refactor them similar to other tests, e.g.
@pytest.mark.parametrize(
"test_case",
[
TestCase(
name="....",
config={....},
expected_output={....}
),
],
)| expected_error=ValueError, | ||
| ), | ||
| TestCase( | ||
| name="valid flow with multiple pip index URLs", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove this test case, since we verify it in the utils_test.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wassimbensalem Did you address this comment ?
|
|
||
| # The default PIP index URL to download Python packages. | ||
| DEFAULT_PIP_INDEX_URL = os.getenv("DEFAULT_PIP_INDEX_URL", "https://pypi.org/simple") | ||
| # Handle environment variable for multiple URLs (comma-separated) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Handle environment variable for multiple URLs (comma-separated) | |
| # Handle environment variable for multiple URLs (comma-separated). | |
| # The first URL will be the index-url, and remaining ones are extra-index-urls. |
kubeflow/trainer/utils/utils.py
Outdated
| pip_index_urls: list[str] = constants.DEFAULT_PIP_INDEX_URLS, | ||
| is_mpi: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not set the defaults here, since it should be controlled by the public APIs.
| pip_index_urls: list[str] = constants.DEFAULT_PIP_INDEX_URLS, | |
| is_mpi: bool = False, | |
| pip_index_urls: list[str], | |
| is_mpi: bool, |
kubeflow/trainer/utils/utils.py
Outdated
| import textwrap | ||
| import threading | ||
| from typing import Any, Callable, Optional | ||
| from typing import Any, Callable, Dict, Optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| from typing import Any, Callable, Dict, Optional | |
| from typing import Any, Callable, Optional |
kubeflow/trainer/utils/utils.py
Outdated
| train_func: Callable, | ||
| train_func_parameters: Optional[dict[str, Any]], | ||
| pip_index_url: str, | ||
| train_func_parameters: Optional[Dict[str, Any]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| train_func_parameters: Optional[Dict[str, Any]], | |
| train_func_parameters: Optional[dict[str, Any]], |
kubeflow/trainer/utils/utils.py
Outdated
| train_func_parameters: Optional[dict[str, Any]], | ||
| pip_index_url: str, | ||
| train_func_parameters: Optional[Dict[str, Any]], | ||
| pip_index_urls: list[str] = constants.DEFAULT_PIP_INDEX_URLS, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment.
| pip_index_urls: list[str] = constants.DEFAULT_PIP_INDEX_URLS, | |
| pip_index_urls: list[str], |
Pull Request Test Coverage Report for Build 17472441226Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
|
/milestone v0.1 |
…d format Signed-off-by: wassimbensalem <bswassim@gmail.com>
| expected_error=ValueError, | ||
| ), | ||
| TestCase( | ||
| name="valid flow with multiple pip index URLs", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wassimbensalem Did you address this comment ?
| pip_index_urls: Optional[list[str]] = None, | ||
| packages_to_install: list[str] = ["torch", "numpy"], | ||
| ) -> models.TrainerV1alpha1Trainer: | ||
| """ | ||
| Get the custom trainer for the TrainJob. | ||
| """ | ||
| if pip_index_urls is None: | ||
| pip_index_urls = constants.DEFAULT_PIP_INDEX_URLS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| pip_index_urls: Optional[list[str]] = None, | |
| packages_to_install: list[str] = ["torch", "numpy"], | |
| ) -> models.TrainerV1alpha1Trainer: | |
| """ | |
| Get the custom trainer for the TrainJob. | |
| """ | |
| if pip_index_urls is None: | |
| pip_index_urls = constants.DEFAULT_PIP_INDEX_URLS | |
| pip_index_urls: Optional[list[str]] = constants.DEFAULT_PIP_INDEX_URLS, | |
| packages_to_install: list[str] = ["torch", "numpy"], | |
| ) -> models.TrainerV1alpha1Trainer: | |
| """ | |
| Get the custom trainer for the TrainJob. | |
| """ |
kubeflow/trainer/utils/utils_test.py
Outdated
| for expected_text in test_case.expected_output["contains"]: | ||
| assert expected_text in script, ( | ||
| f"Expected '{expected_text}' to be in script" | ||
| ) | ||
|
|
||
| # Verify the script does not contain unexpected elements | ||
| for unexpected_text in test_case.expected_output["not_contains"]: | ||
| assert unexpected_text not in script, ( | ||
| f"Expected '{unexpected_text}' to NOT be in script" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of having contains and not_contains check, can you assert the entire script that get_script_for_python_packages returns?
e.g.
assert test_case.expected_output == scriptThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andreyvelich All comments should be addressed now. Please let me know if anything is missing.
…ontains/not_contains checks - Changed TestCase expected_output from Dict[str, Any] to str - Replaced contains/not_contains logic with direct string equality - Updated all test cases to specify complete expected script output - Tests now assert the entire script returned by get_script_for_python_packages Signed-off-by: wassimbensalem <bswassim@gmail.com>
- Add support for multiple pip index URLs in CustomTrainer - Update constants to handle comma-separated environment variables - Add integration test for multiple pip index URLs flow - Update utils to generate correct pip install commands with --index-url and --extra-index-url - Ensure backward compatibility with single URL usage Signed-off-by: wassimbensalem <bswassim@gmail.com>
- Remove redundant test case since comprehensive tests already exist in utils test - Keep integration tests focused on core functionality rather than implementation details - Maintain test coverage while reducing duplication Signed-off-by: wassimbensalem <bswassim@gmail.com>
…backend" This reverts commit 0ec003d. Signed-off-by: wassimbensalem <bswassim@gmail.com>
Signed-off-by: wassimbensalem <bswassim@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this great contribution @wassimbensalem!
/lgtm
/assign @kubeflow/kubeflow-sdk-team
| train_job_trainer=get_custom_trainer(), | ||
| train_job_trainer=get_custom_trainer( | ||
| pip_index_urls=constants.DEFAULT_PIP_INDEX_URLS, | ||
| packages_to_install=["torch", "numpy"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit: could be other packages since those are already in the runtime image.
|
/lgtm Thanks @wassimbensalem! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this contribution @wassimbensalem!
/lgtm
|
/approve Great work! |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: astefanutti The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* feat: Support multiple pip index URLs in CustomTrainer - Add pip_index_urls parameter to CustomTrainer (list of URLs) - Update constants to support multiple URLs with environment variable - Modify utils to handle multiple index URLs in pip installation - Update tests to use new parameter - Fix mutable default issue with dataclasses.field(default_factory=...) Closes kubeflow#72 Signed-off-by: wassimbensalem <bswassim@gmail.com> * test: add comprehensive test cases for multiple pip index URLs Signed-off-by: wassimbensalem <bswassim@gmail.com> * refactor: simplify constants for pip index URLs Signed-off-by: wassimbensalem <bswassim@gmail.com> * fix: remove Optional from pip_index_urls docstring to match implementation Signed-off-by: wassimbensalem <bswassim@gmail.com> * test: adjust unit test for get_custom_trainer Signed-off-by: wassimbensalem <bswassim@gmail.com> * fix: add missing Dict import in utils.py Signed-off-by: wassimbensalem <bswassim@gmail.com> * refactor: move test cases to utils_test.py and convert to parametrized format Signed-off-by: wassimbensalem <bswassim@gmail.com> * refactor: update utils test to use exact string matching instead of contains/not_contains checks - Changed TestCase expected_output from Dict[str, Any] to str - Replaced contains/not_contains logic with direct string equality - Updated all test cases to specify complete expected script output - Tests now assert the entire script returned by get_script_for_python_packages Signed-off-by: wassimbensalem <bswassim@gmail.com> * feat: add multiple pip index URLs support with comprehensive tests - Add support for multiple pip index URLs in CustomTrainer - Update constants to handle comma-separated environment variables - Add integration test for multiple pip index URLs flow - Update utils to generate correct pip install commands with --index-url and --extra-index-url - Ensure backward compatibility with single URL usage Signed-off-by: wassimbensalem <bswassim@gmail.com> * refactor: remove redundant multiple pip index URLs test from backend - Remove redundant test case since comprehensive tests already exist in utils test - Keep integration tests focused on core functionality rather than implementation details - Maintain test coverage while reducing duplication Signed-off-by: wassimbensalem <bswassim@gmail.com> * Revert "refactor: remove redundant multiple pip index URLs test from backend" This reverts commit 0ec003d. Signed-off-by: wassimbensalem <bswassim@gmail.com> * refactor: remove duplicated test case from the backend. Signed-off-by: wassimbensalem <bswassim@gmail.com> --------- Signed-off-by: wassimbensalem <bswassim@gmail.com>
* feat: Support multiple pip index URLs in CustomTrainer - Add pip_index_urls parameter to CustomTrainer (list of URLs) - Update constants to support multiple URLs with environment variable - Modify utils to handle multiple index URLs in pip installation - Update tests to use new parameter - Fix mutable default issue with dataclasses.field(default_factory=...) Closes kubeflow#72 Signed-off-by: wassimbensalem <bswassim@gmail.com> * test: add comprehensive test cases for multiple pip index URLs Signed-off-by: wassimbensalem <bswassim@gmail.com> * refactor: simplify constants for pip index URLs Signed-off-by: wassimbensalem <bswassim@gmail.com> * fix: remove Optional from pip_index_urls docstring to match implementation Signed-off-by: wassimbensalem <bswassim@gmail.com> * test: adjust unit test for get_custom_trainer Signed-off-by: wassimbensalem <bswassim@gmail.com> * fix: add missing Dict import in utils.py Signed-off-by: wassimbensalem <bswassim@gmail.com> * refactor: move test cases to utils_test.py and convert to parametrized format Signed-off-by: wassimbensalem <bswassim@gmail.com> * refactor: update utils test to use exact string matching instead of contains/not_contains checks - Changed TestCase expected_output from Dict[str, Any] to str - Replaced contains/not_contains logic with direct string equality - Updated all test cases to specify complete expected script output - Tests now assert the entire script returned by get_script_for_python_packages Signed-off-by: wassimbensalem <bswassim@gmail.com> * feat: add multiple pip index URLs support with comprehensive tests - Add support for multiple pip index URLs in CustomTrainer - Update constants to handle comma-separated environment variables - Add integration test for multiple pip index URLs flow - Update utils to generate correct pip install commands with --index-url and --extra-index-url - Ensure backward compatibility with single URL usage Signed-off-by: wassimbensalem <bswassim@gmail.com> * refactor: remove redundant multiple pip index URLs test from backend - Remove redundant test case since comprehensive tests already exist in utils test - Keep integration tests focused on core functionality rather than implementation details - Maintain test coverage while reducing duplication Signed-off-by: wassimbensalem <bswassim@gmail.com> * Revert "refactor: remove redundant multiple pip index URLs test from backend" This reverts commit 0ec003d. Signed-off-by: wassimbensalem <bswassim@gmail.com> * refactor: remove duplicated test case from the backend. Signed-off-by: wassimbensalem <bswassim@gmail.com> --------- Signed-off-by: wassimbensalem <bswassim@gmail.com>
* feat: Support multiple pip index URLs in CustomTrainer - Add pip_index_urls parameter to CustomTrainer (list of URLs) - Update constants to support multiple URLs with environment variable - Modify utils to handle multiple index URLs in pip installation - Update tests to use new parameter - Fix mutable default issue with dataclasses.field(default_factory=...) Closes kubeflow#72 Signed-off-by: wassimbensalem <bswassim@gmail.com> * test: add comprehensive test cases for multiple pip index URLs Signed-off-by: wassimbensalem <bswassim@gmail.com> * refactor: simplify constants for pip index URLs Signed-off-by: wassimbensalem <bswassim@gmail.com> * fix: remove Optional from pip_index_urls docstring to match implementation Signed-off-by: wassimbensalem <bswassim@gmail.com> * test: adjust unit test for get_custom_trainer Signed-off-by: wassimbensalem <bswassim@gmail.com> * fix: add missing Dict import in utils.py Signed-off-by: wassimbensalem <bswassim@gmail.com> * refactor: move test cases to utils_test.py and convert to parametrized format Signed-off-by: wassimbensalem <bswassim@gmail.com> * refactor: update utils test to use exact string matching instead of contains/not_contains checks - Changed TestCase expected_output from Dict[str, Any] to str - Replaced contains/not_contains logic with direct string equality - Updated all test cases to specify complete expected script output - Tests now assert the entire script returned by get_script_for_python_packages Signed-off-by: wassimbensalem <bswassim@gmail.com> * feat: add multiple pip index URLs support with comprehensive tests - Add support for multiple pip index URLs in CustomTrainer - Update constants to handle comma-separated environment variables - Add integration test for multiple pip index URLs flow - Update utils to generate correct pip install commands with --index-url and --extra-index-url - Ensure backward compatibility with single URL usage Signed-off-by: wassimbensalem <bswassim@gmail.com> * refactor: remove redundant multiple pip index URLs test from backend - Remove redundant test case since comprehensive tests already exist in utils test - Keep integration tests focused on core functionality rather than implementation details - Maintain test coverage while reducing duplication Signed-off-by: wassimbensalem <bswassim@gmail.com> * Revert "refactor: remove redundant multiple pip index URLs test from backend" This reverts commit 0ec003d. Signed-off-by: wassimbensalem <bswassim@gmail.com> * refactor: remove duplicated test case from the backend. Signed-off-by: wassimbensalem <bswassim@gmail.com> --------- Signed-off-by: wassimbensalem <bswassim@gmail.com>
What this PR does
Implements support for multiple pip index URLs in CustomTrainer as requested in #72.
Changes Made
• Types: Changed pip_index_url to pip_index_urls: list[str] in CustomTrainer
• Constants: Added DEFAULT_PIP_INDEX_URLS with environment variable support
• Utils: Implemented logic for --index-url and --extra-index-url in pip install commands
• Tests: Updated test cases to use new field names
• Backward Compatibility: Maintained DEFAULT_PIP_INDEX_URL for existing code
Usage Examples
Single URL (backward compatible)
trainer = CustomTrainer(
func=training_function,
pip_index_urls=["https://pypi.org/simple"]
)
Multiple URLs
trainer = CustomTrainer(
func=training_function,
pip_index_urls=[
"https://pypi.org/simple",
"https://private.repo.com/simple"
]
)
Environment Variable Support
export DEFAULT_PIP_INDEX_URLS="https://pypi.org/simple,https://private.repo.com/simple"
Technical Details
• Uses field(default_factory=...) to prevent mutable default issues
• First URL becomes --index-url, subsequent URLs become --extra-index-url
• Automatically includes PyPI if not in the provided URLs
• Follows the same pattern as KFP for consistency
Closes #72