-
Couldn't load subscription status.
- Fork 43
feat: Support multiple pip index URLs in CustomTrainer #74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support multiple pip index URLs in CustomTrainer #74
Conversation
- Change pip_index_url to pip_index_urls: list[str] in CustomTrainer - Add DEFAULT_PIP_INDEX_URLS constant with environment variable support - Implement --index-url and --extra-index-url logic in pip install commands - Maintain backward compatibility with DEFAULT_PIP_INDEX_URL - Update tests to use new field names - Fix mutable default issue using default_factory Closes kubeflow#72
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this contribution @wassimbensalem!
kubeflow/trainer/types/types.py
Outdated
| packages_to_install (`Optional[List[str]]`): | ||
| A list of Python packages to install before running the function. | ||
| pip_index_url (`Optional[str]`): The PyPI URL from which to install Python packages. | ||
| pip_index_urls (`Optional[list[str]]`): The PyPI URLs from which to install Python packages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the comment that the first URL will be used as the default index, and the rest will be used as additional indexes.
kubeflow/trainer/types/types.py
Outdated
| func_args: Optional[Dict] = None | ||
| packages_to_install: Optional[list[str]] = None | ||
| pip_index_url: str = constants.DEFAULT_PIP_INDEX_URL | ||
| pip_index_urls: list[str] = field(default_factory=lambda: constants.DEFAULT_PIP_INDEX_URLS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you want to use field here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use simply the constants.DEFAULT_PIP_INDEX_URLS, in any instance that we create if someone modifies constants.DEFAULT_PIP_INDEX_ULRLS it will affect all created instance. Sothe field with default factory will assure that we have a fresh copy of it each time. This is optional, I can remove it as well !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, we probably need to do this to return new copy all the time object is created ?
pip_index_urls: list[str] = field(default_factory=lambda: list(constants.DEFAULT_PIP_INDEX_URLS))There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly
kubeflow/trainer/utils/utils.py
Outdated
| is_mpi: bool, | ||
| pip_index_urls: list[str] = constants.DEFAULT_PIP_INDEX_URLS, | ||
| is_mpi: bool = False, | ||
| include_pypi: bool = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you want to introduce this flag ?
I would let user explicitly sets the PyPI index in the list if they need it:
pip_index_urls=["pypi.custom.com/simple", "https://pypi.org/simple"]There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I Just was thinking if the user forgot to set the default pypi. For instance he chose pip_index_urls = ["repoA", "repoB"], but the package wasnt there, so we have a fallback solution which is the default pypi. This is also optional and I can remove it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can have it in the future, if we get any complains from users.
I would just define single constant that can be overridden by env:
DEFAULT_PIP_INDEX_URLS = os.getenv(
"DEFAULT_PIP_INDEX_URLS", "https://pypi.org/simple"
).split(",")There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @andreyvelich
kubeflow/trainer/utils/utils.py
Outdated
| --no-warn-script-location {} {} | ||
| """.format( | ||
| pip_index_url, | ||
| options_str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can directly set it here.
| options_str, | |
| " ".join(options), |
|
@wassimbensalem could you please sign the commits? |
|
/ok-to-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @wassimbensalem!
Could you also sign your commits and fix the code formatting -- https://github.com/kubeflow/sdk/blob/main/CONTRIBUTING.md#development-workflow?
| DEFAULT_PIP_INDEX_URLS = ( | ||
| os.getenv("DEFAULT_PIP_INDEX_URLS", DEFAULT_PYPI_URL).split(",") | ||
| if os.getenv("DEFAULT_PIP_INDEX_URLS") | ||
| else [DEFAULT_PYPI_URL] | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could simplify this to not call getenv twice
kubeflow/trainer/utils/utils.py
Outdated
| is_mpi: bool, | ||
| pip_index_urls: list[str] = constants.DEFAULT_PIP_INDEX_URLS, | ||
| is_mpi: bool = False, | ||
| include_pypi: bool = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @andreyvelich
- Simplify DEFAULT_PIP_INDEX_URLS logic - Remove include_pypi parameter and automatic PyPI fallback - Clean up function signatures
0f794c2 to
872285b
Compare
What this PR does
Implements support for multiple pip index URLs in CustomTrainer as requested in #72.
Changes Made
pip_index_urltopip_index_urls: list[str]in CustomTrainerDEFAULT_PIP_INDEX_URLSwith environment variable support--index-urland--extra-index-urlin pip install commandsDEFAULT_PIP_INDEX_URLfor existing codeUsage Examples
Environment Variable Support
Technical Details
field(default_factory=...)to prevent mutable default issues--index-url, subsequent URLs become--extra-index-urlCloses #72