Skip to content

Conversation

@wassimbensalem
Copy link
Contributor

What this PR does

Implements support for multiple pip index URLs in CustomTrainer as requested in #72.

Changes Made

  • Types: Changed pip_index_url to pip_index_urls: list[str] in CustomTrainer
  • Constants: Added DEFAULT_PIP_INDEX_URLS with environment variable support
  • Utils: Implemented logic for --index-url and --extra-index-url in pip install commands
  • Tests: Updated test cases to use new field names
  • Backward Compatibility: Maintained DEFAULT_PIP_INDEX_URL for existing code

Usage Examples

# Single URL (backward compatible)
trainer = CustomTrainer(
    func=training_function,
    pip_index_urls=["https://pypi.org/simple"]
)

# Multiple URLs
trainer = CustomTrainer(
    func=training_function,
    pip_index_urls=[
        "https://pypi.org/simple",
        "https://private.repo.com/simple"
    ]
)

Environment Variable Support

export DEFAULT_PIP_INDEX_URLS="https://pypi.org/simple,https://private.repo.com/simple"

Technical Details

  • Uses field(default_factory=...) to prevent mutable default issues
  • First URL becomes --index-url, subsequent URLs become --extra-index-url
  • Automatically includes PyPI if not in the provided URLs
  • Follows the same pattern as KFP for consistency

Closes #72

- Change pip_index_url to pip_index_urls: list[str] in CustomTrainer
- Add DEFAULT_PIP_INDEX_URLS constant with environment variable support
- Implement --index-url and --extra-index-url logic in pip install commands
- Maintain backward compatibility with DEFAULT_PIP_INDEX_URL
- Update tests to use new field names
- Fix mutable default issue using default_factory

Closes kubeflow#72
@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign electronic-waste for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this contribution @wassimbensalem!

packages_to_install (`Optional[List[str]]`):
A list of Python packages to install before running the function.
pip_index_url (`Optional[str]`): The PyPI URL from which to install Python packages.
pip_index_urls (`Optional[list[str]]`): The PyPI URLs from which to install Python packages.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the comment that the first URL will be used as the default index, and the rest will be used as additional indexes.

func_args: Optional[Dict] = None
packages_to_install: Optional[list[str]] = None
pip_index_url: str = constants.DEFAULT_PIP_INDEX_URL
pip_index_urls: list[str] = field(default_factory=lambda: constants.DEFAULT_PIP_INDEX_URLS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you want to use field here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use simply the constants.DEFAULT_PIP_INDEX_URLS, in any instance that we create if someone modifies constants.DEFAULT_PIP_INDEX_ULRLS it will affect all created instance. Sothe field with default factory will assure that we have a fresh copy of it each time. This is optional, I can remove it as well !

Copy link
Member

@andreyvelich andreyvelich Aug 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, we probably need to do this to return new copy all the time object is created ?

pip_index_urls: list[str] = field(default_factory=lambda: list(constants.DEFAULT_PIP_INDEX_URLS))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly

is_mpi: bool,
pip_index_urls: list[str] = constants.DEFAULT_PIP_INDEX_URLS,
is_mpi: bool = False,
include_pypi: bool = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you want to introduce this flag ?
I would let user explicitly sets the PyPI index in the list if they need it:

pip_index_urls=["pypi.custom.com/simple", "https://pypi.org/simple"]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I Just was thinking if the user forgot to set the default pypi. For instance he chose pip_index_urls = ["repoA", "repoB"], but the package wasnt there, so we have a fallback solution which is the default pypi. This is also optional and I can remove it :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can have it in the future, if we get any complains from users.
I would just define single constant that can be overridden by env:

DEFAULT_PIP_INDEX_URLS = os.getenv(
    "DEFAULT_PIP_INDEX_URLS", "https://pypi.org/simple"
).split(",")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @andreyvelich

--no-warn-script-location {} {}
""".format(
pip_index_url,
options_str,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can directly set it here.

Suggested change
options_str,
" ".join(options),

@astefanutti
Copy link
Contributor

@wassimbensalem could you please sign the commits?

@astefanutti
Copy link
Contributor

/ok-to-test

Copy link
Contributor

@kramaranya kramaranya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @wassimbensalem!
Could you also sign your commits and fix the code formatting -- https://github.com/kubeflow/sdk/blob/main/CONTRIBUTING.md#development-workflow?

Comment on lines 131 to 135
DEFAULT_PIP_INDEX_URLS = (
os.getenv("DEFAULT_PIP_INDEX_URLS", DEFAULT_PYPI_URL).split(",")
if os.getenv("DEFAULT_PIP_INDEX_URLS")
else [DEFAULT_PYPI_URL]
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could simplify this to not call getenv twice

is_mpi: bool,
pip_index_urls: list[str] = constants.DEFAULT_PIP_INDEX_URLS,
is_mpi: bool = False,
include_pypi: bool = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @andreyvelich

@google-oss-prow google-oss-prow bot added size/L and removed size/M labels Aug 30, 2025
- Simplify DEFAULT_PIP_INDEX_URLS logic
- Remove include_pypi parameter and automatic PyPI fallback
- Clean up function signatures
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support Multiple Pip Index URLs

4 participants