Skip to content

Commit 6412808

Browse files
authored
[Data] Persist unresolved paths in FileBasedDataSource. (#51424)
<!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? This PR adds support for storing the original, unresolved paths in the `FileBasedDatasource` class. This allows components to access the raw path pattern provided by the user before any path resolution or expansion occurs. This also adds consistency with [`ParquetDatasource`](https://github.com/ray-project/ray/blob/936539ce15bffa8dfa0a519fee2c7c8c0ee341b8/python/ray/data/_internal/datasource/parquet_datasource.py#L202) and [`_FileDatasink`](https://github.com/ray-project/ray/blob/936539ce15bffa8dfa0a519fee2c7c8c0ee341b8/python/ray/data/datasource/file_datasink.py#L66). ## Related issue number <!-- For example: "Closes #1234" --> ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [x] This PR is not tested :( Signed-off-by: Jakub Dardzinski <kuba0221@gmail.com>
1 parent 761b418 commit 6412808

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

python/ray/data/datasource/file_based_datasource.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -140,6 +140,7 @@ def __init__(
140140
self._partitioning = partitioning
141141
self._ignore_missing_paths = ignore_missing_paths
142142
self._include_paths = include_paths
143+
self._unresolved_paths = paths
143144
paths, self._filesystem = _resolve_paths_and_filesystem(paths, filesystem)
144145
self._filesystem = RetryingPyFileSystem.wrap(
145146
self._filesystem, retryable_errors=self._data_context.retried_io_errors

0 commit comments

Comments
 (0)