Skip to content

Align HfFileSystem and HfApi for the expand argument when listing files in repos #3195

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 4, 2025

Conversation

lhoestq
Copy link
Member

@lhoestq lhoestq commented Jul 2, 2025

HfAPi has it set to False by default, so should HfFileSystem. In this PR I set it to False by default for HfFileSystem.

This will be particularly useful for pyarrow 21.0 when it comes out. Currently having expand=True by default causes issues because it's an expensive operation:

In [1]: import pyarrow.dataset as ds
   ...: 
   ...: uri = "hf://datasets/HuggingFaceFW/fineweb-2"
   ...: dataset = ds.dataset(uri)
HTTP Error 429 thrown while requesting GET https://huggingface.co/api/datasets/HuggingFaceFW/fineweb-2/tree/main?recursive=True&expand=True&cursor=ZXlKbWFXeGxYMjVo...
Retrying in 1s [Retry 1/20].
HTTP Error 429 thrown while requesting GET https://huggingface.co/api/datasets/HuggingFaceFW/fineweb-2/tree/main?recursive=True&expand=True&cursor=ZXlKbWFXeGxYMjVo...
Retrying in 1s [Retry 1/20].
...

This will make HfFileSystem more efficient overall :) (kinda related to #3177 )

@lhoestq lhoestq requested review from Wauplin and hanouticelina July 2, 2025 13:07
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes makes sense and looks good, thanks! 👍 (hope it doesn't break things in the wild though 😬)

Co-authored-by: Lucain <lucain@huggingface.co>
@lhoestq
Copy link
Member Author

lhoestq commented Jul 4, 2025

(hope it doesn't break things in the wild though 😬)

It's fine for all the other cases I know of: pandas, dask, spark, datasets.

@lhoestq lhoestq merged commit 798ea8a into main Jul 4, 2025
23 of 25 checks passed
@lhoestq lhoestq deleted the align-hffs-with-hfapi-for-expand branch July 4, 2025 11:02
mintyleaf pushed a commit to Swarmind/huggingface_hub that referenced this pull request Jul 11, 2025
…iles in repos (huggingface#3195)

* align hffs and hfapi for expand

* update tests

* fix tests

* again

* Update src/huggingface_hub/hf_file_system.py

Co-authored-by: Lucain <lucain@huggingface.co>

---------

Co-authored-by: Lucain <lucain@huggingface.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants