-
Notifications
You must be signed in to change notification settings - Fork 2
Support huggingface source #343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Setting back to draft, to simplify a bit (less checks, so that we support models and spaces, and to reduce the maintenance burden). It will help as a basis for GitHub URLs later |
85e9e98 to
7f72e17
Compare
|
Now supports spaces and models: Screencast.From.2025-10-21.00-02-59.mp4Note that, at the end of the video, the text "Branches" turns black when it has focus, but is no longer hovered over. I fixed the color in commit 0e93ebb. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested locally. Super slick! ❤️
Code looks good. My only comments would be:
- The
huggingFaceSource.test.tstests are written in awkward style (lots of arrays and loops and a confusing higher-order function at the end). - Don't love adding a dependency to make one fetch call? Is the library is dealing with pagination or anything? But I do like HF so its also fine.
But I'll remove the dependency, you're right, it's not that hard. In the space, we also had to handle the authentication which is a bit more tricky edit: done, I vendored it |
yes, I don't love them either. I wanted to have a matrix for all the cases, in particular, to be sure that model URLs work the same as the dataset ones, since the construction is slightly different (dataset and space URLs start with datasets/ and spaces/, while model ones start with /, not /models, which is a potential source of bugs). What do you recommend? test less things? list all the URLs explicitly instead of generating them with code? or something different? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reducing dependency count :-)
I don't have a specific idea how to cleanup the tests, just observing. I gave a suggestion here on one section that doesn't really need test.for. The other ones are multi-line expectations and so I would probably leave the test.for on the other ones. Its the very last set of tests that got really ugly with the flatMaps, but I don't have a specific suggestion.
Co-authored-by: Kenny Daniel <platypii@gmail.com>
|
OK, thanks for the details and the suggestion. Let's keep it that way for now, and maybe improve later. |
see #327
Note that the code supports passing a token to access private or gated datasets, but it's not integrated in the UI (see additional required code: https://github.com/hyparam/space/blob/master/src/lib/auth.ts and https://github.com/hyparam/space/blob/65d4947f47facb7ff576a05a473375ee7af036b6/src/components/Home/Home.tsx#L52).
I think that supporting public datasets only is a good first step. The interface might become complex if we have to add authentication, no?
Screencast.From.2025-10-20.17-28-53.mp4
Also: it only supports HF datasets, not the models and spaces. I'm not sure if we want to remove this restriction or not. For spaces, it's easy. Models are slightly more complex because they don't contain a "models/" subpath, so potentially any URL such as https://huggingface.co/settings/profile could be a valid model URL... We could create a blocklist of some URLs, or just don't care and consider that any HF that is not a space or dataset is a model.We will have the same issue with GitHub, btw. I would say: better remove the restriction, and assume the user will put a valid URL.Done: we now support the models and spaces