Replies: 1 comment 1 reply
-
Hi! You can use Git tags to mark the releases (versions) of a dataset: import huggingface_hub
# version 1.0
dataset.push_to_hub(
"user_name/dataset_name",
)
huggingface_hub.create_tag("user_name/dataset_name", tag="1.0", repo_type="dataset")
# version 2.0
dataset.push_to_hub(
"user_name/dataset_name",
)
huggingface_hub.create_tag("user_name/dataset_name", tag="2.0", repo_type="dataset") And then reference them when loading as follows: # version 1.0
load_dataset("user_name/dataset_name", revision="1.0")
# version 2.0
load_dataset("user_name/dataset_name", revision="2.0") PS: |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Currently, when uploading a dataset, one would call:
So I thought one could create a new version with:
But then I get the following error:
How to then create a new version?
I've tried cloning, tagging the last commit, pushing it, then calling
push_to_hub
again.However, I seed now duplicates of shard, see below:
How do I then get only one version in the commit of
v1
?Beta Was this translation helpful? Give feedback.
All reactions