-
Couldn't load subscription status.
- Fork 1.2k
Dataset Metadata
The Kaggle API follows the Data Package specification for specifying metadata when creating new Datasets and Dataset versions. Next to your files, you have to put a special datapackage.json file in your upload folder alongside the files for each new Dataset (version).
Here's a basic example for dataset-metadata.json:
{
"title": "My Awesome Dataset",
"id": "timoboz/my-awesome-dataset",
"licenses": [{"name": "CC0-1.0"}]
}
You can also use the API command kaggle datasets init -p /path/to/dataset to have the API create this file for you.
Here's an example containing file metadata:
{
"title": "My Awesome Dataset",
"subtitle": "My awesomer subtitle",
"description": "My awesomest description",
"id": "timoboz/my-awesome-dataset",
"licenses": [{"name": "CC0-1.0"}],
"resources": [
{
"path": "my-awesome-data.csv",
"description": "This is my awesome data!",
"schema": {
"fields": [
{
"name": "StringField",
"description": "String field description",
"type": "string"
},
{
"name": "NumberField",
"description": "Number field description",
"type": "number"
},
{
"name": "DateTimeField",
"description": "Date time field description",
"type": "datetime"
}
]
}
},
{
"path": "my-awesome-extra-file.txt",
"description": "This is my awesome extra file!"
}
],
"keywords": [
"beginner",
"tutorial"
]
}
Currently, we're only supporting a small subset of metadata for the following commands:
-
kaggle datasets create(create a new Dataset):-
title: Title of the dataset, must be between 6 and 50 characters in length. -
subtitle: Subtitle of the dataset, must be between 20 and 80 characters in length. -
description: Description of the dataset. -
id: The URL slug of your new dataset, a combination of:- Your username or organization slug (if you are a member of an organization).
- A unique Dataset slug, must be between 3 and 50 characters in length.
-
licenses: Must have exactly one entry that specifies the license. Onlynameis evaluated, all other information is ignored. See below for options. -
description: Description for the dataset. -
resources: Contains an array of files that are being uploaded. (Note - this is not required, nor if included, does it need to include all of the files to be uploaded.):-
path: File path. -
description: File description. -
schema: File schema (definition below):-
fields: Array of fields in the dataset. Please note that this needs to include ALL of the fields in the data in order or they will not be matched up correctly. A later version of the API will fix this bug.-
name: Field name -
title: Field description -
type: Field type. Valid types are defined in the Frictionless Data Table Schema
-
-
-
-
keywords: Contains an array of strings that correspond to an existing tag on Kaggle. If a specified tag doesn't exist, the upload will continue, but that specific tag won't be added.
-
-
kaggle datasets version(create a new version for an existing Dataset):-
subtitle: Subtitle of the dataset, must be between 20 and 80 characters in length. -
description: Description of the dataset. -
id: The URL slug of the dataset you want to update (see above). You must be the owner or otherwise have edit rights for this dataset. -
resources: Contains an array of files that are being uploaded. (Note - this is not required, nor if included, does it need to include all of the files to be uploaded.):-
path: File path. -
description: File description. -
schema: File schema (definition below):-
fields: Array of fields in the dataset. Please note that this needs to include ALL of the fields in the data in order or they will not be matched up correctly. A later version of the API will fix this bug.-
name: Field name -
title: Field description -
type: Field type. Valid types are defined in the Frictionless Data Table Schema
-
-
-
-
keywords: Contains an array of strings that correspond to an existing tag on Kaggle. If a specified tag doesn't exist, the upload will continue, but that specific tag won't be added.
-
We will add further metadata processing in upcoming versions of the API.
You can specify the following licenses for your datasets:
-
CC0-1.0: CC0: Public Domain -
CC-BY-SA-3.0: CC BY-SA 3.0 -
CC-BY-SA-4.0: CC BY-SA 4.0 -
CC-BY-NC-SA-4.0: CC BY-NC-SA 4.0 -
GPL-2.0: GPL 2 -
ODbL-1.0: Database: Open Database, Contents: © Original Authors -
DbCL-1.0: Database: Open Database, Contents: Database Contents -
copyright-authors: Data files © Original Authors -
other: Other (specified in description) -
unknown: Unknown