-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Dataset Metadata
Rebecca Ysteboe edited this page Jul 19, 2018
·
3 revisions
The Kaggle API follows the Data Package specification for specifying metadata when creating new Datasets and Dataset versions. Next to your files, you have to put a special datapackage.json
file in your upload folder alongside the files for each new Dataset (version).
Here's a basic example for dataset-metadata.json
:
{
"title": "My Awesome Dataset",
"id": "timoboz/my-awesome-dataset",
"licenses": [{"name": "CC0-1.0"}]
}
You can also use the API command kaggle datasets init -p /path/to/dataset
to have the API create this file for you.
Here's an example containing file metadata:
{
"title": "My Awesome Dataset",
"subtitle": "My awesomer subtitle",
"description": "My awesomest description",
"id": "timoboz/my-awesome-dataset",
"licenses": [{"name": "CC0-1.0"}],
"resources": [
{
"path": "my-awesome-data.csv",
"description": "This is my awesome data!",
"schema": {
"fields": [
{
"name": "StringField",
"description": "String field description",
"type": "string"
},
{
"name": "NumberField",
"description": "Number field description",
"type": "number"
},
{
"name": "DateTimeField",
"description": "Date time field description",
"type": "datetime"
}
]
}
},
{
"path": "my-awesome-extra-file.txt",
"description": "This is my awesome extra file!"
}
],
"keywords": [
"beginner",
"tutorial"
]
}
Currently, we're only supporting a small subset of metadata for the following commands:
-
kaggle datasets create
(create a new Dataset):-
title
: Title of the dataset, must be between 6 and 50 characters in length. -
subtitle
: Subtitle of the dataset, must be between 20 and 80 characters in length. -
description
: Description of the dataset. -
id
: The URL slug of your new dataset, a combination of:- Your username or organization slug (if you are a member of an organization).
- A unique Dataset slug, must be between 3 and 50 characters in length.
-
licenses
: Must have exactly one entry that specifies the license. Onlyname
is evaluated, all other information is ignored. See below for options. -
description
: Description for the dataset. -
resources
: Contains an array of files that are being uploaded. (Note - this is not required, nor if included, does it need to include all of the files to be uploaded.):-
path
: File path. -
description
: File description. -
schema
: File schema (definition below):-
fields
: Array of fields in the dataset. Please note that this needs to include ALL of the fields in the data in order or they will not be matched up correctly. A later version of the API will fix this bug.-
name
: Field name -
title
: Field description -
type
: Field type. Valid types are defined in the Frictionless Data Table Schema
-
-
-
-
keywords
: Contains an array of strings that correspond to an existing tag on Kaggle. If a specified tag doesn't exist, the upload will continue, but that specific tag won't be added.
-
-
kaggle datasets version
(create a new version for an existing Dataset):-
subtitle
: Subtitle of the dataset, must be between 20 and 80 characters in length. -
description
: Description of the dataset. -
id
: The URL slug of the dataset you want to update (see above). You must be the owner or otherwise have edit rights for this dataset. -
resources
: Contains an array of files that are being uploaded. (Note - this is not required, nor if included, does it need to include all of the files to be uploaded.):-
path
: File path. -
description
: File description. -
schema
: File schema (definition below):-
fields
: Array of fields in the dataset. Please note that this needs to include ALL of the fields in the data in order or they will not be matched up correctly. A later version of the API will fix this bug.-
name
: Field name -
title
: Field description -
type
: Field type. Valid types are defined in the Frictionless Data Table Schema
-
-
-
-
keywords
: Contains an array of strings that correspond to an existing tag on Kaggle. If a specified tag doesn't exist, the upload will continue, but that specific tag won't be added.
-
We will add further metadata processing in upcoming versions of the API.
You can specify the following licenses for your datasets:
-
CC0-1.0
: CC0: Public Domain -
CC-BY-SA-3.0
: CC BY-SA 3.0 -
CC-BY-SA-4.0
: CC BY-SA 4.0 -
CC-BY-NC-SA-4.0
: CC BY-NC-SA 4.0 -
GPL-2.0
: GPL 2 -
ODbL-1.0
: Database: Open Database, Contents: © Original Authors -
DbCL-1.0
: Database: Open Database, Contents: Database Contents -
copyright-authors
: Data files © Original Authors -
other
: Other (specified in description) -
unknown
: Unknown