Skip to content

Dataset Metadata

Rebecca Ysteboe edited this page Jul 19, 2018 · 3 revisions

The Kaggle API follows the Data Package specification for specifying metadata when creating new Datasets and Dataset versions. Next to your files, you have to put a special datapackage.json file in your upload folder alongside the files for each new Dataset (version).

Here's a basic example for dataset-metadata.json:

{
  "title": "My Awesome Dataset", 
  "id": "timoboz/my-awesome-dataset", 
  "licenses": [{"name": "CC0-1.0"}]
}

You can also use the API command kaggle datasets init -p /path/to/dataset to have the API create this file for you.

Here's an example containing file metadata:

{
  "title": "My Awesome Dataset", 
  "subtitle": "My awesomer subtitle",
  "description": "My awesomest description",
  "id": "timoboz/my-awesome-dataset", 
  "licenses": [{"name": "CC0-1.0"}],
  "resources": [
    {
      "path": "my-awesome-data.csv",
      "description": "This is my awesome data!",
      "schema": {
        "fields": [
          {
            "name": "StringField",
            "description": "String field description",
            "type": "string"
          },
          {
            "name": "NumberField",
            "description": "Number field description",
            "type": "number"
          },
          {
            "name": "DateTimeField",
            "description": "Date time field description",
            "type": "datetime"
          }
        ]
      }
    },
    {
      "path": "my-awesome-extra-file.txt",
      "description": "This is my awesome extra file!"
    }
  ],
  "keywords": [
    "beginner",
    "tutorial"
  ]
}

Contents

Currently, we're only supporting a small subset of metadata for the following commands:

  • kaggle datasets create (create a new Dataset):
    • title: Title of the dataset, must be between 6 and 50 characters in length.
    • subtitle: Subtitle of the dataset, must be between 20 and 80 characters in length.
    • description: Description of the dataset.
    • id: The URL slug of your new dataset, a combination of:
      1. Your username or organization slug (if you are a member of an organization).
      2. A unique Dataset slug, must be between 3 and 50 characters in length.
    • licenses: Must have exactly one entry that specifies the license. Only name is evaluated, all other information is ignored. See below for options.
    • description: Description for the dataset.
    • resources: Contains an array of files that are being uploaded. (Note - this is not required, nor if included, does it need to include all of the files to be uploaded.):
      • path: File path.
      • description: File description.
      • schema: File schema (definition below):
        • fields: Array of fields in the dataset. Please note that this needs to include ALL of the fields in the data in order or they will not be matched up correctly. A later version of the API will fix this bug.
    • keywords: Contains an array of strings that correspond to an existing tag on Kaggle. If a specified tag doesn't exist, the upload will continue, but that specific tag won't be added.
  • kaggle datasets version (create a new version for an existing Dataset):
    • subtitle: Subtitle of the dataset, must be between 20 and 80 characters in length.
    • description: Description of the dataset.
    • id: The URL slug of the dataset you want to update (see above). You must be the owner or otherwise have edit rights for this dataset.
    • resources: Contains an array of files that are being uploaded. (Note - this is not required, nor if included, does it need to include all of the files to be uploaded.):
      • path: File path.
      • description: File description.
      • schema: File schema (definition below):
        • fields: Array of fields in the dataset. Please note that this needs to include ALL of the fields in the data in order or they will not be matched up correctly. A later version of the API will fix this bug.
    • keywords: Contains an array of strings that correspond to an existing tag on Kaggle. If a specified tag doesn't exist, the upload will continue, but that specific tag won't be added.

We will add further metadata processing in upcoming versions of the API.

Licenses

You can specify the following licenses for your datasets:

Clone this wiki locally