Skip to content

Newly added domains and entry points need unique IDs distinct from entry #1

@KlausGPaul

Description

@KlausGPaul

An entry point needs a separate ID from the domain document it is part of. At the moment, the script re-uses the same id for _id, and _source.seed_urls[:].id, which causes a crawler error.

{
  "id": "604109b536067b6bad0570d8",
  "created_at": "2021-03-04T16:24:21Z",
  "updated_at": "2021-03-04T16:24:30Z",
  "engine_oid": "6040f4be36067b1ee89528bc",
  "name": "https://some.domain",
  "crawl_rules": [],
  "seed_urls": [
    {
      "created_at": "2021-03-04T16:24:21Z",
      "id": "604109b536067b6bad0570d9",
      "url": "https://some.domain/"
    },
    {
      "created_at": "2021-03-04T16:24:30Z",
      "id": "604109be36067b6bad0570da",
      "url": "https://some.domain/datasets"
    }
  ]
}

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions