Skip to content

Possible limit in xloader submit all: only processes the first 1000 datasets? #251

@dsanchezmatilla

Description

@dsanchezmatilla

Description

When I run:
ckan xloader submit all

It seems to only submit xloader jobs for the first 1000 datasets in the CKAN instance. Any datasets beyond that number don’t appear to be processed.

Expected behavior (unless this is intentional)

I expected this command to submit jobs for all datasets, but it currently seems limited to the first 1000.

What might be happening

It looks like this behavior comes from the use of the package_search API without pagination. By default, that API only returns up to 1000 results.

Here’s the line that seems to be causing it:

{'ignore_auth': True}, {'include_private': True, 'rows': 1000})

Since there’s no loop or pagination logic around this, only the first 1000 datasets are processed.

Suggestion (if not intentional)

If this limit is by design, maybe it would be helpful to document it.
Otherwise, a possible fix would be to add a loop that paginates through all datasets in batches of 1000 using the start and rows parameters in the API call.

Steps to reproduce

  1. Set up a CKAN instance with more than 1000 datasets.
  2. Run ckan xloader submit all.
  3. Only 1000 datasets will have jobs submitted.

Additional notes

Just wanted to raise this in case it's unintentional — happy to help if needed!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions