-
Notifications
You must be signed in to change notification settings - Fork 57
Description
Description
When I run:
ckan xloader submit all
It seems to only submit xloader jobs for the first 1000 datasets in the CKAN instance. Any datasets beyond that number don’t appear to be processed.
Expected behavior (unless this is intentional)
I expected this command to submit jobs for all datasets, but it currently seems limited to the first 1000.
What might be happening
It looks like this behavior comes from the use of the package_search
API without pagination. By default, that API only returns up to 1000 results.
Here’s the line that seems to be causing it:
{'ignore_auth': True}, {'include_private': True, 'rows': 1000}) |
Since there’s no loop or pagination logic around this, only the first 1000 datasets are processed.
Suggestion (if not intentional)
If this limit is by design, maybe it would be helpful to document it.
Otherwise, a possible fix would be to add a loop that paginates through all datasets in batches of 1000 using the start
and rows
parameters in the API call.
Steps to reproduce
- Set up a CKAN instance with more than 1000 datasets.
- Run
ckan xloader submit all
. - Only 1000 datasets will have jobs submitted.
Additional notes
Just wanted to raise this in case it's unintentional — happy to help if needed!