Releases · dathere/datapusher-plus

25 Apr 13:18

jqnatividad

2.0.0

6bff8d5

2.0.0 Latest

Latest

[2.0.0] - 2025-04-25

🎉 Data Resource Upload First (DRUF) Workflow is finally here! 🎉

A workflow that flips the old CKAN traditional data ingestion on its head.

Instead of filling out the metadata first and then uploading the data, users upload data resources first
In a few seconds, even for very large datasets, analysis and validation is done while precompiling statistical metadata
This precompiled metadata are then used by Metadata Formulae defined in the scheming yaml files to either precompute other metadata fields (on both package & resource levels) or to offer metadata suggestions
Metadata Formulae use the same powerful Jinja2 template engine that powers CKAN's templating system.
It comes with an extensible library of Jinja2 filters/functions that can be used in Metadata Formulae ala Excel.

The DRUF reinvents CKAN data ingestion - by automatically calculating/suggesting "Automagical Metadata" - high-quality, high-resolution metadata that reflects and describes what's INSIDE the dataset (e.g. summary stats; frequency table; spatial extent, date range, outliers, etc. calculated with Metadata Formulae) in addition to metadata about the dataset FILE (e.g. last updated, size of the file, owner, format, license, etc - what's normally found in traditional data catalogs).

Future improvements planned:

"entry-time" Metadata Formulae
In addition to the two formula types (formula to set a metadata field directly during creation/update; and suggestion_formula to suggest values using the Bootstap Popover UI), we'll add the ability to allow Data Publishers to enter formulas while they're entering metadata - fully embracing the Excel formula UI/UX aesthetic.
DCAT3-optimized reference profiles
Following implementation guidance for both DCAT-US v3 and DCAT-AP 3 scheming profiles with Metadata Formulae to compute recommended and optional properties that allow publishers to more fully take advantage of DCAT3 features and improvements - metadata properties that are often too laborious to manually compile.
Co-Curator AI
"Automagical metadata" is the perfect context for AI engines - as it summarizes even very large datasets in just a few kilobytes. It allows the Co-Curator¹ to suggest tags, descriptions, links to related data sets and chat about the corpus WHILE the Data Publisher is curating the data.
Inline Data Validation
Optional ability to infer an initial JSON Schema validation file, and then validate future updates to the dataset using it, leveraging the same blazing-fast qsv engine (validating up to 340,000 records/per second²).
Customizable DRUF Data ingestion pipeline
Currently, there are numerous configuration settings to fine-tune the DRUF data-ingestion pipeline. However, the built-in default pipeline can only be customized to a limit without customizing the code. We will expose hooks that CKAN operators can take advantage of to tailor their DRUF pipelines to meet their requirements, while preserving the ability to access the precompiled statistical metadata that DP+ maintains.
Dynamic loading of Formula filters/functions
So users can share custom Jinja2 filters and functions they developed for their Metadata Formulae.
Inline Data Enrichment
Data can be optionally enriched while it's being ingested from other reference datasets within the same CKAN instance or external sources (e.g. enriched against high value curated sources like the Census; geocoding, etc.)
and more!
It took a while for us to bake 2.0.0, but we look forward to picking up the pace and co-innovating with the CKAN ecosystem.

NOTE: To fully experience the DRUF workflow, you'll need to use scheming dataset form pages and apply some CKAN core changes. A detailed installation procedure will be published on the Wiki shortly.

Added

Data Resource Upload First (DRUF) Workflow
- Enhanced resource validation for DRUF workflow
- Metadata Formulae for precomputing metadata/metadata sugggestions
- Spatial file support - supports GeoJSON and Shapefiles
Support for CKAN 2.9 compatibility in CLI operations
Enhanced error handling and logging for resource uploads

Changed

Updated CLI interface to work with CKAN 2.9
Refactored resource upload process to support DRUF workflow
Improved error messages and user feedback
Enhanced configuration handling

Fixed

Various bug fixes and improvements for CKAN 2.9 compatibility
Resource upload process reliability improvements

Contributors

Full Changelog: 1.0.4...2.0.0

Inspired by the Curator in Ready Player One ↩
validate_index benchmark - https://qsv.dathere.com/benchmarks ↩

Contributors

jqnatividad, tino097, and minhajuddin2510

Assets 2

15 Jan 17:35

tino097

1.0.4

44af9ff

1.0.4

Full Changelog: 1.0.3...1.0.4

Assets 2

30 Oct 20:29

tino097

1.0.3

8c86c1e

1.0.3

What's Changed

Ensure we are always using the same token setting for datapusher by @avdata99 in #168
Fix iconv by @avdata99 in #169
Fix the api_token config variable and fix for default views creation by @tino097 in #170
Migration added by @tino097 in #171
Migration added by @tino097 in #172

Full Changelog: 1.0.2...1.0.3

Contributors

tino097 and avdata99

Assets 2

16 Sep 19:49

tino097

1.0.2

fa0babc

1.0.2

What's Changed

Update README file for DP+ as extension by @tino097 in #143
Fix MANIFEST.in by @pdelboca in #148
Migrate cli commands by @tino097 in #150
[dev-v1.0] Fix init db command by @pdelboca in #151
Config part by @tino097 in #153
Database migrations by @tino097 in #154
Update readme by @tino097 in #156
Fix yaml extension in MANIFEST.in by @pdelboca in #157
Fix datefmt compatability with qsv in dev-v1.0 by @Zharktas in #161
Remove obsolete assets by @pdelboca in #159

New Contributors

@pdelboca made their first contribution in #148

Full Changelog: 1.0.1...1.0.2

Contributors

Zharktas, tino097, and pdelboca

Assets 2

22 May 17:01

tino097

1.0.1

75a6581

1.0.1

What's Changed

Replace http requests with actions by @tino097 in #124
Fix calling package action for resource by @tino097 in #140

Full Changelog: 1.0.0...1.0.1

Contributors

tino097

Assets 2

06 May 17:55

tino097

1.0.0

90a4868

1.0.0 Pre-release

Pre-release

What's Changed

Convert the datapusher to work as plugin by @tino097 in #73
Code cleanup by @tino097 in #89
[72]Rewrite resource url by @TomeCirun in #109
Feature db models by @tino097 in #120
Add migration script by @tino097 in #121
Code cleanup part two by @tino097 in #123
Rewrite resource URL if it differs from the defined ckan_url by @jhbruhn in #136
Fixing issues by @tino097 in #137
Sync with master by @tino097 in #138

New Contributors

@tino097 made their first contribution in #73
@jhbruhn made their first contribution in #136

Full Changelog: 0.16.4...1.0.0

Contributors

jhbruhn, tino097, and TomeCirun

Assets 2

0 Join discussion

23 Jan 17:40

jqnatividad

0.16.4

2c648b2

0.16.4

What's Changed

sync read buffer with buffer size of copyexpert by @jqnatividad in #128

Full Changelog: 0.16.3...0.16.4

Contributors

jqnatividad

Assets 2

23 Jan 17:18

jqnatividad

0.16.3

6e21862

0.16.3

What's Changed

make COPY_READBUFFER_SIZE a configurable parameter by @jqnatividad in #127

Full Changelog: 0.16.2...0.16.3

Contributors

jqnatividad

Assets 2

23 Jan 16:50

jqnatividad

0.16.2

07e3170

0.16.2

CHANGED

explicitly create a large read buffer when reading CSV when COPYing files to the datastore.

Full Changelog: 0.16.1...0.16.2

Assets 2

15 Jan 14:13

jqnatividad

0.16.1

cc1fe96

0.16.1

Fixed:

fix utf8 encoding check, replacing NamedTemporaryFile approach, with Temporary Directory approach introduced in https://github.com/dathere/datapusher-plus/pull/117/files

NOTE: you’ll need to install uchardet for the encoding check (apt-get install uchardet)

Full Changelog: 0.16.0...0.16.1

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[2.0.0] - 2025-04-25