Skip to content

Add support for DuckDB #214

@volcan01010

Description

@volcan01010

Summary

As a data analyst, I want support for DuckDB so that I can easily move data between DuckDB and other databases and also benefit from DuckDBs fast CSV loading capabilities.

Description

DuckDB is an in-memory, column-based analytical database. It is designed for working with large files. The column-based design makes it good for analytical work analytical queries e.g. based on aggregations over whole columns or large joins.

https://duckdb.org/why_duckdb

DuckDB works as a standalone application, in a similar way to SQLite. It comes with nice tools for reading CSV files that can do neat things like auto-detect data types. These could be useful in ETL workflows.

https://duckdb.org/docs/data/csv/overview

Adding DuckDB support to ETL Helper would allow users to use the etl.copy_rows to pull data from PostgreSQL/Oracle etc. directly into DuckDB for analysis.

Implementation

The DuckDB Python library is compatible with the DB API 2.0 specification that ETL Helper uses.

https://duckdb.org/docs/api/python/dbapi

This should make it easy to add to ETL Helper. It is just a case of adding a DuckDbHelper and the appropriate tests: https://github.com/BritishGeologicalSurvey/etlhelper/blob/main/CONTRIBUTING.md#support-for-more-database-types

Acceptance criteria

  • DbHelper for DuckDB is added
  • Integration test suite added and runs in GitHub actions
  • Documentation updated

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions