Skip to content

Add skip_conflicts option to load function #143

@volcan01010

Description

@volcan01010

Note: this might be a bit complicated - maybe users should just use executemany and copy_rows and write their own INSERT statements to ensure that they get exactly what they want.

Summary

As an ETL Helper user I want to use skip_conficts so that I can handle primary key violations at the database level without requiring ETL Helper's error handling.

Description

A common error in a big insert is where records already exist for a given primary key. Some databases have internal methods for handling these scenarios. For example SQLite has the ON CONFLICT or INSERT syntax:
https://www.sqlite.org/lang_conflict.html. PostgreSQL has ON CONFLICT DO NOTHING: https://www.postgresql.org/docs/current/sql-insert.html#:~:text=ON%20CONFLICT%20DO%20NOTHING%20simply,can%20perform%20unique%20index%20inference.

There should be implementations for each database. The specific way in which they are applied will need to be configured in the DbHelper classes. The SQL statement generated for the load function may need changes around the INSERT keyword, or as a suffix.

Acceptance critera

  • Calling load with skip_conflicts=True means that primary key violations pass silently and rows are unchanged

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions