GitHub - yoRyuuuuu/mydatasyncer: Utility for synchronizing data from various file formats to a relational database

The mydatasyncer is a utility for synchronizing data from various file formats to a relational database. It provides flexible synchronization options including full overwrite and differential update modes.

Features

Two sync modes:
- Overwrite mode: Completely replaces all existing data in the target table
- Differential mode: Updates only changed records, adds new records, and optionally deletes records not in the source file
Multi-table synchronization: Supports synchronizing multiple tables with dependency-aware ordering
Parent-child relationships: Automatic sync order determination based on foreign key dependencies
Bulk operations: Efficiently handles large datasets using bulk insert/update/delete operations
Transaction support: All operations are wrapped in database transactions to ensure data integrity
Simple configuration: Easy to define target tables, columns, and primary keys
Multiple format support: Supports CSV and JSON formats with automatic type detection

Installation

Prerequisites

Go 1.16 or later
MySQL or compatible database (configurable for other database systems)

Installing with go install

The simplest way to install mydatasyncer is to use go install:

go install github.com/yoRyuuuuu/mydatasyncer@latest

This will install the latest version of mydatasyncer in your Go bin directory, which should be in your PATH.

Building from Source

Alternatively, you can build from source:

git clone https://github.com/yoRyuuuuu/mydatasyncer.git
cd mydatasyncer
go build

Usage

Basic Usage

mydatasyncer --config config.yml

Dry Run Mode

The Dry Run mode allows you to preview synchronization changes without actually modifying the database.

Overview

Key benefits of the Dry Run feature:

Safe Change Preview: Review synchronization results without affecting the actual database
Detailed Execution Plan: Check the number of records to be inserted, updated, or deleted, and which columns will be affected
Transaction Protection: Changes are automatically rolled back in Dry Run mode, ensuring data integrity
Dual Mode Support: Available in both overwrite and diff modes

Usage

Command Line Execution:

# Basic usage
mydatasyncer --config config.yml --dry-run

# Without config path (uses default mydatasyncer.yml)
mydatasyncer --dry-run

Configuration File:

# mydatasyncer.yml
db:
  dsn: "user:password@tcp(127.0.0.1:3306)/testdb?parseTime=true"
sync:
  filePath: "./testdata.csv"
  tableName: "products"
  columns:
    - id
    - name
    - price
  primaryKey: "id"
  syncMode: "diff"
  deleteNotInFile: true

Output Format

The Dry Run mode displays an execution plan in the following format:

[DRY-RUN MODE] Execution Plan for Table {table_name}
----------------------------------------------------
Summary:
- Sync Mode: {overwrite|diff}
- Records in File: X
- Records in Database: Y

Planned Operations:
1. Delete Operations
   - Records to Delete: N
   - Affected Primary Keys: [key list]

2. Insert Operations
   - Records to Insert: M
   - Affected Columns: [column1, column2, ...]

3. Update Operations
   - Records to Update: P
   - Affected Columns: [column1, column2, ...]

Timestamps:
- Timestamp Columns to Update: [created_at, updated_at]

Important Notes

Transaction Control
- All operations in Dry Run mode are automatically rolled back
- No actual database changes are made during execution
Timestamp Columns
- Timestamp columns (created_at, updated_at) are included in the execution plan
- Actual values will be set automatically during real execution
Performance
- Minimal performance overhead even with large datasets
- Efficient execution plan generation with minimal database impact
Error Handling
- Configuration and connection errors are detected before execution
- Error messages are displayed in a clear, understandable format

Basic Usage

Run the synchronization with:

mydatasyncer

By default, it will use the built-in configuration. You can customize the behavior through a configuration file.

Configuration

The configuration file uses YAML format. There are two configuration styles: single-table and multi-table.

Single-Table Configuration

# Database connection settings
db:
  dsn: "user:password@tcp(127.0.0.1:3306)/testdb?parseTime=true"

# Synchronization settings
sync:
  # Input file path (CSV or JSON format)
  filePath: "./testdata.csv"

  # Target table name
  tableName: "products"

  # Columns to synchronize
  columns:
    - id          # Primary key
    - name        # Regular column
    - price       # Regular column
    - created_at  # Timestamp column
    - updated_at  # Timestamp column

  # Primary key specification (required)
  primaryKey: "id"

  # Synchronization mode
  # - "diff": Differential sync mode. Updates existing records, inserts new ones
  # - "overwrite": Overwrite mode. Deletes all existing data and replaces with file contents
  syncMode: "diff"

  # Delete records not in file (only effective in diff mode)
  deleteNotInFile: true

  # Timestamp column auto-update settings
  timestamps:
    createdAt: "created_at"  # Updated only when creating new records
    updatedAt: "updated_at"  # Updated every time records are modified

  # Immutable columns (excluded from synchronization)
  immutableColumns:
    - "created_at"  # Creation timestamp cannot be modified

Multi-Table Configuration

# Database connection settings
db:
  dsn: "user:password@tcp(127.0.0.1:3306)/testdb?parseTime=true"

# Multi-table synchronization settings
tables:
  - tableName: "categories"
    filePath: "./categories.json"
    primaryKey: "category_id"
    syncMode: "diff"
    dependencies: []  # No dependencies (parent table)
    
  - tableName: "products"
    filePath: "./products.csv"
    primaryKey: "product_id"
    syncMode: "diff"
    dependencies: ["categories"]  # Depends on categories table
    columns:
      - product_id
      - name
      - category_id
      - price
    timestamps:
      updatedAt: "updated_at"

Transaction Boundaries

Single-Table Synchronization:

Each table sync runs in its own dedicated transaction
If the sync fails, only that table's changes are rolled back

Multi-Table Synchronization:

All tables are synchronized within a single global transaction
All-or-nothing approach: If ANY table sync fails, the ENTIRE multi-table operation is rolled back
Ensures ACID properties and referential integrity across related tables with foreign key relationships
Dependency-aware processing order:
- Delete operations: Child tables → Parent tables (avoids foreign key violations)
- Insert/Update operations: Parent tables → Child tables (satisfies foreign key constraints)

Sync Mode Details

Differential Mode (diff)

Matches existing records using primary key
Updates only changed records
Inserts new records
Deletes records not in file if deleteNotInFile: true
Executes within a transaction, rolls back all changes on error

Overwrite Mode (overwrite)

Deletes all existing data in the table
Completely replaces with file contents
Faster but existing data is completely lost
Ignores deleteNotInFile setting

Using Timestamp and Immutable Columns

Timestamp Columns
- created_at: Updated with current timestamp only for new records
- updated_at: Updated with current timestamp on every record modification
- Timestamp columns must be included in the columns list
Immutable Columns
- Specified in immutableColumns
- Values once set will not be modified during synchronization
- Typically used to protect metadata or system management fields

Usage Examples

Product Master Differential Sync

sync:
  filePath: "./products.csv"
  tableName: "products"
  columns:
    - product_id
    - name
    - price
    - category
    - created_at
    - updated_at
  primaryKey: "product_id"
  syncMode: "diff"
  deleteNotInFile: false  # Don't delete products
  timestamps:
    createdAt: "created_at"
    updatedAt: "updated_at"

User Data Complete Sync

sync:
  filePath: "./users.csv"
  tableName: "users"
  columns:
    - user_id
    - email
    - name
    - status
    - created_at
  primaryKey: "user_id"
  syncMode: "overwrite"  # Complete replacement
  immutableColumns:
    - "created_at"  # Preserve creation timestamp

Inventory Differential Update (with deletions)

sync:
  filePath: "./inventory.csv"
  tableName: "inventory"
  columns:
    - item_id
    - quantity
    - last_checked
    - updated_at
  primaryKey: "item_id"
  syncMode: "diff"
  deleteNotInFile: true  # Remove out-of-stock items
  timestamps:
    updatedAt: "updated_at"

Using with Docker

A Docker Compose setup is included for easy development and testing:

# Start the MySQL database
docker compose up -d

# Build and run the application
go build
./mydatasyncer

The provided compose.yml sets up a MySQL database and initializes it with sample tables.

Project Structure

main.go: Entry point and CSV file loading logic
dbsync.go: Database synchronization operations
config.go: Configuration definitions and loading
init-sql/: SQL files for database initialization
testdata.csv: Sample data file for testing

License

This project is licensed under the MIT License - see the LICENSE file for details.

Future Enhancements

Support for additional file formats (Excel, JSON, XML)
Configuration through external files (YAML, TOML, JSON)
Command-line parameters for overriding configuration
Enhanced logging and monitoring features
Support for additional database systems

Name		Name	Last commit message	Last commit date
Latest commit History 199 Commits
.github		.github
testdata		testdata
.gitignore		.gitignore
.golangci.yaml		.golangci.yaml
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
compose.yml		compose.yml
config.go		config.go
config_test.go		config_test.go
dbsync.go		dbsync.go
dbsync_test.go		dbsync_test.go
go.mod		go.mod
go.sum		go.sum
loader.go		loader.go
loader_test.go		loader_test.go
main.go		main.go
main_test.go		main_test.go
mydatasyncer.yml		mydatasyncer.yml
primary_key_validator.go		primary_key_validator.go
primary_key_validator_test.go		primary_key_validator_test.go
業務データ同期レポート.md		業務データ同期レポート.md
業務運用における懸念事項レポート.md		業務運用における懸念事項レポート.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Features

Installation

Prerequisites

Installing with go install

Building from Source

Usage

Basic Usage

Dry Run Mode

Overview

Usage

Output Format

Important Notes

Basic Usage

Configuration

Single-Table Configuration

Multi-Table Configuration

Transaction Boundaries

Sync Mode Details

Differential Mode (diff)

Overwrite Mode (overwrite)

Using Timestamp and Immutable Columns

Usage Examples

Using with Docker

Project Structure

License

Future Enhancements

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

yoRyuuuuu/mydatasyncer

Folders and files

Latest commit

History

Repository files navigation

Features

Installation

Prerequisites

Installing with go install

Building from Source

Usage

Basic Usage

Dry Run Mode

Overview

Usage

Output Format

Important Notes

Basic Usage

Configuration

Single-Table Configuration

Multi-Table Configuration

Transaction Boundaries

Sync Mode Details

Differential Mode (diff)

Overwrite Mode (overwrite)

Using Timestamp and Immutable Columns

Usage Examples

Using with Docker

Project Structure

License

Future Enhancements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages