CSViper

CSViper is a command-line tool that automates the process of analyzing CSV files and generating SQL scripts and Python programs to load the data into relational databases. It supports both MySQL and PostgreSQL backends and is designed for scenarios where the database is hosted remotely while the CSV file resides on the local machine.

Features

CSV Analysis: Automatically detects CSV format (delimiter, quote character) and analyzes column structure
Column Normalization: Converts column names to SQL-safe identifiers with intelligent duplicate handling
Multi-Database Support: Generates scripts for both MySQL and PostgreSQL
Modular Design: Four-phase approach allows for flexible workflow management
Standalone Scripts: Generates self-contained Python import scripts for easy deployment
Intelligent File Discovery: Invoker system automatically finds and processes the latest matching data files
Pattern-Based Matching: Uses glob patterns to handle timestamped or versioned file naming conventions
Full Compilation Workflow: Single command to process CSV files from analysis to ready-to-run import scripts

Installation

From Source (Development)

Clone the repository:

git clone https://github.com/ftrotter/csviper.git
cd csviper

Create and activate a virtual environment:

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Install in development mode:

pip install -e .

From PyPI (Coming Soon)

pip install csviper

Usage

CSViper operates in four phases that can be run together or separately:

Phase 1: Extract Metadata

Analyze a CSV file and extract column information:

python -m csviper extract_metadata --from_csv=data.csv --output_dir=./output/

Options:

--from_csv: Path to the CSV file to analyze (required)
--output_dir: Output directory (defaults to CSV filename without extension)
--overwrite_previous: Overwrite existing output files

Phase 2: Generate SQL Scripts

Generate CREATE TABLE and data import SQL scripts:

python -m csviper build_sql --from_metadata_json=output/data.metadata.json --output_dir=./output/

Phase 3: Generate Import Script

Create a standalone Python script for data import:

python -m csviper build_import_script --from_resource_dir=./output/ --output_dir=./output/

Phase 4: Invoke Compiled Scripts (New!)

Execute compiled import scripts with automatic file discovery:

python -m csviper invoke-compiled-script --run_import_from=./output/ --import_data_from_dir=./data_directory/ --database_type=postgresql

Options:

--run_import_from: Directory containing compiled CSViper scripts and metadata (required)
--import_data_from_dir: Directory to search for data files (required)
--database_type: Database type - either 'mysql' or 'postgresql' (required)

Running All Phases Together

python -m csviper full_compile --from_csv=data.csv --output_dir=./output/ --overwrite_previous

Development Setup

Prerequisites

Python 3.8 or higher
Virtual environment (recommended)

Setting up the Development Environment

Clone the repository and navigate to the project directory
Source the virtual environment setup script:

source source_me_to_get_venv.sh

Install development dependencies:

pip install -e ".[dev]"

Running Tests

pytest

Code Formatting

black src/

Linting

flake8 src/

Project Structure

csviper/
├── src/csviper/
│   ├── __init__.py              # Package initialization
│   ├── __main__.py              # CLI entry point
│   ├── column_normalizer.py     # Column name normalization utilities
│   ├── metadata_extractor.py    # CSV analysis and metadata extraction
│   ├── mysql_generator.py       # MySQL SQL generation (coming soon)
│   ├── postgresql_generator.py  # PostgreSQL SQL generation (coming soon)
│   └── script_generators/       # Python script generation (coming soon)
├── tests/                       # Test files
├── AI_Instructions/             # Development documentation
├── setup.py                     # Package setup configuration
├── requirements.txt             # Project dependencies
└── README.md                    # This file

Output Files

CSViper generates several files during processing:

Phase 1 Output

{filename}.metadata.json: Contains CSV structure analysis, normalized column names, and column width information

Phase 2 Output

{filename}.create_table_mysql.sql: MySQL CREATE TABLE script
{filename}.create_table_postgres.sql: PostgreSQL CREATE TABLE script
{filename}.import_data_mysql.sql: MySQL data import script
{filename}.import_data_postgres.sql: PostgreSQL data import script

Phase 3 Output

go.mysql.py: Standalone Python script for MySQL database import
go.postgresql.py: Standalone Python script for PostgreSQL database import

Example Workflow

Option 1: Full Compilation (Recommended)

Run all phases at once:

python -m csviper full_compile --from_csv=sales_data.csv --output_dir=./sales_data/

Option 2: Step-by-Step Process

Analyze your CSV file:

python -m csviper extract_metadata --from_csv=sales_data.csv

Review the generated metadata in sales_data/sales_data.metadata.json
Generate SQL scripts:

python -m csviper build_sql --from_metadata_json=sales_data/sales_data.metadata.json

Create import scripts:

python -m csviper build_import_script --from_resource_dir=sales_data/

Use the invoker system to import data:

python -m csviper invoke-compiled-script --run_import_from=./sales_data/ --import_data_from_dir=./data/ --database_type=postgresql

The Invoker System

The invoker system is CSViper's intelligent file discovery and execution engine. It automatically finds the most recent data file matching your original CSV pattern and executes the appropriate import script.

How It Works

File Pattern Matching: CSViper stores a glob pattern in the metadata file (e.g., sales_data_*.csv) that matches files with similar naming conventions
Automatic Discovery: The invoker searches your data directory for files matching this pattern
Latest File Selection: It automatically selects the most recently modified file
User Confirmation: Shows you the selected file and asks for confirmation before proceeding
Script Execution: Runs the appropriate database import script (go.mysql.py or go.postgresql.py)

Example Invoker Usage

If you have a directory with multiple data files:

data/
├── sales_data_2024-01.csv
├── sales_data_2024-02.csv
├── sales_data_2024-03.csv
└── other_data.csv

And compiled scripts in:

sales_data/
├── sales_data.metadata.json
├── go.mysql.py
├── go.postgresql.py
└── ...

Running the invoker:

python -m csviper invoke-compiled-script --run_import_from=./sales_data/ --import_data_from_dir=./data/ --database_type=postgresql

Will automatically:

Find all files matching sales_data_*.csv
Select sales_data_2024-03.csv (most recent)
Ask for your confirmation
Execute go.postgresql.py with the selected file

Benefits

No manual file specification: Automatically finds the latest data file
Pattern-based matching: Works with timestamped or versioned files
Safety confirmation: Always asks before proceeding
Flexible search: Supports both recursive and non-recursive directory searching
Database agnostic: Works with both MySQL and PostgreSQL scripts

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests for new functionality
Run the test suite
Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Roadmap

Support

For questions, issues, or contributions, please visit the GitHub repository.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
AI_Instructions		AI_Instructions
src/csviper		src/csviper
test_bad_encoding_errors		test_bad_encoding_errors
test_complete		test_complete
test_data		test_data
test_data_modified		test_data_modified
test_encoding_errors		test_encoding_errors
test_encoding_fix_output		test_encoding_fix_output
test_encoding_metadata		test_encoding_metadata
test_full_compile		test_full_compile
test_improved_errors		test_improved_errors
test_output		test_output
.gitignore		.gitignore
COMMUNITY.md		COMMUNITY.md
ERROR_HANDLING_IMPROVEMENTS.md		ERROR_HANDLING_IMPROVEMENTS.md
LICENSE		LICENSE
ReadMe.md		ReadMe.md
SECURITY.md		SECURITY.md
code.json		code.json
csviper.code-workspace		csviper.code-workspace
example.env		example.env
git_store_cred.sh		git_store_cred.sh
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_test_compile.sh		run_test_compile.sh
setup.py		setup.py
source_me_for_local_session_install.sh		source_me_for_local_session_install.sh
source_me_to_get_venv.sh		source_me_to_get_venv.sh
test_bad_encoding.csv		test_bad_encoding.csv
test_data.csv		test_data.csv
test_data_modified.csv		test_data_modified.csv
test_encoding_fix.csv		test_encoding_fix.csv
test_encoding_issue.csv		test_encoding_issue.csv
test_invalid.csv		test_invalid.csv

License

DSACMS/npd_csviper

Folders and files

Latest commit

History

Repository files navigation

CSViper

Features

Installation

From Source (Development)

From PyPI (Coming Soon)

Usage

Phase 1: Extract Metadata

Phase 2: Generate SQL Scripts

Phase 3: Generate Import Script

Phase 4: Invoke Compiled Scripts (New!)

Running All Phases Together

Development Setup

Prerequisites

Setting up the Development Environment

Running Tests

Code Formatting

Linting

Project Structure

Output Files

Phase 1 Output

Phase 2 Output

Phase 3 Output

Example Workflow

Option 1: Full Compilation (Recommended)

Option 2: Step-by-Step Process

The Invoker System

How It Works

Example Invoker Usage

Benefits

Contributing

License

Roadmap

Support

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages