CSViper is a command-line tool that automates the process of analyzing CSV files and generating SQL scripts and Python programs to load the data into relational databases. It supports both MySQL and PostgreSQL backends and is designed for scenarios where the database is hosted remotely while the CSV file resides on the local machine.
- CSV Analysis: Automatically detects CSV format (delimiter, quote character) and analyzes column structure
- Column Normalization: Converts column names to SQL-safe identifiers with intelligent duplicate handling
- Multi-Database Support: Generates scripts for both MySQL and PostgreSQL
- Modular Design: Four-phase approach allows for flexible workflow management
- Standalone Scripts: Generates self-contained Python import scripts for easy deployment
- Intelligent File Discovery: Invoker system automatically finds and processes the latest matching data files
- Pattern-Based Matching: Uses glob patterns to handle timestamped or versioned file naming conventions
- Full Compilation Workflow: Single command to process CSV files from analysis to ready-to-run import scripts
- Clone the repository:
git clone https://github.com/ftrotter/csviper.git
cd csviper
- Create and activate a virtual environment:
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
- Install in development mode:
pip install -e .
pip install csviper
CSViper operates in four phases that can be run together or separately:
Analyze a CSV file and extract column information:
python -m csviper extract_metadata --from_csv=data.csv --output_dir=./output/
Options:
--from_csv
: Path to the CSV file to analyze (required)--output_dir
: Output directory (defaults to CSV filename without extension)--overwrite_previous
: Overwrite existing output files
Generate CREATE TABLE and data import SQL scripts:
python -m csviper build_sql --from_metadata_json=output/data.metadata.json --output_dir=./output/
Create a standalone Python script for data import:
python -m csviper build_import_script --from_resource_dir=./output/ --output_dir=./output/
Execute compiled import scripts with automatic file discovery:
python -m csviper invoke-compiled-script --run_import_from=./output/ --import_data_from_dir=./data_directory/ --database_type=postgresql
Options:
--run_import_from
: Directory containing compiled CSViper scripts and metadata (required)--import_data_from_dir
: Directory to search for data files (required)--database_type
: Database type - either 'mysql' or 'postgresql' (required)
python -m csviper full_compile --from_csv=data.csv --output_dir=./output/ --overwrite_previous
- Python 3.8 or higher
- Virtual environment (recommended)
- Clone the repository and navigate to the project directory
- Source the virtual environment setup script:
source source_me_to_get_venv.sh
- Install development dependencies:
pip install -e ".[dev]"
pytest
black src/
flake8 src/
csviper/
├── src/csviper/
│ ├── __init__.py # Package initialization
│ ├── __main__.py # CLI entry point
│ ├── column_normalizer.py # Column name normalization utilities
│ ├── metadata_extractor.py # CSV analysis and metadata extraction
│ ├── mysql_generator.py # MySQL SQL generation (coming soon)
│ ├── postgresql_generator.py # PostgreSQL SQL generation (coming soon)
│ └── script_generators/ # Python script generation (coming soon)
├── tests/ # Test files
├── AI_Instructions/ # Development documentation
├── setup.py # Package setup configuration
├── requirements.txt # Project dependencies
└── README.md # This file
CSViper generates several files during processing:
{filename}.metadata.json
: Contains CSV structure analysis, normalized column names, and column width information
{filename}.create_table_mysql.sql
: MySQL CREATE TABLE script{filename}.create_table_postgres.sql
: PostgreSQL CREATE TABLE script{filename}.import_data_mysql.sql
: MySQL data import script{filename}.import_data_postgres.sql
: PostgreSQL data import script
go.mysql.py
: Standalone Python script for MySQL database importgo.postgresql.py
: Standalone Python script for PostgreSQL database import
Run all phases at once:
python -m csviper full_compile --from_csv=sales_data.csv --output_dir=./sales_data/
- Analyze your CSV file:
python -m csviper extract_metadata --from_csv=sales_data.csv
-
Review the generated metadata in
sales_data/sales_data.metadata.json
-
Generate SQL scripts:
python -m csviper build_sql --from_metadata_json=sales_data/sales_data.metadata.json
- Create import scripts:
python -m csviper build_import_script --from_resource_dir=sales_data/
- Use the invoker system to import data:
python -m csviper invoke-compiled-script --run_import_from=./sales_data/ --import_data_from_dir=./data/ --database_type=postgresql
The invoker system is CSViper's intelligent file discovery and execution engine. It automatically finds the most recent data file matching your original CSV pattern and executes the appropriate import script.
- File Pattern Matching: CSViper stores a glob pattern in the metadata file (e.g.,
sales_data_*.csv
) that matches files with similar naming conventions - Automatic Discovery: The invoker searches your data directory for files matching this pattern
- Latest File Selection: It automatically selects the most recently modified file
- User Confirmation: Shows you the selected file and asks for confirmation before proceeding
- Script Execution: Runs the appropriate database import script (
go.mysql.py
orgo.postgresql.py
)
If you have a directory with multiple data files:
data/
├── sales_data_2024-01.csv
├── sales_data_2024-02.csv
├── sales_data_2024-03.csv
└── other_data.csv
And compiled scripts in:
sales_data/
├── sales_data.metadata.json
├── go.mysql.py
├── go.postgresql.py
└── ...
Running the invoker:
python -m csviper invoke-compiled-script --run_import_from=./sales_data/ --import_data_from_dir=./data/ --database_type=postgresql
Will automatically:
- Find all files matching
sales_data_*.csv
- Select
sales_data_2024-03.csv
(most recent) - Ask for your confirmation
- Execute
go.postgresql.py
with the selected file
- No manual file specification: Automatically finds the latest data file
- Pattern-based matching: Works with timestamped or versioned files
- Safety confirmation: Always asks before proceeding
- Flexible search: Supports both recursive and non-recursive directory searching
- Database agnostic: Works with both MySQL and PostgreSQL scripts
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Run the test suite
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- Phase 1: CSV metadata extraction and column normalization
- Phase 2: SQL script generation for MySQL and PostgreSQL
- Phase 3: Python import script generation
- Phase 4: Invoker system with automatic file discovery
- Full compilation workflow
- Enhanced error handling and validation
- Progress bars for large file processing
- Configuration file support
- Additional database backend support
- Web interface for easier usage
- Docker containerization
For questions, issues, or contributions, please visit the GitHub repository.