Dumper

Cross-platform Excel ETL pre-processor that flattens Excel files into a predictable CSV format, preserving key source metadata for easy ingestion into data pipelines. Exports values or formulas, simplifying audit processes. Available in both Python and PowerShell versions.

Features

Auto-discovery: Automatically finds and processes the newest Excel file in the specified directory
Multi-sheet support: Extracts data from all worksheets in an Excel file
Hidden sheet control: Option to include or skip hidden worksheets
Smart filtering: Only exports rows containing non-null data
Formula extraction: Option to show formulas instead of calculated values (.xlsx/.xlsm only)
Worksheet identification: Prepends worksheet name as the first column in output
Row number tracking: Option to include Excel row numbers for data traceability
Collision-safe naming: Automatically adds incremental numbers if output files already exist
Custom output directory: Specify where to save the exported CSV files
Flexible input sources: Search for files in any directory or process specific files
Multiple Excel formats: Supports .xlsx, .xls, .xlsm, and .xlsb files
Timestamped output: Uses file modification time for consistent naming (ISO 8601 format)

Quick Start

Python Version

# Install dependencies
pip install pandas openpyxl xlrd

# Basic usage
python dumper.py
python dumper.py -file "data.xlsx" -rownumbers

PowerShell Version

# Install dependencies
Install-Module ImportExcel -Force

# Basic usage
.\dumper.ps1
.\dumper.ps1 -File "data.xlsx" -RowNumbers

Installation

Python Version

Prerequisites: Python 3.6 or higher

Install Dependencies:

pip install pandas openpyxl xlrd

PowerShell Version

Prerequisites: PowerShell 5.1 or PowerShell Core 6+

Install Dependencies:

with admin rights

Install-Module ImportExcel -Force

without admin rights

Install-Module ImportExcel -Force -Scope CurrentUser

Command Line Options

Feature	Python Version	PowerShell Version	Description
Specify file	`-file FILE`	`-File FILE`	Process specific Excel file
Input directory	`-input DIR`	`-InputDir DIR`	Directory to search for Excel files
Output directory	`-output DIR`	`-OutputDir DIR`	Directory for CSV output
Skip hidden sheets	`-no-hide`	`-NoHide`	Exclude hidden worksheets
Include row numbers	`-rownumbers`	`-RowNumbers`	Add Excel row numbers to output
Show formulas	`-formulas`	`-Formulas`	Show formulas instead of values (.xlsx/.xlsm only)
Help	`-help`	`-Help`	Display detailed help

Usage Examples

Basic Usage

Python:

# Process newest Excel file from current directory
python dumper.py

# Process specific file
python dumper.py -file "data.xlsx"

# Include row numbers and formulas
python dumper.py -file "data.xlsx" -rownumbers -formulas

PowerShell:

# Process newest Excel file from current directory
.\dumper.ps1

# Process specific file
.\dumper.ps1 -File "data.xlsx"

# Include row numbers and formulas
.\dumper.ps1 -File "data.xlsx" -RowNumbers -Formulas

Advanced Usage

Python:

# Comprehensive processing with all options
python dumper.py -input "./source" -output "./exports" -file "report.xlsx" -no-hide -rownumbers -formulas

# Process from specific directory
python dumper.py -input "C:\Reports" -output "C:\Processed"

PowerShell:

# Comprehensive processing with all options
.\dumper.ps1 -InputDir "./source" -OutputDir "./exports" -File "report.xlsx" -NoHide -RowNumbers -Formulas

# Process from specific directory
.\dumper.ps1 -InputDir "C:\Reports" -OutputDir "C:\Processed"

Get Help

Python:

python dumper.py -help

PowerShell:

.\dumper.ps1 -Help

Output Format

File Naming Convention

Both versions create files with the pattern:

dumper[py|ps]_[original_filename]_[timestamp].csv

Python version: dumperpy_Sales_Report_2025-07-21T14-30-52-0500.csv
PowerShell version: dumperps_Sales_Report_2025-07-21T14-30-52-0500.csv
Timestamp format: ISO 8601 with colons replaced by hyphens
Collision handling: Adds (1), (2), etc. if file exists

CSV Structure

The output CSV contains:

First column: Worksheet name
Second column: Excel row number (if row numbers option used)
Remaining columns: Original data from Excel worksheets (Column_1, Column_2, etc.)

Without row numbers:

Worksheet,Column_1,Column_2,Column_3
Sheet1,John Doe,Sales Manager,50000
Sheet1,Jane Smith,Developer,65000
Summary,Total Employees,,2

With row numbers:

Worksheet,Row_Number,Column_1,Column_2,Column_3
Sheet1,2,John Doe,Sales Manager,50000
Sheet1,3,Jane Smith,Developer,65000
Summary,5,Total Employees,,2

With formulas (when -formulas option used):

Worksheet,Column_1,Column_2,Column_3
Sheet1,John Doe,Sales Manager,50000
Summary,Total Employees,"FORMULA: =COUNTA(A:A)-1",2
Summary,Average Salary,"FORMULA: =AVERAGE(C2:C3)",57500

Supported Excel Formats

Both versions support:

.xlsx - Excel 2007+ (Open XML Format)
.xls - Excel 97-2003 (Binary Format)
.xlsm - Excel Macro-Enabled Workbook
.xlsb - Excel Binary Workbook

Note: Formula extraction (-formulas/-Formulas) only works with .xlsx and .xlsm files.

Dependencies

Python Version

pandas - Data manipulation and Excel reading
openpyxl - Excel 2007+ file support and formula extraction
xlrd - Legacy Excel (.xls) file support
Standard library modules: argparse, csv, glob, os, sys, datetime, pathlib

PowerShell Version

ImportExcel module - Excel file processing
PowerShell 5.1 or PowerShell Core 6+

Platform Compatibility

Python Version

Windows, macOS, Linux - Full cross-platform support
Uses pathlib for platform-independent path handling

PowerShell Version

Windows - Native PowerShell 5.1+
macOS, Linux - PowerShell Core 6+
Cross-platform path handling with PowerShell cmdlets

Error Handling

Both versions provide clear error messages for:

File not found: When specified file doesn't exist
No Excel files: When no Excel files found in directory
Missing dependencies: When required libraries/modules aren't installed
Read errors: When Excel file is corrupted or inaccessible
Write errors: When output location is not writable
Directory creation: Automatically creates output directories

Example Workflows

Daily Report Processing

Python:

# Process daily reports
python dumper.py -input "/data/daily_reports" -output "./exports/$(date +%Y-%m-%d)"

PowerShell:

# Process daily reports
.\dumper.ps1 -InputDir "C:\Data\DailyReports" -OutputDir "C:\Exports\$(Get-Date -Format 'yyyy-MM-dd')"

Batch Processing Multiple Directories

Python:

# Process multiple directories
for dir in data/*/; do 
    python dumper.py -input "$dir" -output "./processed/$(basename $dir)"
done

PowerShell:

# Process multiple directories
Get-ChildItem "C:\Data" -Directory | ForEach-Object {
    .\dumper.ps1 -InputDir $_.FullName -OutputDir "C:\Processed\$($_.Name)"
}

Troubleshooting

Common Issues

Python - "Error: Required libraries not found"

pip install pandas openpyxl xlrd
python -c "import pandas, openpyxl; print('Dependencies OK')"

PowerShell - "ImportExcel module not found"

Install-Module ImportExcel -Force -Scope CurrentUser
Import-Module ImportExcel

"No Excel files found in directory"

Verify directory path with -input/-InputDir
Check file extensions (.xlsx, .xls, .xlsm, .xlsb)
Use -file/-File to specify exact filename

"Permission denied" errors

Check write permissions for output directory
Close Excel files that might be locking files
Run with appropriate privileges

"Error reading Excel file"

Verify file isn't corrupted by opening in Excel
Check file isn't password-protected
Ensure file isn't in use by another application

Choosing Between Versions

Use Python Version When:

You need maximum cross-platform compatibility
You're integrating with existing Python data pipelines
You prefer pip-based dependency management
You need the most mature Excel processing capabilities

Use PowerShell Version When:

You're working primarily in Windows environments
You prefer PowerShell-based automation workflows
You want native integration with Windows administration tasks
You're already using PowerShell for other data processing

Advanced Configuration

Large File Processing

Both versions handle large files efficiently by:

Processing sheets one at a time to manage memory
Showing progress for each sheet
Using streaming approaches where possible

Integration with CI/CD

Both scripts can be integrated into automated workflows:

Python in GitHub Actions:

- name: Process Excel files
  run: |
    pip install pandas openpyxl xlrd
    python dumper.py -input "./data" -output "./processed"

PowerShell in Azure DevOps:

- task: PowerShell@2
  inputs:
    targetType: 'inline'
    script: |
      Install-Module ImportExcel -Force
      .\dumper.ps1 -InputDir ".\data" -OutputDir ".\processed"

Performance Considerations

Hidden sheets: Use -no-hide/-NoHide to skip unnecessary processing
Formula extraction: Only enable when needed as it requires additional processing
Large workbooks: Both versions stream data to minimize memory usage
Network drives: Local processing is faster than network-mounted drives

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github/workflows		.github/workflows
dist		dist
excel_dumper.egg-info		excel_dumper.egg-info
excel_dumper		excel_dumper
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
conftest.py		conftest.py
coverage.xml		coverage.xml
debug.py		debug.py
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.py		setup.py
test.json		test.json
test_basic_ci.py		test_basic_ci.py

License

pgaljan/excel-dumper

Folders and files

Latest commit

History

Repository files navigation

Dumper

Features

Quick Start

Python Version

PowerShell Version

Installation

Python Version

PowerShell Version

Command Line Options

Usage Examples

Basic Usage

Advanced Usage

Get Help

Output Format

File Naming Convention

CSV Structure

Supported Excel Formats

Dependencies

Python Version

PowerShell Version

Platform Compatibility

Python Version

PowerShell Version

Error Handling

Example Workflows

Daily Report Processing

Batch Processing Multiple Directories

Troubleshooting

Common Issues

Choosing Between Versions

Use Python Version When:

Use PowerShell Version When:

Advanced Configuration

Large File Processing

Integration with CI/CD

Performance Considerations

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages