Media Batch Manager

A toolkit for organizing and deduplicating media files and documents.

Overview

media-batch-manager is a collection of Python utilities designed to help you organize large collections of files. It includes two main tools:

ImageSort: Organizes and deduplicates images and videos using perceptual hashing
DocumentSort: Organizes and deduplicates documents using content-based hashing

Both tools can process large collections of files, identify and remove duplicates, and organize the remaining files into manageable batches.

Quick Start

Installation

# Clone the repository
git clone https://github.com/yourusername/media-batch-manager.git
cd media-batch-manager

# Install dependencies
pip install -r requirements.txt

Running the Tools

Image Sorter

python sort_image.py

Document Sorter

python sort_document.py

Features

Common Features

Intelligent deduplication: Identifies and removes duplicate files
Batch organization: Groups files into folders with a configurable maximum number of files per folder (default limit: 500 files per folder)
Progress tracking: Shows detailed progress bars for long-running operations
Source cleanup: Removes processed files and empty directories after successful processing
Detailed statistics: Provides summary statistics after processing

ImageSort Specific Features

Perceptual hashing: Uses image hashing algorithms to identify visually similar images
Support for HEIC format: Handles Apple's HEIC image format
Video file support: Processes common video formats

DocumentSort Specific Features

Content-based deduplication: Compares normalized document content to find duplicates
Automatic encoding detection: Handles various text encodings correctly
Smart categorization: Organizes files into categories based on file type
PDF processing: Extracts and analyzes text content from PDF files

How It Works

ImageSort Process

Scans the source directory for supported image and video files
Computes perceptual hashes for images and content hashes for other files
Identifies and removes duplicate files
Organizes unique files into batch folders
Cleans up the source directory

DocumentSort Process

Scans the source directory for document files
Analyzes document content with appropriate encoding detection
Computes normalized content hashes to identify duplicates
Categorizes files by type (documents, spreadsheets, presentations, etc.)
Organizes files into category-specific batch folders
Cleans up the source directory

Configuration

Both tools use default source and destination directories that can be customized:

# In sort_image.py
SOURCE_DIR = "./source_images"  # Change this to your source directory
DEST_DIR = "./sorted_images"    # Change this to your destination directory

# In sort_document.py
SOURCE_DIR = "./source_documents"  # Change this to your source directory
DEST_DIR = "./sorted_documents"    # Change this to your destination directory

You can also set these directories using environment variables for the document sorter:

export DOCUMENT_SORT_SOURCE="./source_documents"
export DOCUMENT_SORT_DEST="./sorted_documents"

Supported File Formats

Images

JPEG/JPG, PNG, GIF, BMP, WebP, SVG, TIFF/TIF, HEIC

Videos

MP4, MOV, AVI, MKV, WMV, FLV, WebM, MPG/MPEG, M4V

Documents

Office: DOC, DOCX, XLS, XLSX, PPT, PPTX, ODT, ODS, ODP
Text: TXT, MD, RTF, CSV, JSON, XML, YAML, LOG
Web: HTML, HTM, CSS, JS
Code: Various programming language files
Other: PDF, Archives (ZIP, RAR, etc.)

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
sort_document.py		sort_document.py
sort_image.py		sort_image.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Media Batch Manager

Overview

Quick Start

Installation

Running the Tools

Image Sorter

Document Sorter

Features

Common Features

ImageSort Specific Features

DocumentSort Specific Features

How It Works

ImageSort Process

DocumentSort Process

Configuration

Supported File Formats

Images

Videos

Documents

About

Uh oh!

Releases

Packages

Languages

License

akora/media-batch-manager

Folders and files

Latest commit

History

Repository files navigation

Media Batch Manager

Overview

Quick Start

Installation

Running the Tools

Image Sorter

Document Sorter

Features

Common Features

ImageSort Specific Features

DocumentSort Specific Features

How It Works

ImageSort Process

DocumentSort Process

Configuration

Supported File Formats

Images

Videos

Documents

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages