Skip to content

P4wnda/file-compression-tool

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Huffman File Compression Tool

Overview

This project is a command-line tool for compressing and decompressing files using the Huffman coding algorithm. It is designed for educational purposes (e.g., the Introduction to C module at HSLU) and demonstrates key concepts such as file handling, frequency analysis, tree structures, and bit manipulation in C.

Features

  • Binary File Support: Handles any file type, not just text files
  • Progress Visualization: Real-time progress bar during compression/decompression
  • File Extension Management: Automatic .pda extension handling
  • Smart Warnings: Detects and warns about inefficient compression scenarios
  • File Handling: Reads input files and generates compressed output files
  • Frequency Analysis: Analyzes character frequencies to build optimal Huffman codes
  • Huffman Tree Construction: Uses a priority queue to assign shorter codes to frequent characters
  • Encoding & Decoding: Compresses and decompresses files using the Huffman algorithm
  • Error Handling: Detects and reports invalid input, file errors, and inefficiencies for small files
  • Command Line Interface: Simple CLI for compression and decompression

Visual Feedback

The tool now provides visual feedback during operation:

  • Progress bar showing compression/decompression status
  • Updates every 16KB of processed data
  • Compression ratio reporting after completion
  • Warning messages for inefficient compression scenarios

Compression Effectiveness

The effectiveness of compression varies significantly depending on the file type:

Effective Compression (Recommended):

  • Text files (.txt)
  • CSV files (.csv)
  • Log files (.log)
  • Source code files (.c, .h, etc.)
  • Raw/uncompressed image files (.bmp)
  • Raw data files

Poor Compression (Not Recommended):

  • Word documents (.docx, .doc)
  • PDF files (.pdf)
  • Compressed images (.jpg, .png, .gif)
  • Audio/video files (.mp3, .mp4, .avi)
  • Archive files (.zip, .rar, .7z)
  • Executables (.exe, .dll)

The tool will now warn you when attempting to compress already-compressed file types, as these typically won't benefit from additional Huffman compression.

Build Instructions

Linux / macOS

  1. Make sure gcc is installed.
  2. Build with the Makefile:
    make
  3. The program will be built as huffman.

Windows

  1. Install MinGW or TDM-GCC and ensure gcc is in your PATH.
  2. Open a Bash shell (e.g., Git Bash, MSYS2).
  3. Run the build script:
    ./build_windows.sh
  4. The program will be built as huffman.exe.

Usage

Compress:

./huffman -c <input_file> <output_file>

The compressed file will automatically get the .pda extension.

Decompress:

./huffman -d <input_file> <output_file>

The input file must have .pda extension for decompression.

Example:

# Compress (will create example.pda)
./huffman -c example.txt example

# Decompress (will restore original file)
./huffman -d example.pda example.txt

Notes

  • For very small files (<100 bytes), the compressed file may be larger than the original due to Huffman tree overhead
  • The maximum supported file size is 1GB
  • The tool is cross-platform and works on Linux, macOS, and Windows (32/64 bit)
  • Compression ratio depends heavily on file content and type
  • Progress bar provides real-time feedback during operation
  • Automatic warnings for inefficient compression scenarios

Project Objectives

  1. Binary Support: Handle any file type
  2. Visual Feedback: Show progress during operation
  3. File Handling: Read and write files
  4. Frequency Analysis: Analyze byte frequencies
  5. Huffman Tree Construction: Build the Huffman tree
  6. Encoding Process: Encode and decode files
  7. Error Handling: Robust error handling
  8. CLI: Command-line interface only

Folder Structure

src/
├── core/             # Core Huffman algorithm
│   ├── huffman.c
│   └── huffman.h
├── io/               # Input/Output
│   ├── bit_io.c
│   ├── bit_io.h
│   ├── file_io.c
│   └── file_io.h
├── compression/      # Compression specific
│   ├── encode.c
│   ├── encode.h
│   ├── decode.c
│   └── decode.h
├── utils/           # Utility functions
│   ├── frequency.c
│   ├── frequency.h
│   ├── file_extension.c    # Extension handling
│   ├── file_extension.h
│   ├── progress_bar.c      # Progress visualization
│   └── progress_bar.h
└── main.c           # Entry point

About

Huffman Encoding-Based File Compression Tool

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published