This project is a command-line tool for compressing and decompressing files using the Huffman coding algorithm. It is designed for educational purposes (e.g., the Introduction to C module at HSLU) and demonstrates key concepts such as file handling, frequency analysis, tree structures, and bit manipulation in C.
- Binary File Support: Handles any file type, not just text files
- Progress Visualization: Real-time progress bar during compression/decompression
- File Extension Management: Automatic .pda extension handling
- Smart Warnings: Detects and warns about inefficient compression scenarios
- File Handling: Reads input files and generates compressed output files
- Frequency Analysis: Analyzes character frequencies to build optimal Huffman codes
- Huffman Tree Construction: Uses a priority queue to assign shorter codes to frequent characters
- Encoding & Decoding: Compresses and decompresses files using the Huffman algorithm
- Error Handling: Detects and reports invalid input, file errors, and inefficiencies for small files
- Command Line Interface: Simple CLI for compression and decompression
The tool now provides visual feedback during operation:
- Progress bar showing compression/decompression status
- Updates every 16KB of processed data
- Compression ratio reporting after completion
- Warning messages for inefficient compression scenarios
The effectiveness of compression varies significantly depending on the file type:
- Text files (.txt)
- CSV files (.csv)
- Log files (.log)
- Source code files (.c, .h, etc.)
- Raw/uncompressed image files (.bmp)
- Raw data files
- Word documents (.docx, .doc)
- PDF files (.pdf)
- Compressed images (.jpg, .png, .gif)
- Audio/video files (.mp3, .mp4, .avi)
- Archive files (.zip, .rar, .7z)
- Executables (.exe, .dll)
The tool will now warn you when attempting to compress already-compressed file types, as these typically won't benefit from additional Huffman compression.
- Make sure
gcc
is installed. - Build with the Makefile:
make
- The program will be built as
huffman
.
- Install MinGW or TDM-GCC and ensure
gcc
is in your PATH. - Open a Bash shell (e.g., Git Bash, MSYS2).
- Run the build script:
./build_windows.sh
- The program will be built as
huffman.exe
.
Compress:
./huffman -c <input_file> <output_file>
The compressed file will automatically get the .pda extension.
Decompress:
./huffman -d <input_file> <output_file>
The input file must have .pda extension for decompression.
Example:
# Compress (will create example.pda)
./huffman -c example.txt example
# Decompress (will restore original file)
./huffman -d example.pda example.txt
- For very small files (<100 bytes), the compressed file may be larger than the original due to Huffman tree overhead
- The maximum supported file size is 1GB
- The tool is cross-platform and works on Linux, macOS, and Windows (32/64 bit)
- Compression ratio depends heavily on file content and type
- Progress bar provides real-time feedback during operation
- Automatic warnings for inefficient compression scenarios
- Binary Support: Handle any file type
- Visual Feedback: Show progress during operation
- File Handling: Read and write files
- Frequency Analysis: Analyze byte frequencies
- Huffman Tree Construction: Build the Huffman tree
- Encoding Process: Encode and decode files
- Error Handling: Robust error handling
- CLI: Command-line interface only
src/
├── core/ # Core Huffman algorithm
│ ├── huffman.c
│ └── huffman.h
├── io/ # Input/Output
│ ├── bit_io.c
│ ├── bit_io.h
│ ├── file_io.c
│ └── file_io.h
├── compression/ # Compression specific
│ ├── encode.c
│ ├── encode.h
│ ├── decode.c
│ └── decode.h
├── utils/ # Utility functions
│ ├── frequency.c
│ ├── frequency.h
│ ├── file_extension.c # Extension handling
│ ├── file_extension.h
│ ├── progress_bar.c # Progress visualization
│ └── progress_bar.h
└── main.c # Entry point