This project implements a file compression and decompression tool using the Huffman coding algorithm. It provides an efficient, lossless method for reducing file sizes, making it useful for storage optimization and data transmission.
- Lossless compression of text files
- Decompression of files compressed with this tool
- Command-line interface for easy use
- Efficient implementation of Huffman coding algorithm
- C++ compiler (C++11 or later)
- Standard C++ libraries
- Standard Python libraries
- Clone the repository: git clone https://github.com/Zahidmohd/File-Zipper.git
- Navigate to the project directory:
cd foldername
(folder in which you have clone it). - Compile the project:
g++ -std=c++11 main.cpp huffman.cpp compress.cpp decompress.cpp -o huffman_compressor
- To compress a file:
./huffman_compressor c input.txt compressed.bin
- To decompress a file:
./huffman_compressor d compressed.bin decompressed.txt
-
We provide a shell script
test_huffman.sh
that automates the testing process. This script generates a test file, compresses it, decompresses it, and verifies the result.Usage:
./test_huffman.sh <file_size_in_kb>
Example:./test_huffman.sh 100
This will:- Generate a 100 KB test file
- Compress the file using the Huffman algorithm
- Decompress the compressed file
- Compare the original and decompressed files
- Report whether the files are identical or different
- Clean up temporary files
The script provides a quick and easy way to verify the correctness of the Huffman coding implementation.
-
If you prefer to test the process step by step:
- You can generate test files of any size using the provided Python script:
python create_test_file.py <filename>(take name as "input.txt" as i have code for the same) <size_in_kb>
- Example:
python create_test_file.py input.txt 100
This will create a 100 KB file namedinput.txt
.
- Compress the test file:
./huffman_compressor c input.txt compressed.bin
- Decompress the file:
./huffman_compressor d compressed.bin decompressed.txt
- Compare the original and decompressed files to verify the process:
cmp -s test_input.txt decompressed.txt && echo "Files are identical" || echo "Files are different"
-
-
For a more advanced file comparison and patch generation, you can use the bsdiff4 algorithm. Check out my bsdiff4 repository for implementation details and usage instructions.
-
Using bsdiff4, you can:
- Generate a patch between the original and compressed files.
- Apply the patch to verify the integrity of the compression/decompression process.
-
This provides an additional layer of verification for the Huffman coding implementation.
-
- The compressor reads the input file and counts the frequency of each character.
- It builds a Huffman tree based on these frequencies.
- The tree is used to generate unique binary codes for each character.
- The text is encoded using these codes, and the compressed data is written to a file along with the frequency table.
- For decompression, the process is reversed using the stored frequency table.
main.cpp
: Entry point of the programhuffman.h
: Header file with Huffman coding class and structure declarationshuffman.cpp
: Implementation of Huffman coding algorithmcompress.cpp
: File compression functiondecompress.cpp
: File decompression functiontest_huffman.sh
: Shell script to test our project is working or not.create_test_file.py
: File creation function for testing if required
Contributions, issues, and feature requests are welcome. Feel free to check issues page if you want to contribute.