Skip to content

A lightweight C++ implementation of Huffman encoding compression. 一个简单的C++霍夫曼编码压缩实现。

License

Notifications You must be signed in to change notification settings

qfcy/cxx-huffman-simple

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The English introduction is placed below the Chinese version.

一个简单的C++霍夫曼编码实现,将输入二进制文件数据压缩成.compressed格式,并支持解压。

命令行

编译:g++ huffman.cpp -o bin\huffman -O2 -s -Wall
压缩:huffman file.txt,程序会生成file.txt.compressed作为压缩的结果。
解压:huffman file.txt.compressed,程序会生成file.txt。如果file.txt已存在,则会提示替换。

文件结构与格式

文件头部分:

  • 解压后数据大小n (8字节无符号整数)
  • 霍夫曼树位图占用的字节数m (2字节ushort)
  • 霍夫曼树位图,用二进制位表示 (m字节,0表示内部节点,1表示叶子结点,详见下文)
  • 霍夫曼树每个节点代表的字符信息,越前面的字符位于树的越浅层,出现的频率越高 (m字节,例如:<空格>eonsa\nritpd)(;clfug*<>+bz/,_m ...

文件体部分:

  • 霍夫曼编码的原始文件数据 (任意大小,但解压后的数据需要达到n字节,否则被视为文件截断)

霍夫曼树位图的存储

这个项目优化了霍夫曼树位图的存储空间,也就是遇到叶子结点之后不再往下存储。
传统的存储(a[2*n+1]和a[2*n+2]表示a[n]的子节点):

0
10
0011

如果霍夫曼树有十几层深,霍夫曼树的大小就会达到KB级别,这是不可接受的。
由于叶子结点1下面不可能存在新的叶子结点,这里的实现省略了多余的底部内部节点:

0
10
11

第三层的11连接到第二层的0。这里省略了两个00,优化了空间。

A lightweight C++ Huffman encoding implementation that compresses input binary file data into the .compressed format and supports decompression.

Command Line

  • Compiling: g++ huffman.cpp -o huffman -O2 -s -Wall.
  • Compression: huffman file.txt
    The program generates file.txt.compressed as the compressed result.
  • Decompression: huffman file.txt.compressed
    The program generates file.txt. If file.txt already exists, the program will prompt for replacement.

File Structure and Format

File Header:

  • Size of decompressed data n (8-byte unsigned long long)
  • Size of the Huffman tree bitmap in bytes m (2-byte unsigned short)
  • Huffman tree bitmap represented in binary bits (m bytes, where 0 represents an internal node and 1 represents a leaf node, see details below)
  • Character information for each node in the Huffman tree. Characters appearing earlier in the list are closer to the root of the tree and have higher frequencies (m bytes, e.g., <space>eonsa\nritpd)(;clfug*<>+bz/,_m ...)

File Body:

  • Huffman-encoded original file data (arbitrary size, but the decompressed data must match n bytes; otherwise, the file is considered truncated)

Storage of the Huffman Tree Bitmap

This project optimizes the storage space for the Huffman tree bitmap by omitting unnecessary internal nodes after encountering leaf nodes.

Traditional Storage (where a[2*n+1] and a[2*n+2] represent the child node of a[n]):

0
10
0011

If the Huffman tree has a depth of more than 12 levels, its size could reach kilobytes, which is unacceptable.

Optimized Storage:
Since no new leaf nodes can exist below a leaf node (1), the code omits redundant internal nodes at the bottom:

0
10
11

In this example, the third level's 11 connects to the second level's 0. Two 00 nodes are omitted, optimizing space usage.

About

A lightweight C++ implementation of Huffman encoding compression. 一个简单的C++霍夫曼编码压缩实现。

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages