Skip to content

Best compression for SQL Dump file #26

@manticore-projects

Description

@manticore-projects

Based on the benchmarks, the following seems to work best for a SQL dump file:
24MB 27.65s 2.15% 38.83 kanzi -tTEXT+RLT+LZP+PACK+RLT -eTpaqx

A few more interesting observations:

  • higher block sizes (above 32G) do not seem to help
  • level 8 seems to have issues (with higher block sizes)
  • the Java program is much slower than the C code to the extends, that I wonder if wrapping the native code using FFI may offer benefits (I would need to look for optimizing the C code and compiler switches first)

You can close this issue when it does not yield any useful information.

are@ryzen ~/D/s/kanzi-testing-scripts (main)> ./kanzi_benchmark.sh ~/Downloads/ifrsbox_bak.h2.sql 
[INFO] Benchmarking compression algorithms
[INFO] Input file: /home/are/Downloads/ifrsbox_bak.h2.sql (1.1GB)
[INFO] Parallel jobs: 6

  COMPRESSED       TIME     RATIO      SPEED ALGORITHM
------------ ---------- --------- ---------- ----------

# BZIP3 Variants
        34MB      6.25s     3.13%     171.84 bzip3
        32MB      7.18s     2.90%     149.53 bzip3 -b32
        29MB      8.32s     2.69%     129.02 bzip3 -b64
        28MB     10.06s     2.56%     106.73 bzip3 -b128
        26MB     10.41s     2.41%     103.12 bzip3 -b256

# KANZI Level Presets (Default Block Size)
       105MB     0.782s     9.75%    1374.22 kanzi -l1
        75MB     0.574s     6.96%    1871.19 kanzi -l2
        73MB      1.45s     6.70%     741.21 kanzi -l3
        67MB      1.93s     6.18%     556.34 kanzi -l4
        53MB      8.43s     4.85%     127.47 kanzi -l5
        47MB      9.86s     4.30%     108.96 kanzi -l6
        32MB      6.86s     2.95%     156.61 kanzi -l7
        30MB     30.26s     2.71%      35.49 kanzi -l8
        27MB     39.21s     2.45%      27.38 kanzi -l9

# KANZI Level Presets (64MB Block Size)
       102MB     0.663s     9.45%    1620.50 kanzi -b64m -l1
        73MB     0.623s     6.78%    1725.10 kanzi -b64m -l2
        64MB      1.36s     5.95%     789.80 kanzi -b64m -l3
        57MB      1.91s     5.23%     563.57 kanzi -b64m -l4
        41MB     19.48s     3.73%      55.14 kanzi -b64m -l5
        37MB     16.13s     3.37%      66.60 kanzi -b64m -l6
        27MB      9.00s     2.50%     119.37 kanzi -b64m -l7
        49MB     30.24s     4.54%      35.51 kanzi -b64m -l8 <-- What happens here?
        27MB     39.91s     2.50%      26.90 kanzi -b64m -l9

# KANZI Large Block Sizes (Level 9)
        33MB       1m6s     2.98%      16.14 kanzi -b1m -l9
        29MB     47.90s     2.60%      22.42 kanzi -b4m -l9
        28MB     41.61s     2.56%      25.81 kanzi -b8m -l9
        27MB     41.84s     2.47%      25.66 kanzi -b16m -l9
        27MB     39.11s     2.45%      27.45 kanzi -b32m -l9 <-- sweet spot?
        27MB     40.74s     2.50%      26.35 kanzi -b64m -l9
        30MB     40.63s     2.79%      26.43 kanzi -b96m -l9
        34MB     49.30s     3.11%      21.78 kanzi -b128m -l9
        46MB     54.01s     4.27%      19.88 kanzi -b256m -l9

# KANZI Specialized Transform Chains (64MB blocks)
        28MB     43.72s     2.60%      24.56 kanzi -tRLT -eTpaqx
        24MB     34.58s     2.19%      31.05 kanzi -tPACK -eTpaqx
        24MB     35.73s     2.19%      30.05 kanzi -tPACK+ZRLT+PACK -eTpaqx
        24MB     36.00s     2.19%      29.83 kanzi -tPACK+RLT -eTpaqx
        24MB     35.38s     2.16%      30.36 kanzi -tRLT+PACK -eTpaqx
        24MB     34.16s     2.17%      31.44 kanzi -tRLT+TEXT+PACK -eTpaqx
        24MB     35.86s     2.16%      29.94 kanzi -tRLT+PACK+LZP -eTpaqx
        24MB     35.35s     2.16%      30.38 kanzi -tRLT+PACK+LZP+RLT -eTpaqx
        24MB     32.87s     2.19%      32.66 kanzi -tTEXT+ZRLT+PACK -eTpaqx
        24MB     27.59s     2.18%      38.93 kanzi -tRLT+LZP+PACK+RLT -eTpaqx
        24MB     32.22s     2.20%      33.32 kanzi -tTEXT+ZRLT+PACK+LZP -eTpaqx
        24MB     32.77s     2.19%      32.77 kanzi -tTEXT+RLT+PACK -eTpaqx
        26MB     27.40s     2.41%      39.20 kanzi -tTEXT+RLT+LZP -eTpaqx <-- sweet spot?
        24MB     33.64s     2.20%      31.93 kanzi -tTEXT+RLT+PACK+LZP -eTpaqx
        26MB     27.49s     2.41%      39.07 kanzi -tTEXT+RLT+LZP+RLT -eTpaqx
        24MB     33.86s     2.20%      31.71 kanzi -tTEXT+RLT+PACK+LZP+RLT -eTpaqx
        24MB     28.79s     2.15%      37.30 kanzi -tTEXT+RLT+LZP+PACK -eTpaqx
        24MB     33.54s     2.20%      32.01 kanzi -tTEXT+RLT+PACK+RLT+LZP -eTpaqx
        24MB     27.65s     2.15%      38.83 kanzi -tTEXT+RLT+LZP+PACK+RLT -eTpaqx <-- sweet spot?
        24MB     33.42s     2.20%      32.13 kanzi -tTEXT+PACK+RLT -eTpaqx
        24MB     34.28s     2.19%      31.33 kanzi -tEXE+TEXT+RLT+UTF+PACK -eTpaqx
        27MB     39.39s     2.47%      27.26 kanzi -tEXE+TEXT+RLT+UTF+DNA -eTpaqx
        27MB     39.92s     2.47%      26.90 kanzi -tEXE+TEXT+RLT -eTpaqx
        27MB     40.96s     2.47%      26.22 kanzi -tEXE+TEXT -eTpaqx
        34MB     37.67s     3.15%      28.51 kanzi -tTEXT+BWTS+SRT+ZRLT -eTpaqx
        34MB     42.37s     3.15%      25.35 kanzi -tBWTS+SRT+ZRLT -eTpaqx
        34MB     36.54s     3.16%      29.39 kanzi -tTEXT+BWTS+MTFT+RLT -eTpaqx
        34MB     42.93s     3.16%      25.01 kanzi -tBWTS+MTFT+RLT -eTpaqx
        34MB     29.62s     3.16%      36.25 kanzi -tTEXT+BWT+MTFT+RLT -eTpaqx
        34MB     59.46s     3.09%      18.06 kanzi -tBWT+MTFT+RLT -eTpaqx

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions