Add microbenchmark for spilling with compression #16512

ding-young · 2025-06-23T11:36:22Z

Which issue does this PR close?

Related to Investigate performance tradeoff in compressing spill files #16367

What changes are included in this PR?

This pr adds some microbenchmarks to compare performance characteristics between different compression codecs. It generates 50 RecordBatch for each case, and run both write & read. It manually prints out compression ratio (mem_bytes / disk_bytes) after each run.

To make benchmark more realistic, this pr generates RecordBatches that resemble the data it spills on AggregateExec (tpc-h) and SortExec (sort-tpch). It covers both thin batch consists of primitive arrays and wide batches with complex data type.

Rationale for this change

Benchmark Case Overview

Below are schema of RecordBatch & original query for each benchmark case.

Q2 [Int64(partkey), Decimal128(min(ps_supplycost))]

select
...
    p_partkey,
...
where
        p_partkey = ps_partkey
...
  and ps_supplycost = (
    select
        min(ps_supplycost)

Q16 [Utf8(p_brand), Utf8(p_type), Int32(p_size), Int64(supplier_cnt)]

select
    p_brand,
    p_type,
    p_size,
    count(distinct ps_suppkey) as supplier_cnt
...
group by
    p_brand, p_type, p_size
...

Q20 [Int64(suppkey), Int64(partkey), Decimal128(sum(l_quantity))]

... select
            ps_suppkey
        from
            partsupp
        where
                ps_partkey in (
                select
                    p_partkey
                    ...
            )
          and ps_availqty > (
            select
                    0.5 * sum(l_quantity)

Sort-tpch Q10 wide [Int32, Int64 * 3, Decimal128 * 4, Date * 3, Utf8 * 4]

SELECT l_orderkey, l_suppkey, l_linenumber, l_comment,
         l_partkey, l_quantity, l_extendedprice, l_discount, l_tax,
         l_returnflag, l_linestatus, l_shipdate, l_commitdate,
      l_receiptdate, l_shipinstruct, l_shipmode
FROM lineitem
ORDER BY l_orderkey, l_suppkey, l_linenumber, l_comment

Are these changes tested?

Are there any user-facing changes?

ding-young · 2025-06-23T11:43:22Z

To run bench, cargo bench --bench spill_io

Results

Case	Compression	Time (ms)	Memory (MB)	Disk (MB)	Compression Ratio
Q2	Uncompressed	51.521	9.4	9.5	0.990
Q2	Zstd	147.360	9.4	1.5461	6.215
Q2	Lz4Frame	97.942	9.4	3.2	2.922
Q16	Uncompressed	78.053	23.5	19.5	1.209
Q16	Zstd	236.480	23.5	4.4	5.373
Q16	Lz4Frame	145.180	23.5	7.8	3.007
Q20	Uncompressed	64.233	12.5	12.7	0.989
Q20	Zstd	190.570	12.5	2.4	5.282
Q20	Lz4Frame	123.430	12.5	4.8	2.629
Wide (Q10)	Uncompressed	215.220	56.4	54.2	1.041
Wide (Q10)	Zstd	443.190	56.4	11.6	4.857
Wide (Q10)	Lz4Frame	255.530	56.4	19.9	2.834

Add microbenchmark for spilling with compression

99d237a

github-actions bot added the physical-plan Changes to the physical-plan crate label Jun 23, 2025

add wide batch

68f6b7a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add microbenchmark for spilling with compression #16512

Add microbenchmark for spilling with compression #16512

ding-young commented Jun 23, 2025 •

edited

Loading

Uh oh!

ding-young commented Jun 23, 2025 •

edited

Loading

Uh oh!

Uh oh!

Add microbenchmark for spilling with compression #16512

Are you sure you want to change the base?

Add microbenchmark for spilling with compression #16512

Conversation

ding-young commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

What changes are included in this PR?

Rationale for this change

Benchmark Case Overview

Are these changes tested?

Are there any user-facing changes?

Uh oh!

ding-young commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Results

Uh oh!

Uh oh!

ding-young commented Jun 23, 2025 •

edited

Loading

ding-young commented Jun 23, 2025 •

edited

Loading