Optimize allocation rate for `int64` array in `hex` function #16483

Fly-Style · 2025-06-20T14:45:37Z

Which issue does this PR close?

Comment from #15947 regarding potentially optimized memory usage #15947 (comment)

Rationale for this change

iter().map(...).collect() introduces intermediate allocations and this patch optimizes this behavior by reducing buffer allocation only once specifically of hex computation for int64.

Dandandan · 2025-06-20T17:00:43Z

I think this is probably faster! Do we have any benchmarks for this to show it?

Fly-Style · 2025-06-24T09:48:36Z

@Dandandan hi, Daniël! On my way to benchmark it :)

Fly-Style · 2025-06-24T10:33:00Z

Hm, interestingly, the speedup is not higher than I expected...
I assume there may be some inconsistencies in benchmark code, I have experience only with JMH, so, I may miss something here. /cc @Dandandan

Benchmark results:

Benchmarking compute_hex_new: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.1s, or reduce sample count to 90.
compute_hex_new         time:   [49.225 ms 49.905 ms 50.779 ms]
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) high mild
  6 (6.00%) high severe

Benchmarking compute_hex_old: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.0s, or reduce sample count to 90.
compute_hex_old         time:   [50.322 ms 50.565 ms 50.820 ms]
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

Hardware: MBP m3 pro

Benchmark code:

use arrow::array::PrimitiveDictionaryBuilder;
use arrow::datatypes::{Int32Type, Int64Type};
use criterion::{criterion_group, criterion_main, Criterion};
use datafusion_expr::ColumnarValue;
use datafusion_spark::function::math::hex::{compute_hex, compute_hex_old};
use std::sync::Arc;

const DICT_SIZE: i64 = 1024 * 1024;

fn criterion_benchmark(c: &mut Criterion) {
    let mut input_builder = PrimitiveDictionaryBuilder::<Int32Type, Int64Type>::new();
    for i in 0..DICT_SIZE {
        input_builder.append_value((DICT_SIZE - i) * 1024);
    }
    let input = input_builder.finish();
    let columnar_value = ColumnarValue::Array(Arc::new(input));

    c.bench_function("compute_hex_new", |b| {
        b.iter(|| compute_hex(&[columnar_value.clone()], false).unwrap());
    });

    c.bench_function("compute_hex_old", |b| {
        b.iter(|| compute_hex_old(&[columnar_value.clone()], false).unwrap());
    });
}

criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);

Optimize allocation rate for int64 array in hex function

35a7a38

github-actions bot added the spark label Jun 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize allocation rate for `int64` array in `hex` function #16483

Optimize allocation rate for `int64` array in `hex` function #16483

Uh oh!

Fly-Style commented Jun 20, 2025

Uh oh!

Dandandan commented Jun 20, 2025

Uh oh!

Fly-Style commented Jun 24, 2025

Uh oh!

Fly-Style commented Jun 24, 2025 •

edited

Loading

Uh oh!

Uh oh!

Optimize allocation rate for int64 array in hex function #16483

Are you sure you want to change the base?

Optimize allocation rate for int64 array in hex function #16483

Uh oh!

Conversation

Fly-Style commented Jun 20, 2025

Which issue does this PR close?

Rationale for this change

Uh oh!

Dandandan commented Jun 20, 2025

Uh oh!

Fly-Style commented Jun 24, 2025

Uh oh!

Fly-Style commented Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Optimize allocation rate for `int64` array in `hex` function #16483

Optimize allocation rate for `int64` array in `hex` function #16483

Fly-Style commented Jun 24, 2025 •

edited

Loading