Some group by query is 6~7x slower than DuckDB

### Describe the bug

Hi team, I have encountered a performance issue when I run same query on a big table with datafusion comparing with DuckDB.

I will try to simplify my case and replicate the issue in my following codes.


### To Reproduce

```python
import timeit
import numpy as np
import pyarrow as pa
import datafusion
from datafusion import SessionContext
import duckdb

print(duckdb.__version__)
print(datafusion.__version__)

# prepare data

batches = 100000

names = list("abcdefghijklmnopqrstuvwxyz")
names = [n + m for n in names for m in names]

names_array = pa.concat_arrays([pa.array(names)] * batches)
values_array = pa.concat_arrays([pa.array(np.random.randint(1, 100, len(names))) for _ in range(batches)])

pa_table = pa.Table.from_arrays([names_array, values_array], names=["name", "value"])

# prepare query
sql = "select name, sum(value) as value FROM pa_table group by name;"
n_round = 10

# duckb
elapsed = timeit.timeit('duckdb.sql(sql).to_arrow_table()', number=n_round , globals=globals())
duckdb_per_round = elapsed  / n_round

# datafusion
ctx = SessionContext()
_ = ctx.from_arrow(pa_table, "pa_table")
elapsed  = timeit.timeit('ctx.sql(sql).to_arrow_table()', number=n_round , globals=globals())
datafusion_per_round= elapsed  / n_round

# result
print(f"{'duckdb':<12}: {duckdb_per_round * 1000:.2f}ms")
print(f"{'datafusion':<12}: {datafusion_per_round * 1000:.2f}ms")
```

the output will look like:
```bash
1.3.1
47.0.0
duckdb      : 152.15ms
datafusion  : 1002.04ms
```

### Expected behavior

_No response_

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Some group by query is 6~7x slower than DuckDB #1186

Describe the bug

To Reproduce

Expected behavior

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Some group by query is 6~7x slower than DuckDB #1186

Description

Describe the bug

To Reproduce

Expected behavior

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions