-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
Memory does not get freed after executing multiple COPY ... TO ... PARTITIONED BY ...
queries. I have not been able to identify what is causing this behavior.
To Reproduce
The behavior can be observed using datafusion-cli. I have been monitoring the memory usage through Activity Monitor.
- Download test parquet file (120MB): https://file.io/eKiHwu4waHVN
- Run
datafusion-cli
- Create a external table:
CREATE EXTERNAL TABLE my_table
(
col1 VARCHAR NOT NULL,
timestamp BIGINT NOT NULL,
col2 VARCHAR NOT NULL,
col3 VARCHAR NOT NULL,
col4 VARCHAR NOT NULL,
col5 VARCHAR NOT NULL,
col6 VARCHAR NOT NULL,
col7 VARCHAR NOT NULL,
col8 VARCHAR NOT NULL,
col9 VARCHAR NOT NULL,
col10 VARCHAR NOT NULL,
col11 VARCHAR NOT NULL,
col12 DOUBLE
)
WITH ORDER (col1 ASC, timestamp ASC) STORED AS PARQUET LOCATION 'test_file.parquet';
- Execute
COPY .. PARTITIONED BY
query:
COPY (SELECT col1, timestamp, col10, col12 FROM my_table ORDER BY col1 ASC, timestamp ASC)
TO './output' STORED AS PARQUET PARTITIONED BY (col1) OPTIONS (compression 'uncompressed');
- Monitor memory usage.
- Repeat execution of
COPY .. PARTITIONED BY
query and continue monitoring memory usage. - Observation: memory does not get released.
Expected behavior
My expectation is to be able to run the COPY
command multiple times without having the memory usage increasing every time.
Additional context
There is more context of what I am trying to do in Discord: https://discord.com/channels/885562378132000778/1166447479609376850/1253419900043526236
I am also experiencing the same behavior when running my application in Kubernetes. K8s terminates my pod once it exceeds the pod memory limits:

cpaika, mattaustin83 and alsenz
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working