-
Notifications
You must be signed in to change notification settings - Fork 48
Open
Description
I ran the TPC-H queries locally (Macbook M3 with 16GB of RAM) for Scale Factor 40 and Sail's performance looks great!

Sail works well for all scale factors under 40. It errors out at SF80 on my machine with this error:
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/Users/matthewpowers/Documents/code/my_apps/querybench/querybench/run_tpch.py", line 70, in <module>
sail_res = querybench.sail.tpch_queries.run_benchmarks(spark).rename(columns={"duration": "sail"})
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/matthewpowers/Documents/code/my_apps/querybench/querybench/sail/tpch_queries.py", line 105, in run_benchmarks
benchmark(q4, spark, benchmarks=benchmarks, name="q4")
File "/Users/matthewpowers/Documents/code/my_apps/querybench/querybench/helpers.py", line 24, in benchmark
ret = f(dfs, **kwargs)
^^^^^^^^^^^^^^^^
File "/Users/matthewpowers/Documents/code/my_apps/querybench/querybench/sail/tpch_queries.py", line 18, in q4
return spark.sql(querybench.queries.tpch.q4()).collect()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/matthewpowers/Documents/code/my_apps/querybench/.venv/lib/python3.12/site-packages/pyspark/sql/connect/dataframe.py", line 1778, in collect
table, schema = self._to_table()
^^^^^^^^^^^^^^^^
File "/Users/matthewpowers/Documents/code/my_apps/querybench/.venv/lib/python3.12/site-packages/pyspark/sql/connect/dataframe.py", line 1791, in _to_table
table, schema, self._execution_info = self._session.client.to_table(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/matthewpowers/Documents/code/my_apps/querybench/.venv/lib/python3.12/site-packages/pyspark/sql/connect/client/core.py", line 925, in to_table
table, schema, metrics, observed_metrics, _ = self._execute_and_fetch(req, observations)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/matthewpowers/Documents/code/my_apps/querybench/.venv/lib/python3.12/site-packages/pyspark/sql/connect/client/core.py", line 1560, in _execute_and_fetch
for response in self._execute_and_fetch_as_iterator(
File "/Users/matthewpowers/Documents/code/my_apps/querybench/.venv/lib/python3.12/site-packages/pyspark/sql/connect/client/core.py", line 1537, in _execute_and_fetch_as_iterator
self._handle_error(error)
File "/Users/matthewpowers/Documents/code/my_apps/querybench/.venv/lib/python3.12/site-packages/pyspark/sql/connect/client/core.py", line 1811, in _handle_error
self._handle_rpc_error(error)
File "/Users/matthewpowers/Documents/code/my_apps/querybench/.venv/lib/python3.12/site-packages/pyspark/sql/connect/client/core.py", line 1882, in _handle_rpc_error
raise convert_exception(
pyspark.errors.exceptions.connect.IllegalArgumentException: invalid argument: operation not found: 6f6f15ff-e59d-42d8-8236-09230cc2462b
Here's the benchmark code in case you'd like to reproduce on your end: https://github.com/MrPowers/querybench
Metadata
Metadata
Assignees
Labels
No labels