-
-
Notifications
You must be signed in to change notification settings - Fork 226
Open
Description
Context
We are interested in:
- Using numpy to directly deserialize the data into numpy objects.
- Use columnar format so we retrieve a list of columns' values, instead of a list of rows.
We already specify use_numpy as True.
Our observations when stepping through the clickhouse-driver package:
- For the result set to return in columnar format, it seems like we have to call
Client.execute()directly. - Using a cursor object to execute a query does not allow columnar to be specified. See:
response = execute( - We are using https://github.com/jayvynl/django-clickhouse-backend which uses mymarilyn/clickhouse-driver under the hood. In our application code, we execute queries by calling
cursor.execute()
- We are using https://github.com/jayvynl/django-clickhouse-backend which uses mymarilyn/clickhouse-driver under the hood. In our application code, we execute queries by calling
In addition, when columnar=False, we observe the following:
- The results are deserialized per column and we do see it's serialized to specific np objects (e.g.
np.datetime64)
- However, when the results are finally returned as a list of rows, we get a numpy list of generic objects, where each object is the python object:
Proposed
Allow a cursor object to specify whether to return results in columns (columnar=True) and/or to deserialize data to numpy objects (use_numpy=True). I've confirmed that by passing columnar=True, the column values' deserialized into specific np objects (e.g. np.datetime64) are retained.
The intention of doing this at the cursor-level is so that other queries are not affected (INSERT queries, database inspection queries).
The API can be like this:
sql = """
SELECT toDateTime32('2022-01-01 01:00:05', 'UTC'), number, number*2.5
FROM system.numbers
LIMIT 3
"""
with cursor.set_query_execution_args(
columnar=True, use_numpy=True
) as cursor:
cursor.execute(sql)
np.testing.assert_equal(
cursor.fetchall(),
[
np.array(
[
np.datetime64("2022-01-01T01:00:05"),
np.datetime64("2022-01-01T01:00:05"),
np.datetime64("2022-01-01T01:00:05"),
],
dtype="datetime64[s]",
),
np.array([0, 1, 2], dtype=np.uint64),
np.array([0, 2.5, 5.0], dtype=np.float64),
],
)
cursor.execute(sql)
np.testing.assert_equal(
cursor.fetchmany(2),
[
np.array(
[
np.datetime64("2022-01-01T01:00:05"),
np.datetime64("2022-01-01T01:00:05"),
np.datetime64("2022-01-01T01:00:05"),
],
dtype="datetime64[s]",
),
np.array([0, 1, 2], dtype=np.uint64),
],
)I can open a PR if the maintainers find that the PR would be worth doing (and reviewing).
Metadata
Metadata
Assignees
Labels
No labels

