Skip to content

Feedback Needed: Allow columnar and use_numpy parameter to be configured at the Cursor level #482

@paxcodes

Description

@paxcodes

Context

We are interested in:

  • Using numpy to directly deserialize the data into numpy objects.
  • Use columnar format so we retrieve a list of columns' values, instead of a list of rows.

We already specify use_numpy as True.

Our observations when stepping through the clickhouse-driver package:

In addition, when columnar=False, we observe the following:

  • The results are deserialized per column and we do see it's serialized to specific np objects (e.g. np.datetime64)

Image

  • However, when the results are finally returned as a list of rows, we get a numpy list of generic objects, where each object is the python object:

Image

Proposed

Allow a cursor object to specify whether to return results in columns (columnar=True) and/or to deserialize data to numpy objects (use_numpy=True). I've confirmed that by passing columnar=True, the column values' deserialized into specific np objects (e.g. np.datetime64) are retained.

The intention of doing this at the cursor-level is so that other queries are not affected (INSERT queries, database inspection queries).

The API can be like this:

    sql = """
        SELECT toDateTime32('2022-01-01 01:00:05', 'UTC'), number, number*2.5
        FROM system.numbers
        LIMIT 3
    """
    with cursor.set_query_execution_args(
        columnar=True, use_numpy=True
    ) as cursor:
        cursor.execute(sql)
        np.testing.assert_equal(
            cursor.fetchall(),
            [
                np.array(
                    [
                        np.datetime64("2022-01-01T01:00:05"),
                        np.datetime64("2022-01-01T01:00:05"),
                        np.datetime64("2022-01-01T01:00:05"),
                    ],
                    dtype="datetime64[s]",
                ),
                np.array([0, 1, 2], dtype=np.uint64),
                np.array([0, 2.5, 5.0], dtype=np.float64),
            ],
        )

        cursor.execute(sql)
        np.testing.assert_equal(
            cursor.fetchmany(2),
            [
                np.array(
                    [
                        np.datetime64("2022-01-01T01:00:05"),
                        np.datetime64("2022-01-01T01:00:05"),
                        np.datetime64("2022-01-01T01:00:05"),
                    ],
                    dtype="datetime64[s]",
                ),
                np.array([0, 1, 2], dtype=np.uint64),
            ],
        )

I can open a PR if the maintainers find that the PR would be worth doing (and reviewing).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions