Skip to content

Pykx pandas conversion offers no speedup with short vs float columns #36

@antipisa

Description

@antipisa
Python: 3.11.8
Pandas: 2.2.1
Numpy: 1.26.4

There seems to be marginal benefit in terms speed when converting kdb tables into pandas using the .pd() method when using short ints instead of floats. Although the memory usage of the table drops accordingly, the time spent in conversion to dataframe does not improve much.

N:50000000;
dat1:([] date:2000.01.01; sym:`A; q1:N?100h; q2:N?5000h; q3:N?50h);
dat2:([] date:2000.01.01; sym:`A; q1:N?100f; q2:N?5000f; q3:N?50f);

Indeed, the size of dat1 is 40% the size of dat2, and yet in python:


handle = pykx.SyncQConnection(host, port)
%timeit df1 = handle('dat1').pd() #short
%timeit df2 = handle('dat2').pd() #float
3.48s per loop
5.76s per loop

This gets closer the more float/short columns you add.

Is there a way to optimize the call to .pd when dealing with very large tables whose column values are mostly shorts? Otherwise one can spend forever waiting for the conversion.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions