-
Notifications
You must be signed in to change notification settings - Fork 18
Open
Description
Python: 3.11.8
Pandas: 2.2.1
Numpy: 1.26.4
There seems to be marginal benefit in terms speed when converting kdb tables into pandas using the .pd() method when using short ints instead of floats. Although the memory usage of the table drops accordingly, the time spent in conversion to dataframe does not improve much.
N:50000000;
dat1:([] date:2000.01.01; sym:`A; q1:N?100h; q2:N?5000h; q3:N?50h);
dat2:([] date:2000.01.01; sym:`A; q1:N?100f; q2:N?5000f; q3:N?50f);
Indeed, the size of dat1 is 40% the size of dat2, and yet in python:
handle = pykx.SyncQConnection(host, port)
%timeit df1 = handle('dat1').pd() #short
%timeit df2 = handle('dat2').pd() #float
3.48s per loop
5.76s per loop
This gets closer the more float/short columns you add.
Is there a way to optimize the call to .pd when dealing with very large tables whose column values are mostly shorts? Otherwise one can spend forever waiting for the conversion.
Metadata
Metadata
Assignees
Labels
No labels