Replies: 3 comments 4 replies
-
Beta Was this translation helpful? Give feedback.
-
https://github.com/sfu-db/connector-x/discussions/156 Oh I see... In MSSQL it will execute twice. Will you optimize getting metadata in MSSQL? |
Beta Was this translation helpful? Give feedback.
-
Hi @cufewxy , thanks for the example and the logs! In our benchmark, connectorx could achieve 5x less time than pandas. We directly select the TPC-H lineitem table, which has 60M rows and 16 columns using 4 partitions. We also tested connectorx using 1 partition and it outperforms pandas by >3x. connectorx is targeting on the large query result fetching scenario. It speeds up the process by optimizing the client-side execution as well as saturating both network and machine resource through parallelism. However, when query execution on the database server is the bottleneck (e.g. query execution time is long but the result is relatively small), the improvement ConnectorX could get will become minor and sometimes it can be even slower than Pandas due to the overhead in fetching metadata. It is interesting to see that in your case although the query is not very complex, the performance is worse than pandas. We can see if we could try to reproduce your case in our environment and see whether we could further improve it. It would be great if you could share more detailed information of the query with us:
We are currently trying to find a more efficient way to get schema in order to speed up the process for mssql. But I think in your case it is not the main issue (in figure 2 the schema fetching procedure only uses 4 seconds). |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hey, I have been trying to find a better module than pandas.read_sql and feel that it is almost there.
When I tried in mysql, connectorx performs better than pandas. But when I tried mssql, connectorx costs about 2x time than pandas(still connectorx's memory consuming is better).
The query is very simple: filter a table that one column is greater than a value. In pandas, I use pd.read_sql(xxx, engine) and engine is defined as sqlalchemy.create_engine('mssql+pymssql:xxx'). I also tried pyodbc and connectorx is again slower than pandas.
Beta Was this translation helpful? Give feedback.
All reactions