This project benchmarks the various python connection methods for SQL Server.
Spoiler Alert:
Polars Native is the clear winner, with ConnectorX → Polars Direct and ConnectorX → Arrow → Polars methods closely following.
pyodbc → Pandas and SQLAlchemy → Pandas methods are the slowest, but still very fast.
ConnectorX → Pandas method is the fastest among the traditional methods, but still 2-4x slower than Polars.
There still is the matter of how you want to work your data. The conversion to pandas or arrow is still going to be a factor in how you use the data.
View Benchmark Results - Complete performance analysis across dataset sizes
-
Install dependencies:
pip install -r requirements.txt
-
Configure environment: Create a
.env
file with your SQL Server connection details:MSSQL_SERVER=your_server MSSQL_DB=your_database MSSQL_USER=your_username MSSQL_PWD=your_password MSSQL_PORT=1433 MSSQL_DRIVER=ODBC Driver 17 for SQL Server SQL_BENCHMARK_TABLE=your_benchmark_table
-
Run the examples:
python main.py
- connectorx: Rust-based database connectivity
- polars: High-performance DataFrame library
- pyarrow: Apache Arrow Python bindings
- arrow-odbc: ODBC connectivity for Arrow
- pandas: Traditional DataFrame library (for comparison)
- python-dotenv: Environment variable management