Skip to content

Phrendo/python_sql_connector_benchmark

Repository files navigation

High-Performance Python SQL Server benchmark

This project benchmarks the various python connection methods for SQL Server.

Spoiler Alert:

Polars Native is the clear winner, with ConnectorX → Polars Direct and ConnectorX → Arrow → Polars methods closely following.

pyodbc → Pandas and SQLAlchemy → Pandas methods are the slowest, but still very fast.

ConnectorX → Pandas method is the fastest among the traditional methods, but still 2-4x slower than Polars.

There still is the matter of how you want to work your data. The conversion to pandas or arrow is still going to be a factor in how you use the data.

View Benchmark Results - Complete performance analysis across dataset sizes

Quick Start

  1. Install dependencies:

    pip install -r requirements.txt
  2. Configure environment: Create a .env file with your SQL Server connection details:

    MSSQL_SERVER=your_server
    MSSQL_DB=your_database
    MSSQL_USER=your_username
    MSSQL_PWD=your_password
    MSSQL_PORT=1433
    MSSQL_DRIVER=ODBC Driver 17 for SQL Server
    
    
    SQL_BENCHMARK_TABLE=your_benchmark_table
  3. Run the examples:

    python main.py

Dependencies

  • connectorx: Rust-based database connectivity
  • polars: High-performance DataFrame library
  • pyarrow: Apache Arrow Python bindings
  • arrow-odbc: ODBC connectivity for Arrow
  • pandas: Traditional DataFrame library (for comparison)
  • python-dotenv: Environment variable management

Further Reading

About

Benchmark comparison of connectorx, polars, pyodbc, sqlalchemy - hint: Polars native crushes them all.

Topics

Resources

Stars

Watchers

Forks

Languages