Skip to content

Commit d7bf374

Browse files
committed
turbodbc: Hello, World!
1 parent 53503fa commit d7bf374

File tree

12 files changed

+383
-0
lines changed

12 files changed

+383
-0
lines changed
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
.venv*
Lines changed: 166 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,166 @@
1+
###########################
2+
Using CrateDB with turbodbc
3+
###########################
4+
5+
6+
*****
7+
About
8+
*****
9+
10+
This section of the documentation describes how to connect to `CrateDB`_
11+
with `turbodbc`_, by providing a few example programs.
12+
13+
The examples use the `unixODBC`_ implementation of `ODBC`_, and the `PostgreSQL
14+
ODBC driver`_, for connecting to the `PostgreSQL wire protocol`_ interface of
15+
`CrateDB`_.
16+
17+
This folder also contains ``Dockerfile`` files providing environments to
18+
exercise the examples on different operating systems, like Arch Linux,
19+
Red Hat (CentOS), Debian, and SUSE Linux.
20+
21+
22+
************
23+
Introduction
24+
************
25+
26+
`Turbodbc`_ is a Python module to access relational databases via the `Open
27+
Database Connectivity (ODBC)`_ interface. In addition to complying with
28+
the `Python Database API Specification 2.0`_, turbodbc offers built-in `NumPy`_
29+
and `Apache Arrow`_ support for improved performance. Their slogan is:
30+
31+
Don’t wait minutes for your results, just blink.
32+
33+
*Note: The description texts have been taken from turbodbc's documentation 1:1.*
34+
35+
Description
36+
===========
37+
38+
Its primary target audience are data scientists that use databases for which no
39+
efficient native Python drivers are available.
40+
41+
For maximum compatibility, turbodbc complies with the `Python Database API
42+
Specification 2.0`_ (PEP 249). For maximum performance, turbodbc internally
43+
relies on batched data transfer instead of single-record communication as
44+
other popular ODBC modules do.
45+
46+
Why should I use turbodbc instead of other ODBC modules?
47+
========================================================
48+
49+
- Short answer: turbodbc is faster.
50+
- Slightly longer answer: turbodbc is faster, *much* faster if you want to
51+
work with NumPy.
52+
- Medium-length answer: The author has tested turbodbc and pyodbc (probably
53+
the most popular Python ODBC module) with various databases (Exasol,
54+
PostgreSQL, MySQL) and corresponding ODBC drivers. He found turbodbc to be
55+
consistently faster.
56+
57+
Smooth. What is the trick?
58+
==========================
59+
60+
Turbodbc exploits buffering.
61+
62+
- Turbodbc implements both sending parameters and retrieving result sets using
63+
buffers of multiple rows/parameter sets. This avoids round trips to the ODBC
64+
driver and (depending how well the ODBC driver is written) to the database.
65+
- Multiple buffers are used for asynchronous I/O. This allows to interleave
66+
Python object conversion and direct database interaction (see performance
67+
options below).
68+
- Buffers contain binary representations of data. NumPy arrays contain binary
69+
representations of data. Good thing they are often the same, so instead of
70+
converting, the driver can just copy data.
71+
72+
73+
*****
74+
Setup
75+
*****
76+
77+
Install prerequisites
78+
=====================
79+
80+
Arch Linux::
81+
82+
# See `dockerfiles/archlinux.Dockerfile`.
83+
84+
CentOS Stream::
85+
86+
dnf install --enablerepo=crb -y boost-devel g++ postgresql-odbc python3 python3-devel python3-pip unixODBC-devel
87+
88+
Debian::
89+
90+
apt-get install --yes build-essential libboost-dev odbc-postgresql unixodbc-dev
91+
92+
macOS/Homebrew::
93+
94+
brew install psqlodbc unixodbc
95+
96+
SUSE Linux Enterprise Server::
97+
98+
# See `dockerfiles/sles.Dockerfile`.
99+
100+
Install Python sandbox
101+
======================
102+
::
103+
104+
# Create Python virtualenv and install dependency packages.
105+
python3 -m venv .venv
106+
source .venv/bin/activate
107+
pip install --upgrade --requirement=requirements-prereq.txt
108+
pip install --upgrade --requirement=requirements.txt --verbose
109+
110+
.. note::
111+
112+
The `turbodbc pip installation documentation`_ says:
113+
Please ``pip install numpy`` before installing turbodbc, because turbodbc
114+
will search for the ``numpy`` Python package at installation/compile time.
115+
If NumPy is not installed, turbodbc will not compile the `NumPy
116+
support`_ features. Similarly, please ``pip install pyarrow`` before
117+
installing turbodbc if you would like to use the `Apache Arrow
118+
support`_.
119+
120+
121+
*****
122+
Usage
123+
*****
124+
125+
Run CrateDB::
126+
127+
docker run --rm -it --publish=4200:4200 --publish=5432:5432 crate \
128+
-Cdiscovery.type=single-node -Ccluster.routing.allocation.disk.threshold_enabled=false
129+
130+
Invoke demo program on workstation::
131+
132+
python demo.py
133+
134+
Exercise demo program using Docker, on different operating systems::
135+
136+
docker build --progress=plain --tag local/python-turbodbc-demo --file=dockerfiles/archlinux.Dockerfile .
137+
docker build --progress=plain --tag local/python-turbodbc-demo --file=dockerfiles/centos.Dockerfile .
138+
docker build --progress=plain --tag local/python-turbodbc-demo --file=dockerfiles/debian.Dockerfile .
139+
docker build --progress=plain --tag local/python-turbodbc-demo --file=dockerfiles/sles.Dockerfile .
140+
141+
docker run --rm -it --volume=$(pwd):/src --network=host local/python-turbodbc-demo python3 /src/demo.py
142+
143+
144+
*******
145+
Backlog
146+
*******
147+
148+
The patch just contains a basic example within ``demo.py``. Advanced usage
149+
examples to be exercised are tracked within the `backlog`_.
150+
151+
152+
153+
.. _Apache Arrow: https://en.wikipedia.org/wiki/Apache_Arrow
154+
.. _Apache Arrow support: https://turbodbc.readthedocs.io/en/latest/pages/advanced_usage.html#advanced-usage-arrow
155+
.. _backlog: https://github.com/crate/cratedb-examples/blob/main/python-turbodbc/backlog.rst
156+
.. _CrateDB: https://github.com/crate/crate
157+
.. _NumPy: https://en.wikipedia.org/wiki/NumPy
158+
.. _NumPy support: https://turbodbc.readthedocs.io/en/latest/pages/advanced_usage.html#advanced-usage-numpy
159+
.. _ODBC: https://en.wikipedia.org/wiki/Open_Database_Connectivity
160+
.. _Open Database Connectivity (ODBC): https://en.wikipedia.org/wiki/Open_Database_Connectivity
161+
.. _PostgreSQL ODBC driver: https://odbc.postgresql.org/
162+
.. _PostgreSQL wire protocol: https://crate.io/docs/crate/reference/en/latest/interfaces/postgres.html
163+
.. _Python Database API Specification 2.0: https://peps.python.org/pep-0249/
164+
.. _turbodbc: https://turbodbc.readthedocs.io/
165+
.. _turbodbc pip installation documentation: https://turbodbc.readthedocs.io/en/latest/pages/getting_started.html#pip
166+
.. _unixODBC: https://www.unixodbc.org/
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
#######################
2+
python-turbodbc backlog
3+
#######################
4+
5+
Various items how this little code example can be improved.
6+
7+
- [x] Provide basic example
8+
- [x] Insert multiple records using parameters
9+
- [x] Docs: Add installation on SUSE
10+
- [x] Provide example(s) for different operating systems (Linux, macOS)
11+
- [o] Docs: Drop a note about connecting with driver file vs. connecting via DSN
12+
- [o] Evaluate different ODBC drivers
13+
- [o] Provide an example scenario how to run it on Windows
14+
- [o] Exercise advanced NumPy and PyArrow options
15+
- [o] Exchange advanced CrateDB data types like ``OBJECT``, ``ARRAY``, and friends
16+
- [o] Use ``SSLmode = Yes`` to connect to CrateDB Cloud
17+
- [o] Explore other driver options at `Zabbix » Recommended UnixODBC settings for PostgreSQL`_
18+
- [o] Check out https://github.com/dirkjonker/sqlalchemy-turbodbc
19+
- [o] Check out https://docs.devart.com/odbc/postgresql/centos.htm
20+
21+
22+
.. _Zabbix » Recommended UnixODBC settings for PostgreSQL: https://www.zabbix.com/documentation/current/en/manual/config/items/itemtypes/odbc_checks/unixodbc_postgresql

by-language/python-turbodbc/demo.py

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
import os
2+
import sys
3+
4+
from turbodbc import connect
5+
6+
7+
def demo_pg():
8+
# Connect to database.
9+
# https://turbodbc.readthedocs.io/en/latest/pages/getting_started.html#establish-a-connection-with-your-database
10+
11+
# Either connect per data source name defined within the ODBC configuration,
12+
# connection = connect(dsn="postgresql", server="localhost", database="testdrive", uid="crate", pwd=None)
13+
14+
# or connect per connection string, referencing a driver file directly.
15+
if sys.platform == "linux":
16+
candidates = [
17+
# archlinux
18+
"/usr/lib/psqlodbcw.so",
19+
# Debian
20+
"/usr/lib/x86_64-linux-gnu/odbc/psqlodbcw.so",
21+
# Red Hat
22+
"/usr/lib64/psqlodbcw.so",
23+
]
24+
driver_file = find_program(candidates)
25+
if driver_file is None:
26+
raise ValueError(f"Unable to detect driver file at {candidates}")
27+
elif sys.platform == "darwin":
28+
driver_file = "/usr/local/lib/psqlodbcw.so"
29+
else:
30+
raise NotImplementedError(f"Platform {sys.platform} not supported yet")
31+
32+
connection_string = f"Driver={driver_file};Server=localhost;Port=5432;Database=testdrive;Uid=crate;Pwd=;"
33+
print(f"INFO: Connecting to '{connection_string}'")
34+
connection = connect(connection_string=connection_string)
35+
36+
# Insert data.
37+
cursor = connection.cursor()
38+
cursor.execute("CREATE TABLE IF NOT EXISTS testdrive (id INT PRIMARY KEY, data TEXT);")
39+
cursor.execute("DELETE FROM testdrive;")
40+
cursor.execute("INSERT INTO testdrive VALUES (0, 'zero'), (1, 'one'), (2, 'two');")
41+
cursor.executemany("INSERT INTO testdrive VALUES (?, ?);", [(3, "three"), (4, "four"), (5, "five")])
42+
cursor.execute("REFRESH TABLE testdrive;")
43+
cursor.close()
44+
45+
# Query data.
46+
cursor = connection.cursor()
47+
cursor.execute("SELECT * FROM testdrive ORDER BY id")
48+
49+
print("Column metadata:")
50+
print(cursor.description)
51+
52+
print("Results by row:")
53+
for row in cursor:
54+
print(row)
55+
56+
cursor.close()
57+
58+
# Terminate database connection.
59+
connection.close()
60+
61+
62+
def find_program(candidates):
63+
for candidate in candidates:
64+
if os.path.exists(candidate):
65+
return candidate
66+
67+
68+
if __name__ == "__main__":
69+
demo_pg()
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
# ---------------------------
2+
# Setup archlinux environment
3+
# ---------------------------
4+
5+
# Include `yay` for easily installing AUR packages.
6+
7+
FROM archlinux:base-20230205.0.123931 as archlinux-build
8+
9+
# Allow building packages using `makepkg` within Docker container.
10+
# https://blog.ganssle.io/tag/arch-linux.html
11+
RUN pacman -Sy --noconfirm --needed base-devel binutils fakeroot git sudo
12+
RUN useradd --create-home build
13+
RUN echo 'build ALL=NOPASSWD: ALL' >> /etc/sudoers
14+
15+
# Install AUR package helper program `yay`.
16+
# https://aur.archlinux.org/packages/yay
17+
RUN mkdir /yay-bin; chmod ugo+rwX /yay-bin
18+
USER build
19+
RUN \
20+
git clone https://aur.archlinux.org/yay-bin.git && \
21+
cd yay-bin && \
22+
makepkg -si --noconfirm
23+
USER root
24+
25+
26+
# --------------------------
27+
# Setup turbodbc environment
28+
# --------------------------
29+
30+
# Install Python, unixODBC, PostgreSQL ODBC driver, and turbodbc.
31+
32+
FROM archlinux-build
33+
34+
# Install unixODBC.
35+
# https://archlinux.org/packages/core/x86_64/unixodbc/
36+
RUN pacman -Sy --noconfirm --needed unixodbc
37+
38+
# Install PostgreSQL ODBC driver.
39+
# https://aur.archlinux.org/packages/psqlodbc
40+
USER build
41+
RUN yay -S --noconfirm psqlodbc
42+
USER root
43+
44+
# Install NumPy, PyArrow, and turbodbc.
45+
RUN pacman -Sy --noconfirm --needed boost python python-pip python-setuptools
46+
ADD requirements*.txt .
47+
RUN pip install --upgrade --requirement=requirements-prereq.txt
48+
RUN MAKEFLAGS="-j$(nproc)" pip install --upgrade --requirement=requirements.txt --verbose
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
FROM quay.io/centos/centos:stream9
2+
3+
# Install Python, unixODBC, the PostgreSQL ODBC driver, and development libraries.
4+
RUN dnf install --enablerepo=crb -y boost-devel g++ postgresql-odbc python3 python3-devel python3-pip unixODBC-devel
5+
6+
# Install Python, NumPy, PyArrow, and turbodbc.
7+
ADD requirements*.txt .
8+
RUN pip install --upgrade --requirement=requirements-prereq.txt
9+
RUN MAKEFLAGS="-j$(nproc)" pip install --upgrade --requirement=requirements.txt --verbose
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
FROM python:3.11-slim-bullseye
2+
3+
ENV DEBIAN_FRONTEND=noninteractive
4+
5+
# Install prerequisites.
6+
RUN apt-get update
7+
RUN apt-get install --yes build-essential libboost-dev odbc-postgresql unixodbc-dev
8+
9+
# Install NumPy, PyArrow, and turbodbc.
10+
ADD requirements*.txt .
11+
RUN pip install --upgrade --requirement=requirements-prereq.txt
12+
RUN MAKEFLAGS="-j$(nproc)" pip install --upgrade --requirement=requirements.txt --verbose
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
FROM registry.suse.com/suse/sle15
2+
3+
# Add package repository for acquiring `boost-devel`.
4+
# https://software.opensuse.org//download.html?project=home%3Afsirl%3Aboost1651&package=boost
5+
RUN zypper addrepo https://download.opensuse.org/repositories/home:fsirl:boost1651/15.4/home:fsirl:boost1651.repo
6+
7+
# Add package repository for acquiring `python310`.
8+
# https://download.opensuse.org/repositories/devel:/languages:/python:/backports/15.4/
9+
RUN zypper addrepo https://download.opensuse.org/repositories/devel:/languages:/python:/backports/15.4/devel:languages:python:backports.repo
10+
11+
# Activate package repositories.
12+
RUN zypper --gpg-auto-import-keys refresh
13+
14+
# Install Python, unixODBC, the PostgreSQL ODBC driver, and development libraries.
15+
RUN zypper install -y boost-devel gcc-c++ psqlODBC python310 python310-devel python310-pip unixODBC-devel update-alternatives
16+
17+
# Make Python 3.10 the default Python 3, and add an alias `python3`.
18+
RUN update-alternatives --install /usr/bin/python python /usr/bin/python3.10 0
19+
RUN update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.10 0
20+
21+
# Install Python, NumPy, PyArrow, and turbodbc.
22+
ADD requirements*.txt .
23+
RUN pip install --upgrade --requirement=requirements-prereq.txt
24+
RUN MAKEFLAGS="-j$(nproc)" pip install --upgrade --requirement=requirements.txt --verbose

by-language/python-turbodbc/odbc.ini

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# More options:
2+
# https://www.zabbix.com/documentation/current/en/manual/config/items/itemtypes/odbc_checks/unixodbc_postgresql
3+
4+
[postgresql]
5+
Description = General ODBC for PostgreSQL
6+
7+
# General
8+
FileUsage = 1
9+
10+
# If the driver manager was built with thread support, this entry
11+
# alters the default thread serialization level (available since 1.6).
12+
Threading = 2
13+
14+
# Linux
15+
#Driver = /usr/lib64/libodbcpsql.so
16+
#Setup = /usr/lib64/libodbcpsqlS.so
17+
18+
# macOS
19+
Driver = /usr/local/lib/psqlodbcw.so
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
[tool.black]
2+
line-length = 120
3+
4+
[tool.isort]
5+
profile = "black"
6+
skip_glob = "**/site-packages/**"
7+
skip_gitignore = false

0 commit comments

Comments
 (0)