tinysql
is a lightweight Object-Relational Mapping (ORM) layer designed to facilitate the management of tabular data.
For instance, this relates to data from various domains such as computational neuroscience, data science, analytics, time series analysis, machine learning, and artificial intelligence.
It provides a minimalistic approach to map such data onto an SQLite database without obscuring the underlying SQL.
Binary data can be stored outside the database, because who doesn't need to share data with their colleagues but doesn't want to send or copy a terabyte-sized database.
In fact, it most likely shouldn't even be called an ORM.
The primary goal of tinysql
is to offer a barebones ORM that maintains the expressive power of SQL.
By not abstracting away SQL, tinysql
ensures that users can fully leverage SQL while benefiting from a simplified interface to map tabular/struct data onto a database.
While more powerful alternatives like SQLAlchemy or DataJoint are available, they often come with additional complexity.
tinysql
addresses the need for a straightforward, minimalistic solution by focusing on:
- Simplicity: Avoids the overhead of complex ORM frameworks.
- Direct SQL Access: Retains the ability to execute custom SQL queries and perform explicit data manipulations.
- Local and Portable Data Storage: binary data (e.g. numpy arrays, BLOBs, etc) can be stored outside the database in the local file system, e.g. on disk, allowing exchange of such data with colleagues.
- Minimalistic Design:
tinysql
offers a simple and intuitive interface for database interactions. - Direct SQL Execution:
tinysql
allows for writing and executing custom SQL scripts without restrictions. - Flexible Data Handling:
tinysql
supports storing numpy arrays and other large data objects as BLOBs on disk, making data exchange easier.
tinysql
is particularly useful for scenarios such as:
- Local Data Analysis: You need a lightweight tool for working with data locally without the overhead of larger frameworks.
- Data Exchange: You want to share specific data files (e.g. numpy arrays) without transferring entire databases.
- Custom SQL: You want to write and execute custom SQL queries while still benefiting from ORM features.
At the moment, tinysql
is not yet available on PyPI. To install it, you
therefore need to download or clone this repository and then use pip
.
Example:
$ git clone https://github.com/nwaniek/tinysql.git
$ cd tinysql
$ pip install .
In the future, meaning as soon as tinysql
is available on PyPI, you can
install it by simply running
$ pip install tinysql
To use tinysql
, you define your tables and interact with your database using minimalistic ORM methods.
A brief example could look as follows:
from typing import NamedTuple
from tinysql import setup_db, Equals, Or, In, select, db_table
# this is an example class that is derived from NamedTuple and mapped to the
# database. It contains spiking data from neural recordings for a particular
# animal
@db_table('SpikeData', primary_keys=['id'])
class SpikeData(NamedTuple):
id: str # this could be a SHA1
animal_name: str
neuron_id: int
spike_times: np.ndarray
comment: str
# The following will open an existing database, or create one if it does not
# exist yet. If a second argument is given to setup_db, then tinysql will
# assume storage of BLOBs and ndarrays should happen outside the database,
# i.e. on disk or wherever the path points to.
with setup_db('database.db', '/path/to/external/storage') as context:
# load some data from, preprocess, etc...
# once you have SpikeData with your data, we can insert it
the_data = np.load('original_data_file_n123.npy')
spikes = SpikeData(get_sha1('original_data_file_n123.npy'), 'Fievel', 123, the_data, "Data from Fievel's 123rd neuron")
# we can either use the free function "insert", or the context method:
context.insert(spikes)
# is equvalent to: insert(context, spikes)
# do something else, and now we want to analyse the data from Fievel and
# Tanya. We can do so by using use some basic Conditionals (Equals, Or, ...)
# to restrict results
results = select(context, SpikeData, Or(Equals('animal_name', 'Fievel'), Equals('animal_name', 'Tanya')))
for result in results:
print(result)
# tinysql supports most SQL WHERE conditionals, so instead of combining
# an OR and two Equals, we could also do instead of the previous
results = select(context, SpikeData, In('animal_name', ['Fievel', 'Tanya']))
for result in results:
print(result)
Of course, we also often use all kinds of enums to identify stuff or flag things.
And, obviously, you should map your enums to the database, too.
This is why tinysql
supports all standard python enum types.
from tinysql import db_enum
# for instance, we might want to use an enum to identify the brain region
# in which the spike data was recorded in
@db_enum("RecordingArea", descriptions={'PPC': 'Posterior Parietal Cortex', 'EC': 'Entorhinal Cortex', 'CA1': 'Cornu Ammonis 1', 'CA3': 'Cornu Ammonus 3'})
class RecordingArea(Enum):
PPC = "PPC"
EC = "EC"
CA1 = "CA1"
CA3 = "CA3"
# db_enum doesn't care about the enum type, and you can also omit the
# description if you don't want to document things in the database
@dbenum('MyIntEnum')
class MyIntEnum(IntEnum):
One: auto()
Two: auto()
Three: auto()
Despite not really being a full-fledged ORM, tinysql
provides a means to write
conditionals that are translated to SQL. In the spirit of tinysql
, they are
kept as minimalistic as possible and as close to SQL as it gets:
from tinysql import select, Not, GreaterThan
results = select(context, AmazingValues, Not(GreaterThan('value1', 70.0)))
for obj in results:
print(obj)
tinysql
currently provides Equals, NotEquals, GreaterThan, LessThan, Between,
Like, In, And, Or, and Not. You can nest them arbitrarily and thereby build
complex expressions, but then again you might just simply drop into SQL to
achieve this, as will be shown next.
tinysql
does not hide the connection to the sqlite database it is connected to
(after using it as a context manager or runnning init_tables). It provides
some methods that you can use to fill specific objects like select where, you
can pass an SQL expression, and it will fill a particular class with the
results:
results = select(context, AmazingValues, "WHERE value1 >= ? AND value1 < ?", (70.0, 120.0, ))
for obj in results:
print(obj)
If you use select, or any other SQL passthrough method, it is up to you to make
sure that the result from the database can be accepted by the constructor of the
class that you pass in. That is, under the hood, tinysql
merely forwards the
results via cls(*row).
It is also possible to directly write SQL statements and execute them as you usually would with sqlite:
with setup_db('mydatabase.sqlite') as context:
cur = context.con.cursor()
rows = cur.execute("SELECT * FOM AmazingValues")
for row in rows:
print(row)
Moreover, tinysql
provides some methods like execute
and executemany
,
that directly pass through to the connection and commits the statement, to save
you a few keystrokes:
with setup_db('mydatabase.sqlite') as context:
context.executemany("INSERT INTO MyTable VALUES (?)", [("one",), ("two",)])
which is equivalent to
with setup_db('mydatabase.sqlite') as context:
cur = context.con.cursor()
cur.executemany("INSERT INTO MyTable VALUES (?)", [("one",), ("two",)])
context.con.commit()
Does it save much? No. Is ist convenient? Yes.
Sometimes there's a need for an autoincrement field. tinysql supports this, but be aware that sqlite has special treatment for autoincrement. That is, an autoinc field must be a primary key, and there can be only one primary key in the table. If you attempt to create tinysql-mapped tables with autoinc fields and more than one primary key, tinysql will raise an exception! Read more about sqlite's autoinc in the sqlite documentation.
from tinysql import autoinc, db_table
# to create an autoinc field, simply use tinysql's autoinc type
@db_table('FancyData', primary_keys=['id'])
class FancyData(NamedTuple):
id : autoinc
stuff: str
# when creating a new instance of FancyData, you need to pass an instance of
# autoinc to FancyData. tinysql will filter out autoinc fields when
# inserting data into the database. when loading data, you'll get a regular
# integer back.
my_data = FancyData(autoinc(), 'really amazing data!')
There's another subtle issue with autoinc, namely when using tinysql with an external storage for BLOBs (see further below). At the time of writing an entry into the database, or more precisely before writing the data to the table, the value of the autoinc field might not yet be determined. Yet, the primary key(s) of a mapped/registered class will be used in the production of the filename where the ndarray will be stored.
As a general recommendation: don't mix autoinc fields with BLOB fields in one class. Rather, use another form of primary key, something that can be determined at runtime before writing things to the database, such as a SHA1 over your data, or a time-based UUID.
For convenience, and to ameliorate the situation regarding autoincrement and
external storage, tinysql
provides a specific class uuid
. Well, it
really is just a wrapper around str
and the function gen_uuid()
, which
in turn simply calls uuid4().hex
from python's uuid
module... The reason
tinysql.uuid
exists is to make this type somewhat explicit, with the goal to
improve the self-documentation level of code.
Here's how to use it:
from tinysql import db_table, uuid
@db_table("UUIDTest", primary_keys=["id"])
@dataclass
class UUIDTest:
id: uuid
def test_uuid(context):
context.insert(UUIDTest(uuid()))
for obj in context.select(UUIDTest):
print(obj)
As with anything else in tinysql
, it is kept as barebones as it gets. That
means that you have to specify the value itself during construction (see the
context.insert(...)
line). You could also move this into a custom
constructor or use id: uuid = field(default_constructor = lambda: uuid())
,
but this would likely break tinysql
's select
statement, which merely
passes each result row from a database query to the constructor of a class.
Other times, you might want to work with several databases at the same time, and
of course this is possible with tinysql
.
By default, every call to db_table
registers a table in a global registry,
and also by default, every context inherits this registry. This is useful when
working with several databases that have an identical layout. Other databases
are not identical, and thus a DatabaseContext
should map only those tables
that actually reside in the database, or maybe you wish to use different
contexts for different views into a database, effectively restricting the tables
they can handle.
The default variant, meaning every context inherits all tables, looks like this:
from tinysql import db_table, DatabaseContext
# register one table
@db_table("StringData")
class StringData:
data: str
# register another table
@db_table("FloatData")
class FloatData:
data: float
# declare two contexts, inheriting the global registry
context1 = DatabaseContext("db1.sqlite", None)
context2 = DatabaseContext("db2.sqlite", None)
with context1:
context1.insert(StringData("wow!"))
context1.insert(FloatData(1.23))
# the same works for context2...
Next follows an example in which the two databases have different tables:
from tinysql import db_table, DatabaseContext, TableNotMappedError
# register one table
@db_table("StringData")
class StringData:
data: str
# register another table
@db_table("FloatData")
class FloatData:
data: float
# declare two contexts, each having a specific registry defined by the
# classes argument that is passed in
context1 = DatabaseContext("db1.sqlite", None, [StringData])
context2 = DatabaseContext("db2.sqlite", None, [FloatData])
with context1:
context1.insert(StringData("wow!"))
# the next line will raise a TableNotMappedError, because context1
# doesn't know about FloatData
try:
context1.insert(FloatData(1.23))
except TableNotMappedError:
pass
# context2 only knows about FloatData. Instead of using a context manager,
# we can also use manual calls to open and initialize the database:
context2.open()
# if necessar,y finer control can be achieved by calling connect() and init_tables() on
# the context instead of open():
#context2.connect()
#context2.init_tables()
# now we can use the context
context2.insert(FloatData(42.0))
# make sure to close the context when you're done. This will close the
# connection to the database
context2.close()
You can also completely disable the global registry after importing tinysql:
import tinysql
tinysql.configure(use_global_registry=False)
If you have read the above example, then you might wonder what the second
argument, None
, in the constructor of DatabaseContext
is about.
tinysql
allows to store BLOBs, such as numpy arrays or other large binary
objects, outside of the database itself. This can be useful when sharing such
files with colleagues, while not having to share the entire sqlite database. To
enable the external data storage, you simply need to pass the path to a folder
in which the external data should be stored to a DatabaseContext
.
# database (context) with external data storage
context = tinysql.DatabaseContext('test.sqlite', 'test_storage')
In this example, tinysql
will now store BLOB types within directory
test_storage
. More precisely, it will create sub-directories based on the
table/class name, and files with filenames based on the primary keys of the
table and the actual field name which should be stored.
tinysql
will store numpy arrays as npz
files, and use pickle
for all
objects of type bytes
, bytearray
, or memoryview
. Further types can
be treated under the external table storage approach by adding them to
tinysql
's TYPE_MAPPING
variable and setting the flag to
TypeFlags.BLOB
(see tinysql.py
for examples).
If you wish to extend tinysql
with other types than the standard types that it
already supports, autoinc, np.ndarray, and other BLOBs, then best have a look at
tinysql
's TYPE_MAPPING
variable. This is simply a dict which contains a map
from a type that you want to use in a type annotation to the sqlite database
type and some additional flag. You can either inject your own type mappings into
TYPE_MAPPING
, or change it directly there (remember, tinysql is as basic as it
gets, and a 'single file package').
Contributions are welcome! If you have suggestions, bug reports, or want to contribute code, please open an issue or submit a pull request on GitHub.
tinysql
is licensed under the MIT License.
See the LICENSE file for details.