Skip to content

nwaniek/tinysql

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

76 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tinysql - A minimalistic object-relational mapper

Introduction

tinysql is a lightweight Object-Relational Mapping (ORM) layer designed to facilitate the management of tabular data. For instance, this relates to data from various domains such as computational neuroscience, data science, analytics, time series analysis, machine learning, and artificial intelligence. It provides a minimalistic approach to map such data onto an SQLite database without obscuring the underlying SQL. Binary data can be stored outside the database, because who doesn't need to share data with their colleagues but doesn't want to send or copy a terabyte-sized database. In fact, it most likely shouldn't even be called an ORM.

Goal

The primary goal of tinysql is to offer a barebones ORM that maintains the expressive power of SQL. By not abstracting away SQL, tinysql ensures that users can fully leverage SQL while benefiting from a simplified interface to map tabular/struct data onto a database.

Why tinysql?

While more powerful alternatives like SQLAlchemy or DataJoint are available, they often come with additional complexity. tinysql addresses the need for a straightforward, minimalistic solution by focusing on:

  • Simplicity: Avoids the overhead of complex ORM frameworks.
  • Direct SQL Access: Retains the ability to execute custom SQL queries and perform explicit data manipulations.
  • Local and Portable Data Storage: binary data (e.g. numpy arrays, BLOBs, etc) can be stored outside the database in the local file system, e.g. on disk, allowing exchange of such data with colleagues.

Features

  • Minimalistic Design: tinysql offers a simple and intuitive interface for database interactions.
  • Direct SQL Execution: tinysql allows for writing and executing custom SQL scripts without restrictions.
  • Flexible Data Handling: tinysql supports storing numpy arrays and other large data objects as BLOBs on disk, making data exchange easier.

Use Cases

tinysql is particularly useful for scenarios such as:

  • Local Data Analysis: You need a lightweight tool for working with data locally without the overhead of larger frameworks.
  • Data Exchange: You want to share specific data files (e.g. numpy arrays) without transferring entire databases.
  • Custom SQL: You want to write and execute custom SQL queries while still benefiting from ORM features.

Installation

At the moment, tinysql is not yet available on PyPI. To install it, you therefore need to download or clone this repository and then use pip. Example:

$ git clone https://github.com/nwaniek/tinysql.git
$ cd tinysql
$ pip install .

In the future, meaning as soon as tinysql is available on PyPI, you can install it by simply running

$ pip install tinysql

Usage

To use tinysql, you define your tables and interact with your database using minimalistic ORM methods. A brief example could look as follows:

from typing import NamedTuple
from tinysql import setup_db, Equals, Or, In, select, db_table

# this is an example class that is derived from NamedTuple and mapped to the
# database. It contains spiking data from neural recordings for a particular
# animal
@db_table('SpikeData', primary_keys=['id'])
class SpikeData(NamedTuple):
    id: str # this could be a SHA1
    animal_name: str
    neuron_id: int
    spike_times: np.ndarray
    comment: str

# The following will open an existing database, or create one if it does not
# exist yet. If a second argument is given to setup_db, then tinysql will
# assume storage of BLOBs and ndarrays should happen outside the database,
# i.e. on disk or wherever the path points to.
with setup_db('database.db', '/path/to/external/storage') as context:

    # load some data from, preprocess, etc...
    # once you have SpikeData with your data, we can insert it
    the_data = np.load('original_data_file_n123.npy')
    spikes = SpikeData(get_sha1('original_data_file_n123.npy'), 'Fievel', 123, the_data, "Data from Fievel's 123rd neuron")
    # we can either use the free function "insert", or the context method:
    context.insert(spikes)
    # is equvalent to: insert(context, spikes)

    # do something else, and now we want to analyse the data from Fievel and
    # Tanya. We can do so by using use some basic Conditionals (Equals, Or, ...)
    # to restrict results
    results = select(context, SpikeData, Or(Equals('animal_name', 'Fievel'), Equals('animal_name', 'Tanya')))
    for result in results:
        print(result)

    # tinysql supports most SQL WHERE conditionals, so instead of combining
    # an OR and two Equals, we could also do instead of the previous
    results = select(context, SpikeData, In('animal_name', ['Fievel', 'Tanya']))
    for result in results:
        print(result)

Enums

Of course, we also often use all kinds of enums to identify stuff or flag things. And, obviously, you should map your enums to the database, too. This is why tinysql supports all standard python enum types.

from tinysql import db_enum

# for instance, we might want to use an enum to identify the brain region
# in which the spike data was recorded in
@db_enum("RecordingArea", descriptions={'PPC': 'Posterior Parietal Cortex', 'EC': 'Entorhinal Cortex', 'CA1': 'Cornu Ammonis 1', 'CA3': 'Cornu Ammonus 3'})
class RecordingArea(Enum):
    PPC = "PPC"
    EC  = "EC"
    CA1 = "CA1"
    CA3 = "CA3"

# db_enum doesn't care about the enum type, and you can also omit the
# description if you don't want to document things in the database
@dbenum('MyIntEnum')
class MyIntEnum(IntEnum):
    One: auto()
    Two: auto()
    Three: auto()

Conditions

Despite not really being a full-fledged ORM, tinysql provides a means to write conditionals that are translated to SQL. In the spirit of tinysql, they are kept as minimalistic as possible and as close to SQL as it gets:

from tinysql import select, Not, GreaterThan

results = select(context, AmazingValues, Not(GreaterThan('value1', 70.0)))
for obj in results:
    print(obj)

tinysql currently provides Equals, NotEquals, GreaterThan, LessThan, Between, Like, In, And, Or, and Not. You can nest them arbitrarily and thereby build complex expressions, but then again you might just simply drop into SQL to achieve this, as will be shown next.

Direct SQL passthrough

tinysql does not hide the connection to the sqlite database it is connected to (after using it as a context manager or runnning init_tables). It provides some methods that you can use to fill specific objects like select where, you can pass an SQL expression, and it will fill a particular class with the results:

results = select(context, AmazingValues, "WHERE value1 >= ? AND value1 < ?", (70.0, 120.0, ))
for obj in results:
    print(obj)

If you use select, or any other SQL passthrough method, it is up to you to make sure that the result from the database can be accepted by the constructor of the class that you pass in. That is, under the hood, tinysql merely forwards the results via cls(*row).

It is also possible to directly write SQL statements and execute them as you usually would with sqlite:

with setup_db('mydatabase.sqlite') as context:
    cur = context.con.cursor()
    rows = cur.execute("SELECT * FOM AmazingValues")
    for row in rows:
        print(row)

Moreover, tinysql provides some methods like execute and executemany, that directly pass through to the connection and commits the statement, to save you a few keystrokes:

with setup_db('mydatabase.sqlite') as context:
    context.executemany("INSERT INTO MyTable VALUES (?)", [("one",), ("two",)])

which is equivalent to

with setup_db('mydatabase.sqlite') as context:
    cur = context.con.cursor()
    cur.executemany("INSERT INTO MyTable VALUES (?)", [("one",), ("two",)])
    context.con.commit()

Does it save much? No. Is ist convenient? Yes.

Autoincrement

Sometimes there's a need for an autoincrement field. tinysql supports this, but be aware that sqlite has special treatment for autoincrement. That is, an autoinc field must be a primary key, and there can be only one primary key in the table. If you attempt to create tinysql-mapped tables with autoinc fields and more than one primary key, tinysql will raise an exception! Read more about sqlite's autoinc in the sqlite documentation.

from tinysql import autoinc, db_table

# to create an autoinc field, simply use tinysql's autoinc type
@db_table('FancyData', primary_keys=['id'])
class FancyData(NamedTuple):
    id : autoinc
    stuff: str

# when creating a new instance of FancyData, you need to pass an instance of
# autoinc to FancyData. tinysql will filter out autoinc fields when
# inserting data into the database. when loading data, you'll get a regular
# integer back.
my_data = FancyData(autoinc(), 'really amazing data!')

There's another subtle issue with autoinc, namely when using tinysql with an external storage for BLOBs (see further below). At the time of writing an entry into the database, or more precisely before writing the data to the table, the value of the autoinc field might not yet be determined. Yet, the primary key(s) of a mapped/registered class will be used in the production of the filename where the ndarray will be stored.

As a general recommendation: don't mix autoinc fields with BLOB fields in one class. Rather, use another form of primary key, something that can be determined at runtime before writing things to the database, such as a SHA1 over your data, or a time-based UUID.

UUIDs

For convenience, and to ameliorate the situation regarding autoincrement and external storage, tinysql provides a specific class uuid. Well, it really is just a wrapper around str and the function gen_uuid(), which in turn simply calls uuid4().hex from python's uuid module... The reason tinysql.uuid exists is to make this type somewhat explicit, with the goal to improve the self-documentation level of code.

Here's how to use it:

from tinysql import db_table, uuid

@db_table("UUIDTest", primary_keys=["id"])
@dataclass
class UUIDTest:
    id: uuid

def test_uuid(context):
    context.insert(UUIDTest(uuid()))
    for obj in context.select(UUIDTest):
        print(obj)

As with anything else in tinysql, it is kept as barebones as it gets. That means that you have to specify the value itself during construction (see the context.insert(...) line). You could also move this into a custom constructor or use id: uuid = field(default_constructor = lambda: uuid()), but this would likely break tinysql's select statement, which merely passes each result row from a database query to the constructor of a class.

Working with several databases

Other times, you might want to work with several databases at the same time, and of course this is possible with tinysql.

By default, every call to db_table registers a table in a global registry, and also by default, every context inherits this registry. This is useful when working with several databases that have an identical layout. Other databases are not identical, and thus a DatabaseContext should map only those tables that actually reside in the database, or maybe you wish to use different contexts for different views into a database, effectively restricting the tables they can handle.

The default variant, meaning every context inherits all tables, looks like this:

from tinysql import db_table, DatabaseContext

# register one table
@db_table("StringData")
class StringData:
    data: str

# register another table
@db_table("FloatData")
class FloatData:
    data: float

# declare two contexts, inheriting the global registry
context1 = DatabaseContext("db1.sqlite", None)
context2 = DatabaseContext("db2.sqlite", None)

with context1:
    context1.insert(StringData("wow!"))
    context1.insert(FloatData(1.23))
# the same works for context2...

Next follows an example in which the two databases have different tables:

from tinysql import db_table, DatabaseContext, TableNotMappedError

# register one table
@db_table("StringData")
class StringData:
    data: str

# register another table
@db_table("FloatData")
class FloatData:
    data: float

# declare two contexts, each having a specific registry defined by the
# classes argument that is passed in
context1 = DatabaseContext("db1.sqlite", None, [StringData])
context2 = DatabaseContext("db2.sqlite", None, [FloatData])

with context1:
    context1.insert(StringData("wow!"))
    # the next line will raise a TableNotMappedError, because context1
    # doesn't know about FloatData
    try:
        context1.insert(FloatData(1.23))
    except TableNotMappedError:
        pass

# context2 only knows about FloatData. Instead of using a context manager,
# we can also use manual calls to open and initialize the database:
context2.open()
# if necessar,y finer control can be achieved by calling connect() and init_tables() on
# the context instead of open():
#context2.connect()
#context2.init_tables()
# now we can use the context
context2.insert(FloatData(42.0))
# make sure to close the context when you're done. This will close the
# connection to the database
context2.close()

You can also completely disable the global registry after importing tinysql:

import tinysql
tinysql.configure(use_global_registry=False)

External Table Storage

If you have read the above example, then you might wonder what the second argument, None, in the constructor of DatabaseContext is about.

tinysql allows to store BLOBs, such as numpy arrays or other large binary objects, outside of the database itself. This can be useful when sharing such files with colleagues, while not having to share the entire sqlite database. To enable the external data storage, you simply need to pass the path to a folder in which the external data should be stored to a DatabaseContext.

# database (context) with external data storage
context = tinysql.DatabaseContext('test.sqlite', 'test_storage')

In this example, tinysql will now store BLOB types within directory test_storage. More precisely, it will create sub-directories based on the table/class name, and files with filenames based on the primary keys of the table and the actual field name which should be stored.

tinysql will store numpy arrays as npz files, and use pickle for all objects of type bytes, bytearray, or memoryview. Further types can be treated under the external table storage approach by adding them to tinysql's TYPE_MAPPING variable and setting the flag to TypeFlags.BLOB (see tinysql.py for examples).

Extending tinysql with other types

If you wish to extend tinysql with other types than the standard types that it already supports, autoinc, np.ndarray, and other BLOBs, then best have a look at tinysql's TYPE_MAPPING variable. This is simply a dict which contains a map from a type that you want to use in a type annotation to the sqlite database type and some additional flag. You can either inject your own type mappings into TYPE_MAPPING, or change it directly there (remember, tinysql is as basic as it gets, and a 'single file package').

Contributing

Contributions are welcome! If you have suggestions, bug reports, or want to contribute code, please open an issue or submit a pull request on GitHub.

License

tinysql is licensed under the MIT License. See the LICENSE file for details.

About

A minimalistic object-relational mapper

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages