Tabledancer is a simple application which allows you to manage table lifecycles based on YAML inside CICD.
Tabledancer is a lightweight application aimed at specifically managing the
lifecycle of tables specified using DDLs. It is by no means meant to be a
comprehensive model management system such as dbt.
As shown in the conceptual diagram above, tabledancer expects to see two things
in your git repository. The DDL of the table implemented as a YAML and a
lifecycle specification
which is also a yaml description which tells
tabledancer how to actually react if there is a change in the DDL. These two
are typically implemented in a single yaml file. This yaml file is given the
name choreograph.
Let's have a look at a simple sample choreograph file below.
backend: databricks
life_cycle_policy:
policy: DropCreateOnSchemaChange
properties: null
table_spec:
name: simple_table
database: tdtest
comment: This is a simple table
columns:
- featureOne:
type: int
comment: It's a feature
- featureTwo:
type: string
comment: It's another feature
using: DELTA``yaml
The backend tag simply specifies which backend the target table resides in. This tells tabledancer to use the correct dancer, which is basically the backend implementation.
The life_cycle_policy field defines what to do when there is a difference between the DDL spec in git compared to what's available in the database. The available options and fields are implemented by the dancer.
Finally the table_spec is simply a yaml representation of the DDL file. This can be arbitrarily complex as stipulated by the dancer.
Currently we don't have this distributed in pip as it's still in early development. Instead we'll have to install from source. Simply clone down the repo using your favourite clone command.
Since this was borne out of work in databricks, and remote working with
databricks / spark is a bit finicky, i.e. different versions of
databricks-connect
required for working with different cluster runtimes; we
have provided a special style
installation option to get the correct
requirements.
It's highly recommended that you do this inside a
virtualenv
or a container since databricks-connect plays funny with pyspark.
Install just like below, use the --style
flag to provide which backend.
python setup.py install --user --style=databricks8.1