Skip to content

codeBehindMe/tabledancer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Code style: black Imports: isort run-tests run-walkthroughs

logo


Tabledancer is a simple application which allows you to manage table lifecycles based on YAML inside CICD.

Concept

Tabledancer is a lightweight application aimed at specifically managing the lifecycle of tables specified using DDLs. It is by no means meant to be a comprehensive model management system such as dbt. concept

How does it work?

As shown in the conceptual diagram above, tabledancer expects to see two things in your git repository. The DDL of the table implemented as a YAML and a lifecycle specification which is also a yaml description which tells tabledancer how to actually react if there is a change in the DDL. These two are typically implemented in a single yaml file. This yaml file is given the name choreograph.

Let's have a look at a simple sample choreograph file below.

backend: databricks 
life_cycle_policy:
  policy: DropCreateOnSchemaChange
  properties: null
table_spec:
  name: simple_table
  database: tdtest
  comment: This is a simple table
  columns:
    - featureOne:
        type: int
        comment: It's a feature
    - featureTwo:
        type: string
        comment: It's another feature
  using: DELTA``yaml

The backend tag simply specifies which backend the target table resides in. This tells tabledancer to use the correct dancer, which is basically the backend implementation.

The life_cycle_policy field defines what to do when there is a difference between the DDL spec in git compared to what's available in the database. The available options and fields are implemented by the dancer.

Finally the table_spec is simply a yaml representation of the DDL file. This can be arbitrarily complex as stipulated by the dancer.

Installation

Currently we don't have this distributed in pip as it's still in early development. Instead we'll have to install from source. Simply clone down the repo using your favourite clone command.

Since this was borne out of work in databricks, and remote working with databricks / spark is a bit finicky, i.e. different versions of databricks-connect required for working with different cluster runtimes; we have provided a special style installation option to get the correct requirements.

It's highly recommended that you do this inside a virtualenv or a container since databricks-connect plays funny with pyspark.

Install just like below, use the --style flag to provide which backend.

python setup.py install --user --style=databricks8.1

About

Super slim DDL management tool for CI/CD applications

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published