Skip to content

javigoncompte/datastream-platform

Repository files navigation

🚀 DataStream Platform

�� Databricks Bundles Deploy Monorepo

Sync the una dependecy graph

uv run una sync

🔧 Prerequisites

  1. Install Databricks CLI 0.238 or later. See Install or update the Databricks CLI.

  2. Install uv. See Installing uv. We use uv to create a virtual environment and install the required dependencies.

  3. Authenticate to your Databricks workspace if you have not done so already:

    $ databricks configure
  4. Optionally, install developer tools such as the Databricks extension for Visual Studio Code from https://docs.databricks.com/dev-tools/vscode-ext.html. Or read the "getting started" documentation for Databricks Connect for instructions on running the included Python code from a different IDE.

  5. For documentation on the Databricks Asset Bundles format used for this project, and for CI/CD configuration, see https://docs.databricks.com/dev-tools/bundles/index.html.

🚀 Deploy and run jobs

  1. Create a new virtual environment and install the required dependencies:

    $ uv sync --all-packages
  2. To deploy a specific package/app bundle to the development target:

    $ databricks bundle deploy --target dev --var="type_of_deployment=packages" --var="deployment_name=datastream"

    Note: "dev" is the default target, so the --target parameter is optional here.

    This deploys everything that's defined for this project. For example, the default template would deploy a job called [dev yourname] datastream_job to your workspace. You can find that job by opening your workspace and clicking on Workflows.

  3. Similarly, to deploy a production copy:

    $ databricks bundle deploy --target prod --var="type_of_deployment=packages" --var="deployment_name=datastream"
  4. To run a job:

    $ databricks bundle run

📚 MonoRepo Structure

📁 Structure

  • �� libs: importable packages, never run independently, do not have entry points
  • 🌐 apps: Applications that can be deployed to web servers, never imported
  • 📦 packages: Are just packages that you want to run either have entry points or just scripts calling custom packages never imported

🌊 Docker

You can use Docker or just normal wheel builds

🔄 Syncing

To make life easier while you're working across the workspace, you should run:

uv sync --all-packages

uv's sync behaviour is as follows:

  • If you're in the workspace root and you run uv sync, it will sync only the dependencies of the root workspace, which for this kind of monorepo should be bare. This is not very useful.
  • If you're in eg apps/myserver and run uv sync, it will sync only that package.
  • You can run uv sync --package=postmodern-server to sync only that package.
  • You can run uv sync --all-packages to sync all packages.

You can add an alias to your .bashrc/.zshrc if you like:

alias uvs="uv sync --all-packages"

📋 Dependencies

You'll notice that apps/mycli has urllib3 as a dependency. Because of this, every package in the workspace is able to import urllib3 in local development, even though they don't include it as a direct or transitive dependency.

This can make it possible to import stuff and write passing tests, only to have stuff fail in production, where presumably you've only got the necessary dependencies installed.

There are two ways to guard against this:

  1. If you're working only on eg libs/server, you can sync only that package (see above). This will make your LSP shout and your tests fail, if you try to import something that isn't available to that package.

  2. Your CI (see example in .github/workflows) should do the same.

Dev dependencies

This repo has all the global dev dependencies (basedpyright, pytest, ruff etc) in the root pyproject.toml, and then repeated again in each package without their version specifiers.

It's annoying to have to duplicate them, but at least excluding the versions makes it easier to keep things in sync.

The reason they have to be repeated, is that if you follow the example above and install only a specific package, it won't include anything specified in the root package.

⚡ Tasks/scripts

🧪 Testing

This repo includes a simple pytest test for each package.

make fmt
make lint
make check
make test

make all

Build

make build

✅ basedpyright

Note its a dev dependency so if you want to remove it and use ty or something else go ahead.

This repo actually uses basedpyright.

The following needs to be included with every package pyproject.toml:

[tool.basedpyright]
venvPath = "../.."       # point to the workspace root where the venv is
venv = ".venv"
strict = ["**/*.py"]
pythonVersion = "3.13"

Then you can run uv run poe check as for tests.

$ uvx --from build pyproject-build --installer=uv --outdir=dist --wheel apps/printer

🔨 Building Wheels

# Builds Evalgen a library
$ uvx --from build pyproject-build --installer=uv --outdir=dist  --wheel libs/evalgen

# Builds a package named datastream
$ uvx --with uvx-dynamic-versioning --from build pyproject-build --installer uv --wheel packages/datastream

About

OpenSource dataplatform Integrating frameworks

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages