This project automates the download, standardization, database import, and data representation of Switzerland Health insurances related-data from open sources like opendata.swiss. It processes all available data o the current year and prepares it for use in analytics or dashboards.
- 🐍 Python 3.10 + Pipenv
- 🐬 MariaDB 11.3
- 🐳 Docker & Docker Compose
- 📦 CSV-based datasets from opendata.swiss
The idea is to have full and useful pipelines inside the "datasets" directory. Each directory contains all the needed instructions for
- Downloading the needed files / databases for building a dataset
- Cooking the dataset, mixing and transforming data
- Loading the dataset into a database
- Lamal
- Tarmed (in progress)
The easiest way to run everything is using Docker Compose. It handles dataset generation and database provisioning automatically.
docker compose --profile PIPELINE_NAME up -df.e: docker compose --profile Lamal up -d
What this does:
- Downloads all raw data files
- Unzips and cleans them
- Standardizes everything into
.csvformat. You can find the results (the raw CSV Files) in build/export of the pipeline directory - Starts a MariaDB container
- Imports the data using
All important variables are declared in .dataset.env. of your pipeline directory. Example:
# Dataset archive URLs
export DATASET_ARCHIVES="Archiv_Praemien_2011.zip|https://...;Archiv_Praemien_2012.zip|https://...;..."
export DATASET_LAST_YEAR="2025"
export DATASET_LAST_URL_CH="https://..."
export DATASET_LAST_URL_EU="https://..."On the .env file of the main directory, you can also find some environment variables for configuring docker compose vars. :
# DB Credentials
MYSQL_ROOT_PASSWORD=root
MYSQL_DATABASE=lamal
MYSQL_USER=lamal
MYSQL_PASSWORD=lamalIf you want to build the dataset without Docker, follow this instructions:
sudo apt-get install python3 python3-pip unzip pipenv
# Go to your build dataset directory (cd datasets/Lamal/build)
pip install pipenv (not necessary if pipenv is installed already)
pipenv install# Go to your build dataset directory (cd datasets/Lamal/build)
pipenv run bash utils/generate_dataset.sh- Start a local MariaDB/MySQL instance.
- Modify the paths inside CreateAndImportData.sql for pointing to the export directory
Example: /app/export/assurances.csv' - Run the import script manually:
mysql -u lamal -p lamal < CreateAndImportData.sql
Swiss federal health data is inconsistent:
- Different encodings (UTF-8, latin-1)
- Column names change over time
- Values and enums are not standardized
This pipeline ensures all years conform to a unified schema for further processing.