Skip to content

cripsisxyz/swisshealth

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🇨🇭 Lamal|Tarmed – Swiss Health Insurance Data Analysis

This project automates the download, standardization, database import, and data representation of Switzerland Health insurances related-data from open sources like opendata.swiss. It processes all available data o the current year and prepares it for use in analytics or dashboards.


⚙️ Stack

  • 🐍 Python 3.10 + Pipenv
  • 🐬 MariaDB 11.3
  • 🐳 Docker & Docker Compose
  • 📦 CSV-based datasets from opendata.swiss

🏗️ Project structure

The idea is to have full and useful pipelines inside the "datasets" directory. Each directory contains all the needed instructions for

  1. Downloading the needed files / databases for building a dataset
  2. Cooking the dataset, mixing and transforming data
  3. Loading the dataset into a database

🔁 Current pipelines

  1. Lamal
  2. Tarmed (in progress)

🚀 Quickstart (Dockerized)

The easiest way to run everything is using Docker Compose. It handles dataset generation and database provisioning automatically.

1. Build and start the system:

Launch full stack (Dataset building, )

docker compose --profile PIPELINE_NAME up -d

f.e: docker compose --profile Lamal up -d

What this does:

  • Downloads all raw data files
  • Unzips and cleans them
  • Standardizes everything into .csv format. You can find the results (the raw CSV Files) in build/export of the pipeline directory
  • Starts a MariaDB container
  • Imports the data using

⚗️ Environment Configuration

All important variables are declared in .dataset.env. of your pipeline directory. Example:

# Dataset archive URLs
export DATASET_ARCHIVES="Archiv_Praemien_2011.zip|https://...;Archiv_Praemien_2012.zip|https://...;..."
export DATASET_LAST_YEAR="2025"
export DATASET_LAST_URL_CH="https://..."
export DATASET_LAST_URL_EU="https://..."

On the .env file of the main directory, you can also find some environment variables for configuring docker compose vars. :

# DB Credentials
MYSQL_ROOT_PASSWORD=root
MYSQL_DATABASE=lamal
MYSQL_USER=lamal
MYSQL_PASSWORD=lamal

🔧 Manual Mode (without Docker)

If you want to build the dataset without Docker, follow this instructions:

1. Install dependencies

sudo apt-get install python3 python3-pip unzip pipenv
# Go to your build dataset directory (cd datasets/Lamal/build)
pip install pipenv (not necessary if pipenv is installed already)
pipenv install

2. Run the pipeline

# Go to your build dataset directory (cd datasets/Lamal/build)
pipenv run bash utils/generate_dataset.sh

3. Launch MariaDB locally and import the data

  • Start a local MariaDB/MySQL instance.
  • Modify the paths inside CreateAndImportData.sql for pointing to the export directory
    Example: /app/export/assurances.csv'
  • Run the import script manually:
    mysql -u lamal -p lamal < CreateAndImportData.sql

🧠 Why preprocess the data?

Swiss federal health data is inconsistent:

  • Different encodings (UTF-8, latin-1)
  • Column names change over time
  • Values and enums are not standardized

This pipeline ensures all years conform to a unified schema for further processing.

About

Build & Create + Run database of different Swiss Health Datasets, in particular for LAMAL

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 71.5%
  • Shell 28.5%