Table of Contents
This Service implements the creation of new Pseudonyms for Notifications based on Bloom filter.
See ReleaseNotes for all information regarding the (newest) releases.
In order to start the Pseudonymization-Service locally there must be a running Redis instance, which is preconfigured with some secrets.
For convenience reasons there is a docker-compose.yml
in the root folder of this project that
starts, sets up and imports two secrets to the Redis container, so you can start immediately.
- A JDK 21 distribution (e.g. Eclipse Temurin, Amazon Corretto, etc.)
- Maven 3.8+
- Docker (optional)
- A running PostgreSQL Instance, configured with initial Secrets.
mvn clean verify
The Project can be built with the following command:
mvn -e clean verify -DskipTests=true
build with docker image:
docker build -t pseudonymization-service:latest .
The Docker Image associated to the service can be built alternatively with the extra profile docker
:
mvn -e clean verify -Pdocker
The application can be started as Docker container with the following commands:
docker compose -f docker-compose.yml up -d
docker run --rm --name pseudonymization-service -p 8080:8080 pseudonymization-service:latest
It can be started as SpringBoot Application directly from IntelliJ or running the commands:
mvn clean verify
java -jar target/pseduonymization-service.jar
Start the spring boot server with: mvn clean spring-boot:run
Check the server with: curl -v localhost:8080/actuator/health
The service can be deployed to Kubernetes by using the available Helm Chart in the repository:
helm upgrade --install pseudonymization-service deployment/helm/pseudonymization-service/ --namespace MY_NAMESPACE
Important: PostgreSQL must be deployed and configured separately.
Start the spring boot server with: mvn clean spring-boot:run
Check the server with: curl -v localhost:8080/actuator/health
Endpoint | Description |
---|---|
/pseudonymization |
POST endpoint for creating new Pseudonyms based on input data given. |
/actuator/health/ |
Standard endpoint from Actuator. |
/actuator/health/liveness |
Standard endpoint from Actuator. |
/actuator/health/readiness |
Standard endpoint from Actuator. |
It provides one endpoint /pseudonymization
for the HTTP POST method, it expects a JSON
as request body, and it is protected by an API key.
Example HTTP request:
POST http://localhost:8080/pseudonymization Content-Type: application/json
{ "type": "demisPseudonymizationRequest", "familyName": "Schmidt", "firstName": "Anna", "dateOfBirth": "23.02.2012", "diseaseCode": "covid19" }
Note:
- All five properties need to be non-empty.
- The property dateOfBirth need to be in the format dd.MM.yyyy
- Values of the properties type are always
"demisPseudonymisationRequest"
.
For legacy compatibility:
- The content type
application/vnd.demis_pseudonymization+json
is supported then the answering content type is alsoapplication/vnd.demis_pseudonymization+json
. - As value for the
type
property"demisPseudonymizationRequest"
is also allowed.
The following flags can be used to configure the generation of common Secrets (§6.1/§7.1/§7.4 Notifications - or secret "one"):
secrets.one.generation.enabled=true
secrets.one.generation.init-on-missing=false
secrets.one.generation.days-of-validity=45
secrets.one.generation.secret-length=50
secrets.one.generation.supported-symbols="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789.;!?$%&/()[]-_"
# run every day by default at 0:00 - the Generation Schedule should run before the reloading one.
secrets.one.generation.cron-schedule=0 0 0 * * *
# secret reloading configuration
secrets.one.reloading.enabled=true
# run every day by default at 0:10 - the Reloading Schedule should run after the generation one.
secrets.one.reloading.cron-schedule=0 10 0 * * *
The following flags can be used to configure the generation of anonymous Secrets (§7.3 Notifications - or secret "two"):
secrets.two.generation.enabled=true
secrets.two.generation.init-on-missing=false
secrets.two.generation.days-of-validity=1095
secrets.two.generation.secret-length=50
secrets.two.generation.supported-symbols="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789.;!?$%&/()[]-_"
# run every day by default at 0:00 - the Generation Schedule should run before the reloading one.
secrets.two.generation.cron-schedule=0 0 0 * * *
# secret reloading configuration
secrets.two.reloading.enabled=true
# run every day by default at 0:10 - the Reloading Schedule should run after the generation one.
secrets.two.reloading.cron-schedule=0 10 0 * * *
Ops Flag | Description |
---|
Pseudonym generation uses hash functions, which require secrets. Different secrets used by the same hash function lead to different pseudonyms for the same input. This helps protect end-user privacy.
The current implementation uses two hash functions. Each hash function requires a secret. Pseudonym generation is implemented for names and birthdays. A secret consists of four passphrases:
- nameFunctionFirst: used for the first hash function
- nameFunctionSecond: used for the second hash function
- dateFunctionFirst: used for the first hash function
- dateFunctionSecond: used for the second hash function
Each secret is salted with the disease code before pseudonym generation. This way it's not possible to identify identical pseudonyms across different disease. Example:
pseudonym_for_name("Foobar", nameFunctionFirst, nameFunctionSecond, "cvpd") = "oewiruw2312"
pseudonym_for_name("Foobar", nameFunctionFirst, nameFunctionSecond, "evdp") = "jhiuhu12uih"
Due to legal requirements we have configured two secrets. Which secret is used, depends on the disease code of a notification. The secrets differ in their lifespan. The two secrets are:
- "secrets one": common notifications (§6.1/§7.1/§7.4 notifications) have a lifespan of weeks
- "secrets two": anonymous notifications (§7.3, e.g. HIV) have a lifespan of years
We have chosen to go with these enumarted names, because the lifespan isn't
implied by the disease code and the the configuration for disease codes might
change in the future. It's not practical to have a short term secret
or
long term secret
.
Secrets are grouped into secret pairs. A secret pair consists of:
- an outdated secret
- an active secret
To improve end-user privacy a secret is re-generated at the end of it's lifespan. To allow downstream institutes to analyse long running cases with multiple notifications we return the pseudonym based on the outdated secret and the active secret.
This way a chain of pseudonyms is generated. Cases can only be connected if these pseudonym pairs are stored. Example:
form: [outdated, active]
[pseudonymN-3, pseudonymN-2] <- [pseudonymN-2, pseudonymN-1] <- [pseudonymN-1, pseudonymN]
Each secret is stored in it's own database table.
Look at the created_at
column to identify active and outdated secret. Active is the one with the largest created_at
timestamp
and outdated is the one with the second largest created_at
timestamp.
Application properties to control the regneration and reloading schedule are available.
If you want to see the security policy, please check our SECURITY.md.
If you want to contribute, please check our CONTRIBUTING.md.
EUROPEAN UNION PUBLIC LICENCE v. 1.2
EUPL © the European Union 2007, 2016
- Copyright notice: Each published work result is accompanied by an explicit statement of the license conditions for use. These are regularly typical conditions in connection with open source or free software. Programs described/provided/linked here are free software, unless otherwise stated.
- Permission notice: Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions::
- The copyright notice (Item 1) and the permission notice (Item 2) shall be included in all copies or substantial portions of the Software.
- The software is provided "as is" without warranty of any kind, either express or implied, including, but not limited to, the warranties of fitness for a particular purpose, merchantability, and/or non-infringement. The authors or copyright holders shall not be liable in any manner whatsoever for any damages or other claims arising from, out of or in connection with the software or the use or other dealings with the software, whether in an action of contract, tort, or otherwise.
- The software is the result of research and development activities, therefore not necessarily quality assured and without the character of a liable product. For this reason, gematik does not provide any support or other user assistance (unless otherwise stated in individual cases and without justification of a legal obligation). Furthermore, there is no claim to further development and adaptation of the results to a more current state of the art.
- Gematik may remove published results temporarily or permanently from the place of publication at any time without prior notice or justification.
- Please note: Parts of this code may have been generated using AI-supported technology.’ Please take this into account, especially when troubleshooting, for security analyses and possible adjustments.
See LICENSE.
E-Mail to DEMIS Entwicklung