Skip to content

Export: support partial depseudnoymization when deducing pseudo rules from dataset metadata #4

@kschulst

Description

@kschulst

Right now, if specify that we should depseudonymize during export, then the depseudonymization will be applied for all pseudo rules that are provided. This is done using the pseudoRules parameter which accepts a list of pseudo rules (name, pattern, func), each of which potentially matches multiple fields.

In the export endpoint, if we don't explicitly specify which pseudo rules to use, then we try to retrieve these rules from the dataset metadata. Deducing pseudo rules from the dataset metadata is assumably going to be the main use case. However, for these cases:

  • we don't have any mechanism to only specify a subset of rules to applied
  • we don't have any mechanism to only specify a subset of fields to be depseudonymized

Thus, the suggestion is to introduce two new parameters: pseudoRulesFilter and pseudoFieldsFilter.

To summarize, depseudonymization during export would be specified by the following parameters:

  • pseudoRules - if not present, then deduce these from the dataset path
  • pseudoRulesPath - optional explicit path to deduce pseudo rules from (Export: support retrieving pseudo rules from another dataset path #2)
  • pseudoRulesFilter - a list of named pseudo rules that should be considered
  • pseudoFieldsFilter - a list of globs that addresses the fields that should be considered. Allows the user to have more control over which fields gets depseudonymized, since a pseudo rule might match multiple fields
  • depseudo - whether or not the export should depseudonymize. Only required if pseudo rules should be deduced from dataset path and no pseudo filters have been specified. If either of the above parameters are present, then the export should assume this property to be true.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions