GDPR-Obfuscator

GDPR Obfuscation tool that can be integrated as a library module into a Python codebase.

About

This is a general-purpose Python tool to process data being ingested to AWS and intercept personally identifiable information (PII). All information stored by companies data projects should be for bulk data analysis only. Consequently, there is a requirement under GDPR. to ensure that all data containing information that can be used to identify an individual should be anonymised.

This obfuscation tool can be integrated as a library module into a Python codebase.

It is expected that the tool will be deployed within the AWS account.

It is expected that the code will use the AWS SDK for Python (boto3).

It is expected that the code will use the PyArrow when handling parquet data.

The library is suitable for deployment on a platform within the AWS ecosystem, such as EC2, ECS, or Lambda.

Back to top

Requirements

Ensure you have installed latest python version.

Local run

pip install -r ./requirements.txt

or clone repo and run

make requirements

Back to top

Tests_and_Coverage

Code is tested with Pytest, With test coverage of %100
See tests for more details.

Back to top

PEP8_and_security

Code is written in Python,
PEP8 compliant, tested with flake8
As well as tested for security vulnerabilities:
dependency vulnerability safety, security issues bandit.

Back to top

Assumptions_and_Prerequisites

Data is stored in CSV, JSON, or parquet format in S3.
This tool uses External Python libralies:
:Boto3 for managing AWS resources
:Botocore for Error handling available witin AWS enviroment
:PyArrow for parquet data handling
Fields containing GDPR-sensitive data are known and will be supplied in advance, see Usage
Data records will be supplied with a primary key.

Back to top

Usage

pip install from pip branch

pip install "git+https://github.com/mirkovicUK/GDPR-Obfuscator.git@pip"

Imports

from gdpr.obfuscator import gdpr_obfuscator

Alternatively clone the repo:

git clone https://github.com/mirkovicUK/GDPR-Obfuscator.git

Import:

from src.gdpr_obfuscator import gdpr_obfuscator

The tool should be invoked by sending a JSON string containing:
the S3 location of the required CSV,JSON or Parquet file for obfuscation
and the names of the fields that are required to be obfuscated

JSON string format:
{
"file_to_obfuscate": "s3://bucket_name/path_to_data/file.csv",
"pii_fields": ["name", "surname", "other_filelds_to_mask"]
}

masked_data = gdpr_obfuscator(JSON: str)

Example:

Following example will create resources:S3,
and upload some data for testing, example is designed to clean all resources after execution , and to work with AWS Free Tier.

Example will expect AWS credentials in python .env file as this.

bucket='unique bucket name' : mandatory
aws_access_key_id='Your account access key' :optional
aws_secret_access_key= 'Your account secret access key' :optional
region_name = 'region_name' :mandatory

Back to top

Name		Name	Last commit message	Last commit date
Latest commit History 188 Commits
example		example
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GDPR-Obfuscator

Table of Contents

About

Requirements

Tests_and_Coverage

PEP8_and_security

Assumptions_and_Prerequisites

Usage

Example:

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

mirkovicUK/GDPR-Obfuscator

Folders and files

Latest commit

History

Repository files navigation

GDPR-Obfuscator

Table of Contents

About

Requirements

Tests_and_Coverage

PEP8_and_security

Assumptions_and_Prerequisites

Usage

Example:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages