Benchmarking Differential Privacy Python Tools

This repository contains the implementation of benchmarked differential privacy Python libraries and frameworks. The libraries and frameworks are evaluated based on utility and execution time.

Sample Experimental Results

Mean relative error of the sum query for experiments with synthetic datasets.

Execution time of the tools in the Spark environment, experimented on synthetic datasets of varying sizes of upto 1 billion data points.

How to Execute Queries Using the Libraries/Frameworks on Synthetic Data

# sample command to run a query on a dataset size using a library/framwework 
python3 run_tool.py --size 100k --query VARIANCE  --tool opendp

Argument	Description	Type	Default
size	Dataset size to run query. Valid values are {`1k`, `10k`, `100k`}	str	`10k`
query	Query to run. Valid values are { `count`, `sum`, `mean`, `variance`}	str	`count`
tool	Library/Framework to use. Valid values are { `diffprivlib`, `opendp`, `tmlt_ana`, `pipelinedp_local`, `pipelinedp_spark`}	str	`diffprivlib`

Medium Articles

This git repository is also referenced by our four-part differential privacy articles on DSAID Medium:

Part 1: Sharing Data with Differential Privacy: A Primer — A beginner’s guide to understanding the fundamental concepts of differential privacy with simplified mathematical interpretation.
Part 2: Practitioners’ Guide to Accessing Emerging Differential Privacy Tools — Explore the emerging differential privacy tools developed by prominent researchers and institutions, with practical guidance on their adoption for real-world use cases.
Part 3: Evaluating Differential Privacy Tools’ Performance — A comparative analysis of the accuracy and execution time of differential privacy tools in both standalone and distributed environments, with a focus on common analytical queries.
Part 4: Getting Started with Scalable Differential Privacy Tools on the Cloud — A step-by-step guide to deploying differential privacy tools in a distributed environment on AWS services, specifically AWS Glue and Amazon EMR, to support the analysis of large datasets.

The following folders are used as references to Part 4:
```
|- glue/ - Glue differential privacy examples
|- emr/  - EMR differential privacy examples
```

Contributors

@anshu-gt (anshu@dsaid.gov.sg)
@syah-ri (syahri@dsaid.gov.sg)

💪 by DSAID Data Privacy Protection Capability Centre (DPPCC) of GovTech

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
commons		commons
data		data
dp_tools		dp_tools
emr		emr
glue		glue
images		images
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
plots.ipynb		plots.ipynb
run_tool.py		run_tool.py
whitepaper_on_differential_privacy_v1_11th_may_2023.pdf		whitepaper_on_differential_privacy_v1_11th_may_2023.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Benchmarking Differential Privacy Python Tools

Sample Experimental Results

How to Execute Queries Using the Libraries/Frameworks on Synthetic Data

Medium Articles

Contributors

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

dsaidgovsg/benchmarking-differential-privacy-tools

Folders and files

Latest commit

History

Repository files navigation

Benchmarking Differential Privacy Python Tools

Sample Experimental Results

How to Execute Queries Using the Libraries/Frameworks on Synthetic Data

Medium Articles

Contributors

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages