This is a spearated repository for ARX examples. ARX is an open source data anonymization software. These examples can be used as a developer tutorial to help get into the ARX code and the data anonymization topic.
To compile the examples that come with ARX
- Clone or download this repository
- Open eclipse, create a new java project, then point to the downloaded folder
- note you need to uncheck default location
- In eclipse, open src>arxExample>Example01.java and run as a java application
- you may need to right click on the project then click run as a java application
- in case classn Example01 not found error, you may need to add arx jar file:
- right click on arxExamples project> Build path>Configure build path> Libraries
- click on Classpath then Add External JARs then select arx-3.9-main.jar which can be found in libs folder.
You should get this output:
- Input data:
[age, gender, zipcode]
[34, male, 81667]
[45, female, 81675]
[66, male, 81925]
[70, female, 81931]
[34, female, 81931]
[70, male, 81931]
[45, male, 81931]
- Time needed: 0.03s
- Information loss: 0.5874010519681996 / 0.5874010519681996
- Optimal generalization
* zipcode: 3/5
* gender : 0/1
* age : 2/2
- Statistics
EquivalenceClassStatistics {
- Average equivalence class size = 3.5
- Maximal equivalence class size = 4
- Minimal equivalence class size = 3
- Number of equivalence classes = 2
- Number of records = 7
- Number of suppressed records = 0
}
- Transformed data:
[age, gender, zipcode]
[*, male, 81***]
[*, female, 81***]
[*, male, 81***]
[*, female, 81***]
[*, female, 81***]
[*, male, 81***]
[*, male, 81***]
Use these command-lines (tested on windows bash terminal):
# first build the class files, note that the order of the java files is important
javac -cp libs/arx-3.9-main.jar arxExamples/ExampleUtils.java arxExamples/Example01.java
# Then run the class file, notice that for windows terminal you should use ; instead of :
java -cp libs/arx-3.9-main.jar:"/c/<fullPath>/arxExamples" Example01
# Note, the path is important e.g. if the three files in a folder, then run this commands from inside the folder
javac -cp arx-3.9-main.jar ExampleUtils.java Example01.java
java -cp arx-3.9-main.jar Example01
TODO: organize them to different groups e.g. I/O, anonymization models, hierarchy, statistics, metrics, ... etc
-
Example01.java, in this example these concepts are exaplained:
-
creating data and hierarchy manually
DefaultData data = Data.create(); data.add("age", "gender", "zipcode"); data.add("34", "male", "81667"); data.add("45", "female", "81675"); . . . // Define hierarchies DefaultHierarchy age = Hierarchy.create(); age.add("34", "<50", "*"); age.add("45", "<50", "*");
-
Connect hierarchy on the data, apply anoynimisation, print the input data and the results.
The output should be similar to the above output.
-
-
Example02.java: In this example these concepts are exaplained:
-
Same as Example01.java, it also shows how to read/write data from/to csv files. The output should look like:
Reading data from data/test.csv ....!!! - Time needed: 0.02s - Information loss: 0.30055597162146275 / 0.30055597162146275 - Optimal generalization * zipcode: 3/5 * gender : 0/1 * age : 2/2 - Statistics EquivalenceClassStatistics { - Average equivalence class size = 3.5 - Maximal equivalence class size = 4 - Minimal equivalence class size = 3 - Number of equivalence classes = 2 - Number of records = 7 - Number of suppressed records = 0 } - Writing data... Result is saved in data/test_result.csv Done!
-
-
Example03.java: - Same as Example01.java.
- Time needed: 0.02s - Information loss: 0.3333333332999999 / 0.3333333332999999 - Optimal generalization * zipcode: 2/5 - Statistics EquivalenceClassStatistics { - Average equivalence class size = 3.5 - Maximal equivalence class size = 5 - Minimal equivalence class size = 2 - Number of equivalence classes = 2 - Number of records = 7 - Number of suppressed records = 0 } - Transformed data: [age, gender, zipcode] [*, male, 816**] [*, female, 816**] [*, male, 819**] [*, female, 819**] [*, female, 819**] [*, male, 819**] [*, male, 819**]
-
Example04.java: - Same as Example01.java. It also shows how to get information about the data, define hierarchy using AttributeType. The output should look like this:
- inHandle.getNumRows() :7 - inHandle.getNumColumns() :3 - inHandle.getAttributeName(0):age - inHandle.getValue(0,0) :34 - Time needed: 0.02s - Information loss: 0.3333333332999999 / 0.3333333332999999 - Optimal generalization * zipcode: 2/5 - Statistics EquivalenceClassStatistics { - Average equivalence class size = 3.5 - Maximal equivalence class size = 5 - Minimal equivalence class size = 2 - Number of equivalence classes = 2 - Number of records = 7 - Number of suppressed records = 0 } - Transformed data: [age, gender, zipcode] [*, male, 819**] [*, female, 819**] [*, female, 819**] [*, male, 819**] [*, male, 819**] [*, male, 816**] [*, female, 816**]
-
Example05.java: - Same as Example01.java. It also shows how to use two privacy models at the same time. The output should look like this:
- Time needed: 0.02s - Information loss: 0.41421356237309515 / 0.41421356237309515 - Optimal generalization * zipcode: 3/5 * gender : 0/1 - Statistics EquivalenceClassStatistics { - Average equivalence class size = 3.5 - Maximal equivalence class size = 4 - Minimal equivalence class size = 3 - Number of equivalence classes = 2 - Number of records = 7 - Number of suppressed records = 0 } - Transformed data: [age, gender, zipcode] [34, male, 81***] [45, female, 81***] [66, male, 81***] [70, female, 81***] [34, female, 81***] [70, male, 81***] [45, male, 81***]
-
Example06.java: Similar to 5
-
Example07.java: Similar to 5
-
Example08.java: t-closeness criterion.
-
Example09.java: d-presence criterion.
-
Example10.java: data subsets
-
Example11.java: data selector
-
Example12.java: complex data selector
-
Example13.java: multiple sensitive attributes and different privacy models.
-
Example14.java: loss metrics
-
Example16.java: statistics e.g. frequencies
-
Example17.java: data types
-
Example18.java: hierarchy builders.
-
Example19.java: lattice, creating different representation of the results.
-
Example20.java: aggregate functions.
-
Example21.java: import data from different sources.
-
Example22.java: l-diversity privacy model without protecting sensitive assocations.
-
Example23.java: multiple instances of l-diversity without protecting sensitive associations.
-
Example24.java: directly using empty and functional hierarchies.
-
Example25.java: generalized loss metric with different types of generalization hierarchies.
-
Example26.java: an interval-based hierarchy builder with high precision .
-
Example27.java: data cleansing capabilities
-
Example28.java: data cleansing using the DataSource functionality.
-
Example29.java: risk analysis
-
Example30.java: summary statistics
-
Example31.java: microaggregation
-
Example32.java: microaggregation with generalization
-
Example33.java: microaggregation: attribute types and transformation methods should be specified separately.
-
Example34.java: heuristic search algorithm
-
Example35.java: HIPAA identifiers
-
Example36.java: utility-based microaggregation
-
Example37.java: E,D Differential Privacy
-
Example38.java: local recoding
-
Example39.java: compare data mining performance
-
Example40.java: compare data mining performanc
-
Example41.java: k-map model
-
Example42.java: k-map and d-presence models combined
-
Example43.java: evaluate combined risk metrics
-
Example44.java: k-map privacy model with a statistical estimator.
-
Example45.java: mixed risk model
-
Example46.java: distribution of risks
-
Example47.java: evaluating distinction and separation of attributes, ref: Motwani et al. "Efficient algorithms for masking and finding quasi-identifiers" Proc. VLDB Conf., 2007.
-
Example48.java: ordered distance t-closeness, ref: Li et al. "t-Closeness: Privacy Beyond k-Anonymity and l-Diversity"
-
Example49.java: no-attack, game-theoretic, a monetary cost/benefit analysis using prosecutor risk.
-
Example50.java: no-attack, game-theoretic, a monetary cost/benefit analysis using journalist risk.
-
Example51.java: game-theoretic, a monetary cost/benefit analysis using prosecutor risk.
-
Example52.java: game-theoretic, a monetary cost/benefit analysis using journalist risk.
-
Example53.java: generate pdf reports
-
Example54.java: access quality statistics
-
Example55.java: fast algorithm for local recoding with ARX
-
Example56.java: evaluate risk with wildcard matching
-
Example57.java: analyze risks with wildcards for data transformed with cell suppression
-
Example58.java: consistent handling of suppressed records in input and output
-
Example59.java: handling of suppressed values and records in input data
-
Example60.java: processing high-dimensional data