Skip to content

iaBIH/arx_examples

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ARX API Tutorials

This is a spearated repository for ARX examples. ARX is an open source data anonymization software. These examples can be used as a developer tutorial to help get into the ARX code and the data anonymization topic.

Getting started:

To compile the examples that come with ARX

  1. Clone or download this repository
  2. Open eclipse, create a new java project, then point to the downloaded folder
  • note you need to uncheck default location
  1. In eclipse, open src>arxExample>Example01.java and run as a java application
  • you may need to right click on the project then click run as a java application
  • in case classn Example01 not found error, you may need to add arx jar file:
    • right click on arxExamples project> Build path>Configure build path> Libraries
    • click on Classpath then Add External JARs then select arx-3.9-main.jar which can be found in libs folder.

You should get this output:

        - Input data:
            [age, gender, zipcode]
            [34, male, 81667]
            [45, female, 81675]
            [66, male, 81925]
            [70, female, 81931]
            [34, female, 81931]
            [70, male, 81931]
            [45, male, 81931]
          - Time needed: 0.03s
          - Information loss: 0.5874010519681996 / 0.5874010519681996
          - Optimal generalization
            * zipcode: 3/5
            * gender : 0/1
            * age    : 2/2
          - Statistics
         EquivalenceClassStatistics {
         - Average equivalence class size = 3.5
         - Maximal equivalence class size = 4
         - Minimal equivalence class size = 3
         - Number of equivalence classes = 2
         - Number of records = 7
         - Number of suppressed records = 0
         }
          - Transformed data:
            [age, gender, zipcode]
            [*, male, 81***]
            [*, female, 81***]
            [*, male, 81***]
            [*, female, 81***]
            [*, female, 81***]
            [*, male, 81***]
            [*, male, 81***]                  

Compiling as a standalone

Use these command-lines (tested on windows bash terminal):

              # first build the class files, note that the order of the java files is important
              javac -cp libs/arx-3.9-main.jar arxExamples/ExampleUtils.java arxExamples/Example01.java
              
              # Then run the class file, notice that for windows terminal you should use ; instead of : 
              java -cp libs/arx-3.9-main.jar:"/c/<fullPath>/arxExamples" Example01                                  

              # Note, the path is important e.g. if the three files in a folder, then run this commands from inside the folder
              javac -cp arx-3.9-main.jar ExampleUtils.java Example01.java
              java -cp arx-3.9-main.jar Example01                                  

List of examples:

TODO: organize them to different groups e.g. I/O, anonymization models, hierarchy, statistics, metrics, ... etc

  1. Example01.java, in this example these concepts are exaplained:

    • creating data and hierarchy manually

       DefaultData data = Data.create();
       data.add("age", "gender", "zipcode");
       data.add("34", "male", "81667");
       data.add("45", "female", "81675");
       .
       .
       .
       // Define hierarchies
       DefaultHierarchy age = Hierarchy.create();
       age.add("34", "<50", "*");
       age.add("45", "<50", "*");
      
    • Connect hierarchy on the data, apply anoynimisation, print the input data and the results.
      The output should be similar to the above output.

  2. Example02.java: In this example these concepts are exaplained:

    • Same as Example01.java, it also shows how to read/write data from/to csv files. The output should look like:

                Reading data from  data/test.csv ....!!!
             - Time needed: 0.02s
             - Information loss: 0.30055597162146275 / 0.30055597162146275
             - Optimal generalization
               * zipcode: 3/5
               * gender : 0/1
               * age    : 2/2
             - Statistics
             EquivalenceClassStatistics {
             - Average equivalence class size = 3.5
             - Maximal equivalence class size = 4
             - Minimal equivalence class size = 3
             - Number of equivalence classes = 2
             - Number of records = 7
             - Number of suppressed records = 0
             }
             - Writing data...
             Result is saved in data/test_result.csv
             Done!
      
  3. Example03.java: - Same as Example01.java.

         - Time needed: 0.02s
         - Information loss: 0.3333333332999999 / 0.3333333332999999
         - Optimal generalization
           * zipcode: 2/5
         - Statistics
         EquivalenceClassStatistics {
         - Average equivalence class size = 3.5
         - Maximal equivalence class size = 5
         - Minimal equivalence class size = 2
         - Number of equivalence classes = 2
         - Number of records = 7
         - Number of suppressed records = 0
         }
         - Transformed data:
           [age, gender, zipcode]
           [*, male, 816**]
           [*, female, 816**]
           [*, male, 819**]
           [*, female, 819**]
           [*, female, 819**]
           [*, male, 819**]
           [*, male, 819**]
    
  4. Example04.java: - Same as Example01.java. It also shows how to get information about the data, define hierarchy using AttributeType. The output should look like this:

         - inHandle.getNumRows()       :7
         - inHandle.getNumColumns()    :3
         - inHandle.getAttributeName(0):age
         - inHandle.getValue(0,0)      :34
         - Time needed: 0.02s
         - Information loss: 0.3333333332999999 / 0.3333333332999999
         - Optimal generalization
           * zipcode: 2/5
         - Statistics
         EquivalenceClassStatistics {
         - Average equivalence class size = 3.5
         - Maximal equivalence class size = 5
         - Minimal equivalence class size = 2
         - Number of equivalence classes = 2
         - Number of records = 7
         - Number of suppressed records = 0
         }
         - Transformed data:
           [age, gender, zipcode]
           [*, male, 819**]
           [*, female, 819**]
           [*, female, 819**]
           [*, male, 819**]
           [*, male, 819**]
           [*, male, 816**]
           [*, female, 816**]
    
  5. Example05.java: - Same as Example01.java. It also shows how to use two privacy models at the same time. The output should look like this:

         - Time needed: 0.02s
         - Information loss: 0.41421356237309515 / 0.41421356237309515
         - Optimal generalization
           * zipcode: 3/5
           * gender : 0/1
         - Statistics
         EquivalenceClassStatistics {
         - Average equivalence class size = 3.5
         - Maximal equivalence class size = 4
         - Minimal equivalence class size = 3
         - Number of equivalence classes = 2
         - Number of records = 7
         - Number of suppressed records = 0
         }
         - Transformed data:
           [age, gender, zipcode]
           [34, male, 81***]
           [45, female, 81***]
           [66, male, 81***]
           [70, female, 81***]
           [34, female, 81***]
           [70, male, 81***]
           [45, male, 81***]
    
  6. Example06.java: Similar to 5

  7. Example07.java: Similar to 5

  8. Example08.java: t-closeness criterion.

  9. Example09.java: d-presence criterion.

  10. Example10.java: data subsets

  11. Example11.java: data selector

  12. Example12.java: complex data selector

  13. Example13.java: multiple sensitive attributes and different privacy models.

  14. Example14.java: loss metrics

  15. Not found!!!

  16. Example16.java: statistics e.g. frequencies

  17. Example17.java: data types

  18. Example18.java: hierarchy builders.

  19. Example19.java: lattice, creating different representation of the results.

  20. Example20.java: aggregate functions.

  21. Example21.java: import data from different sources.

  22. Example22.java: l-diversity privacy model without protecting sensitive assocations.

  23. Example23.java: multiple instances of l-diversity without protecting sensitive associations.

  24. Example24.java: directly using empty and functional hierarchies.

  25. Example25.java: generalized loss metric with different types of generalization hierarchies.

  26. Example26.java: an interval-based hierarchy builder with high precision .

  27. Example27.java: data cleansing capabilities

  28. Example28.java: data cleansing using the DataSource functionality.

  29. Example29.java: risk analysis

  30. Example30.java: summary statistics

  31. Example31.java: microaggregation

  32. Example32.java: microaggregation with generalization

  33. Example33.java: microaggregation: attribute types and transformation methods should be specified separately.

  34. Example34.java: heuristic search algorithm

  35. Example35.java: HIPAA identifiers

  36. Example36.java: utility-based microaggregation

  37. Example37.java: E,D Differential Privacy

  38. Example38.java: local recoding

  39. Example39.java: compare data mining performance

  40. Example40.java: compare data mining performanc

  41. Example41.java: k-map model

  42. Example42.java: k-map and d-presence models combined

  43. Example43.java: evaluate combined risk metrics

  44. Example44.java: k-map privacy model with a statistical estimator.

  45. Example45.java: mixed risk model

  46. Example46.java: distribution of risks

  47. Example47.java: evaluating distinction and separation of attributes, ref: Motwani et al. "Efficient algorithms for masking and finding quasi-identifiers" Proc. VLDB Conf., 2007.

  48. Example48.java: ordered distance t-closeness, ref: Li et al. "t-Closeness: Privacy Beyond k-Anonymity and l-Diversity"

  49. Example49.java: no-attack, game-theoretic, a monetary cost/benefit analysis using prosecutor risk.

  50. Example50.java: no-attack, game-theoretic, a monetary cost/benefit analysis using journalist risk.

  51. Example51.java: game-theoretic, a monetary cost/benefit analysis using prosecutor risk.

  52. Example52.java: game-theoretic, a monetary cost/benefit analysis using journalist risk.

  53. Example53.java: generate pdf reports

  54. Example54.java: access quality statistics

  55. Example55.java: fast algorithm for local recoding with ARX

  56. Example56.java: evaluate risk with wildcard matching

  57. Example57.java: analyze risks with wildcards for data transformed with cell suppression

  58. Example58.java: consistent handling of suppressed records in input and output

  59. Example59.java: handling of suppressed values and records in input data

  60. Example60.java: processing high-dimensional data

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published