Skip to content

stacs-srg/crptr-fork

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Crptr

Crptr is a software that modifies data based on given probabilities and methods in order to simulate errors and variations occur in real-world data.

Originally developed by Ahmad Alsadeeqi (see original Crptr repository). This fork was developed by Tom Dalton, to extend the existing application for PhD research to evaluate record linkage methods and algorithms.

The software is written in Python 3.0 and consists of two main packages:

  • crptr - the original Crptr application (updated to Python 3.0)
  • populations_crptr - the extensions for use with population data.

Basic usage

The Crptr software is versatile and can be configured to any number of different applications, not just synthetic population corruption, but this guide will stick to the basics of population corruption.

The populations_crptr package contains corruptor definitions and example configurations for synthetic populations CSVs in the TD format (these can be generated using the Valipop application).

An example runner for these, population_corruptor.py, is included in the package. This takes the filepath to a records directory (containing "birth_records.csv", "marriage_records.csv" and "death_records.csv") as a CLI parameter. To run this, use the following commands from the root of the repository:

# Clone the repository and cd into it
git clone https://github.com/stacs-srg/crptr-fork.git
cd crptr-fork

# Creates a virtual environment for running the application, installs the
# requirements, and the Crptr packages in an editable format.
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pip install -e .

# Run the corruptor with example input file
python -m populations_crptr.population_corruptor src/main/resources/example-inputs/TD_300

This will produce output similar to:

Running crptr for src/main/resources/example-inputs/TD_300
2025/06/25 12-04-24.811 :: Corrupting src/main/resources/example-inputs/TD_300/birth_records.csv...
Elapsed time: 00-00-00-160
2025/06/25 12-04-24.971 :: Corrupting src/main/resources/example-inputs/TD_300/marriage_records.csv...
Elapsed time: 00-00-00-120
2025/06/25 12-04-25.091 :: Corrupting src/main/resources/example-inputs/TD_300/death_records.csv...
Elapsed time: 00-00-00-154
Results output to results/default/2025-06-25T12-04-24-809

The corrupted records and a log file detailing corruptions made will be output to the results directory specified in the output.

Details such as the results output directory, corruption profiles, and corruptor types (OCR or standard) can be modified using the config module.

License

Crptr is published under the Mozilla Public License.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages