Skip to content

Data Scrubbing Actions

Daniel Hazelbaker edited this page Feb 21, 2019 · 7 revisions

Data scrubbing is a special animal and you probably won't need to use these actions under normal circumstances. Primarily, these are to be used when you want to (attempt) to scrub your database of all personal information about your members. This is a "best guess" scrub, there is no guarantee that nothing is left and you should not send this database to anybody you don't trust implicitly.

It should be noted that only core tables are scrubbed. Any tables added by plugins are not checked. Also, if Rock updates and adds new columns they may not be checked until an update to this tool has been released.

Empty Analytics Source Tables

Completely empties the contents of all AnalyticsSource... tables. These tables contain pre-existing data from your Rock instance, but stored in a manner that speeds up data queries. There is a Rock Job that re-populates these tables so they are safe to empty.

Empty Saved Account Tables

Empties the FinancialPersonBankAccount and FinancialPersonSavedAccount tables. These tables contain a person's financial bank accounts (as in, scanned check matching data) and links to their saved credit card data in your gateway (the actual CC data is not saved in Rock, but references to the cards are stored here).

Generate Organization and Campuses

Changes your global attribute values for OrganizationName, OrganizationAbbreviation and OrganizationWebsite to contain generated random data. Also changes the name of each campus to a random generated name.

If the campus has a URL value, then it gets a new generated web address. If it has a Description then the Description is replaced with Lorem Ipsum data.

Generate Random Email Addresses

Searches the database for any e-mail address and replaces it with a generated one (in the format of user####@fakeinbox.com). This searches Person records, other tables that contain known e-mail address fields and also many Attribute Value types that might contain e-mail addresses. In the case of the full-text fields a regular expression is used to find e-mail addresses.

Also replaces the global attributes EmailExceptionsList and OrganizationEmail with generated values.

Generate Random Location Addresses

This action requires you to have a developer account with here.com (free up to 250,000 lookups per month). This action has a number of sub-actions it runs depending on the types of addresses found.

  • Address Locations that are not geo-coded. These addresses are just a text string and may not even be valid. Since we don't have a geo-coded location to work with, we just generate a random address.
    • If the original address is outside the US then we generate a random address from somewhere in the world.
    • If the original address is inside the US but outside the state of your church, then a random address somewhere inside the Us is generated.
    • Finally, if the original address is inside the US and also inside the same state as your church, then a random address in the Phoenix, AZ area is chosen. If this is where your church actually is, well, sorry you are out of luck.
  • Next we look for any address locations that are geo-coded and within a 35-mile radius of the address of your church - specifically, the OrganizationAddress global attribute. This will fail miserably if that address is not geocoded.
    • All matching addresses are shifted into the Phoenix, AZ area, centered around a specific point (configurable in the preferences if you need to change it). The addresses stay relative to their original location. Meaning if the address was 5.3 miles north of your church, it will be 5.3 miles north of the target center location. There is also a preference that adds a +/- 1 mile jitter to the shifted addresses so they cannot easily reversed. Once the geo-point has been shifted, we use here.com to reverse lookup from the geo-point to get an actual address. If we cannot get one, then some random address will be used instead.
  • The last set of addresses are those that are geo-coded, but outside the 35-mile radius. These addresses all remain where they are but get a 1 mile jitter applied to them. Once the jitter is applied we again do a reverse geo-code lookup to get the address of the new point.

A large database can easily have over 100,000 geo-coded addresses. Since you only get 250,000 free lookups per month, you may only get one or two runs a month before having to wait. The here.com website will show how many lookups you have performed, but it takes about 24 hours for it to update.

Generate Random Logins

Replaces all login usernames with "random" usernames in the format of fakeuser####. Remember, this replaces all usernames. So the username you normally use to login to Rock will be changed as well.

Generate Random Names

The database is searched for any person names. These names are replaced the generated fake data.

First the Person table is processed. The logic is somewhat complex, but the jist of it is that if you have 4 people in your family and you all have the same last name, then the generated last name will also be the same for each of you. If 3 of you have the same name but one has a different last name, then the 3 of you will have a shared generated last name and the last person will have a different generated last name. Also, the first names will be gender specific whenever possible (meaning if the record is marked as Male it will generate a Male name). If the original record had a nick name then it will be replaced with a generated name as well.

After the Person table has been processed, we then process benevolence requests, prayer requests, event registrations, and previous last names. Whenever possible, if the record links to the person then we use the newly generated names from the Person table - otherwise a new generated name is used.

Finally there are a few additional tables that just have a generic "name" field that we scrub and always generate a random name for.

Generate Random Phone Numbers

Any Person phone numbers are replaced with completely random phone numbers - hopefully none that actually work. After the Person Phone Number table is scrubbed, we search any attribute values that contain phone numbers using regular expression matching. Finally any generic "phone number" columns in various Rock tables are scrubbed.

The global attribute OrganizationPhone is also scrubbed.

Insert History Placeholders

Attempts to sanitize the History data with HIDDEN values. This should catch nearly all history records and hide the original values. For example, after the scrub you should only see things like Connection Status changed from HIDDEN to HIDDEN.

Sanitize Background Check Data

The BackgroundCheck table can sometimes contain sensitive data in the form of logged information on the requests. This first blanks out all those fields.

Next, all Attribute Values are searched for links to background check files.

Finally, we change the name of any existing background check workflow instances to match the generated name (this assumes you actually scrubbed the names too).

Sanitize Benevolence Request Data

Cleans up all benevolence requests by replacing any government Id values with generated values (GEN#####). Any Request Text and Response Summary is replaced with Lorem Ipsum content.

Sanitize Content Channel Items

This is a hard one, but basically what we attempt to do is replace every non-HTML word with lorem ipsum words instead. So if the original Content for the item was <p>Some sensitive data</p> you would end up with <p>Lorem ipsum negal</p> - hopefully. It's hard to be sure this one catches everything because content channel items are used for everything and there is really no telling what data is actually in here.

Sanitize Devices

Check-in device IP addresses are replaced with random addresses (172.16.x.y), or if it is a hostname device-###.rocksolidchurchdemo.com. If the address was a 127.0.0.1 or ::1 it is left as is.

Sanitize Interaction Data

This is another wild west. We simply clear the custom data form each table (ComponentData, ChannelData, and InteractionData).

Scrub Workflow Log

The Workflow Log can often times include sensitive data. For example, when you set an attribute value it is logged (assuming you have logging turned on). This action attempts to scrub those messages and replace the actual action message with HIDDEN.

Clone this wiki locally