-
Notifications
You must be signed in to change notification settings - Fork 1
Data Scrubbing Actions
Data scrubbing is a special animal and you probably won't need to use these actions under normal circumstances. Primarily, these are to be used when you want to (attempt) to scrub your database of all personal information about your members. This is a "best guess" scrub, there is no guarantee that nothing is left and you should not send this database to anybody you don't trust implicitly.
It should be noted that only core tables are scrubbed. Any tables added by plugins are not checked. Also, if Rock updates and adds new columns they may not be checked until an update to this tool has been released.
Completely empties the contents of all AnalyticsSource...
tables. These tables contain pre-existing data from your Rock instance, but stored in a manner that speeds up data queries. There is a Rock Job that re-populates these tables so they are safe to empty.
Empties the FinancialPersonBankAccount
and FinancialPersonSavedAccount
tables. These tables contain a person's financial bank accounts (as in, scanned check matching data) and links to their saved credit card data in your gateway (the actual CC data is not saved in Rock, but references to the cards are stored here).
Changes your global attribute values for OrganizationName
, OrganizationAbbreviation
and OrganizationWebsite
to contain generated random data. Also changes the name of each campus to a random generated name.
If the campus has a URL value, then it gets a new generated web address. If it has a Description then the Description is replaced with Lorem Ipsum data.
Searches the database for any e-mail address and replaces it with a generated one (in the format of user####@fakeinbox.com
). This searches Person records, other tables that contain known e-mail address fields and also many Attribute Value types that might contain e-mail addresses. In the case of the full-text fields a regular expression is used to find e-mail addresses.
Also replaces the global attributes EmailExceptionsList
and OrganizationEmail
with generated values.
This action requires you to have a developer account with here.com (free up to 250,000 lookups per month). This action has a number of sub-actions it runs depending on the types of addresses found.
- Address Locations that are not geo-coded. These addresses are just a text string and may not even be valid. Since we don't have a geo-coded location to work with, we just generate a random address.
- If the original address is outside the US then we generate a random address from somewhere in the world.
- If the original address is inside the US but outside the state of your church, then a random address somewhere inside the Us is generated.
- Finally, if the original address is inside the US and also inside the same state as your church, then a random address in the Phoenix, AZ area is chosen. If this is where your church actually is, well, sorry you are out of luck.
- Next we look for any address locations that are geo-coded and within a 35-mile radius of the address of your church - specifically, the
OrganizationAddress
global attribute. This will fail miserably if that address is not geocoded.- All matching addresses are shifted into the Phoenix, AZ area, centered around a specific point (configurable in the preferences if you need to change it). The addresses stay relative to their original location. Meaning if the address was 5.3 miles north of your church, it will be 5.3 miles north of the target center location. There is also a preference that adds a +/- 1 mile jitter to the shifted addresses so they cannot easily reversed. Once the geo-point has been shifted, we use
here.com
to reverse lookup from the geo-point to get an actual address. If we cannot get one, then some random address will be used instead.
- All matching addresses are shifted into the Phoenix, AZ area, centered around a specific point (configurable in the preferences if you need to change it). The addresses stay relative to their original location. Meaning if the address was 5.3 miles north of your church, it will be 5.3 miles north of the target center location. There is also a preference that adds a +/- 1 mile jitter to the shifted addresses so they cannot easily reversed. Once the geo-point has been shifted, we use
- The last set of addresses are those that are geo-coded, but outside the 35-mile radius. These addresses all remain where they are but get a 1 mile jitter applied to them. Once the jitter is applied we again do a reverse geo-code lookup to get the address of the new point.
A large database can easily have over 100,000 geo-coded addresses. Since you only get 250,000 free lookups per month, you may only get one or two runs a month before having to wait. The here.com
website will show how many lookups you have performed, but it takes about 24 hours for it to update.
Replaces all login usernames with "random" usernames in the format of fakeuser####
. Remember, this replaces all usernames. So the username you normally use to login to Rock will be changed as well.
The database is searched for any person names. These names are replaced the generated fake data.
First the Person table is processed. The logic is somewhat complex, but the jist of it is that if you have 4 people in your family and you all have the same last name, then the generated last name will also be the same for each of you. If 3 of you have the same name but one has a different last name, then the 3 of you will have a shared generated last name and the last person will have a different generated last name. Also, the first names will be gender specific whenever possible (meaning if the record is marked as Male it will generate a Male name). If the original record had a nick name then it will be replaced with a generated name as well.
After the Person table has been processed, we then process benevolence requests, prayer requests, event registrations, and previous last names. Whenever possible, if the record links to the person then we use the newly generated names from the Person table - otherwise a new generated name is used.
Finally there are a few additional tables that just have a generic "name" field that we scrub and always generate a random name for.
Any Person phone numbers are replaced with completely random phone numbers - hopefully none that actually work. After the Person Phone Number table is scrubbed, we search any attribute values that contain phone numbers using regular expression matching. Finally any generic "phone number" columns in various Rock tables are scrubbed.
The global attribute OrganizationPhone
is also scrubbed.
Attempts to sanitize the History data with HIDDEN
values. This should catch nearly all history records and hide the original values. For example, after the scrub you should only see things like Connection Status changed from HIDDEN to HIDDEN
.
The BackgroundCheck
table can sometimes contain sensitive data in the form of logged information on the requests. This first blanks out all those fields.
Next, all Attribute Values are searched for links to background check files.
Finally, we change the name of any existing background check workflow instances to match the generated name (this assumes you actually scrubbed the names too).
Cleans up all benevolence requests by replacing any government Id values with generated values (GEN#####
). Any Request Text and Response Summary is replaced with Lorem Ipsum content.
This is a hard one, but basically what we attempt to do is replace every non-HTML word with lorem ipsum words instead. So if the original Content
for the item was <p>Some sensitive data</p>
you would end up with <p>Lorem ipsum negal</p>
- hopefully. It's hard to be sure this one catches everything because content channel items are used for everything and there is really no telling what data is actually in here.
Check-in device IP addresses are replaced with random addresses (172.16.x.y
), or if it is a hostname device-###.rocksolidchurchdemo.com
. If the address was a 127.0.0.1
or ::1
it is left as is.
This is another wild west. We simply clear the custom data form each table (ComponentData
, ChannelData
, and InteractionData
).
The Workflow Log can often times include sensitive data. For example, when you set an attribute value it is logged (assuming you have logging turned on). This action attempts to scrub those messages and replace the actual action message with HIDDEN
.