Data Anonymization through Dimensionality Reduction #79
-
Hey everyone, After viewing the part on how anonymization is flawed (great course so far btw!), I was wondering how this applies to data that was anonymized through dimensionality reduction techniques (PCA, t-SNE,LDA etc.).
Thanks :) |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 3 replies
-
Hi Merlin! Great questions! Let me get you more specialized help. 🙂 @iamtrask or @mcleonard Can you help with this? |
Beta Was this translation helpful? Give feedback.
-
I also have a bit of skepticism when hearing that anonymization is broken 😊 If I would tell that to anyone from cybersecurity, they would tell me I’m wrong. Speaking from the cybersecurity perspective, no single system is considered to be “completely secure”. When we are talking about risks, we are talking about their reduction, but not about the possibility to bring it to zero. In the same way I see anonymization. Of course, you might find some way to connect it to real people if you get some other piece of information. But what is the probability of you getting this data? Especially when we are talking about finances. If we quantitize it, we will find it out that the probability is so low, that the risk can be accepted. And this is something any business would do, instead of redesigning the whole information flow and replacing best practices. For this reason I feel it problematic answering some of the questions to the videos. Because the right answers feel a lot like a personal point of view, but not a single truth, in other words - fact. If we say that anonymized data can be connected to real people in 2% of cases, it doesn't mean that anonymization can't protect personal data. |
Beta Was this translation helpful? Give feedback.
-
@MerlinSchaefer - great question! While every case is probably a little different - dimensionality reduction reduces across dimensions of each individual's data to create "latent features". So perhaps the simplest example is - with Machine Learning there's very little structure you need to know in order to de-anonymize. The most straightforward way would be to find the true records for a small percentage of the data and then learn a linear classifier (for linear techniques) or a non-linear classifier for more advanced compression techniques. Worth mentioning - many dimensionality reduction techniques have fantastic differentially private alternatives. |
Beta Was this translation helpful? Give feedback.
@MerlinSchaefer - great question! While every case is probably a little different - dimensionality reduction reduces across dimensions of each individual's data to create "latent features". So perhaps the simplest example is - with Machine Learning there's very little structure you need to know in order to de-anonymize. The most straightforward way would be to find the true records for a small percentage of the data and then learn a linear classifier (for linear techniques) or a non-linear classifier for more advanced compression techniques.
Worth mentioning - many dimensionality reduction techniques have fantastic differentially private alternatives.