Differential Privacy Technology， 해시게임 Suppose before sharing the data, inject some noise, or create a synthetic dataset with the same statistical properties as the original dataset.
There is a good chance, then, that privacy can be protected.
Differential privacy techniques make data unreal by injecting carefully calibrated random noise,
technology to protect personal privacy.
The ingenuity of differential privacy techniques is to allow meaningful analysis to be extracted from datasets,
At the same time protect personal privacy.
However, its limitation is that it is difficult to learn anything about an individual without direct access to the dataset.
In typical differential privacy techniques, data stewards are considered trustworthy,
And acts as the central subject, who holds the personal data that make up the dataset.
Using a trusted administrator, differential privacy techniques can operate in one of two modes: online or interactive mode or offline non-interactive mode.
In the online interactive mode, the data analyst adaptively queries the dataset,
A query is a function applied to a dataset, and each query produces an irrelevant response, thus preserving privacy.
In an offline non-interactive mode, the administrator generates a synthetic database using a differential privacy mechanism with the same statistical properties as the original dataset.
After the data is published, v no longer plays any role, and the original data may even be destroyed.
Therefore, with synthetic databases, re-identifying individuals become difficult.
Furthermore, such synthetic data can be shared for performing quality analysis.
4.2.1 The principle of differential privacy technology
Consider an algorithm that analyzes a dataset and computes statistical properties such as mean, variance, median, and mode.
If by looking at the output, one cannot tell if any personal data was included in the original dataset,
Then this algorithm is called a differentially private algorithm.
In other words, the guarantee of a differentially private algorithm is that its behavior hardly changes with the absence or presence of individuals in the dataset.
Most notably, this guarantee applies to any individual and any dataset.
Therefore, no matter how unique an individual’s details may be,
And regardless of the details of anyone else in the dataset, the guarantees of differential privacy techniques still work.
Mathematically, a differential privacy algorithm can be defined as follows: If, for all datasets, D1 and D2 differ in at most one element,
And all S are subsets of Range (M), the differential privacy of ε is obtained in the random function M. therefore:
Pr[M(D1) ε S] ≤ exp(ε) x Pr[M(D2) ε S]
The distribution of M(D1) output by administrators in dataset D1 is almost the same as M(D2) in dataset D2,
Datasets D1 and D2 differ in only one individual record,
And M is a randomization algorithm that guarantees ε differential privacy: ε determines the indistinguishability of the two datasets D1 and D2,
That is, the query response bias for the two database sets is determined by ε.
This provides an assurance that personal information about the participants in the dataset will not be leaked.
Differential privacy techniques avoid relational qualification data, while also making other disclosure risks difficult to occur.
A key feature of differential privacy techniques is that it defines privacy as a quantifiable measure using the parameter ε, rather than binary logic, such as whether personal data is leaked.
Essentially, ε determines how much noise is added to the computation,
So it can be seen as a tuning knob that balances privacy and practicality.
Each different private analysis can be tuned to provide more or less privacy.
4.2.2 Implementation of Differential Privacy Technology
Differential private algorithms are random algorithms that add noise at key points. In the specific implementation,
The Laplacian mechanism can make aggregate queries (eg count, sum, means, etc.) differently private.
This method samples random noise using a Laplacian probability distribution centered at 0 and scaled by 1/ε,
Noise perturbs the actual value obtained by adding results in a masking response.
Still using the hospital scenario as an example, suppose the hospital holds data on cancer patients collected through a medical application.
Now, if a doctor wants to know if Xiao Ming is a cancer patient, he can do it by crafting multiple queries.
For example, if the COUNT query is used, the result is 30, and if the second statistic query that does not include Xiaoming is 29,
Then it can be concluded that Xiao Ming is a cancer patient.
If the second COUNT query results in 30, the opposite conclusion will be drawn.
There are many related algorithmic mechanisms that can replace the Laplace mechanism,
For example, an exponential mechanism, a private multiplier weight algorithm, a multiplier weight index algorithm, etc.
With such a mechanism, it is possible to realize software systems based on differential privacy technology,
But practical challenges remain.
For example, if the same query always receives the same noisy response,
Then it needs to look for logs of historical responses.
Since the answer remains the same, no information leakage occurs, but log lookups can be costly in space and time.
4.2.3 Limitations of Differential Privacy Technology
Establishing the equivalence of two queries is notoriously computationally difficult.
Therefore, although compared with traditional privacy protection methods,
Differential privacy techniques have some advantages, but they also have certain limitations.
First, it remains a challenge to determine the ideal privacy loss parameter ε with high utility while preserving privacy.
Second, the privacy guarantees in differential privacy techniques only apply to a limited number of queries, which is a function of the amount of different data represented in the dataset.
Therefore, designing a privacy-preserving mechanism that can handle an arbitrary number of queries is also a challenge.
Additionally, differential privacy techniques are vulnerable to side-channel attacks, in which an adversary can learn facts about the data by monitoring the side channel.
A typical example is a timing channel attack, the query computation will take 51µs if one has cancer, and 49µs otherwise,
Well, it is possible to know if a person has cancer just by looking at the time spent.
Finally, it is still possible for sensitive data to be exposed,
For example, bad actors can build classifiers on private datasets to predict sensitive information.
The premise of the above discussion is that the data administrator is trusted. If the data administrator is not trusted,
This requires the use of local differential privacy techniques.
That is, noise is injected locally, implementing noise injection at the individual level of each data subject,
In this way, privacy control is left to the data subject.
Additionally, through privacy regulations such as GDPR, large organizations use native differential privacy techniques to avoid liability arising from the misuse of storing sensitive user data.
Therefore, based on the trust assumption, local differential privacy techniques are more attractive.
However, the utility of statistics published using local differential privacy techniques is worse than that published using standard differential privacy techniques,
Since the perturbation occurs at the end of each individual, it results in a larger addition of noise.
4.2.4 Application of Differential Privacy Technology
Differential privacy techniques have a broader role in many application areas, including physical networked systems,
Such as smart grid systems, healthcare systems, IoT, autonomous vehicle systems, etc.
In a smart grid system, electricity providers use smart meters to record and maintain household energy consumption information.
This information can reveal a family’s lifestyle and other details, and misuse could infringe on consumers’ privacy.
Therefore, it is necessary to incorporate privacy-preserving technologies into such systems.
Similarly, for healthcare and medical systems, data collected by IoT devices, such as blood pressure, blood sugar levels,
Sometimes even the location section needs to be obtained in a privacy-preserving way.
Among various application services, Microsoft uses native differential privacy technology to protect user privacy in Windows applications.
Apple also uses this technology to protect the privacy of user activity for a given period of time,
While still getting data that helps make features like QuickType smarter and more usable.
In Google’s Chrome, data about how software hijacks user settings is obtained in a privacy-sensitive manner.
Additionally, both IBM and Google provide libraries for performing machine learning tasks in a differential privacy-aware manner.
With differential privacy technology, is private data adequately protected?
It depends on ε. When ε ≤ 1, the data utility output by differential privacy techniques may be poor.
One way to solve this problem is to use a very large value of ε to alleviate the utility problem.
Apple reportedly uses ε = 6 in MacOS and even ε = 43 in iOS 10 beta, while Google uses ε = 9 in chrome.
This shows that the applicability of differential privacy techniques in practice is still a challenge since a value as large as ε=9 greatly reduces the privacy guarantee.
The need for data privacy has expanded from standard use cases for data publishing to privacy-driven analytics.
Here, DP gains significant attention because it provides mathematical guarantees.
However, there are some challenges in mapping the theory of DP to practice.
4.2.5 Challenges in practice
An ideal differential privacy technique should mitigate the threats and risks of exposing sensitive data while maintaining high data utility.
The requirement for privacy always depends on the specific scenario, when the data controller is a trusted entity,
Standard differential privacy techniques can be used; if the data controller is not trusted,
Then use local differential privacy technology.
In both cases, different mechanisms prevent malicious data analysts from leaking sensitive information.
Therefore, depending on the use case and its requirements for privacy and applications, an appropriate differential privacy technology setup can be chosen.
No one-size-fits-all mechanism works for all use cases.
The Laplacian mechanism can only be used for numerical queries, while the exponential mechanism can handle both numerical and categorical data in the query.
Therefore, the suitability of the machine varies according to the use case and data type.
That said, many differential privacy algorithms are only suitable for specific use cases.
The value of ε can be used to determine the privacy level.
The smaller the value of ε, the better the privacy, but the accuracy of the results may be affected.
From a privacy perspective, ε greater than 6 may not be good.
While this is indeed a good goal, it is often impossible to achieve given the nuances of use cases.
Furthermore, the choice of ε may vary from application to application, depending on the need for privacy in that scenario.
In general, questions like “what is the appropriate value for ε” and “how much privacy is enough” have no answers.
The loss of privacy data is cumulative, and for each new query, privacy protection decreases as additional information on sensitive data are released.
This means that after a certain number of queries, the application may not provide privacy protection.
Ideally, for strong privacy guarantees, the privacy loss should be small.
Therefore, to mitigate the growing privacy loss, the maximum privacy loss represented by the privacy budget can be enforced.
Every query can be considered private, which leads to increased privacy loss.
If the number of queries exceeds the threshold of the privacy budget, then the response to the query can be stopped, thus stopping the mechanism of differential privacy.
Therefore, differential privacy techniques may not be suitable for long-running systems due to privacy and practicality concerns.