Techniques that are based on deterministic encryption are usually susceptible to frequency attacks. The party receiving the “so thought” secure encrypted data can perform a frequency analysis on selected fields in order to uncover the original data. For example, if the attacker has access to the demographic or census data of a particular population, he could calculate the frequency of a selected field and try to map it to the encrypted data frequencies to deduce a relationship between the two.
To illustrate the feasibility of a frequency attack, consider a town where “Smith” is the most popular last name. If a database administrator owns a database with encrypted last names of patients of that town’s hospital, he can …show more content…
Some of the techniques that rely on phonetic encoding include Karakasidis and Verykios [68][72]. Exact matching only
Real-world data sets are dirty: two records in different databases that are referring to the same individual could contain typing variations in different fields. For example, an individual named John could also be named Jon (missing the h) in another database. If only exact matching is used to perform record linkage, the process will produce additional false negative results.
The techniques that rely on exact matching are: Berman [10], O’Keefe et al. [86], Dusserre et al. [40], Bouzelat et al. [13], Quantin et al. [97][98][99], El Emam et al. [41], and Schadow et al. [107]. Reference strings
Protocols relying on embedding strings as vectors based on distances from a set of reference strings or using other methodologies based on reference strings do not have a consistent performance or accuracy measurement. The performance and accuracy of the protocol will largely depend on the chosen reference strings. It is rather risky for an organization to attempt to use such protocols unless they have been tested and proven to work with their datasets (which might be difficult if privacy needs to be maintained at all