Essay on Delta Encoding
Delta encoding is a way of storing or transmitting data in the form of differences between sequential data rather than complete files; more generally this is known as data differencing. Delta encoding is sometimes called delta compression, particularly where archival histories of changes are required (e.g., in software projects).
The differences are recorded in discrete files called "deltas" or "diffs", after the Unix file comparison utility, diff. Because changes are often small – for example, changing a few words in a large document, or changing a few records in a large table – delta encoding greatly reduces data redundancy. Collections of unique deltas are substantially more space-efficient than their non-encoded …show more content…
Δ(v1, v2) = (v1 \ v2) U (v2 \ v1), where v1 and v2 represent two successive versions.
A directed delta, also called a change, is a sequence of (elementary) change operations which, when applied to one version v1, yields another version v2 (note the correspondence to transaction logs in databases).
A variation of delta encoding which encodes differences between the prefixes or suffixes of strings is called incremental encoding. It is particularly effective for sorted lists with small differences between strings, such as a list of words from a dictionary.
The nature of the data to be encoded influences the effectiveness of a particular compression algorithm. For example, in sending updates to the Google Chrome browser, a specialized algorithm is used based on knowledge of the archive and executable format.
Delta encoding performs best when data has small or constant variation; for an unsorted data set, there may be little to no compression possible with this method.
In delta encoded transmission over a network where only a single copy of the file is available at each end of the communication channel, special error control codes are used to detect which parts of the file have changed since its previous version. For example, rsync uses a rolling checksum algorithm based on Mark Adler's adler-32 checksum.
Sample C code
The following C code performs a