Delta Coding
This paper presents the Delta Coding, a simple encoding scheme
to represent arbitrary data using a limited alphabet of symbols.
PresentationThe main principle behind the Delta Coding is that arbitrary data are represented as a succession of bytes. These bytes can be seen as a list of numbers. (b1, b2 ... bn) Another way to represent these numbers is to store the difference of each pair of numbers. Each byte is then repesented by its difference with the previous byte. (d1, d2 ... dn) di = bi - bi-1 The numbers of this new list are called the deltas. The Delta Coding therefore is the operation that transforms a stream of bytes to a stream of deltas, hence its name. AlgorithmsThe encoding and decoding algorithms are very similar and share the same basic structure. Both read the input stream byte after byte, only needing to keep in memory the value of the previous byte in order to compute the new delta. Encoding AlgorithmThe encoding algorithm trivially consists in applying the definition to each byte. ∀i di = bi - bi-1 Though not necessary, it is more natural to calculate the bytes following the input order, thus reducing the operations to a read, a substraction and a write. Here is an implementation of the ASCII encoder (see below for the description of the format) using the Perl language.
#!/usr/bin/perl
# Delta Coding Perl Encoder
# Copyright (C)2002 Sebastien Aperghis-Tramoni
use strict;
use Fcntl;
die "usage: denc file" unless @ARGV;
my $file = shift;
open(FILE, $file) or die "can't open '$file': $! ";
my($buf,$prevbyte) = (0,0);
while(sysread(FILE, $buf, 1024)) {
my @data = unpack("C*", $buf);
for my $byte (@data) {
my $delta = $byte - $prevbyte;
my $tok = '';
if($delta == 0) { $tok = "=" }
elsif($delta > 0) { $tok = "+$delta" }
elsif($delta < 0) { $tok = $delta }
print $tok;
$prevbyte = $byte;
}
print "\n"
}
Decoding AlgorithmThe decoding algorithm is trivially obtained by reversing the encoding algorithm. ∀i bi = bi-1 - di In this case however, the decoder must follow the input order because each byte is decoded using the previous decoded byte. For the first byte, the value of bi is zero. Here is an implementation of the ASCII decoder using the Perl language.
#!/usr/bin/perl -W
# Delta Coding Perl Decoder
# Copyright (C)2002 Sebastien Aperghis-Tramoni
use strict;
die "usage: ddec file" unless @ARGV;
my $file = shift;
open(FILE, $file) or die "cannot read '$file': $! ";
my $prev = 0;
while(<FILE>) {
my @data = ();
while(s/^([=+-])(\d*)//o) {
my $byte;
if($1 eq "=") { $byte = $prev }
elsif($1 eq "+") { $byte = $prev + $2 }
elsif($1 eq "-") { $byte = $prev - $2 }
push @data, $byte;
$prev = $byte;
}
print pack "C*", @data;
}
ASCII FormatDelta-encoded data can be stored as a file that uses a subset of the ASCII characters. GrammarThe Delta Coding can be represented in ASCII format using the following BNF grammar.
Decimal notation is assumed in this representation, but one can use another base in order to use more or less characters in the grammar. Examples... Binary FormatIn order to avoid using ASCII characters (which cost 8 bits per character), we introduce the binary format, which is a tokenization of the ASCII representation. We reuse the grammar introduced in the ASCII format. With decimal numbers, the ASCII format has an alphabet of 13 characters (plus, minus, equal and the ten digits). The lowest power of two greater than 13 is 24 = 16, which means that the characters can be encoded using 4 bits; but this leaves 3 unused characters. We can use a base 13 notation in order to make the format more efficient. Sébastien Aperghis-Tramoni
2002.09.12
This document validates as XHTML 1.0 Strict and CSS.
Copyright ©2002 Sébastien Aperghis-Tramoni, "Maddingue"
|