Protocols — Compression Algorithm Specification
Version 1.10 12/01/02 17-7
17.2.3.2 Block Body
The Block Body is simply a mixture of Original Characters and Pointers, while each Pointer has
two elements: String Length preceding String Position. All these data units are tightly packed
together.
OM13176
Orig Char
Pointer
Orig Char StrLen
StrPos
Orig Char StrLen
StrPos
Pointer
StrLen
StrPos
Pointer
Figure 17-4. Block Body
The Original Characters, String Lengths and String Positions are all Huffman coded using the
Huffman trees presented in the Block Header, with some additional variations. The exact format is
described below:
An Original Character is a byte in the source data. A String Length is a value that is greater than 3
and less than 257 (this range should be ensured by the compressor). By calculating “(String
Length – 3) | 0x100,” a value set is obtained that ranges from 256 to 509. By combining this value
set with the value set of Original Characters (0 ~ 255), the Char&Len Set (ranging from 0 to 509) is
generated for Huffman Coding.
A String Position is a value that indicates the distance between the current position and the target
string. The String Position value is defined as “Current Position – Starting Position of the target
string - 1.” The String Position value ranges from 0 to 8190 (so 8192 is the “sliding window”
size, and this range should be ensured by the compressor). The lengths of the String Position
values (in binary form) form a value set ranging from 0 to 13 (it is assumed that value 0 has length
of 0). This value set is the Position Set for Huffman Coding. The full representation of a String
Position value is composed of two consecutive parts: one is the Huffman code for the value length;
the other is the actual String Position value of “length - 1” bits (excluding the highest bit since the
highest bit is always “1”). For example, String Position value 18 is represented as: Huffman code
for “5” followed by “0010.” If the value length is 0 or 1, then no value is appended to the
Huffman code. This kind of representation favors small String Position values, which is a hint for
compressor design.