Intel Extensible Firmware Interface Network Router User Manual


  Open as PDF
of 1084
 
Extensible Firmware Interface Specification
17-2 12/01/02 Version 1.10
To improve the compression ratio further, Huffman Coding is utilized as the second step.
The intermediate data (consisting of original characters and pointers) is divided into Blocks so
that the compressor can perform Huffman Coding on a Block immediately after it is generated;
eliminating the need for a second pass from the beginning after the intermediate data has been
generated. Also, since symbol frequency distribution may differ in different parts of the
intermediate data, Huffman Coding can be optimized for each specific Block. The compressor
determines Block Size for each Block according to the specifications defined in section 17.2,
Data Format.
In each Block, two symbol sets are defined for Huffman Coding. The Char&Len Set consists
of the Original Characters plus the String Lengths and the Position Set consists of String
Positions (Note that the two elements of a Pointer belong to separate symbol sets). The
Huffman Coding schemes applied on these two symbol sets are independent.
The algorithm uses canonical Huffman Coding so a Huffman tree can be represented as an
array of code lengths in the order of the symbols in the symbol set. This code length array
represents the Huffman Coding scheme for the symbol set. Both the Char&Len Set code length
array and the Position Set code length array appear in the Block Header.
Huffman coding is used on the code length array of the Char&Len Set to define a third symbol
set. The Extra Set is defined based on the code length values in the Char&Len Set code length
array. The code length array for the Huffman Coding of Extra Set also appears in the Block
Header together with the other two code length arrays. For exact format of the Block Header,
see section 17.2.3.1, Block Header.
The decompression process is straightforward given that the compression process is known. The
decompressor scans the compressed data and decodes the symbols one by one, according to the
Huffman code mapping tables generated from code length arrays. Along the process, if it
encounters an original character, it outputs it; if it encounters a pointer, it looks it up in the already
decompressed data and outputs the associated string.