Version 1.10 12/01/02 17-1
17
Protocols - Compression Algorithm
Specification
In EFI firmware storage, binary codes/data are often compressed to save storage space. These
compressed codes/data are extracted into memory for execution at boot time. This demands an
efficient lossless compression/decompression algorithm. The compressor must produce small
compressed images, and the decompressor must operate fast enough to avoid delays at boot time.
This chapter describes in detail the EFI compression/decompression algorithm, as well as the EFI
Decompress Protocol. The EFI Decompress Protocol provides a standard decompression interface
for use at boot time.
17.1 Algorithm Overview
In this chapter the term “character” denotes a single byte and the term “string” denotes a series of
concatenated characters.
The compression/decompression algorithm used in EFI firmware storage is a combination of the
LZ77 algorithm and Huffman Coding. The LZ77 algorithm replaces a repeated string with a
pointer to the previous occurrence of the string. Huffman Coding encodes symbols in a way that
the more frequently a symbol appears in a text, the shorter the code that is assigned to it.
The compression process contains two steps:
• The first step is to find repeated strings (using LZ77 algorithm) and produce intermediate data.
Beginning with the first character, the compressor scans the source data and determines if the
characters starting at the current position can form a string previously appearing in the text. If
a long enough matching string is found, the compressor will output a pointer to the string. If
the pointer occupies more space than the string itself, the compressor will output the original
character at the current position in the source data. Then the compressor advances to the next
position and repeats the process. To speed up the compression process, the compressor
dynamically maintains a String Info Log to record the positions and lengths of strings
encountered, so that string comparisons are performed quickly by looking up the String
Info Log.
Because a compressor cannot have unlimited resources, as the compression continues the
compressor removes “old” string information. This prevents the String Info Log from
becoming too large. As a result, the algorithm can only look up repeated strings within the
range of a fixed-sized “sliding window” behind the current position.
In this way, a stream of intermediate data is produced which contains two types of symbols:
the Original Characters (to be preserved in the decompressed data), and the Pointers
(representing a previous string). A Pointer consists of two elements: the String Position and
the String Length, representing the location and the length of the target string, respectively.