diff options
| author | Mark Adler <madler@alumni.caltech.edu> | 2011-09-09 23:17:02 -0700 |
|---|---|---|
| committer | Mark Adler <madler@alumni.caltech.edu> | 2011-09-09 23:17:02 -0700 |
| commit | ff11b0a61f7345572ff2e413173d3179486162f2 (patch) | |
| tree | f3c9e2563c4f0ac6684a0012ad48423d4c6aa798 /algorithm.doc | |
| parent | e26a448e9673d67dc2866e11a48d24fc352e5f80 (diff) | |
| download | zlib-ff11b0a61f7345572ff2e413173d3179486162f2.tar.gz zlib-ff11b0a61f7345572ff2e413173d3179486162f2.tar.bz2 zlib-ff11b0a61f7345572ff2e413173d3179486162f2.zip | |
zlib 1.0.4v1.0.4
Diffstat (limited to '')
| -rw-r--r-- | algorithm.doc | 14 |
1 files changed, 7 insertions, 7 deletions
diff --git a/algorithm.doc b/algorithm.doc index 156058a..01902af 100644 --- a/algorithm.doc +++ b/algorithm.doc | |||
| @@ -7,7 +7,7 @@ pointer to the previous string, in the form of a pair (distance, | |||
| 7 | length). Distances are limited to 32K bytes, and lengths are limited | 7 | length). Distances are limited to 32K bytes, and lengths are limited |
| 8 | to 258 bytes. When a string does not occur anywhere in the previous | 8 | to 258 bytes. When a string does not occur anywhere in the previous |
| 9 | 32K bytes, it is emitted as a sequence of literal bytes. (In this | 9 | 32K bytes, it is emitted as a sequence of literal bytes. (In this |
| 10 | description, 'string' must be taken as an arbitrary sequence of bytes, | 10 | description, `string' must be taken as an arbitrary sequence of bytes, |
| 11 | and is not restricted to printable characters.) | 11 | and is not restricted to printable characters.) |
| 12 | 12 | ||
| 13 | Literals or match lengths are compressed with one Huffman tree, and | 13 | Literals or match lengths are compressed with one Huffman tree, and |
| @@ -16,7 +16,7 @@ in a compact form at the start of each block. The blocks can have any | |||
| 16 | size (except that the compressed data for one block must fit in | 16 | size (except that the compressed data for one block must fit in |
| 17 | available memory). A block is terminated when deflate() determines that | 17 | available memory). A block is terminated when deflate() determines that |
| 18 | it would be useful to start another block with fresh trees. (This is | 18 | it would be useful to start another block with fresh trees. (This is |
| 19 | somewhat similar to compress.) | 19 | somewhat similar to the behavior of LZW-based _compress_.) |
| 20 | 20 | ||
| 21 | Duplicated strings are found using a hash table. All input strings of | 21 | Duplicated strings are found using a hash table. All input strings of |
| 22 | length 3 are inserted in the hash table. A hash index is computed for | 22 | length 3 are inserted in the hash table. A hash index is computed for |
| @@ -57,7 +57,7 @@ but saves time since there are both fewer insertions and fewer searches. | |||
| 57 | 57 | ||
| 58 | 2. Decompression algorithm (inflate) | 58 | 2. Decompression algorithm (inflate) |
| 59 | 59 | ||
| 60 | The real question is given a Huffman tree, how to decode fast. The most | 60 | The real question is, given a Huffman tree, how to decode fast. The most |
| 61 | important realization is that shorter codes are much more common than | 61 | important realization is that shorter codes are much more common than |
| 62 | longer codes, so pay attention to decoding the short codes fast, and let | 62 | longer codes, so pay attention to decoding the short codes fast, and let |
| 63 | the long codes take longer to decode. | 63 | the long codes take longer to decode. |
| @@ -84,7 +84,7 @@ all the data. For inflate, which has 286 possible codes for the | |||
| 84 | literal/length tree, the size of the first table is nine bits. Also the | 84 | literal/length tree, the size of the first table is nine bits. Also the |
| 85 | distance trees have 30 possible values, and the size of the first table is | 85 | distance trees have 30 possible values, and the size of the first table is |
| 86 | six bits. Note that for each of those cases, the table ended up one bit | 86 | six bits. Note that for each of those cases, the table ended up one bit |
| 87 | longer than the "average" code length, i.e. the code length of an | 87 | longer than the ``average'' code length, i.e. the code length of an |
| 88 | approximately flat code which would be a little more than eight bits for | 88 | approximately flat code which would be a little more than eight bits for |
| 89 | 286 symbols and a little less than five bits for 30 symbols. It would be | 89 | 286 symbols and a little less than five bits for 30 symbols. It would be |
| 90 | interesting to see if optimizing the first level table for other | 90 | interesting to see if optimizing the first level table for other |
| @@ -97,9 +97,9 @@ gzip@prep.ai.mit.edu madler@alumni.caltech.edu | |||
| 97 | 97 | ||
| 98 | References: | 98 | References: |
| 99 | 99 | ||
| 100 | [LZ77] Ziv J., Lempel A., "A Universal Algorithm for Sequential Data | 100 | [LZ77] Ziv J., Lempel A., ``A Universal Algorithm for Sequential Data |
| 101 | Compression", IEEE Transactions on Information Theory", Vol. 23, No. 3, | 101 | Compression,'' IEEE Transactions on Information Theory, Vol. 23, No. 3, |
| 102 | pp. 337-343. | 102 | pp. 337-343. |
| 103 | 103 | ||
| 104 | "DEFLATE Compressed Data Format Specification" available in | 104 | ``DEFLATE Compressed Data Format Specification'' available in |
| 105 | ftp://ds.internic.net/rfc/rfc1951.txt | 105 | ftp://ds.internic.net/rfc/rfc1951.txt |
