diff options
author | Mark Adler <madler@alumni.caltech.edu> | 2011-09-09 23:17:02 -0700 |
---|---|---|
committer | Mark Adler <madler@alumni.caltech.edu> | 2011-09-09 23:17:02 -0700 |
commit | ff11b0a61f7345572ff2e413173d3179486162f2 (patch) | |
tree | f3c9e2563c4f0ac6684a0012ad48423d4c6aa798 /algorithm.doc | |
parent | e26a448e9673d67dc2866e11a48d24fc352e5f80 (diff) | |
download | zlib-ff11b0a61f7345572ff2e413173d3179486162f2.tar.gz zlib-ff11b0a61f7345572ff2e413173d3179486162f2.tar.bz2 zlib-ff11b0a61f7345572ff2e413173d3179486162f2.zip |
zlib 1.0.4v1.0.4
Diffstat (limited to 'algorithm.doc')
-rw-r--r-- | algorithm.doc | 14 |
1 files changed, 7 insertions, 7 deletions
diff --git a/algorithm.doc b/algorithm.doc index 156058a..01902af 100644 --- a/algorithm.doc +++ b/algorithm.doc | |||
@@ -7,7 +7,7 @@ pointer to the previous string, in the form of a pair (distance, | |||
7 | length). Distances are limited to 32K bytes, and lengths are limited | 7 | length). Distances are limited to 32K bytes, and lengths are limited |
8 | to 258 bytes. When a string does not occur anywhere in the previous | 8 | to 258 bytes. When a string does not occur anywhere in the previous |
9 | 32K bytes, it is emitted as a sequence of literal bytes. (In this | 9 | 32K bytes, it is emitted as a sequence of literal bytes. (In this |
10 | description, 'string' must be taken as an arbitrary sequence of bytes, | 10 | description, `string' must be taken as an arbitrary sequence of bytes, |
11 | and is not restricted to printable characters.) | 11 | and is not restricted to printable characters.) |
12 | 12 | ||
13 | Literals or match lengths are compressed with one Huffman tree, and | 13 | Literals or match lengths are compressed with one Huffman tree, and |
@@ -16,7 +16,7 @@ in a compact form at the start of each block. The blocks can have any | |||
16 | size (except that the compressed data for one block must fit in | 16 | size (except that the compressed data for one block must fit in |
17 | available memory). A block is terminated when deflate() determines that | 17 | available memory). A block is terminated when deflate() determines that |
18 | it would be useful to start another block with fresh trees. (This is | 18 | it would be useful to start another block with fresh trees. (This is |
19 | somewhat similar to compress.) | 19 | somewhat similar to the behavior of LZW-based _compress_.) |
20 | 20 | ||
21 | Duplicated strings are found using a hash table. All input strings of | 21 | Duplicated strings are found using a hash table. All input strings of |
22 | length 3 are inserted in the hash table. A hash index is computed for | 22 | length 3 are inserted in the hash table. A hash index is computed for |
@@ -57,7 +57,7 @@ but saves time since there are both fewer insertions and fewer searches. | |||
57 | 57 | ||
58 | 2. Decompression algorithm (inflate) | 58 | 2. Decompression algorithm (inflate) |
59 | 59 | ||
60 | The real question is given a Huffman tree, how to decode fast. The most | 60 | The real question is, given a Huffman tree, how to decode fast. The most |
61 | important realization is that shorter codes are much more common than | 61 | important realization is that shorter codes are much more common than |
62 | longer codes, so pay attention to decoding the short codes fast, and let | 62 | longer codes, so pay attention to decoding the short codes fast, and let |
63 | the long codes take longer to decode. | 63 | the long codes take longer to decode. |
@@ -84,7 +84,7 @@ all the data. For inflate, which has 286 possible codes for the | |||
84 | literal/length tree, the size of the first table is nine bits. Also the | 84 | literal/length tree, the size of the first table is nine bits. Also the |
85 | distance trees have 30 possible values, and the size of the first table is | 85 | distance trees have 30 possible values, and the size of the first table is |
86 | six bits. Note that for each of those cases, the table ended up one bit | 86 | six bits. Note that for each of those cases, the table ended up one bit |
87 | longer than the "average" code length, i.e. the code length of an | 87 | longer than the ``average'' code length, i.e. the code length of an |
88 | approximately flat code which would be a little more than eight bits for | 88 | approximately flat code which would be a little more than eight bits for |
89 | 286 symbols and a little less than five bits for 30 symbols. It would be | 89 | 286 symbols and a little less than five bits for 30 symbols. It would be |
90 | interesting to see if optimizing the first level table for other | 90 | interesting to see if optimizing the first level table for other |
@@ -97,9 +97,9 @@ gzip@prep.ai.mit.edu madler@alumni.caltech.edu | |||
97 | 97 | ||
98 | References: | 98 | References: |
99 | 99 | ||
100 | [LZ77] Ziv J., Lempel A., "A Universal Algorithm for Sequential Data | 100 | [LZ77] Ziv J., Lempel A., ``A Universal Algorithm for Sequential Data |
101 | Compression", IEEE Transactions on Information Theory", Vol. 23, No. 3, | 101 | Compression,'' IEEE Transactions on Information Theory, Vol. 23, No. 3, |
102 | pp. 337-343. | 102 | pp. 337-343. |
103 | 103 | ||
104 | "DEFLATE Compressed Data Format Specification" available in | 104 | ``DEFLATE Compressed Data Format Specification'' available in |
105 | ftp://ds.internic.net/rfc/rfc1951.txt | 105 | ftp://ds.internic.net/rfc/rfc1951.txt |