aboutsummaryrefslogtreecommitdiff
path: root/algorithm.doc
diff options
context:
space:
mode:
authorMark Adler <madler@alumni.caltech.edu>2011-09-09 23:17:02 -0700
committerMark Adler <madler@alumni.caltech.edu>2011-09-09 23:17:02 -0700
commitff11b0a61f7345572ff2e413173d3179486162f2 (patch)
treef3c9e2563c4f0ac6684a0012ad48423d4c6aa798 /algorithm.doc
parente26a448e9673d67dc2866e11a48d24fc352e5f80 (diff)
downloadzlib-ff11b0a61f7345572ff2e413173d3179486162f2.tar.gz
zlib-ff11b0a61f7345572ff2e413173d3179486162f2.tar.bz2
zlib-ff11b0a61f7345572ff2e413173d3179486162f2.zip
zlib 1.0.4v1.0.4
Diffstat (limited to 'algorithm.doc')
-rw-r--r--algorithm.doc14
1 files changed, 7 insertions, 7 deletions
diff --git a/algorithm.doc b/algorithm.doc
index 156058a..01902af 100644
--- a/algorithm.doc
+++ b/algorithm.doc
@@ -7,7 +7,7 @@ pointer to the previous string, in the form of a pair (distance,
7length). Distances are limited to 32K bytes, and lengths are limited 7length). Distances are limited to 32K bytes, and lengths are limited
8to 258 bytes. When a string does not occur anywhere in the previous 8to 258 bytes. When a string does not occur anywhere in the previous
932K bytes, it is emitted as a sequence of literal bytes. (In this 932K bytes, it is emitted as a sequence of literal bytes. (In this
10description, 'string' must be taken as an arbitrary sequence of bytes, 10description, `string' must be taken as an arbitrary sequence of bytes,
11and is not restricted to printable characters.) 11and is not restricted to printable characters.)
12 12
13Literals or match lengths are compressed with one Huffman tree, and 13Literals or match lengths are compressed with one Huffman tree, and
@@ -16,7 +16,7 @@ in a compact form at the start of each block. The blocks can have any
16size (except that the compressed data for one block must fit in 16size (except that the compressed data for one block must fit in
17available memory). A block is terminated when deflate() determines that 17available memory). A block is terminated when deflate() determines that
18it would be useful to start another block with fresh trees. (This is 18it would be useful to start another block with fresh trees. (This is
19somewhat similar to compress.) 19somewhat similar to the behavior of LZW-based _compress_.)
20 20
21Duplicated strings are found using a hash table. All input strings of 21Duplicated strings are found using a hash table. All input strings of
22length 3 are inserted in the hash table. A hash index is computed for 22length 3 are inserted in the hash table. A hash index is computed for
@@ -57,7 +57,7 @@ but saves time since there are both fewer insertions and fewer searches.
57 57
582. Decompression algorithm (inflate) 582. Decompression algorithm (inflate)
59 59
60The real question is given a Huffman tree, how to decode fast. The most 60The real question is, given a Huffman tree, how to decode fast. The most
61important realization is that shorter codes are much more common than 61important realization is that shorter codes are much more common than
62longer codes, so pay attention to decoding the short codes fast, and let 62longer codes, so pay attention to decoding the short codes fast, and let
63the long codes take longer to decode. 63the long codes take longer to decode.
@@ -84,7 +84,7 @@ all the data. For inflate, which has 286 possible codes for the
84literal/length tree, the size of the first table is nine bits. Also the 84literal/length tree, the size of the first table is nine bits. Also the
85distance trees have 30 possible values, and the size of the first table is 85distance trees have 30 possible values, and the size of the first table is
86six bits. Note that for each of those cases, the table ended up one bit 86six bits. Note that for each of those cases, the table ended up one bit
87longer than the "average" code length, i.e. the code length of an 87longer than the ``average'' code length, i.e. the code length of an
88approximately flat code which would be a little more than eight bits for 88approximately flat code which would be a little more than eight bits for
89286 symbols and a little less than five bits for 30 symbols. It would be 89286 symbols and a little less than five bits for 30 symbols. It would be
90interesting to see if optimizing the first level table for other 90interesting to see if optimizing the first level table for other
@@ -97,9 +97,9 @@ gzip@prep.ai.mit.edu madler@alumni.caltech.edu
97 97
98References: 98References:
99 99
100[LZ77] Ziv J., Lempel A., "A Universal Algorithm for Sequential Data 100[LZ77] Ziv J., Lempel A., ``A Universal Algorithm for Sequential Data
101Compression", IEEE Transactions on Information Theory", Vol. 23, No. 3, 101Compression,'' IEEE Transactions on Information Theory, Vol. 23, No. 3,
102pp. 337-343. 102pp. 337-343.
103 103
104"DEFLATE Compressed Data Format Specification" available in 104``DEFLATE Compressed Data Format Specification'' available in
105ftp://ds.internic.net/rfc/rfc1951.txt 105ftp://ds.internic.net/rfc/rfc1951.txt