zlib 1.2.0.5v1.2.0.5

author: Mark Adler <madler@alumni.caltech.edu> 2011-09-09 23:22:37 -0700
committer: Mark Adler <madler@alumni.caltech.edu> 2011-09-09 23:22:37 -0700
commit: 4b5a43a219d51066c01ff2ab86af18b967f2d0dd (patch)
tree: 4dcaf0cd18751d04cf638a9a6ec521990d4f2e90 /algorithm.txt
parent: 086e982175da84b3db958191031380794315f95f (diff)
download: zlib-4b5a43a219d51066c01ff2ab86af18b967f2d0dd.tar.gz
zlib-4b5a43a219d51066c01ff2ab86af18b967f2d0dd.tar.bz2
zlib-4b5a43a219d51066c01ff2ab86af18b967f2d0dd.zip
1 files changed, 36 insertions, 36 deletions
diff --git a/algorithm.txt b/algorithm.txt
index a0497654..b022dde3 100644
--- a/algorithm.txt
+++ b/algorithm.txt
@@ -91,40 +91,40 @@ for 30 symbols.
 2.2 More details on the inflate table lookup
-Ok, you want to know what this cleverly obfuscated inflate tree actually  
+Ok, you want to know what this cleverly obfuscated inflate tree actually
-looks like.  You are correct that it's not a Huffman tree.  It is simply a  
+looks like.  You are correct that it's not a Huffman tree.  It is simply a
-lookup table for the first, let's say, nine bits of a Huffman symbol.  The  
+lookup table for the first, let's say, nine bits of a Huffman symbol.  The
-symbol could be as short as one bit or as long as 15 bits.  If a particular  
+symbol could be as short as one bit or as long as 15 bits.  If a particular
 symbol is shorter than nine bits, then that symbol's translation is duplicated
-in all those entries that start with that symbol's bits.  For example, if the  
+in all those entries that start with that symbol's bits.  For example, if the
-symbol is four bits, then it's duplicated 32 times in a nine-bit table.  If a  
+symbol is four bits, then it's duplicated 32 times in a nine-bit table.  If a
 symbol is nine bits long, it appears in the table once.
-If the symbol is longer than nine bits, then that entry in the table points  
+If the symbol is longer than nine bits, then that entry in the table points
-to another similar table for the remaining bits.  Again, there are duplicated  
+to another similar table for the remaining bits.  Again, there are duplicated
 entries as needed.  The idea is that most of the time the symbol will be short
-and there will only be one table look up.  (That's whole idea behind data  
+and there will only be one table look up.  (That's whole idea behind data
-compression in the first place.)  For the less frequent long symbols, there  
+compression in the first place.)  For the less frequent long symbols, there
-will be two lookups.  If you had a compression method with really long  
+will be two lookups.  If you had a compression method with really long
-symbols, you could have as many levels of lookups as is efficient.  For  
+symbols, you could have as many levels of lookups as is efficient.  For
 inflate, two is enough.
-So a table entry either points to another table (in which case nine bits in  
+So a table entry either points to another table (in which case nine bits in
-the above example are gobbled), or it contains the translation for the symbol  
+the above example are gobbled), or it contains the translation for the symbol
-and the number of bits to gobble.  Then you start again with the next  
+and the number of bits to gobble.  Then you start again with the next
 ungobbled bit.
-You may wonder: why not just have one lookup table for how ever many bits the  
+You may wonder: why not just have one lookup table for how ever many bits the
-longest symbol is?  The reason is that if you do that, you end up spending  
+longest symbol is?  The reason is that if you do that, you end up spending
-more time filling in duplicate symbol entries than you do actually decoding.   
+more time filling in duplicate symbol entries than you do actually decoding.
-At least for deflate's output that generates new trees every several 10's of  
+At least for deflate's output that generates new trees every several 10's of
-kbytes.  You can imagine that filling in a 2^15 entry table for a 15-bit code  
+kbytes.  You can imagine that filling in a 2^15 entry table for a 15-bit code
-would take too long if you're only decoding several thousand symbols.  At the  
+would take too long if you're only decoding several thousand symbols.  At the
 other extreme, you could make a new table for every bit in the code.  In fact,
-that's essentially a Huffman tree.  But then you spend two much time  
+that's essentially a Huffman tree.  But then you spend two much time
 traversing the tree while decoding, even for short symbols.
-So the number of bits for the first lookup table is a trade of the time to  
+So the number of bits for the first lookup table is a trade of the time to
 fill out the table vs. the time spent looking at the second level and above of
 the table.
@@ -154,7 +154,7 @@ Let's make the first table three bits long (eight entries):
 110: -> table X (gobble 3 bits)
 111: -> table Y (gobble 3 bits)
-Each entry is what the bits decode as and how many bits that is, i.e. how  
+Each entry is what the bits decode as and how many bits that is, i.e. how
 many bits to gobble.  Or the entry points to another table, with the number of
 bits to gobble implicit in the size of the table.
@@ -166,7 +166,7 @@ long:
 10: D,2
 11: E,2
-Table Y is three bits long since the longest code starting with 111 is six  
+Table Y is three bits long since the longest code starting with 111 is six
 bits long:
 000: F,2
@@ -178,20 +178,20 @@ bits long:
 110: I,3
 111: J,3
-So what we have here are three tables with a total of 20 entries that had to  
+So what we have here are three tables with a total of 20 entries that had to
-be constructed.  That's compared to 64 entries for a single table.  Or  
+be constructed.  That's compared to 64 entries for a single table.  Or
-compared to 16 entries for a Huffman tree (six two entry tables and one four  
+compared to 16 entries for a Huffman tree (six two entry tables and one four
-entry table).  Assuming that the code ideally represents the probability of  
+entry table).  Assuming that the code ideally represents the probability of
 the symbols, it takes on the average 1.25 lookups per symbol.  That's compared
-to one lookup for the single table, or 1.66 lookups per symbol for the  
+to one lookup for the single table, or 1.66 lookups per symbol for the
 Huffman tree.
-There, I think that gives you a picture of what's going on.  For inflate, the  
+There, I think that gives you a picture of what's going on.  For inflate, the
-meaning of a particular symbol is often more than just a letter.  It can be a  
+meaning of a particular symbol is often more than just a letter.  It can be a
-byte (a "literal"), or it can be either a length or a distance which  
+byte (a "literal"), or it can be either a length or a distance which
-indicates a base value and a number of bits to fetch after the code that is  
+indicates a base value and a number of bits to fetch after the code that is
-added to the base value.  Or it might be the special end-of-block code.  The  
+added to the base value.  Or it might be the special end-of-block code.  The
-data structures created in inftrees.c try to encode all that information  
+data structures created in inftrees.c try to encode all that information
 compactly in the tables.
author	Mark Adler <madler@alumni.caltech.edu>	2011-09-09 23:22:37 -0700
committer	Mark Adler <madler@alumni.caltech.edu>	2011-09-09 23:22:37 -0700
commit	4b5a43a219d51066c01ff2ab86af18b967f2d0dd (patch)
tree	4dcaf0cd18751d04cf638a9a6ec521990d4f2e90 /algorithm.txt
parent	086e982175da84b3db958191031380794315f95f (diff)
download	zlib-4b5a43a219d51066c01ff2ab86af18b967f2d0dd.tar.gz zlib-4b5a43a219d51066c01ff2ab86af18b967f2d0dd.tar.bz2 zlib-4b5a43a219d51066c01ff2ab86af18b967f2d0dd.zip

diff --git a/algorithm.txt b/algorithm.txt index a0497654..b022dde3 100644 --- a/algorithm.txt +++ b/algorithm.txt
@@ -91,40 +91,40 @@ for 30 symbols.
91		91
92	2.2 More details on the inflate table lookup	92	2.2 More details on the inflate table lookup
93		93
94	Ok, you want to know what this cleverly obfuscated inflate tree actually	94	Ok, you want to know what this cleverly obfuscated inflate tree actually
95	looks like. You are correct that it's not a Huffman tree. It is simply a	95	looks like. You are correct that it's not a Huffman tree. It is simply a
96	lookup table for the first, let's say, nine bits of a Huffman symbol. The	96	lookup table for the first, let's say, nine bits of a Huffman symbol. The
97	symbol could be as short as one bit or as long as 15 bits. If a particular	97	symbol could be as short as one bit or as long as 15 bits. If a particular
98	symbol is shorter than nine bits, then that symbol's translation is duplicated	98	symbol is shorter than nine bits, then that symbol's translation is duplicated
99	in all those entries that start with that symbol's bits. For example, if the	99	in all those entries that start with that symbol's bits. For example, if the
100	symbol is four bits, then it's duplicated 32 times in a nine-bit table. If a	100	symbol is four bits, then it's duplicated 32 times in a nine-bit table. If a
101	symbol is nine bits long, it appears in the table once.	101	symbol is nine bits long, it appears in the table once.
102		102
103	If the symbol is longer than nine bits, then that entry in the table points	103	If the symbol is longer than nine bits, then that entry in the table points
104	to another similar table for the remaining bits. Again, there are duplicated	104	to another similar table for the remaining bits. Again, there are duplicated
105	entries as needed. The idea is that most of the time the symbol will be short	105	entries as needed. The idea is that most of the time the symbol will be short
106	and there will only be one table look up. (That's whole idea behind data	106	and there will only be one table look up. (That's whole idea behind data
107	compression in the first place.) For the less frequent long symbols, there	107	compression in the first place.) For the less frequent long symbols, there
108	will be two lookups. If you had a compression method with really long	108	will be two lookups. If you had a compression method with really long
109	symbols, you could have as many levels of lookups as is efficient. For	109	symbols, you could have as many levels of lookups as is efficient. For
110	inflate, two is enough.	110	inflate, two is enough.
111		111
112	So a table entry either points to another table (in which case nine bits in	112	So a table entry either points to another table (in which case nine bits in
113	the above example are gobbled), or it contains the translation for the symbol	113	the above example are gobbled), or it contains the translation for the symbol
114	and the number of bits to gobble. Then you start again with the next	114	and the number of bits to gobble. Then you start again with the next
115	ungobbled bit.	115	ungobbled bit.
116		116
117	You may wonder: why not just have one lookup table for how ever many bits the	117	You may wonder: why not just have one lookup table for how ever many bits the
118	longest symbol is? The reason is that if you do that, you end up spending	118	longest symbol is? The reason is that if you do that, you end up spending
119	more time filling in duplicate symbol entries than you do actually decoding.	119	more time filling in duplicate symbol entries than you do actually decoding.
120	At least for deflate's output that generates new trees every several 10's of	120	At least for deflate's output that generates new trees every several 10's of
121	kbytes. You can imagine that filling in a 2^15 entry table for a 15-bit code	121	kbytes. You can imagine that filling in a 2^15 entry table for a 15-bit code
122	would take too long if you're only decoding several thousand symbols. At the	122	would take too long if you're only decoding several thousand symbols. At the
123	other extreme, you could make a new table for every bit in the code. In fact,	123	other extreme, you could make a new table for every bit in the code. In fact,
124	that's essentially a Huffman tree. But then you spend two much time	124	that's essentially a Huffman tree. But then you spend two much time
125	traversing the tree while decoding, even for short symbols.	125	traversing the tree while decoding, even for short symbols.
126		126
127	So the number of bits for the first lookup table is a trade of the time to	127	So the number of bits for the first lookup table is a trade of the time to
128	fill out the table vs. the time spent looking at the second level and above of	128	fill out the table vs. the time spent looking at the second level and above of
129	the table.	129	the table.
130		130
@@ -154,7 +154,7 @@ Let's make the first table three bits long (eight entries):
154	110: -> table X (gobble 3 bits)	154	110: -> table X (gobble 3 bits)
155	111: -> table Y (gobble 3 bits)	155	111: -> table Y (gobble 3 bits)
156		156
157	Each entry is what the bits decode as and how many bits that is, i.e. how	157	Each entry is what the bits decode as and how many bits that is, i.e. how
158	many bits to gobble. Or the entry points to another table, with the number of	158	many bits to gobble. Or the entry points to another table, with the number of
159	bits to gobble implicit in the size of the table.	159	bits to gobble implicit in the size of the table.
160		160
@@ -166,7 +166,7 @@ long:
166	10: D,2	166	10: D,2
167	11: E,2	167	11: E,2
168		168
169	Table Y is three bits long since the longest code starting with 111 is six	169	Table Y is three bits long since the longest code starting with 111 is six
170	bits long:	170	bits long:
171		171
172	000: F,2	172	000: F,2
@@ -178,20 +178,20 @@ bits long:
178	110: I,3	178	110: I,3
179	111: J,3	179	111: J,3
180		180
181	So what we have here are three tables with a total of 20 entries that had to	181	So what we have here are three tables with a total of 20 entries that had to
182	be constructed. That's compared to 64 entries for a single table. Or	182	be constructed. That's compared to 64 entries for a single table. Or
183	compared to 16 entries for a Huffman tree (six two entry tables and one four	183	compared to 16 entries for a Huffman tree (six two entry tables and one four
184	entry table). Assuming that the code ideally represents the probability of	184	entry table). Assuming that the code ideally represents the probability of
185	the symbols, it takes on the average 1.25 lookups per symbol. That's compared	185	the symbols, it takes on the average 1.25 lookups per symbol. That's compared
186	to one lookup for the single table, or 1.66 lookups per symbol for the	186	to one lookup for the single table, or 1.66 lookups per symbol for the
187	Huffman tree.	187	Huffman tree.
188		188
189	There, I think that gives you a picture of what's going on. For inflate, the	189	There, I think that gives you a picture of what's going on. For inflate, the
190	meaning of a particular symbol is often more than just a letter. It can be a	190	meaning of a particular symbol is often more than just a letter. It can be a
191	byte (a "literal"), or it can be either a length or a distance which	191	byte (a "literal"), or it can be either a length or a distance which
192	indicates a base value and a number of bits to fetch after the code that is	192	indicates a base value and a number of bits to fetch after the code that is
193	added to the base value. Or it might be the special end-of-block code. The	193	added to the base value. Or it might be the special end-of-block code. The
194	data structures created in inftrees.c try to encode all that information	194	data structures created in inftrees.c try to encode all that information
195	compactly in the tables.	195	compactly in the tables.
196		196
197		197