diff options
author | Julian Seward <jseward@acm.org> | 1998-08-23 22:13:13 +0200 |
---|---|---|
committer | Julian Seward <jseward@acm.org> | 1998-08-23 22:13:13 +0200 |
commit | 977101ad5f833f5c0a574bfeea408e5301a6b052 (patch) | |
tree | fc1e8fed202869c116cbf6b8c362456042494a0a /bzip2.1.preformatted | |
parent | 1eb67a9d8f7f05ae310bc9ef297d176f3a3f8a37 (diff) | |
download | bzip2-0.9.0c.tar.gz bzip2-0.9.0c.tar.bz2 bzip2-0.9.0c.zip |
bzip2-0.9.0cbzip2-0.9.0c
Diffstat (limited to 'bzip2.1.preformatted')
-rw-r--r-- | bzip2.1.preformatted | 318 |
1 files changed, 158 insertions, 160 deletions
diff --git a/bzip2.1.preformatted b/bzip2.1.preformatted index 5206e05..8c4fab1 100644 --- a/bzip2.1.preformatted +++ b/bzip2.1.preformatted | |||
@@ -5,18 +5,20 @@ bzip2(1) bzip2(1) | |||
5 | 5 | ||
6 | 6 | ||
7 | NNAAMMEE | 7 | NNAAMMEE |
8 | bzip2, bunzip2 - a block-sorting file compressor, v0.1 | 8 | bzip2, bunzip2 - a block-sorting file compressor, v0.9.0 |
9 | bzcat - decompresses files to stdout | ||
9 | bzip2recover - recovers data from damaged bzip2 files | 10 | bzip2recover - recovers data from damaged bzip2 files |
10 | 11 | ||
11 | 12 | ||
12 | SSYYNNOOPPSSIISS | 13 | SSYYNNOOPPSSIISS |
13 | bbzziipp22 [ --ccddffkkssttvvVVLL112233445566778899 ] [ _f_i_l_e_n_a_m_e_s _._._. ] | 14 | bbzziipp22 [ --ccddffkkssttvvzzVVLL112233445566778899 ] [ _f_i_l_e_n_a_m_e_s _._._. ] |
14 | bbuunnzziipp22 [ --kkvvssVVLL ] [ _f_i_l_e_n_a_m_e_s _._._. ] | 15 | bbuunnzziipp22 [ --ffkkvvssVVLL ] [ _f_i_l_e_n_a_m_e_s _._._. ] |
16 | bbzzccaatt [ --ss ] [ _f_i_l_e_n_a_m_e_s _._._. ] | ||
15 | bbzziipp22rreeccoovveerr _f_i_l_e_n_a_m_e | 17 | bbzziipp22rreeccoovveerr _f_i_l_e_n_a_m_e |
16 | 18 | ||
17 | 19 | ||
18 | DDEESSCCRRIIPPTTIIOONN | 20 | DDEESSCCRRIIPPTTIIOONN |
19 | _B_z_i_p_2 compresses files using the Burrows-Wheeler block- | 21 | _b_z_i_p_2 compresses files using the Burrows-Wheeler block- |
20 | sorting text compression algorithm, and Huffman coding. | 22 | sorting text compression algorithm, and Huffman coding. |
21 | Compression is generally considerably better than that | 23 | Compression is generally considerably better than that |
22 | achieved by more conventional LZ77/LZ78-based compressors, | 24 | achieved by more conventional LZ77/LZ78-based compressors, |
@@ -26,7 +28,7 @@ DDEESSCCRRIIPPTTIIOONN | |||
26 | The command-line options are deliberately very similar to | 28 | The command-line options are deliberately very similar to |
27 | those of _G_N_U _G_z_i_p_, but they are not identical. | 29 | those of _G_N_U _G_z_i_p_, but they are not identical. |
28 | 30 | ||
29 | _B_z_i_p_2 expects a list of file names to accompany the com- | 31 | _b_z_i_p_2 expects a list of file names to accompany the com- |
30 | mand-line flags. Each file is replaced by a compressed | 32 | mand-line flags. Each file is replaced by a compressed |
31 | version of itself, with the name "original_name.bz2". | 33 | version of itself, with the name "original_name.bz2". |
32 | Each compressed file has the same modification date and | 34 | Each compressed file has the same modification date and |
@@ -38,8 +40,8 @@ DDEESSCCRRIIPPTTIIOONN | |||
38 | cepts, or have serious file name length restrictions, such | 40 | cepts, or have serious file name length restrictions, such |
39 | as MS-DOS. | 41 | as MS-DOS. |
40 | 42 | ||
41 | _B_z_i_p_2 and _b_u_n_z_i_p_2 will not overwrite existing files; if | 43 | _b_z_i_p_2 and _b_u_n_z_i_p_2 will by default not overwrite existing |
42 | you want this to happen, you should delete them first. | 44 | files; if you want this to happen, specify the -f flag. |
43 | 45 | ||
44 | If no file names are specified, _b_z_i_p_2 compresses from | 46 | If no file names are specified, _b_z_i_p_2 compresses from |
45 | standard input to standard output. In this case, _b_z_i_p_2 | 47 | standard input to standard output. In this case, _b_z_i_p_2 |
@@ -47,17 +49,15 @@ DDEESSCCRRIIPPTTIIOONN | |||
47 | this would be entirely incomprehensible and therefore | 49 | this would be entirely incomprehensible and therefore |
48 | pointless. | 50 | pointless. |
49 | 51 | ||
50 | _B_u_n_z_i_p_2 (or _b_z_i_p_2 _-_d ) decompresses and restores all spec- | 52 | _b_u_n_z_i_p_2 (or _b_z_i_p_2 _-_d ) decompresses and restores all spec- |
51 | ified files whose names end in ".bz2". Files without this | 53 | ified files whose names end in ".bz2". Files without this |
52 | suffix are ignored. Again, supplying no filenames causes | 54 | suffix are ignored. Again, supplying no filenames causes |
53 | decompression from standard input to standard output. | 55 | decompression from standard input to standard output. |
54 | 56 | ||
55 | You can also compress or decompress files to the standard | 57 | _b_u_n_z_i_p_2 will correctly decompress a file which is the con- |
56 | output by giving the -c flag. You can decompress multiple | 58 | catenation of two or more compressed files. The result is |
57 | files like this, but you may only compress a single file | 59 | the concatenation of the corresponding uncompressed files. |
58 | this way, since it would otherwise be difficult to sepa- | 60 | Integrity testing (-t) of concatenated compressed files is |
59 | rate out the compressed representations of the original | ||
60 | files. | ||
61 | 61 | ||
62 | 62 | ||
63 | 63 | ||
@@ -70,6 +70,21 @@ DDEESSCCRRIIPPTTIIOONN | |||
70 | bzip2(1) bzip2(1) | 70 | bzip2(1) bzip2(1) |
71 | 71 | ||
72 | 72 | ||
73 | also supported. | ||
74 | |||
75 | You can also compress or decompress files to the standard | ||
76 | output by giving the -c flag. Multiple files may be com- | ||
77 | pressed and decompressed like this. The resulting outputs | ||
78 | are fed sequentially to stdout. Compression of multiple | ||
79 | files in this manner generates a stream containing multi- | ||
80 | ple compressed file representations. Such a stream can be | ||
81 | decompressed correctly only by _b_z_i_p_2 version 0.9.0 or | ||
82 | later. Earlier versions of _b_z_i_p_2 will stop after decom- | ||
83 | pressing the first file in the stream. | ||
84 | |||
85 | _b_z_c_a_t (or _b_z_i_p_2 _-_d_c ) decompresses all specified files to | ||
86 | the standard output. | ||
87 | |||
73 | Compression is always performed, even if the compressed | 88 | Compression is always performed, even if the compressed |
74 | file is slightly larger than the original. Files of less | 89 | file is slightly larger than the original. Files of less |
75 | than about one hundred bytes tend to get larger, since the | 90 | than about one hundred bytes tend to get larger, since the |
@@ -108,36 +123,37 @@ MMEEMMOORRYY MMAANNAAGGEEMMEENNTT | |||
108 | file, and _b_u_n_z_i_p_2 then allocates itself just enough memory | 123 | file, and _b_u_n_z_i_p_2 then allocates itself just enough memory |
109 | to decompress the file. Since block sizes are stored in | 124 | to decompress the file. Since block sizes are stored in |
110 | compressed files, it follows that the flags -1 to -9 are | 125 | compressed files, it follows that the flags -1 to -9 are |
111 | irrelevant to and so ignored during decompression. Com- | 126 | irrelevant to and so ignored during decompression. |
112 | pression and decompression requirements, in bytes, can be | ||
113 | estimated as: | ||
114 | 127 | ||
115 | Compression: 400k + ( 7 x block size ) | ||
116 | 128 | ||
117 | Decompression: 100k + ( 5 x block size ), or | ||
118 | 100k + ( 2.5 x block size ) | ||
119 | 129 | ||
120 | Larger block sizes give rapidly diminishing marginal | 130 | 2 |
121 | returns; most of the compression comes from the first two | ||
122 | or three hundred k of block size, a fact worth bearing in | ||
123 | mind when using _b_z_i_p_2 on small machines. It is also | ||
124 | important to appreciate that the decompression memory | ||
125 | requirement is set at compression-time by the choice of | ||
126 | block size. | ||
127 | 131 | ||
128 | 132 | ||
129 | 133 | ||
130 | 2 | ||
131 | 134 | ||
132 | 135 | ||
136 | bzip2(1) bzip2(1) | ||
133 | 137 | ||
134 | 138 | ||
139 | Compression and decompression requirements, in bytes, can | ||
140 | be estimated as: | ||
135 | 141 | ||
136 | bzip2(1) bzip2(1) | 142 | Compression: 400k + ( 7 x block size ) |
137 | 143 | ||
144 | Decompression: 100k + ( 4 x block size ), or | ||
145 | 100k + ( 2.5 x block size ) | ||
146 | |||
147 | Larger block sizes give rapidly diminishing marginal | ||
148 | returns; most of the compression comes from the first two | ||
149 | or three hundred k of block size, a fact worth bearing in | ||
150 | mind when using _b_z_i_p_2 on small machines. It is also | ||
151 | important to appreciate that the decompression memory | ||
152 | requirement is set at compression-time by the choice of | ||
153 | block size. | ||
138 | 154 | ||
139 | For files compressed with the default 900k block size, | 155 | For files compressed with the default 900k block size, |
140 | _b_u_n_z_i_p_2 will require about 4600 kbytes to decompress. To | 156 | _b_u_n_z_i_p_2 will require about 3700 kbytes to decompress. To |
141 | support decompression of any file on a 4 megabyte machine, | 157 | support decompression of any file on a 4 megabyte machine, |
142 | _b_u_n_z_i_p_2 has an option to decompress using approximately | 158 | _b_u_n_z_i_p_2 has an option to decompress using approximately |
143 | half this amount of memory, about 2300 kbytes. Decompres- | 159 | half this amount of memory, about 2300 kbytes. Decompres- |
@@ -157,8 +173,8 @@ bzip2(1) bzip2(1) | |||
157 | file 20,000 bytes long with the flag -9 will cause the | 173 | file 20,000 bytes long with the flag -9 will cause the |
158 | compressor to allocate around 6700k of memory, but only | 174 | compressor to allocate around 6700k of memory, but only |
159 | touch 400k + 20000 * 7 = 540 kbytes of it. Similarly, the | 175 | touch 400k + 20000 * 7 = 540 kbytes of it. Similarly, the |
160 | decompressor will allocate 4600k but only touch 100k + | 176 | decompressor will allocate 3700k but only touch 100k + |
161 | 20000 * 5 = 200 kbytes. | 177 | 20000 * 4 = 180 kbytes. |
162 | 178 | ||
163 | Here is a table which summarises the maximum memory usage | 179 | Here is a table which summarises the maximum memory usage |
164 | for different block sizes. Also recorded is the total | 180 | for different block sizes. Also recorded is the total |
@@ -172,64 +188,66 @@ bzip2(1) bzip2(1) | |||
172 | Compress Decompress Decompress Corpus | 188 | Compress Decompress Decompress Corpus |
173 | Flag usage usage -s usage Size | 189 | Flag usage usage -s usage Size |
174 | 190 | ||
175 | -1 1100k 600k 350k 914704 | 191 | -1 1100k 500k 350k 914704 |
176 | -2 1800k 1100k 600k 877703 | 192 | -2 1800k 900k 600k 877703 |
177 | -3 2500k 1600k 850k 860338 | ||
178 | -4 3200k 2100k 1100k 846899 | ||
179 | -5 3900k 2600k 1350k 845160 | ||
180 | -6 4600k 3100k 1600k 838626 | ||
181 | -7 5400k 3600k 1850k 834096 | ||
182 | -8 6000k 4100k 2100k 828642 | ||
183 | -9 6700k 4600k 2350k 828642 | ||
184 | 193 | ||
185 | 194 | ||
186 | OOPPTTIIOONNSS | ||
187 | --cc ----ssttddoouutt | ||
188 | Compress or decompress to standard output. -c will | ||
189 | decompress multiple files to stdout, but will only | ||
190 | compress a single file to stdout. | ||
191 | |||
192 | 195 | ||
196 | 3 | ||
193 | 197 | ||
194 | 198 | ||
195 | 199 | ||
196 | 3 | ||
197 | 200 | ||
198 | 201 | ||
202 | bzip2(1) bzip2(1) | ||
199 | 203 | ||
200 | 204 | ||
205 | -3 2500k 1300k 850k 860338 | ||
206 | -4 3200k 1700k 1100k 846899 | ||
207 | -5 3900k 2100k 1350k 845160 | ||
208 | -6 4600k 2500k 1600k 838626 | ||
209 | -7 5400k 2900k 1850k 834096 | ||
210 | -8 6000k 3300k 2100k 828642 | ||
211 | -9 6700k 3700k 2350k 828642 | ||
201 | 212 | ||
202 | bzip2(1) bzip2(1) | ||
203 | 213 | ||
214 | OOPPTTIIOONNSS | ||
215 | --cc ----ssttddoouutt | ||
216 | Compress or decompress to standard output. -c will | ||
217 | decompress multiple files to stdout, but will only | ||
218 | compress a single file to stdout. | ||
204 | 219 | ||
205 | --dd ----ddeeccoommpprreessss | 220 | --dd ----ddeeccoommpprreessss |
206 | Force decompression. _B_z_i_p_2 and _b_u_n_z_i_p_2 are really | 221 | Force decompression. _b_z_i_p_2_, _b_u_n_z_i_p_2 and _b_z_c_a_t are |
207 | the same program, and the decision about whether to | 222 | really the same program, and the decision about |
208 | compress or decompress is done on the basis of | 223 | what actions to take is done on the basis of which |
209 | which name is used. This flag overrides that mech- | 224 | name is used. This flag overrides that mechanism, |
210 | anism, and forces _b_z_i_p_2 to decompress. | 225 | and forces _b_z_i_p_2 to decompress. |
211 | 226 | ||
212 | --ff ----ccoommpprreessss | 227 | --zz ----ccoommpprreessss |
213 | The complement to -d: forces compression, regard- | 228 | The complement to -d: forces compression, regard- |
214 | less of the invokation name. | 229 | less of the invokation name. |
215 | 230 | ||
216 | --tt ----tteesstt | 231 | --tt ----tteesstt |
217 | Check integrity of the specified file(s), but don't | 232 | Check integrity of the specified file(s), but don't |
218 | decompress them. This really performs a trial | 233 | decompress them. This really performs a trial |
219 | decompression and throws away the result, using the | 234 | decompression and throws away the result. |
220 | low-memory decompression algorithm (see -s). | 235 | |
236 | --ff ----ffoorrccee | ||
237 | Force overwrite of output files. Normally, _b_z_i_p_2 | ||
238 | will not overwrite existing output files. | ||
221 | 239 | ||
222 | --kk ----kkeeeepp | 240 | --kk ----kkeeeepp |
223 | Keep (don't delete) input files during compression | 241 | Keep (don't delete) input files during compression |
224 | or decompression. | 242 | or decompression. |
225 | 243 | ||
226 | --ss ----ssmmaallll | 244 | --ss ----ssmmaallll |
227 | Reduce memory usage, both for compression and | 245 | Reduce memory usage, for compression, decompression |
228 | decompression. Files are decompressed using a mod- | 246 | and testing. Files are decompressed and tested |
229 | ified algorithm which only requires 2.5 bytes per | 247 | using a modified algorithm which only requires 2.5 |
230 | block byte. This means any file can be decom- | 248 | bytes per block byte. This means any file can be |
231 | pressed in 2300k of memory, albeit somewhat more | 249 | decompressed in 2300k of memory, albeit at about |
232 | slowly than usual. | 250 | half the normal speed. |
233 | 251 | ||
234 | During compression, -s selects a block size of | 252 | During compression, -s selects a block size of |
235 | 200k, which limits memory use to around the same | 253 | 200k, which limits memory use to around the same |
@@ -239,35 +257,32 @@ bzip2(1) bzip2(1) | |||
239 | MEMORY MANAGEMENT above. | 257 | MEMORY MANAGEMENT above. |
240 | 258 | ||
241 | 259 | ||
260 | |||
261 | |||
262 | 4 | ||
263 | |||
264 | |||
265 | |||
266 | |||
267 | |||
268 | bzip2(1) bzip2(1) | ||
269 | |||
270 | |||
242 | --vv ----vveerrbboossee | 271 | --vv ----vveerrbboossee |
243 | Verbose mode -- show the compression ratio for each | 272 | Verbose mode -- show the compression ratio for each |
244 | file processed. Further -v's increase the ver- | 273 | file processed. Further -v's increase the ver- |
245 | bosity level, spewing out lots of information which | 274 | bosity level, spewing out lots of information which |
246 | is primarily of interest for diagnostic purposes. | 275 | is primarily of interest for diagnostic purposes. |
247 | 276 | ||
248 | --LL ----lliicceennssee | 277 | --LL ----lliicceennssee --VV ----vveerrssiioonn |
249 | Display the software version, license terms and | 278 | Display the software version, license terms and |
250 | conditions. | 279 | conditions. |
251 | 280 | ||
252 | --VV ----vveerrssiioonn | ||
253 | Same as -L. | ||
254 | |||
255 | --11 ttoo --99 | 281 | --11 ttoo --99 |
256 | Set the block size to 100 k, 200 k .. 900 k when | 282 | Set the block size to 100 k, 200 k .. 900 k when |
257 | compressing. Has no effect when decompressing. | 283 | compressing. Has no effect when decompressing. |
258 | See MEMORY MANAGEMENT above. | 284 | See MEMORY MANAGEMENT above. |
259 | 285 | ||
260 | |||
261 | |||
262 | 4 | ||
263 | |||
264 | |||
265 | |||
266 | |||
267 | |||
268 | bzip2(1) bzip2(1) | ||
269 | |||
270 | |||
271 | ----rreeppeettiittiivvee--ffaasstt | 286 | ----rreeppeettiittiivvee--ffaasstt |
272 | _b_z_i_p_2 injects some small pseudo-random variations | 287 | _b_z_i_p_2 injects some small pseudo-random variations |
273 | into very repetitive blocks to limit worst-case | 288 | into very repetitive blocks to limit worst-case |
@@ -306,34 +321,34 @@ RREECCOOVVEERRIINNGG DDAATTAA FFRROOMM DDAAMMAAGGEEDD F | |||
306 | _b_z_i_p_2_r_e_c_o_v_e_r takes a single argument, the name of the dam- | 321 | _b_z_i_p_2_r_e_c_o_v_e_r takes a single argument, the name of the dam- |
307 | aged file, and writes a number of files "rec0001file.bz2", | 322 | aged file, and writes a number of files "rec0001file.bz2", |
308 | "rec0002file.bz2", etc, containing the extracted blocks. | 323 | "rec0002file.bz2", etc, containing the extracted blocks. |
309 | The output filenames are designed so that the use of wild- | 324 | The output filenames are designed so that the use of |
310 | cards in subsequent processing -- for example, "bzip2 -dc | ||
311 | rec*file.bz2 > recovered_data" -- lists the files in the | ||
312 | "right" order. | ||
313 | 325 | ||
314 | _b_z_i_p_2_r_e_c_o_v_e_r should be of most use dealing with large .bz2 | ||
315 | files, as these will contain many blocks. It is clearly | ||
316 | futile to use it on damaged single-block files, since a | ||
317 | damaged block cannot be recovered. If you wish to min- | ||
318 | imise any potential data loss through media or transmis- | ||
319 | sion errors, you might consider compressing with a smaller | ||
320 | block size. | ||
321 | 326 | ||
322 | 327 | ||
323 | PPEERRFFOORRMMAANNCCEE NNOOTTEESS | 328 | 5 |
324 | The sorting phase of compression gathers together similar | ||
325 | 329 | ||
326 | 330 | ||
327 | 331 | ||
328 | 5 | ||
329 | 332 | ||
330 | 333 | ||
334 | bzip2(1) bzip2(1) | ||
331 | 335 | ||
332 | 336 | ||
337 | wildcards in subsequent processing -- for example, "bzip2 | ||
338 | -dc rec*file.bz2 > recovered_data" -- lists the files in | ||
339 | the "right" order. | ||
333 | 340 | ||
334 | bzip2(1) bzip2(1) | 341 | _b_z_i_p_2_r_e_c_o_v_e_r should be of most use dealing with large .bz2 |
342 | files, as these will contain many blocks. It is clearly | ||
343 | futile to use it on damaged single-block files, since a | ||
344 | damaged block cannot be recovered. If you wish to min- | ||
345 | imise any potential data loss through media or transmis- | ||
346 | sion errors, you might consider compressing with a smaller | ||
347 | block size. | ||
335 | 348 | ||
336 | 349 | ||
350 | PPEERRFFOORRMMAANNCCEE NNOOTTEESS | ||
351 | The sorting phase of compression gathers together similar | ||
337 | strings in the file. Because of this, files containing | 352 | strings in the file. Because of this, files containing |
338 | very long runs of repeated symbols, like "aabaabaabaab | 353 | very long runs of repeated symbols, like "aabaabaabaab |
339 | ..." (repeated several hundred times) may compress | 354 | ..." (repeated several hundred times) may compress |
@@ -348,10 +363,6 @@ bzip2(1) bzip2(1) | |||
348 | severe slowness in compression, try making the block size | 363 | severe slowness in compression, try making the block size |
349 | as small as possible, with flag -1. | 364 | as small as possible, with flag -1. |
350 | 365 | ||
351 | Incompressible or virtually-incompressible data may decom- | ||
352 | press rather more slowly than one would hope. This is due | ||
353 | to a naive implementation of the move-to-front coder. | ||
354 | |||
355 | _b_z_i_p_2 usually allocates several megabytes of memory to | 366 | _b_z_i_p_2 usually allocates several megabytes of memory to |
356 | operate in, and then charges all over it in a fairly ran- | 367 | operate in, and then charges all over it in a fairly ran- |
357 | dom fashion. This means that performance, both for com- | 368 | dom fashion. This means that performance, both for com- |
@@ -362,12 +373,6 @@ bzip2(1) bzip2(1) | |||
362 | large performance improvements. I imagine _b_z_i_p_2 will per- | 373 | large performance improvements. I imagine _b_z_i_p_2 will per- |
363 | form best on machines with very large caches. | 374 | form best on machines with very large caches. |
364 | 375 | ||
365 | Test mode (-t) uses the low-memory decompression algorithm | ||
366 | (-s). This means test mode does not run as fast as it | ||
367 | could; it could run as fast as the normal decompression | ||
368 | machinery. This could easily be fixed at the cost of some | ||
369 | code bloat. | ||
370 | |||
371 | 376 | ||
372 | CCAAVVEEAATTSS | 377 | CCAAVVEEAATTSS |
373 | I/O error messages are not as helpful as they could be. | 378 | I/O error messages are not as helpful as they could be. |
@@ -375,19 +380,14 @@ CCAAVVEEAATTSS | |||
375 | but the details of what the problem is sometimes seem | 380 | but the details of what the problem is sometimes seem |
376 | rather misleading. | 381 | rather misleading. |
377 | 382 | ||
378 | This manual page pertains to version 0.1 of _b_z_i_p_2_. It may | 383 | This manual page pertains to version 0.9.0 of _b_z_i_p_2_. Com- |
379 | well happen that some future version will use a different | 384 | pressed data created by this version is entirely forwards |
380 | compressed file format. If you try to decompress, using | 385 | and backwards compatible with the previous public release, |
381 | 0.1, a .bz2 file created with some future version which | 386 | version 0.1pl2, but with the following exception: 0.9.0 |
382 | uses a different compressed file format, 0.1 will complain | 387 | can correctly decompress multiple concatenated compressed |
383 | that your file "is not a bzip2 file". If that happens, | 388 | files. 0.1pl2 cannot do this; it will stop after decom- |
384 | you should obtain a more recent version of _b_z_i_p_2 and use | 389 | pressing just the first file in the stream. |
385 | that to decompress the file. | ||
386 | 390 | ||
387 | Wildcard expansion for Windows 95 and NT is flaky. | ||
388 | |||
389 | _b_z_i_p_2_r_e_c_o_v_e_r uses 32-bit integers to represent bit posi- | ||
390 | tions in compressed files, so it cannot handle compressed | ||
391 | 391 | ||
392 | 392 | ||
393 | 393 | ||
@@ -400,61 +400,59 @@ CCAAVVEEAATTSS | |||
400 | bzip2(1) bzip2(1) | 400 | bzip2(1) bzip2(1) |
401 | 401 | ||
402 | 402 | ||
403 | files more than 512 megabytes long. This could easily be | 403 | Wildcard expansion for Windows 95 and NT is flaky. |
404 | |||
405 | _b_z_i_p_2_r_e_c_o_v_e_r uses 32-bit integers to represent bit posi- | ||
406 | tions in compressed files, so it cannot handle compressed | ||
407 | files more than 512 megabytes long. This could easily be | ||
404 | fixed. | 408 | fixed. |
405 | 409 | ||
406 | _b_z_i_p_2_r_e_c_o_v_e_r sometimes reports a very small, incomplete | 410 | |
407 | final block. This is spurious and can be safely ignored. | 411 | AAUUTTHHOORR |
412 | Julian Seward, jseward@acm.org. | ||
413 | http://www.muraroa.demon.co.uk | ||
414 | |||
415 | The ideas embodied in _b_z_i_p_2 are due to (at least) the fol- | ||
416 | lowing people: Michael Burrows and David Wheeler (for the | ||
417 | block sorting transformation), David Wheeler (again, for | ||
418 | the Huffman coder), Peter Fenwick (for the structured cod- | ||
419 | ing model in the original _b_z_i_p_, and many refinements), and | ||
420 | Alistair Moffat, Radford Neal and Ian Witten (for the | ||
421 | arithmetic coder in the original _b_z_i_p_)_. I am much | ||
422 | indebted for their help, support and advice. See the man- | ||
423 | ual in the source distribution for pointers to sources of | ||
424 | documentation. Christian von Roques encouraged me to look | ||
425 | for faster sorting algorithms, so as to speed up compres- | ||
426 | sion. Bela Lubkin encouraged me to improve the worst-case | ||
427 | compression performance. Many people sent patches, helped | ||
428 | with portability problems, lent machines, gave advice and | ||
429 | were generally helpful. | ||
430 | |||
431 | |||
432 | |||
433 | |||
434 | |||
435 | |||
436 | |||
437 | |||
438 | |||
439 | |||
440 | |||
441 | |||
442 | |||
443 | |||
444 | |||
445 | |||
446 | |||
447 | |||
408 | 448 | ||
409 | 449 | ||
410 | RREELLAATTIIOONNSSHHIIPP TTOO bbzziipp--00..2211 | ||
411 | This program is a descendant of the _b_z_i_p program, version | ||
412 | 0.21, which I released in August 1996. The primary dif- | ||
413 | ference of _b_z_i_p_2 is its avoidance of the possibly patented | ||
414 | algorithms which were used in 0.21. _b_z_i_p_2 also brings | ||
415 | various useful refinements (-s, -t), uses less memory, | ||
416 | decompresses significantly faster, and has support for | ||
417 | recovering data from damaged files. | ||
418 | 450 | ||
419 | Because _b_z_i_p_2 uses Huffman coding to construct the com- | ||
420 | pressed bitstream, rather than the arithmetic coding used | ||
421 | in 0.21, the compressed representations generated by the | ||
422 | two programs are incompatible, and they will not interop- | ||
423 | erate. The change in suffix from .bz to .bz2 reflects | ||
424 | this. It would have been helpful to at least allow _b_z_i_p_2 | ||
425 | to decompress files created by 0.21, but this would defeat | ||
426 | the primary aim of having a patent-free compressor. | ||
427 | 451 | ||
428 | For a more precise statement about patent issues in bzip2, | ||
429 | please see the README file in the distribution. | ||
430 | 452 | ||
431 | Huffman coding necessarily involves some coding ineffi- | ||
432 | ciency compared to arithmetic coding. This means that | ||
433 | _b_z_i_p_2 compresses about 1% worse than 0.21, an unfortunate | ||
434 | but unavoidable fact-of-life. On the other hand, decom- | ||
435 | pression is approximately 50% faster for the same reason, | ||
436 | and the change in file format gave an opportunity to add | ||
437 | data-recovery features. So it is not all bad. | ||
438 | 453 | ||
439 | 454 | ||
440 | AAUUTTHHOORR | ||
441 | Julian Seward, jseward@acm.org. | ||
442 | 455 | ||
443 | The ideas embodied in _b_z_i_p and _b_z_i_p_2 are due to (at least) | ||
444 | the following people: Michael Burrows and David Wheeler | ||
445 | (for the block sorting transformation), David Wheeler | ||
446 | (again, for the Huffman coder), Peter Fenwick (for the | ||
447 | structured coding model in 0.21, and many refinements), | ||
448 | and Alistair Moffat, Radford Neal and Ian Witten (for the | ||
449 | arithmetic coder in 0.21). I am much indebted for their | ||
450 | help, support and advice. See the file ALGORITHMS in the | ||
451 | source distribution for pointers to sources of documenta- | ||
452 | tion. Christian von Roques encouraged me to look for | ||
453 | faster sorting algorithms, so as to speed up compression. | ||
454 | Bela Lubkin encouraged me to improve the worst-case com- | ||
455 | pression performance. Many people sent patches, helped | ||
456 | with portability problems, lent machines, gave advice and | ||
457 | were generally helpful. | ||
458 | 456 | ||
459 | 457 | ||
460 | 458 | ||