diff options
author | Julian Seward <jseward@acm.org> | 1998-08-23 22:13:13 +0200 |
---|---|---|
committer | Julian Seward <jseward@acm.org> | 1998-08-23 22:13:13 +0200 |
commit | 977101ad5f833f5c0a574bfeea408e5301a6b052 (patch) | |
tree | fc1e8fed202869c116cbf6b8c362456042494a0a /bzip2.1 | |
parent | 1eb67a9d8f7f05ae310bc9ef297d176f3a3f8a37 (diff) | |
download | bzip2-977101ad5f833f5c0a574bfeea408e5301a6b052.tar.gz bzip2-977101ad5f833f5c0a574bfeea408e5301a6b052.tar.bz2 bzip2-977101ad5f833f5c0a574bfeea408e5301a6b052.zip |
bzip2-0.9.0cbzip2-0.9.0c
Diffstat (limited to 'bzip2.1')
-rw-r--r-- | bzip2.1 | 191 |
1 files changed, 83 insertions, 108 deletions
@@ -1,21 +1,29 @@ | |||
1 | .PU | 1 | .PU |
2 | .TH bzip2 1 | 2 | .TH bzip2 1 |
3 | .SH NAME | 3 | .SH NAME |
4 | bzip2, bunzip2 \- a block-sorting file compressor, v0.1 | 4 | bzip2, bunzip2 \- a block-sorting file compressor, v0.9.0 |
5 | .br | ||
6 | bzcat \- decompresses files to stdout | ||
5 | .br | 7 | .br |
6 | bzip2recover \- recovers data from damaged bzip2 files | 8 | bzip2recover \- recovers data from damaged bzip2 files |
7 | 9 | ||
8 | .SH SYNOPSIS | 10 | .SH SYNOPSIS |
9 | .ll +8 | 11 | .ll +8 |
10 | .B bzip2 | 12 | .B bzip2 |
11 | .RB [ " \-cdfkstvVL123456789 " ] | 13 | .RB [ " \-cdfkstvzVL123456789 " ] |
12 | [ | 14 | [ |
13 | .I "filenames \&..." | 15 | .I "filenames \&..." |
14 | ] | 16 | ] |
15 | .ll -8 | 17 | .ll -8 |
16 | .br | 18 | .br |
17 | .B bunzip2 | 19 | .B bunzip2 |
18 | .RB [ " \-kvsVL " ] | 20 | .RB [ " \-fkvsVL " ] |
21 | [ | ||
22 | .I "filenames \&..." | ||
23 | ] | ||
24 | .br | ||
25 | .B bzcat | ||
26 | .RB [ " \-s " ] | ||
19 | [ | 27 | [ |
20 | .I "filenames \&..." | 28 | .I "filenames \&..." |
21 | ] | 29 | ] |
@@ -24,7 +32,7 @@ bzip2recover \- recovers data from damaged bzip2 files | |||
24 | .I "filename" | 32 | .I "filename" |
25 | 33 | ||
26 | .SH DESCRIPTION | 34 | .SH DESCRIPTION |
27 | .I Bzip2 | 35 | .I bzip2 |
28 | compresses files using the Burrows-Wheeler block-sorting | 36 | compresses files using the Burrows-Wheeler block-sorting |
29 | text compression algorithm, and Huffman coding. | 37 | text compression algorithm, and Huffman coding. |
30 | Compression is generally considerably | 38 | Compression is generally considerably |
@@ -38,7 +46,7 @@ those of | |||
38 | .I GNU Gzip, | 46 | .I GNU Gzip, |
39 | but they are not identical. | 47 | but they are not identical. |
40 | 48 | ||
41 | .I Bzip2 | 49 | .I bzip2 |
42 | expects a list of file names to accompany the command-line flags. | 50 | expects a list of file names to accompany the command-line flags. |
43 | Each file is replaced by a compressed version of itself, | 51 | Each file is replaced by a compressed version of itself, |
44 | with the name "original_name.bz2". | 52 | with the name "original_name.bz2". |
@@ -50,11 +58,11 @@ original file names, permissions and dates in filesystems | |||
50 | which lack these concepts, or have serious file name length | 58 | which lack these concepts, or have serious file name length |
51 | restrictions, such as MS-DOS. | 59 | restrictions, such as MS-DOS. |
52 | 60 | ||
53 | .I Bzip2 | 61 | .I bzip2 |
54 | and | 62 | and |
55 | .I bunzip2 | 63 | .I bunzip2 |
56 | will not overwrite existing files; if you want this to happen, | 64 | will by default not overwrite existing files; |
57 | you should delete them first. | 65 | if you want this to happen, specify the \-f flag. |
58 | 66 | ||
59 | If no file names are specified, | 67 | If no file names are specified, |
60 | .I bzip2 | 68 | .I bzip2 |
@@ -64,7 +72,7 @@ In this case, | |||
64 | will decline to write compressed output to a terminal, as | 72 | will decline to write compressed output to a terminal, as |
65 | this would be entirely incomprehensible and therefore pointless. | 73 | this would be entirely incomprehensible and therefore pointless. |
66 | 74 | ||
67 | .I Bunzip2 | 75 | .I bunzip2 |
68 | (or | 76 | (or |
69 | .I bzip2 \-d | 77 | .I bzip2 \-d |
70 | ) decompresses and restores all specified files whose names | 78 | ) decompresses and restores all specified files whose names |
@@ -73,12 +81,28 @@ Files without this suffix are ignored. | |||
73 | Again, supplying no filenames | 81 | Again, supplying no filenames |
74 | causes decompression from standard input to standard output. | 82 | causes decompression from standard input to standard output. |
75 | 83 | ||
84 | .I bunzip2 | ||
85 | will correctly decompress a file which is the concatenation | ||
86 | of two or more compressed files. The result is the concatenation | ||
87 | of the corresponding uncompressed files. Integrity testing | ||
88 | (\-t) of concatenated compressed files is also supported. | ||
89 | |||
76 | You can also compress or decompress files to | 90 | You can also compress or decompress files to |
77 | the standard output by giving the \-c flag. | 91 | the standard output by giving the \-c flag. |
78 | You can decompress multiple files like this, but you may | 92 | Multiple files may be compressed and decompressed like this. |
79 | only compress a single file this way, since it would otherwise | 93 | The resulting outputs are fed sequentially to stdout. |
80 | be difficult to separate out the compressed representations of | 94 | Compression of multiple files in this manner generates |
81 | the original files. | 95 | a stream containing multiple compressed file representations. |
96 | Such a stream can be decompressed correctly only by | ||
97 | .I bzip2 | ||
98 | version 0.9.0 or later. Earlier versions of | ||
99 | .I bzip2 | ||
100 | will stop after decompressing the first file in the stream. | ||
101 | |||
102 | .I bzcat | ||
103 | (or | ||
104 | .I bzip2 \-dc | ||
105 | ) decompresses all specified files to the standard output. | ||
82 | 106 | ||
83 | Compression is always performed, even if the compressed file is | 107 | Compression is always performed, even if the compressed file is |
84 | slightly larger than the original. Files of less than about | 108 | slightly larger than the original. Files of less than about |
@@ -132,7 +156,7 @@ Compression and decompression requirements, in bytes, can be estimated as: | |||
132 | 156 | ||
133 | Compression: 400k + ( 7 x block size ) | 157 | Compression: 400k + ( 7 x block size ) |
134 | 158 | ||
135 | Decompression: 100k + ( 5 x block size ), or | 159 | Decompression: 100k + ( 4 x block size ), or |
136 | .br | 160 | .br |
137 | 100k + ( 2.5 x block size ) | 161 | 100k + ( 2.5 x block size ) |
138 | 162 | ||
@@ -147,7 +171,7 @@ choice of block size. | |||
147 | 171 | ||
148 | For files compressed with the default 900k block size, | 172 | For files compressed with the default 900k block size, |
149 | .I bunzip2 | 173 | .I bunzip2 |
150 | will require about 4600 kbytes to decompress. | 174 | will require about 3700 kbytes to decompress. |
151 | To support decompression of any file on a 4 megabyte machine, | 175 | To support decompression of any file on a 4 megabyte machine, |
152 | .I bunzip2 | 176 | .I bunzip2 |
153 | has an option to decompress using approximately half this | 177 | has an option to decompress using approximately half this |
@@ -168,8 +192,8 @@ For example, compressing a file 20,000 bytes long with the flag | |||
168 | \-9 | 192 | \-9 |
169 | will cause the compressor to allocate around | 193 | will cause the compressor to allocate around |
170 | 6700k of memory, but only touch 400k + 20000 * 7 = 540 | 194 | 6700k of memory, but only touch 400k + 20000 * 7 = 540 |
171 | kbytes of it. Similarly, the decompressor will allocate 4600k but | 195 | kbytes of it. Similarly, the decompressor will allocate 3700k but |
172 | only touch 100k + 20000 * 5 = 200 kbytes. | 196 | only touch 100k + 20000 * 4 = 180 kbytes. |
173 | 197 | ||
174 | Here is a table which summarises the maximum memory usage for | 198 | Here is a table which summarises the maximum memory usage for |
175 | different block sizes. Also recorded is the total compressed | 199 | different block sizes. Also recorded is the total compressed |
@@ -182,71 +206,73 @@ Corpus is dominated by smaller files. | |||
182 | Compress Decompress Decompress Corpus | 206 | Compress Decompress Decompress Corpus |
183 | Flag usage usage -s usage Size | 207 | Flag usage usage -s usage Size |
184 | 208 | ||
185 | -1 1100k 600k 350k 914704 | 209 | -1 1100k 500k 350k 914704 |
186 | -2 1800k 1100k 600k 877703 | 210 | -2 1800k 900k 600k 877703 |
187 | -3 2500k 1600k 850k 860338 | 211 | -3 2500k 1300k 850k 860338 |
188 | -4 3200k 2100k 1100k 846899 | 212 | -4 3200k 1700k 1100k 846899 |
189 | -5 3900k 2600k 1350k 845160 | 213 | -5 3900k 2100k 1350k 845160 |
190 | -6 4600k 3100k 1600k 838626 | 214 | -6 4600k 2500k 1600k 838626 |
191 | -7 5400k 3600k 1850k 834096 | 215 | -7 5400k 2900k 1850k 834096 |
192 | -8 6000k 4100k 2100k 828642 | 216 | -8 6000k 3300k 2100k 828642 |
193 | -9 6700k 4600k 2350k 828642 | 217 | -9 6700k 3700k 2350k 828642 |
194 | 218 | ||
195 | .SH OPTIONS | 219 | .SH OPTIONS |
196 | .TP | 220 | .TP |
197 | .B \-c --stdout | 221 | .B \-c --stdout |
198 | Compress or decompress to standard output. \-c will decompress | 222 | Compress or decompress to standard output. \-c will decompress |
199 | multiple files to stdout, but will only compress a single file to | 223 | multiple files to stdout, but will only compress a single file to |
200 | stdout. | 224 | stdout. |
201 | .TP | 225 | .TP |
202 | .B \-d --decompress | 226 | .B \-d --decompress |
203 | Force decompression. | 227 | Force decompression. |
204 | .I Bzip2 | 228 | .I bzip2, |
205 | and | ||
206 | .I bunzip2 | 229 | .I bunzip2 |
207 | are really the same program, and the decision about whether to | 230 | and |
208 | compress or decompress is done on the basis of which name is | 231 | .I bzcat |
232 | are really the same program, and the decision about what actions | ||
233 | to take is done on the basis of which name is | ||
209 | used. This flag overrides that mechanism, and forces | 234 | used. This flag overrides that mechanism, and forces |
210 | .I bzip2 | 235 | .I bzip2 |
211 | to decompress. | 236 | to decompress. |
212 | .TP | 237 | .TP |
213 | .B \-f --compress | 238 | .B \-z --compress |
214 | The complement to \-d: forces compression, regardless of the invokation | 239 | The complement to \-d: forces compression, regardless of the invokation |
215 | name. | 240 | name. |
216 | .TP | 241 | .TP |
217 | .B \-t --test | 242 | .B \-t --test |
218 | Check integrity of the specified file(s), but don't decompress them. | 243 | Check integrity of the specified file(s), but don't decompress them. |
219 | This really performs a trial decompression and throws away the result, | 244 | This really performs a trial decompression and throws away the result. |
220 | using the low-memory decompression algorithm (see \-s). | 245 | .TP |
246 | .B \-f --force | ||
247 | Force overwrite of output files. Normally, | ||
248 | .I bzip2 | ||
249 | will not overwrite existing output files. | ||
221 | .TP | 250 | .TP |
222 | .B \-k --keep | 251 | .B \-k --keep |
223 | Keep (don't delete) input files during compression or decompression. | 252 | Keep (don't delete) input files during compression or decompression. |
224 | .TP | 253 | .TP |
225 | .B \-s --small | 254 | .B \-s --small |
226 | Reduce memory usage, both for compression and decompression. | 255 | Reduce memory usage, for compression, decompression and |
227 | Files are decompressed using a modified algorithm which only | 256 | testing. |
257 | Files are decompressed and tested using a modified algorithm which only | ||
228 | requires 2.5 bytes per block byte. This means any file can be | 258 | requires 2.5 bytes per block byte. This means any file can be |
229 | decompressed in 2300k of memory, albeit somewhat more slowly than | 259 | decompressed in 2300k of memory, albeit at about half the normal |
230 | usual. | 260 | speed. |
231 | 261 | ||
232 | During compression, -s selects a block size of 200k, which limits | 262 | During compression, -s selects a block size of 200k, which limits |
233 | memory use to around the same figure, at the expense of your | 263 | memory use to around the same figure, at the expense of your |
234 | compression ratio. In short, if your machine is low on memory | 264 | compression ratio. In short, if your machine is low on memory |
235 | (8 megabytes or less), use -s for everything. See | 265 | (8 megabytes or less), use -s for everything. See |
236 | MEMORY MANAGEMENT above. | 266 | MEMORY MANAGEMENT above. |
237 | |||
238 | .TP | 267 | .TP |
239 | .B \-v --verbose | 268 | .B \-v --verbose |
240 | Verbose mode -- show the compression ratio for each file processed. | 269 | Verbose mode -- show the compression ratio for each file processed. |
241 | Further \-v's increase the verbosity level, spewing out lots of | 270 | Further \-v's increase the verbosity level, spewing out lots of |
242 | information which is primarily of interest for diagnostic purposes. | 271 | information which is primarily of interest for diagnostic purposes. |
243 | .TP | 272 | .TP |
244 | .B \-L --license | 273 | .B \-L --license -V --version |
245 | Display the software version, license terms and conditions. | 274 | Display the software version, license terms and conditions. |
246 | .TP | 275 | .TP |
247 | .B \-V --version | ||
248 | Same as \-L. | ||
249 | .TP | ||
250 | .B \-1 to \-9 | 276 | .B \-1 to \-9 |
251 | Set the block size to 100 k, 200 k .. 900 k when | 277 | Set the block size to 100 k, 200 k .. 900 k when |
252 | compressing. Has no effect when decompressing. | 278 | compressing. Has no effect when decompressing. |
@@ -329,10 +355,6 @@ to compress the latter. | |||
329 | If you do get a file which causes severe slowness in compression, | 355 | If you do get a file which causes severe slowness in compression, |
330 | try making the block size as small as possible, with flag \-1. | 356 | try making the block size as small as possible, with flag \-1. |
331 | 357 | ||
332 | Incompressible or virtually-incompressible data may decompress | ||
333 | rather more slowly than one would hope. This is due to | ||
334 | a naive implementation of the move-to-front coder. | ||
335 | |||
336 | .I bzip2 | 358 | .I bzip2 |
337 | usually allocates several megabytes of memory to operate in, | 359 | usually allocates several megabytes of memory to operate in, |
338 | and then charges all over it in a fairly random fashion. This | 360 | and then charges all over it in a fairly random fashion. This |
@@ -346,28 +368,19 @@ I imagine | |||
346 | .I bzip2 | 368 | .I bzip2 |
347 | will perform best on machines with very large caches. | 369 | will perform best on machines with very large caches. |
348 | 370 | ||
349 | Test mode (\-t) uses the low-memory decompression algorithm | ||
350 | (\-s). This means test mode does not run as fast as it could; | ||
351 | it could run as fast as the normal decompression machinery. | ||
352 | This could easily be fixed at the cost of some code bloat. | ||
353 | |||
354 | .SH CAVEATS | 371 | .SH CAVEATS |
355 | I/O error messages are not as helpful as they could be. | 372 | I/O error messages are not as helpful as they could be. |
356 | .I Bzip2 | 373 | .I Bzip2 |
357 | tries hard to detect I/O errors and exit cleanly, but the | 374 | tries hard to detect I/O errors and exit cleanly, but the |
358 | details of what the problem is sometimes seem rather misleading. | 375 | details of what the problem is sometimes seem rather misleading. |
359 | 376 | ||
360 | This manual page pertains to version 0.1 of | 377 | This manual page pertains to version 0.9.0 of |
361 | .I bzip2. | 378 | .I bzip2. |
362 | It may well happen that some future version will | 379 | Compressed data created by this version is entirely forwards and |
363 | use a different compressed file format. If you try to | 380 | backwards compatible with the previous public release, version 0.1pl2, |
364 | decompress, using 0.1, a .bz2 file created with some | 381 | but with the following exception: 0.9.0 can correctly decompress |
365 | future version which uses a different compressed file format, | 382 | multiple concatenated compressed files. 0.1pl2 cannot do this; it |
366 | 0.1 will complain that your file "is not a bzip2 file". | 383 | will stop after decompressing just the first file in the stream. |
367 | If that happens, you should obtain a more recent version | ||
368 | of | ||
369 | .I bzip2 | ||
370 | and use that to decompress the file. | ||
371 | 384 | ||
372 | Wildcard expansion for Windows 95 and NT | 385 | Wildcard expansion for Windows 95 and NT |
373 | is flaky. | 386 | is flaky. |
@@ -377,63 +390,25 @@ uses 32-bit integers to represent bit positions in | |||
377 | compressed files, so it cannot handle compressed files | 390 | compressed files, so it cannot handle compressed files |
378 | more than 512 megabytes long. This could easily be fixed. | 391 | more than 512 megabytes long. This could easily be fixed. |
379 | 392 | ||
380 | .I bzip2recover | ||
381 | sometimes reports a very small, incomplete final block. | ||
382 | This is spurious and can be safely ignored. | ||
383 | |||
384 | .SH RELATIONSHIP TO bzip-0.21 | ||
385 | This program is a descendant of the | ||
386 | .I bzip | ||
387 | program, version 0.21, which I released in August 1996. | ||
388 | The primary difference of | ||
389 | .I bzip2 | ||
390 | is its avoidance of the possibly patented algorithms | ||
391 | which were used in 0.21. | ||
392 | .I bzip2 | ||
393 | also brings various useful refinements (\-s, \-t), | ||
394 | uses less memory, decompresses significantly faster, and | ||
395 | has support for recovering data from damaged files. | ||
396 | |||
397 | Because | ||
398 | .I bzip2 | ||
399 | uses Huffman coding to construct the compressed bitstream, | ||
400 | rather than the arithmetic coding used in 0.21, | ||
401 | the compressed representations generated by the two programs | ||
402 | are incompatible, and they will not interoperate. The change | ||
403 | in suffix from .bz to .bz2 reflects this. It would have been | ||
404 | helpful to at least allow | ||
405 | .I bzip2 | ||
406 | to decompress files created by 0.21, but this would | ||
407 | defeat the primary aim of having a patent-free compressor. | ||
408 | |||
409 | For a more precise statement about patent issues in | ||
410 | bzip2, please see the README file in the distribution. | ||
411 | |||
412 | Huffman coding necessarily involves some coding inefficiency | ||
413 | compared to arithmetic coding. This means that | ||
414 | .I bzip2 | ||
415 | compresses about 1% worse than 0.21, an unfortunate but | ||
416 | unavoidable fact-of-life. On the other hand, decompression | ||
417 | is approximately 50% faster for the same reason, and the | ||
418 | change in file format gave an opportunity to add data-recovery | ||
419 | features. So it is not all bad. | ||
420 | |||
421 | .SH AUTHOR | 393 | .SH AUTHOR |
422 | Julian Seward, jseward@acm.org. | 394 | Julian Seward, jseward@acm.org. |
423 | 395 | ||
396 | http://www.muraroa.demon.co.uk | ||
397 | |||
424 | The ideas embodied in | 398 | The ideas embodied in |
425 | .I bzip | ||
426 | and | ||
427 | .I bzip2 | 399 | .I bzip2 |
428 | are due to (at least) the following people: | 400 | are due to (at least) the following people: |
429 | Michael Burrows and David Wheeler (for the block sorting | 401 | Michael Burrows and David Wheeler (for the block sorting |
430 | transformation), David Wheeler (again, for the Huffman coder), | 402 | transformation), David Wheeler (again, for the Huffman coder), |
431 | Peter Fenwick (for the structured coding model in 0.21, | 403 | Peter Fenwick (for the structured coding model in the original |
404 | .I bzip, | ||
432 | and many refinements), | 405 | and many refinements), |
433 | and | 406 | and |
434 | Alistair Moffat, Radford Neal and Ian Witten (for the arithmetic | 407 | Alistair Moffat, Radford Neal and Ian Witten (for the arithmetic |
435 | coder in 0.21). I am much indebted for their help, support and advice. | 408 | coder in the original |
436 | See the file ALGORITHMS in the source distribution for pointers to | 409 | .I bzip). |
410 | I am much indebted for their help, support and advice. | ||
411 | See the manual in the source distribution for pointers to | ||
437 | sources of documentation. | 412 | sources of documentation. |
438 | Christian von Roques encouraged me to look for faster | 413 | Christian von Roques encouraged me to look for faster |
439 | sorting algorithms, so as to speed up compression. | 414 | sorting algorithms, so as to speed up compression. |