diff options
Diffstat (limited to 'bzip2.txt')
-rw-r--r-- | bzip2.txt | 119 |
1 files changed, 60 insertions, 59 deletions
@@ -1,6 +1,6 @@ | |||
1 | 1 | ||
2 | NAME | 2 | NAME |
3 | bzip2, bunzip2 - a block-sorting file compressor, v1.0.2 | 3 | bzip2, bunzip2 - a block-sorting file compressor, v1.0.3 |
4 | bzcat - decompresses files to stdout | 4 | bzcat - decompresses files to stdout |
5 | bzip2recover - recovers data from damaged bzip2 files | 5 | bzip2recover - recovers data from damaged bzip2 files |
6 | 6 | ||
@@ -17,20 +17,20 @@ DESCRIPTION | |||
17 | sorting text compression algorithm, and Huffman coding. | 17 | sorting text compression algorithm, and Huffman coding. |
18 | Compression is generally considerably better than that | 18 | Compression is generally considerably better than that |
19 | achieved by more conventional LZ77/LZ78-based compressors, | 19 | achieved by more conventional LZ77/LZ78-based compressors, |
20 | and approaches the performance of the PPM family of sta | 20 | and approaches the performance of the PPM family of sta- |
21 | tistical compressors. | 21 | tistical compressors. |
22 | 22 | ||
23 | The command-line options are deliberately very similar to | 23 | The command-line options are deliberately very similar to |
24 | those of GNU gzip, but they are not identical. | 24 | those of GNU gzip, but they are not identical. |
25 | 25 | ||
26 | bzip2 expects a list of file names to accompany the com | 26 | bzip2 expects a list of file names to accompany the com- |
27 | mand-line flags. Each file is replaced by a compressed | 27 | mand-line flags. Each file is replaced by a compressed |
28 | version of itself, with the name "original_name.bz2". | 28 | version of itself, with the name "original_name.bz2". |
29 | Each compressed file has the same modification date, per | 29 | Each compressed file has the same modification date, per- |
30 | missions, and, when possible, ownership as the correspond | 30 | missions, and, when possible, ownership as the correspond- |
31 | ing original, so that these properties can be correctly | 31 | ing original, so that these properties can be correctly |
32 | restored at decompression time. File name handling is | 32 | restored at decompression time. File name handling is |
33 | naive in the sense that there is no mechanism for preserv | 33 | naive in the sense that there is no mechanism for preserv- |
34 | ing original file names, permissions, ownerships or dates | 34 | ing original file names, permissions, ownerships or dates |
35 | in filesystems which lack these concepts, or have serious | 35 | in filesystems which lack these concepts, or have serious |
36 | file name length restrictions, such as MS-DOS. | 36 | file name length restrictions, such as MS-DOS. |
@@ -61,23 +61,23 @@ DESCRIPTION | |||
61 | guess the name of the original file, and uses the original | 61 | guess the name of the original file, and uses the original |
62 | name with .out appended. | 62 | name with .out appended. |
63 | 63 | ||
64 | As with compression, supplying no filenames causes decom | 64 | As with compression, supplying no filenames causes decom- |
65 | pression from standard input to standard output. | 65 | pression from standard input to standard output. |
66 | 66 | ||
67 | bunzip2 will correctly decompress a file which is the con | 67 | bunzip2 will correctly decompress a file which is the con- |
68 | catenation of two or more compressed files. The result is | 68 | catenation of two or more compressed files. The result is |
69 | the concatenation of the corresponding uncompressed files. | 69 | the concatenation of the corresponding uncompressed files. |
70 | Integrity testing (-t) of concatenated compressed files is | 70 | Integrity testing (-t) of concatenated compressed files is |
71 | also supported. | 71 | also supported. |
72 | 72 | ||
73 | You can also compress or decompress files to the standard | 73 | You can also compress or decompress files to the standard |
74 | output by giving the -c flag. Multiple files may be com | 74 | output by giving the -c flag. Multiple files may be com- |
75 | pressed and decompressed like this. The resulting outputs | 75 | pressed and decompressed like this. The resulting outputs |
76 | are fed sequentially to stdout. Compression of multiple | 76 | are fed sequentially to stdout. Compression of multiple |
77 | files in this manner generates a stream containing multi | 77 | files in this manner generates a stream containing multi- |
78 | ple compressed file representations. Such a stream can be | 78 | ple compressed file representations. Such a stream can be |
79 | decompressed correctly only by bzip2 version 0.9.0 or | 79 | decompressed correctly only by bzip2 version 0.9.0 or |
80 | later. Earlier versions of bzip2 will stop after decom | 80 | later. Earlier versions of bzip2 will stop after decom- |
81 | pressing the first file in the stream. | 81 | pressing the first file in the stream. |
82 | 82 | ||
83 | bzcat (or bzip2 -dc) decompresses all specified files to | 83 | bzcat (or bzip2 -dc) decompresses all specified files to |
@@ -98,7 +98,7 @@ DESCRIPTION | |||
98 | 98 | ||
99 | As a self-check for your protection, bzip2 uses 32-bit | 99 | As a self-check for your protection, bzip2 uses 32-bit |
100 | CRCs to make sure that the decompressed version of a file | 100 | CRCs to make sure that the decompressed version of a file |
101 | is identical to the original. This guards against corrup | 101 | is identical to the original. This guards against corrup- |
102 | tion of the compressed data, and against undetected bugs | 102 | tion of the compressed data, and against undetected bugs |
103 | in bzip2 (hopefully very unlikely). The chances of data | 103 | in bzip2 (hopefully very unlikely). The chances of data |
104 | corruption going undetected is microscopic, about one | 104 | corruption going undetected is microscopic, about one |
@@ -171,7 +171,7 @@ OPTIONS | |||
171 | 171 | ||
172 | -v --verbose | 172 | -v --verbose |
173 | Verbose mode -- show the compression ratio for each | 173 | Verbose mode -- show the compression ratio for each |
174 | file processed. Further -v's increase the ver | 174 | file processed. Further -v's increase the ver- |
175 | bosity level, spewing out lots of information which | 175 | bosity level, spewing out lots of information which |
176 | is primarily of interest for diagnostic purposes. | 176 | is primarily of interest for diagnostic purposes. |
177 | 177 | ||
@@ -184,19 +184,19 @@ OPTIONS | |||
184 | compressing. Has no effect when decompressing. | 184 | compressing. Has no effect when decompressing. |
185 | See MEMORY MANAGEMENT below. The --fast and --best | 185 | See MEMORY MANAGEMENT below. The --fast and --best |
186 | aliases are primarily for GNU gzip compatibility. | 186 | aliases are primarily for GNU gzip compatibility. |
187 | In particular, --fast doesn't make things signifi | 187 | In particular, --fast doesn't make things signifi- |
188 | cantly faster. And --best merely selects the | 188 | cantly faster. And --best merely selects the |
189 | default behaviour. | 189 | default behaviour. |
190 | 190 | ||
191 | -- Treats all subsequent arguments as file names, even | 191 | -- Treats all subsequent arguments as file names, even |
192 | if they start with a dash. This is so you can han | 192 | if they start with a dash. This is so you can han- |
193 | dle files with names beginning with a dash, for | 193 | dle files with names beginning with a dash, for |
194 | example: bzip2 -- -myfilename. | 194 | example: bzip2 -- -myfilename. |
195 | 195 | ||
196 | --repetitive-fast --repetitive-best | 196 | --repetitive-fast --repetitive-best |
197 | These flags are redundant in versions 0.9.5 and | 197 | These flags are redundant in versions 0.9.5 and |
198 | above. They provided some coarse control over the | 198 | above. They provided some coarse control over the |
199 | behaviour of the sorting algorithm in earlier ver | 199 | behaviour of the sorting algorithm in earlier ver- |
200 | sions, which was sometimes useful. 0.9.5 and above | 200 | sions, which was sometimes useful. 0.9.5 and above |
201 | have an improved algorithm which renders these | 201 | have an improved algorithm which renders these |
202 | flags irrelevant. | 202 | flags irrelevant. |
@@ -207,7 +207,7 @@ MEMORY MANAGEMENT | |||
207 | affects both the compression ratio achieved, and the | 207 | affects both the compression ratio achieved, and the |
208 | amount of memory needed for compression and decompression. | 208 | amount of memory needed for compression and decompression. |
209 | The flags -1 through -9 specify the block size to be | 209 | The flags -1 through -9 specify the block size to be |
210 | 100,000 bytes through 900,000 bytes (the default) respec | 210 | 100,000 bytes through 900,000 bytes (the default) respec- |
211 | tively. At decompression time, the block size used for | 211 | tively. At decompression time, the block size used for |
212 | compression is read from the header of the compressed | 212 | compression is read from the header of the compressed |
213 | file, and bunzip2 then allocates itself just enough memory | 213 | file, and bunzip2 then allocates itself just enough memory |
@@ -235,13 +235,13 @@ MEMORY MANAGEMENT | |||
235 | bunzip2 will require about 3700 kbytes to decompress. To | 235 | bunzip2 will require about 3700 kbytes to decompress. To |
236 | support decompression of any file on a 4 megabyte machine, | 236 | support decompression of any file on a 4 megabyte machine, |
237 | bunzip2 has an option to decompress using approximately | 237 | bunzip2 has an option to decompress using approximately |
238 | half this amount of memory, about 2300 kbytes. Decompres | 238 | half this amount of memory, about 2300 kbytes. Decompres- |
239 | sion speed is also halved, so you should use this option | 239 | sion speed is also halved, so you should use this option |
240 | only where necessary. The relevant flag is -s. | 240 | only where necessary. The relevant flag is -s. |
241 | 241 | ||
242 | In general, try and use the largest block size memory con | 242 | In general, try and use the largest block size memory con- |
243 | straints allow, since that maximises the compression | 243 | straints allow, since that maximises the compression |
244 | achieved. Compression and decompression speed are virtu | 244 | achieved. Compression and decompression speed are virtu- |
245 | ally unaffected by block size. | 245 | ally unaffected by block size. |
246 | 246 | ||
247 | Another significant point applies to files which fit in a | 247 | Another significant point applies to files which fit in a |
@@ -257,11 +257,11 @@ MEMORY MANAGEMENT | |||
257 | 257 | ||
258 | Here is a table which summarises the maximum memory usage | 258 | Here is a table which summarises the maximum memory usage |
259 | for different block sizes. Also recorded is the total | 259 | for different block sizes. Also recorded is the total |
260 | compressed size for 14 files of the Calgary Text Compres | 260 | compressed size for 14 files of the Calgary Text Compres- |
261 | sion Corpus totalling 3,141,622 bytes. This column gives | 261 | sion Corpus totalling 3,141,622 bytes. This column gives |
262 | some feel for how compression varies with block size. | 262 | some feel for how compression varies with block size. |
263 | These figures tend to understate the advantage of larger | 263 | These figures tend to understate the advantage of larger |
264 | block sizes for larger files, since the Corpus is domi | 264 | block sizes for larger files, since the Corpus is domi- |
265 | nated by smaller files. | 265 | nated by smaller files. |
266 | 266 | ||
267 | Compress Decompress Decompress Corpus | 267 | Compress Decompress Decompress Corpus |
@@ -280,7 +280,7 @@ MEMORY MANAGEMENT | |||
280 | 280 | ||
281 | RECOVERING DATA FROM DAMAGED FILES | 281 | RECOVERING DATA FROM DAMAGED FILES |
282 | bzip2 compresses files in blocks, usually 900kbytes long. | 282 | bzip2 compresses files in blocks, usually 900kbytes long. |
283 | Each block is handled independently. If a media or trans | 283 | Each block is handled independently. If a media or trans- |
284 | mission error causes a multi-block .bz2 file to become | 284 | mission error causes a multi-block .bz2 file to become |
285 | damaged, it may be possible to recover data from the | 285 | damaged, it may be possible to recover data from the |
286 | undamaged blocks in the file. | 286 | undamaged blocks in the file. |
@@ -297,19 +297,19 @@ RECOVERING DATA FROM DAMAGED FILES | |||
297 | the integrity of the resulting files, and decompress those | 297 | the integrity of the resulting files, and decompress those |
298 | which are undamaged. | 298 | which are undamaged. |
299 | 299 | ||
300 | bzip2recover takes a single argument, the name of the dam | 300 | bzip2recover takes a single argument, the name of the dam- |
301 | aged file, and writes a number of files | 301 | aged file, and writes a number of files |
302 | "rec00001file.bz2", "rec00002file.bz2", etc, containing | 302 | "rec00001file.bz2", "rec00002file.bz2", etc, containing |
303 | the extracted blocks. The output filenames are | 303 | the extracted blocks. The output filenames are |
304 | designed so that the use of wildcards in subsequent pro | 304 | designed so that the use of wildcards in subsequent pro- |
305 | cessing -- for example, "bzip2 -dc rec*file.bz2 > recov | 305 | cessing -- for example, "bzip2 -dc rec*file.bz2 > recov- |
306 | ered_data" -- processes the files in the correct order. | 306 | ered_data" -- processes the files in the correct order. |
307 | 307 | ||
308 | bzip2recover should be of most use dealing with large .bz2 | 308 | bzip2recover should be of most use dealing with large .bz2 |
309 | files, as these will contain many blocks. It is clearly | 309 | files, as these will contain many blocks. It is clearly |
310 | futile to use it on damaged single-block files, since a | 310 | futile to use it on damaged single-block files, since a |
311 | damaged block cannot be recovered. If you wish to min | 311 | damaged block cannot be recovered. If you wish to min- |
312 | imise any potential data loss through media or transmis | 312 | imise any potential data loss through media or transmis- |
313 | sion errors, you might consider compressing with a smaller | 313 | sion errors, you might consider compressing with a smaller |
314 | block size. | 314 | block size. |
315 | 315 | ||
@@ -323,19 +323,19 @@ PERFORMANCE NOTES | |||
323 | better than previous versions in this respect. The ratio | 323 | better than previous versions in this respect. The ratio |
324 | between worst-case and average-case compression time is in | 324 | between worst-case and average-case compression time is in |
325 | the region of 10:1. For previous versions, this figure | 325 | the region of 10:1. For previous versions, this figure |
326 | was more like 100:1. You can use the -vvvv option to mon | 326 | was more like 100:1. You can use the -vvvv option to mon- |
327 | itor progress in great detail, if you want. | 327 | itor progress in great detail, if you want. |
328 | 328 | ||
329 | Decompression speed is unaffected by these phenomena. | 329 | Decompression speed is unaffected by these phenomena. |
330 | 330 | ||
331 | bzip2 usually allocates several megabytes of memory to | 331 | bzip2 usually allocates several megabytes of memory to |
332 | operate in, and then charges all over it in a fairly ran | 332 | operate in, and then charges all over it in a fairly ran- |
333 | dom fashion. This means that performance, both for com | 333 | dom fashion. This means that performance, both for com- |
334 | pressing and decompressing, is largely determined by the | 334 | pressing and decompressing, is largely determined by the |
335 | speed at which your machine can service cache misses. | 335 | speed at which your machine can service cache misses. |
336 | Because of this, small changes to the code to reduce the | 336 | Because of this, small changes to the code to reduce the |
337 | miss rate have been observed to give disproportionately | 337 | miss rate have been observed to give disproportionately |
338 | large performance improvements. I imagine bzip2 will per | 338 | large performance improvements. I imagine bzip2 will per- |
339 | form best on machines with very large caches. | 339 | form best on machines with very large caches. |
340 | 340 | ||
341 | 341 | ||
@@ -345,46 +345,47 @@ CAVEATS | |||
345 | but the details of what the problem is sometimes seem | 345 | but the details of what the problem is sometimes seem |
346 | rather misleading. | 346 | rather misleading. |
347 | 347 | ||
348 | This manual page pertains to version 1.0.2 of bzip2. Com | 348 | This manual page pertains to version 1.0.3 of bzip2. Com- |
349 | pressed data created by this version is entirely forwards | 349 | pressed data created by this version is entirely forwards |
350 | and backwards compatible with the previous public | 350 | and backwards compatible with the previous public |
351 | releases, versions 0.1pl2, 0.9.0, 0.9.5, 1.0.0 and 1.0.1, | 351 | releases, versions 0.1pl2, 0.9.0, 0.9.5, 1.0.0, 1.0.1 and |
352 | but with the following exception: 0.9.0 and above can cor | 352 | 1.0.2, but with the following exception: 0.9.0 and above |
353 | rectly decompress multiple concatenated compressed files. | 353 | can correctly decompress multiple concatenated compressed |
354 | 0.1pl2 cannot do this; it will stop after decompressing | 354 | files. 0.1pl2 cannot do this; it will stop after decom- |
355 | just the first file in the stream. | 355 | pressing just the first file in the stream. |
356 | 356 | ||
357 | bzip2recover versions prior to this one, 1.0.2, used | 357 | bzip2recover versions prior to 1.0.2 used 32-bit integers |
358 | 32-bit integers to represent bit positions in compressed | 358 | to represent bit positions in compressed files, so they |
359 | files, so it could not handle compressed files more than | 359 | could not handle compressed files more than 512 megabytes |
360 | 512 megabytes long. Version 1.0.2 and above uses 64-bit | 360 | long. Versions 1.0.2 and above use 64-bit ints on some |
361 | ints on some platforms which support them (GNU supported | 361 | platforms which support them (GNU supported targets, and |
362 | targets, and Windows). To establish whether or not | 362 | Windows). To establish whether or not bzip2recover was |
363 | bzip2recover was built with such a limitation, run it | 363 | built with such a limitation, run it without arguments. |
364 | without arguments. In any event you can build yourself an | 364 | In any event you can build yourself an unlimited version |
365 | unlimited version if you can recompile it with MaybeUInt64 | 365 | if you can recompile it with MaybeUInt64 set to be an |
366 | set to be an unsigned 64-bit integer. | 366 | unsigned 64-bit integer. |
367 | 367 | ||
368 | 368 | ||
369 | AUTHOR | 369 | AUTHOR |
370 | Julian Seward, jseward@acm.org. | 370 | Julian Seward, jsewardbzip.org. |
371 | 371 | ||
372 | http://sources.redhat.com/bzip2 | 372 | http://www.bzip.org |
373 | 373 | ||
374 | The ideas embodied in bzip2 are due to (at least) the fol | 374 | The ideas embodied in bzip2 are due to (at least) the fol- |
375 | lowing people: Michael Burrows and David Wheeler (for the | 375 | lowing people: Michael Burrows and David Wheeler (for the |
376 | block sorting transformation), David Wheeler (again, for | 376 | block sorting transformation), David Wheeler (again, for |
377 | the Huffman coder), Peter Fenwick (for the structured cod | 377 | the Huffman coder), Peter Fenwick (for the structured cod- |
378 | ing model in the original bzip, and many refinements), and | 378 | ing model in the original bzip, and many refinements), and |
379 | Alistair Moffat, Radford Neal and Ian Witten (for the | 379 | Alistair Moffat, Radford Neal and Ian Witten (for the |
380 | arithmetic coder in the original bzip). I am much | 380 | arithmetic coder in the original bzip). I am much |
381 | indebted for their help, support and advice. See the man | 381 | indebted for their help, support and advice. See the man- |
382 | ual in the source distribution for pointers to sources of | 382 | ual in the source distribution for pointers to sources of |
383 | documentation. Christian von Roques encouraged me to look | 383 | documentation. Christian von Roques encouraged me to look |
384 | for faster sorting algorithms, so as to speed up compres | 384 | for faster sorting algorithms, so as to speed up compres- |
385 | sion. Bela Lubkin encouraged me to improve the worst-case | 385 | sion. Bela Lubkin encouraged me to improve the worst-case |
386 | compression performance. The bz* scripts are derived from | 386 | compression performance. Donna Robinson XMLised the docu- |
387 | those of GNU gzip. Many people sent patches, helped with | 387 | mentation. The bz* scripts are derived from those of GNU |
388 | portability problems, lent machines, gave advice and were | 388 | gzip. Many people sent patches, helped with portability |
389 | generally helpful. | 389 | problems, lent machines, gave advice and were generally |
390 | helpful. | ||
390 | 391 | ||