aboutsummaryrefslogtreecommitdiff
path: root/bzip2.txt
diff options
context:
space:
mode:
Diffstat (limited to 'bzip2.txt')
-rw-r--r--bzip2.txt119
1 files changed, 60 insertions, 59 deletions
diff --git a/bzip2.txt b/bzip2.txt
index 6afe358..bf895b6 100644
--- a/bzip2.txt
+++ b/bzip2.txt
@@ -1,6 +1,6 @@
1 1
2NAME 2NAME
3 bzip2, bunzip2 - a block-sorting file compressor, v1.0.2 3 bzip2, bunzip2 - a block-sorting file compressor, v1.0.3
4 bzcat - decompresses files to stdout 4 bzcat - decompresses files to stdout
5 bzip2recover - recovers data from damaged bzip2 files 5 bzip2recover - recovers data from damaged bzip2 files
6 6
@@ -17,20 +17,20 @@ DESCRIPTION
17 sorting text compression algorithm, and Huffman coding. 17 sorting text compression algorithm, and Huffman coding.
18 Compression is generally considerably better than that 18 Compression is generally considerably better than that
19 achieved by more conventional LZ77/LZ78-based compressors, 19 achieved by more conventional LZ77/LZ78-based compressors,
20 and approaches the performance of the PPM family of sta­ 20 and approaches the performance of the PPM family of sta-
21 tistical compressors. 21 tistical compressors.
22 22
23 The command-line options are deliberately very similar to 23 The command-line options are deliberately very similar to
24 those of GNU gzip, but they are not identical. 24 those of GNU gzip, but they are not identical.
25 25
26 bzip2 expects a list of file names to accompany the com­ 26 bzip2 expects a list of file names to accompany the com-
27 mand-line flags. Each file is replaced by a compressed 27 mand-line flags. Each file is replaced by a compressed
28 version of itself, with the name "original_name.bz2". 28 version of itself, with the name "original_name.bz2".
29 Each compressed file has the same modification date, per­ 29 Each compressed file has the same modification date, per-
30 missions, and, when possible, ownership as the correspond­ 30 missions, and, when possible, ownership as the correspond-
31 ing original, so that these properties can be correctly 31 ing original, so that these properties can be correctly
32 restored at decompression time. File name handling is 32 restored at decompression time. File name handling is
33 naive in the sense that there is no mechanism for preserv­ 33 naive in the sense that there is no mechanism for preserv-
34 ing original file names, permissions, ownerships or dates 34 ing original file names, permissions, ownerships or dates
35 in filesystems which lack these concepts, or have serious 35 in filesystems which lack these concepts, or have serious
36 file name length restrictions, such as MS-DOS. 36 file name length restrictions, such as MS-DOS.
@@ -61,23 +61,23 @@ DESCRIPTION
61 guess the name of the original file, and uses the original 61 guess the name of the original file, and uses the original
62 name with .out appended. 62 name with .out appended.
63 63
64 As with compression, supplying no filenames causes decom­ 64 As with compression, supplying no filenames causes decom-
65 pression from standard input to standard output. 65 pression from standard input to standard output.
66 66
67 bunzip2 will correctly decompress a file which is the con­ 67 bunzip2 will correctly decompress a file which is the con-
68 catenation of two or more compressed files. The result is 68 catenation of two or more compressed files. The result is
69 the concatenation of the corresponding uncompressed files. 69 the concatenation of the corresponding uncompressed files.
70 Integrity testing (-t) of concatenated compressed files is 70 Integrity testing (-t) of concatenated compressed files is
71 also supported. 71 also supported.
72 72
73 You can also compress or decompress files to the standard 73 You can also compress or decompress files to the standard
74 output by giving the -c flag. Multiple files may be com­ 74 output by giving the -c flag. Multiple files may be com-
75 pressed and decompressed like this. The resulting outputs 75 pressed and decompressed like this. The resulting outputs
76 are fed sequentially to stdout. Compression of multiple 76 are fed sequentially to stdout. Compression of multiple
77 files in this manner generates a stream containing multi­ 77 files in this manner generates a stream containing multi-
78 ple compressed file representations. Such a stream can be 78 ple compressed file representations. Such a stream can be
79 decompressed correctly only by bzip2 version 0.9.0 or 79 decompressed correctly only by bzip2 version 0.9.0 or
80 later. Earlier versions of bzip2 will stop after decom­ 80 later. Earlier versions of bzip2 will stop after decom-
81 pressing the first file in the stream. 81 pressing the first file in the stream.
82 82
83 bzcat (or bzip2 -dc) decompresses all specified files to 83 bzcat (or bzip2 -dc) decompresses all specified files to
@@ -98,7 +98,7 @@ DESCRIPTION
98 98
99 As a self-check for your protection, bzip2 uses 32-bit 99 As a self-check for your protection, bzip2 uses 32-bit
100 CRCs to make sure that the decompressed version of a file 100 CRCs to make sure that the decompressed version of a file
101 is identical to the original. This guards against corrup­ 101 is identical to the original. This guards against corrup-
102 tion of the compressed data, and against undetected bugs 102 tion of the compressed data, and against undetected bugs
103 in bzip2 (hopefully very unlikely). The chances of data 103 in bzip2 (hopefully very unlikely). The chances of data
104 corruption going undetected is microscopic, about one 104 corruption going undetected is microscopic, about one
@@ -171,7 +171,7 @@ OPTIONS
171 171
172 -v --verbose 172 -v --verbose
173 Verbose mode -- show the compression ratio for each 173 Verbose mode -- show the compression ratio for each
174 file processed. Further -v's increase the ver­ 174 file processed. Further -v's increase the ver-
175 bosity level, spewing out lots of information which 175 bosity level, spewing out lots of information which
176 is primarily of interest for diagnostic purposes. 176 is primarily of interest for diagnostic purposes.
177 177
@@ -184,19 +184,19 @@ OPTIONS
184 compressing. Has no effect when decompressing. 184 compressing. Has no effect when decompressing.
185 See MEMORY MANAGEMENT below. The --fast and --best 185 See MEMORY MANAGEMENT below. The --fast and --best
186 aliases are primarily for GNU gzip compatibility. 186 aliases are primarily for GNU gzip compatibility.
187 In particular, --fast doesn't make things signifi­ 187 In particular, --fast doesn't make things signifi-
188 cantly faster. And --best merely selects the 188 cantly faster. And --best merely selects the
189 default behaviour. 189 default behaviour.
190 190
191 -- Treats all subsequent arguments as file names, even 191 -- Treats all subsequent arguments as file names, even
192 if they start with a dash. This is so you can han­ 192 if they start with a dash. This is so you can han-
193 dle files with names beginning with a dash, for 193 dle files with names beginning with a dash, for
194 example: bzip2 -- -myfilename. 194 example: bzip2 -- -myfilename.
195 195
196 --repetitive-fast --repetitive-best 196 --repetitive-fast --repetitive-best
197 These flags are redundant in versions 0.9.5 and 197 These flags are redundant in versions 0.9.5 and
198 above. They provided some coarse control over the 198 above. They provided some coarse control over the
199 behaviour of the sorting algorithm in earlier ver­ 199 behaviour of the sorting algorithm in earlier ver-
200 sions, which was sometimes useful. 0.9.5 and above 200 sions, which was sometimes useful. 0.9.5 and above
201 have an improved algorithm which renders these 201 have an improved algorithm which renders these
202 flags irrelevant. 202 flags irrelevant.
@@ -207,7 +207,7 @@ MEMORY MANAGEMENT
207 affects both the compression ratio achieved, and the 207 affects both the compression ratio achieved, and the
208 amount of memory needed for compression and decompression. 208 amount of memory needed for compression and decompression.
209 The flags -1 through -9 specify the block size to be 209 The flags -1 through -9 specify the block size to be
210 100,000 bytes through 900,000 bytes (the default) respec­ 210 100,000 bytes through 900,000 bytes (the default) respec-
211 tively. At decompression time, the block size used for 211 tively. At decompression time, the block size used for
212 compression is read from the header of the compressed 212 compression is read from the header of the compressed
213 file, and bunzip2 then allocates itself just enough memory 213 file, and bunzip2 then allocates itself just enough memory
@@ -235,13 +235,13 @@ MEMORY MANAGEMENT
235 bunzip2 will require about 3700 kbytes to decompress. To 235 bunzip2 will require about 3700 kbytes to decompress. To
236 support decompression of any file on a 4 megabyte machine, 236 support decompression of any file on a 4 megabyte machine,
237 bunzip2 has an option to decompress using approximately 237 bunzip2 has an option to decompress using approximately
238 half this amount of memory, about 2300 kbytes. Decompres­ 238 half this amount of memory, about 2300 kbytes. Decompres-
239 sion speed is also halved, so you should use this option 239 sion speed is also halved, so you should use this option
240 only where necessary. The relevant flag is -s. 240 only where necessary. The relevant flag is -s.
241 241
242 In general, try and use the largest block size memory con­ 242 In general, try and use the largest block size memory con-
243 straints allow, since that maximises the compression 243 straints allow, since that maximises the compression
244 achieved. Compression and decompression speed are virtu­ 244 achieved. Compression and decompression speed are virtu-
245 ally unaffected by block size. 245 ally unaffected by block size.
246 246
247 Another significant point applies to files which fit in a 247 Another significant point applies to files which fit in a
@@ -257,11 +257,11 @@ MEMORY MANAGEMENT
257 257
258 Here is a table which summarises the maximum memory usage 258 Here is a table which summarises the maximum memory usage
259 for different block sizes. Also recorded is the total 259 for different block sizes. Also recorded is the total
260 compressed size for 14 files of the Calgary Text Compres­ 260 compressed size for 14 files of the Calgary Text Compres-
261 sion Corpus totalling 3,141,622 bytes. This column gives 261 sion Corpus totalling 3,141,622 bytes. This column gives
262 some feel for how compression varies with block size. 262 some feel for how compression varies with block size.
263 These figures tend to understate the advantage of larger 263 These figures tend to understate the advantage of larger
264 block sizes for larger files, since the Corpus is domi­ 264 block sizes for larger files, since the Corpus is domi-
265 nated by smaller files. 265 nated by smaller files.
266 266
267 Compress Decompress Decompress Corpus 267 Compress Decompress Decompress Corpus
@@ -280,7 +280,7 @@ MEMORY MANAGEMENT
280 280
281RECOVERING DATA FROM DAMAGED FILES 281RECOVERING DATA FROM DAMAGED FILES
282 bzip2 compresses files in blocks, usually 900kbytes long. 282 bzip2 compresses files in blocks, usually 900kbytes long.
283 Each block is handled independently. If a media or trans­ 283 Each block is handled independently. If a media or trans-
284 mission error causes a multi-block .bz2 file to become 284 mission error causes a multi-block .bz2 file to become
285 damaged, it may be possible to recover data from the 285 damaged, it may be possible to recover data from the
286 undamaged blocks in the file. 286 undamaged blocks in the file.
@@ -297,19 +297,19 @@ RECOVERING DATA FROM DAMAGED FILES
297 the integrity of the resulting files, and decompress those 297 the integrity of the resulting files, and decompress those
298 which are undamaged. 298 which are undamaged.
299 299
300 bzip2recover takes a single argument, the name of the dam­ 300 bzip2recover takes a single argument, the name of the dam-
301 aged file, and writes a number of files 301 aged file, and writes a number of files
302 "rec00001file.bz2", "rec00002file.bz2", etc, containing 302 "rec00001file.bz2", "rec00002file.bz2", etc, containing
303 the extracted blocks. The output filenames are 303 the extracted blocks. The output filenames are
304 designed so that the use of wildcards in subsequent pro­ 304 designed so that the use of wildcards in subsequent pro-
305 cessing -- for example, "bzip2 -dc rec*file.bz2 > recov­ 305 cessing -- for example, "bzip2 -dc rec*file.bz2 > recov-
306 ered_data" -- processes the files in the correct order. 306 ered_data" -- processes the files in the correct order.
307 307
308 bzip2recover should be of most use dealing with large .bz2 308 bzip2recover should be of most use dealing with large .bz2
309 files, as these will contain many blocks. It is clearly 309 files, as these will contain many blocks. It is clearly
310 futile to use it on damaged single-block files, since a 310 futile to use it on damaged single-block files, since a
311 damaged block cannot be recovered. If you wish to min­ 311 damaged block cannot be recovered. If you wish to min-
312 imise any potential data loss through media or transmis­ 312 imise any potential data loss through media or transmis-
313 sion errors, you might consider compressing with a smaller 313 sion errors, you might consider compressing with a smaller
314 block size. 314 block size.
315 315
@@ -323,19 +323,19 @@ PERFORMANCE NOTES
323 better than previous versions in this respect. The ratio 323 better than previous versions in this respect. The ratio
324 between worst-case and average-case compression time is in 324 between worst-case and average-case compression time is in
325 the region of 10:1. For previous versions, this figure 325 the region of 10:1. For previous versions, this figure
326 was more like 100:1. You can use the -vvvv option to mon­ 326 was more like 100:1. You can use the -vvvv option to mon-
327 itor progress in great detail, if you want. 327 itor progress in great detail, if you want.
328 328
329 Decompression speed is unaffected by these phenomena. 329 Decompression speed is unaffected by these phenomena.
330 330
331 bzip2 usually allocates several megabytes of memory to 331 bzip2 usually allocates several megabytes of memory to
332 operate in, and then charges all over it in a fairly ran­ 332 operate in, and then charges all over it in a fairly ran-
333 dom fashion. This means that performance, both for com­ 333 dom fashion. This means that performance, both for com-
334 pressing and decompressing, is largely determined by the 334 pressing and decompressing, is largely determined by the
335 speed at which your machine can service cache misses. 335 speed at which your machine can service cache misses.
336 Because of this, small changes to the code to reduce the 336 Because of this, small changes to the code to reduce the
337 miss rate have been observed to give disproportionately 337 miss rate have been observed to give disproportionately
338 large performance improvements. I imagine bzip2 will per­ 338 large performance improvements. I imagine bzip2 will per-
339 form best on machines with very large caches. 339 form best on machines with very large caches.
340 340
341 341
@@ -345,46 +345,47 @@ CAVEATS
345 but the details of what the problem is sometimes seem 345 but the details of what the problem is sometimes seem
346 rather misleading. 346 rather misleading.
347 347
348 This manual page pertains to version 1.0.2 of bzip2. Com­ 348 This manual page pertains to version 1.0.3 of bzip2. Com-
349 pressed data created by this version is entirely forwards 349 pressed data created by this version is entirely forwards
350 and backwards compatible with the previous public 350 and backwards compatible with the previous public
351 releases, versions 0.1pl2, 0.9.0, 0.9.5, 1.0.0 and 1.0.1, 351 releases, versions 0.1pl2, 0.9.0, 0.9.5, 1.0.0, 1.0.1 and
352 but with the following exception: 0.9.0 and above can cor­ 352 1.0.2, but with the following exception: 0.9.0 and above
353 rectly decompress multiple concatenated compressed files. 353 can correctly decompress multiple concatenated compressed
354 0.1pl2 cannot do this; it will stop after decompressing 354 files. 0.1pl2 cannot do this; it will stop after decom-
355 just the first file in the stream. 355 pressing just the first file in the stream.
356 356
357 bzip2recover versions prior to this one, 1.0.2, used 357 bzip2recover versions prior to 1.0.2 used 32-bit integers
358 32-bit integers to represent bit positions in compressed 358 to represent bit positions in compressed files, so they
359 files, so it could not handle compressed files more than 359 could not handle compressed files more than 512 megabytes
360 512 megabytes long. Version 1.0.2 and above uses 64-bit 360 long. Versions 1.0.2 and above use 64-bit ints on some
361 ints on some platforms which support them (GNU supported 361 platforms which support them (GNU supported targets, and
362 targets, and Windows). To establish whether or not 362 Windows). To establish whether or not bzip2recover was
363 bzip2recover was built with such a limitation, run it 363 built with such a limitation, run it without arguments.
364 without arguments. In any event you can build yourself an 364 In any event you can build yourself an unlimited version
365 unlimited version if you can recompile it with MaybeUInt64 365 if you can recompile it with MaybeUInt64 set to be an
366 set to be an unsigned 64-bit integer. 366 unsigned 64-bit integer.
367 367
368 368
369AUTHOR 369AUTHOR
370 Julian Seward, jseward@acm.org. 370 Julian Seward, jsewardbzip.org.
371 371
372 http://sources.redhat.com/bzip2 372 http://www.bzip.org
373 373
374 The ideas embodied in bzip2 are due to (at least) the fol­ 374 The ideas embodied in bzip2 are due to (at least) the fol-
375 lowing people: Michael Burrows and David Wheeler (for the 375 lowing people: Michael Burrows and David Wheeler (for the
376 block sorting transformation), David Wheeler (again, for 376 block sorting transformation), David Wheeler (again, for
377 the Huffman coder), Peter Fenwick (for the structured cod­ 377 the Huffman coder), Peter Fenwick (for the structured cod-
378 ing model in the original bzip, and many refinements), and 378 ing model in the original bzip, and many refinements), and
379 Alistair Moffat, Radford Neal and Ian Witten (for the 379 Alistair Moffat, Radford Neal and Ian Witten (for the
380 arithmetic coder in the original bzip). I am much 380 arithmetic coder in the original bzip). I am much
381 indebted for their help, support and advice. See the man­ 381 indebted for their help, support and advice. See the man-
382 ual in the source distribution for pointers to sources of 382 ual in the source distribution for pointers to sources of
383 documentation. Christian von Roques encouraged me to look 383 documentation. Christian von Roques encouraged me to look
384 for faster sorting algorithms, so as to speed up compres­ 384 for faster sorting algorithms, so as to speed up compres-
385 sion. Bela Lubkin encouraged me to improve the worst-case 385 sion. Bela Lubkin encouraged me to improve the worst-case
386 compression performance. The bz* scripts are derived from 386 compression performance. Donna Robinson XMLised the docu-
387 those of GNU gzip. Many people sent patches, helped with 387 mentation. The bz* scripts are derived from those of GNU
388 portability problems, lent machines, gave advice and were 388 gzip. Many people sent patches, helped with portability
389 generally helpful. 389 problems, lent machines, gave advice and were generally
390 helpful.
390 391