aboutsummaryrefslogtreecommitdiff
path: root/bzip2.1.preformatted
diff options
context:
space:
mode:
authorJulian Seward <jseward@acm.org>2001-12-30 22:13:13 +0100
committerJulian Seward <jseward@acm.org>2001-12-30 22:13:13 +0100
commit099d844292f60f9d58914da29e5773204dc55e7a (patch)
tree04bdb38dbcd894d6fdbbc3253e216d029cade5c6 /bzip2.1.preformatted
parent795b859eee96c700e8f3c3fe68e6a9a39d95797c (diff)
downloadbzip2-1.0.2.tar.gz
bzip2-1.0.2.tar.bz2
bzip2-1.0.2.zip
bzip2-1.0.2bzip2-1.0.2
Diffstat (limited to 'bzip2.1.preformatted')
-rw-r--r--bzip2.1.preformatted226
1 files changed, 81 insertions, 145 deletions
diff --git a/bzip2.1.preformatted b/bzip2.1.preformatted
index 9f18339..0f20cb5 100644
--- a/bzip2.1.preformatted
+++ b/bzip2.1.preformatted
@@ -1,11 +1,9 @@
1
2
3
4bzip2(1) bzip2(1) 1bzip2(1) bzip2(1)
5 2
6 3
4
7NNAAMMEE 5NNAAMMEE
8 bzip2, bunzip2 - a block-sorting file compressor, v1.0 6 bzip2, bunzip2 - a block-sorting file compressor, v1.0.2
9 bzcat - decompresses files to stdout 7 bzcat - decompresses files to stdout
10 bzip2recover - recovers data from damaged bzip2 files 8 bzip2recover - recovers data from damaged bzip2 files
11 9
@@ -22,20 +20,20 @@ DDEESSCCRRIIPPTTIIOONN
22 sorting text compression algorithm, and Huffman coding. 20 sorting text compression algorithm, and Huffman coding.
23 Compression is generally considerably better than that 21 Compression is generally considerably better than that
24 achieved by more conventional LZ77/LZ78-based compressors, 22 achieved by more conventional LZ77/LZ78-based compressors,
25 and approaches the performance of the PPM family of sta- 23 and approaches the performance of the PPM family of sta­
26 tistical compressors. 24 tistical compressors.
27 25
28 The command-line options are deliberately very similar to 26 The command-line options are deliberately very similar to
29 those of _G_N_U _g_z_i_p_, but they are not identical. 27 those of _G_N_U _g_z_i_p_, but they are not identical.
30 28
31 _b_z_i_p_2 expects a list of file names to accompany the com- 29 _b_z_i_p_2 expects a list of file names to accompany the com­
32 mand-line flags. Each file is replaced by a compressed 30 mand-line flags. Each file is replaced by a compressed
33 version of itself, with the name "original_name.bz2". 31 version of itself, with the name "original_name.bz2".
34 Each compressed file has the same modification date, per- 32 Each compressed file has the same modification date, per­
35 missions, and, when possible, ownership as the correspond- 33 missions, and, when possible, ownership as the correspond­
36 ing original, so that these properties can be correctly 34 ing original, so that these properties can be correctly
37 restored at decompression time. File name handling is 35 restored at decompression time. File name handling is
38 naive in the sense that there is no mechanism for preserv- 36 naive in the sense that there is no mechanism for preserv­
39 ing original file names, permissions, ownerships or dates 37 ing original file names, permissions, ownerships or dates
40 in filesystems which lack these concepts, or have serious 38 in filesystems which lack these concepts, or have serious
41 file name length restrictions, such as MS-DOS. 39 file name length restrictions, such as MS-DOS.
@@ -58,18 +56,6 @@ DDEESSCCRRIIPPTTIIOONN
58 filename.bz2 becomes filename 56 filename.bz2 becomes filename
59 filename.bz becomes filename 57 filename.bz becomes filename
60 filename.tbz2 becomes filename.tar 58 filename.tbz2 becomes filename.tar
61
62
63
64 1
65
66
67
68
69
70bzip2(1) bzip2(1)
71
72
73 filename.tbz becomes filename.tar 59 filename.tbz becomes filename.tar
74 anyothername becomes anyothername.out 60 anyothername becomes anyothername.out
75 61
@@ -78,23 +64,23 @@ bzip2(1) bzip2(1)
78 guess the name of the original file, and uses the original 64 guess the name of the original file, and uses the original
79 name with _._o_u_t appended. 65 name with _._o_u_t appended.
80 66
81 As with compression, supplying no filenames causes decom- 67 As with compression, supplying no filenames causes decom­
82 pression from standard input to standard output. 68 pression from standard input to standard output.
83 69
84 _b_u_n_z_i_p_2 will correctly decompress a file which is the con- 70 _b_u_n_z_i_p_2 will correctly decompress a file which is the con­
85 catenation of two or more compressed files. The result is 71 catenation of two or more compressed files. The result is
86 the concatenation of the corresponding uncompressed files. 72 the concatenation of the corresponding uncompressed files.
87 Integrity testing (-t) of concatenated compressed files is 73 Integrity testing (-t) of concatenated compressed files is
88 also supported. 74 also supported.
89 75
90 You can also compress or decompress files to the standard 76 You can also compress or decompress files to the standard
91 output by giving the -c flag. Multiple files may be com- 77 output by giving the -c flag. Multiple files may be com­
92 pressed and decompressed like this. The resulting outputs 78 pressed and decompressed like this. The resulting outputs
93 are fed sequentially to stdout. Compression of multiple 79 are fed sequentially to stdout. Compression of multiple
94 files in this manner generates a stream containing multi- 80 files in this manner generates a stream containing multi­
95 ple compressed file representations. Such a stream can be 81 ple compressed file representations. Such a stream can be
96 decompressed correctly only by _b_z_i_p_2 version 0.9.0 or 82 decompressed correctly only by _b_z_i_p_2 version 0.9.0 or
97 later. Earlier versions of _b_z_i_p_2 will stop after decom- 83 later. Earlier versions of _b_z_i_p_2 will stop after decom­
98 pressing the first file in the stream. 84 pressing the first file in the stream.
99 85
100 _b_z_c_a_t (or _b_z_i_p_2 _-_d_c_) decompresses all specified files to 86 _b_z_c_a_t (or _b_z_i_p_2 _-_d_c_) decompresses all specified files to
@@ -115,7 +101,7 @@ bzip2(1) bzip2(1)
115 101
116 As a self-check for your protection, _b_z_i_p_2 uses 32-bit 102 As a self-check for your protection, _b_z_i_p_2 uses 32-bit
117 CRCs to make sure that the decompressed version of a file 103 CRCs to make sure that the decompressed version of a file
118 is identical to the original. This guards against corrup- 104 is identical to the original. This guards against corrup­
119 tion of the compressed data, and against undetected bugs 105 tion of the compressed data, and against undetected bugs
120 in _b_z_i_p_2 (hopefully very unlikely). The chances of data 106 in _b_z_i_p_2 (hopefully very unlikely). The chances of data
121 corruption going undetected is microscopic, about one 107 corruption going undetected is microscopic, about one
@@ -125,17 +111,6 @@ bzip2(1) bzip2(1)
125 you recover the original uncompressed data. You can use 111 you recover the original uncompressed data. You can use
126 _b_z_i_p_2_r_e_c_o_v_e_r to try to recover data from damaged files. 112 _b_z_i_p_2_r_e_c_o_v_e_r to try to recover data from damaged files.
127 113
128
129
130 2
131
132
133
134
135
136bzip2(1) bzip2(1)
137
138
139 Return values: 0 for a normal exit, 1 for environmental 114 Return values: 0 for a normal exit, 1 for environmental
140 problems (file not found, invalid flags, I/O errors, &c), 115 problems (file not found, invalid flags, I/O errors, &c),
141 2 to indicate a corrupt compressed file, 3 for an internal 116 2 to indicate a corrupt compressed file, 3 for an internal
@@ -154,8 +129,8 @@ OOPPTTIIOONNSS
154 and forces _b_z_i_p_2 to decompress. 129 and forces _b_z_i_p_2 to decompress.
155 130
156 --zz ----ccoommpprreessss 131 --zz ----ccoommpprreessss
157 The complement to -d: forces compression, regard- 132 The complement to -d: forces compression,
158 less of the invokation name. 133 regardless of the invocation name.
159 134
160 --tt ----tteesstt 135 --tt ----tteesstt
161 Check integrity of the specified file(s), but don't 136 Check integrity of the specified file(s), but don't
@@ -168,6 +143,11 @@ OOPPTTIIOONNSS
168 forces _b_z_i_p_2 to break hard links to files, which it 143 forces _b_z_i_p_2 to break hard links to files, which it
169 otherwise wouldn't do. 144 otherwise wouldn't do.
170 145
146 bzip2 normally declines to decompress files which
147 don't have the correct magic header bytes. If
148 forced (-f), however, it will pass such files
149 through unmodified. This is how GNU gzip behaves.
150
171 --kk ----kkeeeepp 151 --kk ----kkeeeepp
172 Keep (don't delete) input files during compression 152 Keep (don't delete) input files during compression
173 or decompression. 153 or decompression.
@@ -190,23 +170,11 @@ OOPPTTIIOONNSS
190 --qq ----qquuiieett 170 --qq ----qquuiieett
191 Suppress non-essential warning messages. Messages 171 Suppress non-essential warning messages. Messages
192 pertaining to I/O errors and other critical events 172 pertaining to I/O errors and other critical events
193
194
195
196 3
197
198
199
200
201
202bzip2(1) bzip2(1)
203
204
205 will not be suppressed. 173 will not be suppressed.
206 174
207 --vv ----vveerrbboossee 175 --vv ----vveerrbboossee
208 Verbose mode -- show the compression ratio for each 176 Verbose mode -- show the compression ratio for each
209 file processed. Further -v's increase the ver- 177 file processed. Further -v's increase the ver­
210 bosity level, spewing out lots of information which 178 bosity level, spewing out lots of information which
211 is primarily of interest for diagnostic purposes. 179 is primarily of interest for diagnostic purposes.
212 180
@@ -214,20 +182,24 @@ bzip2(1) bzip2(1)
214 Display the software version, license terms and 182 Display the software version, license terms and
215 conditions. 183 conditions.
216 184
217 --11 ttoo --99 185 --11 ((oorr ----ffaasstt)) ttoo --99 ((oorr ----bbeesstt))
218 Set the block size to 100 k, 200 k .. 900 k when 186 Set the block size to 100 k, 200 k .. 900 k when
219 compressing. Has no effect when decompressing. 187 compressing. Has no effect when decompressing.
220 See MEMORY MANAGEMENT below. 188 See MEMORY MANAGEMENT below. The --fast and --best
189 aliases are primarily for GNU gzip compatibility.
190 In particular, --fast doesn't make things signifi­
191 cantly faster. And --best merely selects the
192 default behaviour.
221 193
222 ---- Treats all subsequent arguments as file names, even 194 ---- Treats all subsequent arguments as file names, even
223 if they start with a dash. This is so you can han- 195 if they start with a dash. This is so you can han­
224 dle files with names beginning with a dash, for 196 dle files with names beginning with a dash, for
225 example: bzip2 -- -myfilename. 197 example: bzip2 -- -myfilename.
226 198
227 ----rreeppeettiittiivvee--ffaasstt ----rreeppeettiittiivvee--bbeesstt 199 ----rreeppeettiittiivvee--ffaasstt ----rreeppeettiittiivvee--bbeesstt
228 These flags are redundant in versions 0.9.5 and 200 These flags are redundant in versions 0.9.5 and
229 above. They provided some coarse control over the 201 above. They provided some coarse control over the
230 behaviour of the sorting algorithm in earlier ver- 202 behaviour of the sorting algorithm in earlier ver­
231 sions, which was sometimes useful. 0.9.5 and above 203 sions, which was sometimes useful. 0.9.5 and above
232 have an improved algorithm which renders these 204 have an improved algorithm which renders these
233 flags irrelevant. 205 flags irrelevant.
@@ -238,7 +210,7 @@ MMEEMMOORRYY MMAANNAAGGEEMMEENNTT
238 affects both the compression ratio achieved, and the 210 affects both the compression ratio achieved, and the
239 amount of memory needed for compression and decompression. 211 amount of memory needed for compression and decompression.
240 The flags -1 through -9 specify the block size to be 212 The flags -1 through -9 specify the block size to be
241 100,000 bytes through 900,000 bytes (the default) respec- 213 100,000 bytes through 900,000 bytes (the default) respec­
242 tively. At decompression time, the block size used for 214 tively. At decompression time, the block size used for
243 compression is read from the header of the compressed 215 compression is read from the header of the compressed
244 file, and _b_u_n_z_i_p_2 then allocates itself just enough memory 216 file, and _b_u_n_z_i_p_2 then allocates itself just enough memory
@@ -256,18 +228,6 @@ MMEEMMOORRYY MMAANNAAGGEEMMEENNTT
256 228
257 Larger block sizes give rapidly diminishing marginal 229 Larger block sizes give rapidly diminishing marginal
258 returns. Most of the compression comes from the first two 230 returns. Most of the compression comes from the first two
259
260
261
262 4
263
264
265
266
267
268bzip2(1) bzip2(1)
269
270
271 or three hundred k of block size, a fact worth bearing in 231 or three hundred k of block size, a fact worth bearing in
272 mind when using _b_z_i_p_2 on small machines. It is also 232 mind when using _b_z_i_p_2 on small machines. It is also
273 important to appreciate that the decompression memory 233 important to appreciate that the decompression memory
@@ -278,13 +238,13 @@ bzip2(1) bzip2(1)
278 _b_u_n_z_i_p_2 will require about 3700 kbytes to decompress. To 238 _b_u_n_z_i_p_2 will require about 3700 kbytes to decompress. To
279 support decompression of any file on a 4 megabyte machine, 239 support decompression of any file on a 4 megabyte machine,
280 _b_u_n_z_i_p_2 has an option to decompress using approximately 240 _b_u_n_z_i_p_2 has an option to decompress using approximately
281 half this amount of memory, about 2300 kbytes. Decompres- 241 half this amount of memory, about 2300 kbytes. Decompres­
282 sion speed is also halved, so you should use this option 242 sion speed is also halved, so you should use this option
283 only where necessary. The relevant flag is -s. 243 only where necessary. The relevant flag is -s.
284 244
285 In general, try and use the largest block size memory con- 245 In general, try and use the largest block size memory con­
286 straints allow, since that maximises the compression 246 straints allow, since that maximises the compression
287 achieved. Compression and decompression speed are virtu- 247 achieved. Compression and decompression speed are virtu­
288 ally unaffected by block size. 248 ally unaffected by block size.
289 249
290 Another significant point applies to files which fit in a 250 Another significant point applies to files which fit in a
@@ -300,11 +260,11 @@ bzip2(1) bzip2(1)
300 260
301 Here is a table which summarises the maximum memory usage 261 Here is a table which summarises the maximum memory usage
302 for different block sizes. Also recorded is the total 262 for different block sizes. Also recorded is the total
303 compressed size for 14 files of the Calgary Text Compres- 263 compressed size for 14 files of the Calgary Text Compres­
304 sion Corpus totalling 3,141,622 bytes. This column gives 264 sion Corpus totalling 3,141,622 bytes. This column gives
305 some feel for how compression varies with block size. 265 some feel for how compression varies with block size.
306 These figures tend to understate the advantage of larger 266 These figures tend to understate the advantage of larger
307 block sizes for larger files, since the Corpus is domi- 267 block sizes for larger files, since the Corpus is domi­
308 nated by smaller files. 268 nated by smaller files.
309 269
310 Compress Decompress Decompress Corpus 270 Compress Decompress Decompress Corpus
@@ -321,22 +281,9 @@ bzip2(1) bzip2(1)
321 -9 7600k 3700k 2350k 828642 281 -9 7600k 3700k 2350k 828642
322 282
323 283
324
325
326
327
328 5
329
330
331
332
333
334bzip2(1) bzip2(1)
335
336
337RREECCOOVVEERRIINNGG DDAATTAA FFRROOMM DDAAMMAAGGEEDD FFIILLEESS 284RREECCOOVVEERRIINNGG DDAATTAA FFRROOMM DDAAMMAAGGEEDD FFIILLEESS
338 _b_z_i_p_2 compresses files in blocks, usually 900kbytes long. 285 _b_z_i_p_2 compresses files in blocks, usually 900kbytes long.
339 Each block is handled independently. If a media or trans- 286 Each block is handled independently. If a media or trans­
340 mission error causes a multi-block .bz2 file to become 287 mission error causes a multi-block .bz2 file to become
341 damaged, it may be possible to recover data from the 288 damaged, it may be possible to recover data from the
342 undamaged blocks in the file. 289 undamaged blocks in the file.
@@ -353,19 +300,19 @@ RREECCOOVVEERRIINNGG DDAATTAA FFRROOMM DDAAMMAAGGEEDD F
353 the integrity of the resulting files, and decompress those 300 the integrity of the resulting files, and decompress those
354 which are undamaged. 301 which are undamaged.
355 302
356 _b_z_i_p_2_r_e_c_o_v_e_r takes a single argument, the name of the dam- 303 _b_z_i_p_2_r_e_c_o_v_e_r takes a single argument, the name of the dam­
357 aged file, and writes a number of files "rec0001file.bz2", 304 aged file, and writes a number of files
358 "rec0002file.bz2", etc, containing the extracted blocks. 305 "rec00001file.bz2", "rec00002file.bz2", etc, containing
359 The output filenames are designed so that the use of 306 the extracted blocks. The output filenames are
360 wildcards in subsequent processing -- for example, "bzip2 307 designed so that the use of wildcards in subsequent pro­
361 -dc rec*file.bz2 > recovered_data" -- lists the files in 308 cessing -- for example, "bzip2 -dc rec*file.bz2 > recov­
362 the correct order. 309 ered_data" -- processes the files in the correct order.
363 310
364 _b_z_i_p_2_r_e_c_o_v_e_r should be of most use dealing with large .bz2 311 _b_z_i_p_2_r_e_c_o_v_e_r should be of most use dealing with large .bz2
365 files, as these will contain many blocks. It is clearly 312 files, as these will contain many blocks. It is clearly
366 futile to use it on damaged single-block files, since a 313 futile to use it on damaged single-block files, since a
367 damaged block cannot be recovered. If you wish to min- 314 damaged block cannot be recovered. If you wish to min­
368 imise any potential data loss through media or transmis- 315 imise any potential data loss through media or transmis­
369 sion errors, you might consider compressing with a smaller 316 sion errors, you might consider compressing with a smaller
370 block size. 317 block size.
371 318
@@ -379,31 +326,19 @@ PPEERRFFOORRMMAANNCCEE NNOOTTEESS
379 better than previous versions in this respect. The ratio 326 better than previous versions in this respect. The ratio
380 between worst-case and average-case compression time is in 327 between worst-case and average-case compression time is in
381 the region of 10:1. For previous versions, this figure 328 the region of 10:1. For previous versions, this figure
382 was more like 100:1. You can use the -vvvv option to mon- 329 was more like 100:1. You can use the -vvvv option to mon­
383 itor progress in great detail, if you want. 330 itor progress in great detail, if you want.
384 331
385 Decompression speed is unaffected by these phenomena. 332 Decompression speed is unaffected by these phenomena.
386 333
387 _b_z_i_p_2 usually allocates several megabytes of memory to 334 _b_z_i_p_2 usually allocates several megabytes of memory to
388 operate in, and then charges all over it in a fairly ran- 335 operate in, and then charges all over it in a fairly ran­
389 dom fashion. This means that performance, both for com- 336 dom fashion. This means that performance, both for com­
390 pressing and decompressing, is largely determined by the 337 pressing and decompressing, is largely determined by the
391
392
393
394 6
395
396
397
398
399
400bzip2(1) bzip2(1)
401
402
403 speed at which your machine can service cache misses. 338 speed at which your machine can service cache misses.
404 Because of this, small changes to the code to reduce the 339 Because of this, small changes to the code to reduce the
405 miss rate have been observed to give disproportionately 340 miss rate have been observed to give disproportionately
406 large performance improvements. I imagine _b_z_i_p_2 will per- 341 large performance improvements. I imagine _b_z_i_p_2 will per­
407 form best on machines with very large caches. 342 form best on machines with very large caches.
408 343
409 344
@@ -413,50 +348,51 @@ CCAAVVEEAATTSS
413 but the details of what the problem is sometimes seem 348 but the details of what the problem is sometimes seem
414 rather misleading. 349 rather misleading.
415 350
416 This manual page pertains to version 1.0 of _b_z_i_p_2_. Com- 351 This manual page pertains to version 1.0.2 of _b_z_i_p_2_. Com­
417 pressed data created by this version is entirely forwards 352 pressed data created by this version is entirely forwards
418 and backwards compatible with the previous public 353 and backwards compatible with the previous public
419 releases, versions 0.1pl2, 0.9.0 and 0.9.5, but with the 354 releases, versions 0.1pl2, 0.9.0, 0.9.5, 1.0.0 and 1.0.1,
420 following exception: 0.9.0 and above can correctly decom- 355 but with the following exception: 0.9.0 and above can cor­
421 press multiple concatenated compressed files. 0.1pl2 can- 356 rectly decompress multiple concatenated compressed files.
422 not do this; it will stop after decompressing just the 357 0.1pl2 cannot do this; it will stop after decompressing
423 first file in the stream. 358 just the first file in the stream.
359
360 _b_z_i_p_2_r_e_c_o_v_e_r versions prior to this one, 1.0.2, used
361 32-bit integers to represent bit positions in compressed
362 files, so it could not handle compressed files more than
363 512 megabytes long. Version 1.0.2 and above uses 64-bit
364 ints on some platforms which support them (GNU supported
365 targets, and Windows). To establish whether or not
366 bzip2recover was built with such a limitation, run it
367 without arguments. In any event you can build yourself an
368 unlimited version if you can recompile it with MaybeUInt64
369 set to be an unsigned 64-bit integer.
370
424 371
425 _b_z_i_p_2_r_e_c_o_v_e_r uses 32-bit integers to represent bit posi-
426 tions in compressed files, so it cannot handle compressed
427 files more than 512 megabytes long. This could easily be
428 fixed.
429 372
430 373
431AAUUTTHHOORR 374AAUUTTHHOORR
432 Julian Seward, jseward@acm.org. 375 Julian Seward, jseward@acm.org.
433 376
434 http://sourceware.cygnus.com/bzip2 377 http://sources.redhat.com/bzip2
435 http://www.muraroa.demon.co.uk
436 378
437 The ideas embodied in _b_z_i_p_2 are due to (at least) the fol- 379 The ideas embodied in _b_z_i_p_2 are due to (at least) the fol­
438 lowing people: Michael Burrows and David Wheeler (for the 380 lowing people: Michael Burrows and David Wheeler (for the
439 block sorting transformation), David Wheeler (again, for 381 block sorting transformation), David Wheeler (again, for
440 the Huffman coder), Peter Fenwick (for the structured cod- 382 the Huffman coder), Peter Fenwick (for the structured cod­
441 ing model in the original _b_z_i_p_, and many refinements), and 383 ing model in the original _b_z_i_p_, and many refinements), and
442 Alistair Moffat, Radford Neal and Ian Witten (for the 384 Alistair Moffat, Radford Neal and Ian Witten (for the
443 arithmetic coder in the original _b_z_i_p_)_. I am much 385 arithmetic coder in the original _b_z_i_p_)_. I am much
444 indebted for their help, support and advice. See the man- 386 indebted for their help, support and advice. See the man­
445 ual in the source distribution for pointers to sources of 387 ual in the source distribution for pointers to sources of
446 documentation. Christian von Roques encouraged me to look 388 documentation. Christian von Roques encouraged me to look
447 for faster sorting algorithms, so as to speed up compres- 389 for faster sorting algorithms, so as to speed up compres­
448 sion. Bela Lubkin encouraged me to improve the worst-case 390 sion. Bela Lubkin encouraged me to improve the worst-case
449 compression performance. Many people sent patches, helped 391 compression performance. The bz* scripts are derived from
450 with portability problems, lent machines, gave advice and 392 those of GNU gzip. Many people sent patches, helped with
451 were generally helpful. 393 portability problems, lent machines, gave advice and were
452 394 generally helpful.
453
454
455
456
457
458
459 395
460 7
461 396
462 397
398 bzip2(1)