aboutsummaryrefslogtreecommitdiff
path: root/bzip2.1
diff options
context:
space:
mode:
Diffstat (limited to '')
-rw-r--r--bzip2.1191
-rw-r--r--bzip2.1.preformatted318
2 files changed, 241 insertions, 268 deletions
diff --git a/bzip2.1 b/bzip2.1
index 489668f..a6789a4 100644
--- a/bzip2.1
+++ b/bzip2.1
@@ -1,21 +1,29 @@
1.PU 1.PU
2.TH bzip2 1 2.TH bzip2 1
3.SH NAME 3.SH NAME
4bzip2, bunzip2 \- a block-sorting file compressor, v0.1 4bzip2, bunzip2 \- a block-sorting file compressor, v0.9.0
5.br
6bzcat \- decompresses files to stdout
5.br 7.br
6bzip2recover \- recovers data from damaged bzip2 files 8bzip2recover \- recovers data from damaged bzip2 files
7 9
8.SH SYNOPSIS 10.SH SYNOPSIS
9.ll +8 11.ll +8
10.B bzip2 12.B bzip2
11.RB [ " \-cdfkstvVL123456789 " ] 13.RB [ " \-cdfkstvzVL123456789 " ]
12[ 14[
13.I "filenames \&..." 15.I "filenames \&..."
14] 16]
15.ll -8 17.ll -8
16.br 18.br
17.B bunzip2 19.B bunzip2
18.RB [ " \-kvsVL " ] 20.RB [ " \-fkvsVL " ]
21[
22.I "filenames \&..."
23]
24.br
25.B bzcat
26.RB [ " \-s " ]
19[ 27[
20.I "filenames \&..." 28.I "filenames \&..."
21] 29]
@@ -24,7 +32,7 @@ bzip2recover \- recovers data from damaged bzip2 files
24.I "filename" 32.I "filename"
25 33
26.SH DESCRIPTION 34.SH DESCRIPTION
27.I Bzip2 35.I bzip2
28compresses files using the Burrows-Wheeler block-sorting 36compresses files using the Burrows-Wheeler block-sorting
29text compression algorithm, and Huffman coding. 37text compression algorithm, and Huffman coding.
30Compression is generally considerably 38Compression is generally considerably
@@ -38,7 +46,7 @@ those of
38.I GNU Gzip, 46.I GNU Gzip,
39but they are not identical. 47but they are not identical.
40 48
41.I Bzip2 49.I bzip2
42expects a list of file names to accompany the command-line flags. 50expects a list of file names to accompany the command-line flags.
43Each file is replaced by a compressed version of itself, 51Each file is replaced by a compressed version of itself,
44with the name "original_name.bz2". 52with the name "original_name.bz2".
@@ -50,11 +58,11 @@ original file names, permissions and dates in filesystems
50which lack these concepts, or have serious file name length 58which lack these concepts, or have serious file name length
51restrictions, such as MS-DOS. 59restrictions, such as MS-DOS.
52 60
53.I Bzip2 61.I bzip2
54and 62and
55.I bunzip2 63.I bunzip2
56will not overwrite existing files; if you want this to happen, 64will by default not overwrite existing files;
57you should delete them first. 65if you want this to happen, specify the \-f flag.
58 66
59If no file names are specified, 67If no file names are specified,
60.I bzip2 68.I bzip2
@@ -64,7 +72,7 @@ In this case,
64will decline to write compressed output to a terminal, as 72will decline to write compressed output to a terminal, as
65this would be entirely incomprehensible and therefore pointless. 73this would be entirely incomprehensible and therefore pointless.
66 74
67.I Bunzip2 75.I bunzip2
68(or 76(or
69.I bzip2 \-d 77.I bzip2 \-d
70) decompresses and restores all specified files whose names 78) decompresses and restores all specified files whose names
@@ -73,12 +81,28 @@ Files without this suffix are ignored.
73Again, supplying no filenames 81Again, supplying no filenames
74causes decompression from standard input to standard output. 82causes decompression from standard input to standard output.
75 83
84.I bunzip2
85will correctly decompress a file which is the concatenation
86of two or more compressed files. The result is the concatenation
87of the corresponding uncompressed files. Integrity testing
88(\-t) of concatenated compressed files is also supported.
89
76You can also compress or decompress files to 90You can also compress or decompress files to
77the standard output by giving the \-c flag. 91the standard output by giving the \-c flag.
78You can decompress multiple files like this, but you may 92Multiple files may be compressed and decompressed like this.
79only compress a single file this way, since it would otherwise 93The resulting outputs are fed sequentially to stdout.
80be difficult to separate out the compressed representations of 94Compression of multiple files in this manner generates
81the original files. 95a stream containing multiple compressed file representations.
96Such a stream can be decompressed correctly only by
97.I bzip2
98version 0.9.0 or later. Earlier versions of
99.I bzip2
100will stop after decompressing the first file in the stream.
101
102.I bzcat
103(or
104.I bzip2 \-dc
105) decompresses all specified files to the standard output.
82 106
83Compression is always performed, even if the compressed file is 107Compression is always performed, even if the compressed file is
84slightly larger than the original. Files of less than about 108slightly larger than the original. Files of less than about
@@ -132,7 +156,7 @@ Compression and decompression requirements, in bytes, can be estimated as:
132 156
133 Compression: 400k + ( 7 x block size ) 157 Compression: 400k + ( 7 x block size )
134 158
135 Decompression: 100k + ( 5 x block size ), or 159 Decompression: 100k + ( 4 x block size ), or
136.br 160.br
137 100k + ( 2.5 x block size ) 161 100k + ( 2.5 x block size )
138 162
@@ -147,7 +171,7 @@ choice of block size.
147 171
148For files compressed with the default 900k block size, 172For files compressed with the default 900k block size,
149.I bunzip2 173.I bunzip2
150will require about 4600 kbytes to decompress. 174will require about 3700 kbytes to decompress.
151To support decompression of any file on a 4 megabyte machine, 175To support decompression of any file on a 4 megabyte machine,
152.I bunzip2 176.I bunzip2
153has an option to decompress using approximately half this 177has an option to decompress using approximately half this
@@ -168,8 +192,8 @@ For example, compressing a file 20,000 bytes long with the flag
168\-9 192\-9
169will cause the compressor to allocate around 193will cause the compressor to allocate around
1706700k of memory, but only touch 400k + 20000 * 7 = 540 1946700k of memory, but only touch 400k + 20000 * 7 = 540
171kbytes of it. Similarly, the decompressor will allocate 4600k but 195kbytes of it. Similarly, the decompressor will allocate 3700k but
172only touch 100k + 20000 * 5 = 200 kbytes. 196only touch 100k + 20000 * 4 = 180 kbytes.
173 197
174Here is a table which summarises the maximum memory usage for 198Here is a table which summarises the maximum memory usage for
175different block sizes. Also recorded is the total compressed 199different block sizes. Also recorded is the total compressed
@@ -182,71 +206,73 @@ Corpus is dominated by smaller files.
182 Compress Decompress Decompress Corpus 206 Compress Decompress Decompress Corpus
183 Flag usage usage -s usage Size 207 Flag usage usage -s usage Size
184 208
185 -1 1100k 600k 350k 914704 209 -1 1100k 500k 350k 914704
186 -2 1800k 1100k 600k 877703 210 -2 1800k 900k 600k 877703
187 -3 2500k 1600k 850k 860338 211 -3 2500k 1300k 850k 860338
188 -4 3200k 2100k 1100k 846899 212 -4 3200k 1700k 1100k 846899
189 -5 3900k 2600k 1350k 845160 213 -5 3900k 2100k 1350k 845160
190 -6 4600k 3100k 1600k 838626 214 -6 4600k 2500k 1600k 838626
191 -7 5400k 3600k 1850k 834096 215 -7 5400k 2900k 1850k 834096
192 -8 6000k 4100k 2100k 828642 216 -8 6000k 3300k 2100k 828642
193 -9 6700k 4600k 2350k 828642 217 -9 6700k 3700k 2350k 828642
194 218
195.SH OPTIONS 219.SH OPTIONS
196.TP 220.TP
197.B \-c --stdout 221.B \-c --stdout
198Compress or decompress to standard output. \-c will decompress 222Compress or decompress to standard output. \-c will decompress
199multiple files to stdout, but will only compress a single file to 223multiple files to stdout, but will only compress a single file to
200stdout. 224stdout.
201.TP 225.TP
202.B \-d --decompress 226.B \-d --decompress
203Force decompression. 227Force decompression.
204.I Bzip2 228.I bzip2,
205and
206.I bunzip2 229.I bunzip2
207are really the same program, and the decision about whether to 230and
208compress or decompress is done on the basis of which name is 231.I bzcat
232are really the same program, and the decision about what actions
233to take is done on the basis of which name is
209used. This flag overrides that mechanism, and forces 234used. This flag overrides that mechanism, and forces
210.I bzip2 235.I bzip2
211to decompress. 236to decompress.
212.TP 237.TP
213.B \-f --compress 238.B \-z --compress
214The complement to \-d: forces compression, regardless of the invokation 239The complement to \-d: forces compression, regardless of the invokation
215name. 240name.
216.TP 241.TP
217.B \-t --test 242.B \-t --test
218Check integrity of the specified file(s), but don't decompress them. 243Check integrity of the specified file(s), but don't decompress them.
219This really performs a trial decompression and throws away the result, 244This really performs a trial decompression and throws away the result.
220using the low-memory decompression algorithm (see \-s). 245.TP
246.B \-f --force
247Force overwrite of output files. Normally,
248.I bzip2
249will not overwrite existing output files.
221.TP 250.TP
222.B \-k --keep 251.B \-k --keep
223Keep (don't delete) input files during compression or decompression. 252Keep (don't delete) input files during compression or decompression.
224.TP 253.TP
225.B \-s --small 254.B \-s --small
226Reduce memory usage, both for compression and decompression. 255Reduce memory usage, for compression, decompression and
227Files are decompressed using a modified algorithm which only 256testing.
257Files are decompressed and tested using a modified algorithm which only
228requires 2.5 bytes per block byte. This means any file can be 258requires 2.5 bytes per block byte. This means any file can be
229decompressed in 2300k of memory, albeit somewhat more slowly than 259decompressed in 2300k of memory, albeit at about half the normal
230usual. 260speed.
231 261
232During compression, -s selects a block size of 200k, which limits 262During compression, -s selects a block size of 200k, which limits
233memory use to around the same figure, at the expense of your 263memory use to around the same figure, at the expense of your
234compression ratio. In short, if your machine is low on memory 264compression ratio. In short, if your machine is low on memory
235(8 megabytes or less), use -s for everything. See 265(8 megabytes or less), use -s for everything. See
236MEMORY MANAGEMENT above. 266MEMORY MANAGEMENT above.
237
238.TP 267.TP
239.B \-v --verbose 268.B \-v --verbose
240Verbose mode -- show the compression ratio for each file processed. 269Verbose mode -- show the compression ratio for each file processed.
241Further \-v's increase the verbosity level, spewing out lots of 270Further \-v's increase the verbosity level, spewing out lots of
242information which is primarily of interest for diagnostic purposes. 271information which is primarily of interest for diagnostic purposes.
243.TP 272.TP
244.B \-L --license 273.B \-L --license -V --version
245Display the software version, license terms and conditions. 274Display the software version, license terms and conditions.
246.TP 275.TP
247.B \-V --version
248Same as \-L.
249.TP
250.B \-1 to \-9 276.B \-1 to \-9
251Set the block size to 100 k, 200 k .. 900 k when 277Set the block size to 100 k, 200 k .. 900 k when
252compressing. Has no effect when decompressing. 278compressing. Has no effect when decompressing.
@@ -329,10 +355,6 @@ to compress the latter.
329If you do get a file which causes severe slowness in compression, 355If you do get a file which causes severe slowness in compression,
330try making the block size as small as possible, with flag \-1. 356try making the block size as small as possible, with flag \-1.
331 357
332Incompressible or virtually-incompressible data may decompress
333rather more slowly than one would hope. This is due to
334a naive implementation of the move-to-front coder.
335
336.I bzip2 358.I bzip2
337usually allocates several megabytes of memory to operate in, 359usually allocates several megabytes of memory to operate in,
338and then charges all over it in a fairly random fashion. This 360and then charges all over it in a fairly random fashion. This
@@ -346,28 +368,19 @@ I imagine
346.I bzip2 368.I bzip2
347will perform best on machines with very large caches. 369will perform best on machines with very large caches.
348 370
349Test mode (\-t) uses the low-memory decompression algorithm
350(\-s). This means test mode does not run as fast as it could;
351it could run as fast as the normal decompression machinery.
352This could easily be fixed at the cost of some code bloat.
353
354.SH CAVEATS 371.SH CAVEATS
355I/O error messages are not as helpful as they could be. 372I/O error messages are not as helpful as they could be.
356.I Bzip2 373.I Bzip2
357tries hard to detect I/O errors and exit cleanly, but the 374tries hard to detect I/O errors and exit cleanly, but the
358details of what the problem is sometimes seem rather misleading. 375details of what the problem is sometimes seem rather misleading.
359 376
360This manual page pertains to version 0.1 of 377This manual page pertains to version 0.9.0 of
361.I bzip2. 378.I bzip2.
362It may well happen that some future version will 379Compressed data created by this version is entirely forwards and
363use a different compressed file format. If you try to 380backwards compatible with the previous public release, version 0.1pl2,
364decompress, using 0.1, a .bz2 file created with some 381but with the following exception: 0.9.0 can correctly decompress
365future version which uses a different compressed file format, 382multiple concatenated compressed files. 0.1pl2 cannot do this; it
3660.1 will complain that your file "is not a bzip2 file". 383will stop after decompressing just the first file in the stream.
367If that happens, you should obtain a more recent version
368of
369.I bzip2
370and use that to decompress the file.
371 384
372Wildcard expansion for Windows 95 and NT 385Wildcard expansion for Windows 95 and NT
373is flaky. 386is flaky.
@@ -377,63 +390,25 @@ uses 32-bit integers to represent bit positions in
377compressed files, so it cannot handle compressed files 390compressed files, so it cannot handle compressed files
378more than 512 megabytes long. This could easily be fixed. 391more than 512 megabytes long. This could easily be fixed.
379 392
380.I bzip2recover
381sometimes reports a very small, incomplete final block.
382This is spurious and can be safely ignored.
383
384.SH RELATIONSHIP TO bzip-0.21
385This program is a descendant of the
386.I bzip
387program, version 0.21, which I released in August 1996.
388The primary difference of
389.I bzip2
390is its avoidance of the possibly patented algorithms
391which were used in 0.21.
392.I bzip2
393also brings various useful refinements (\-s, \-t),
394uses less memory, decompresses significantly faster, and
395has support for recovering data from damaged files.
396
397Because
398.I bzip2
399uses Huffman coding to construct the compressed bitstream,
400rather than the arithmetic coding used in 0.21,
401the compressed representations generated by the two programs
402are incompatible, and they will not interoperate. The change
403in suffix from .bz to .bz2 reflects this. It would have been
404helpful to at least allow
405.I bzip2
406to decompress files created by 0.21, but this would
407defeat the primary aim of having a patent-free compressor.
408
409For a more precise statement about patent issues in
410bzip2, please see the README file in the distribution.
411
412Huffman coding necessarily involves some coding inefficiency
413compared to arithmetic coding. This means that
414.I bzip2
415compresses about 1% worse than 0.21, an unfortunate but
416unavoidable fact-of-life. On the other hand, decompression
417is approximately 50% faster for the same reason, and the
418change in file format gave an opportunity to add data-recovery
419features. So it is not all bad.
420
421.SH AUTHOR 393.SH AUTHOR
422Julian Seward, jseward@acm.org. 394Julian Seward, jseward@acm.org.
423 395
396http://www.muraroa.demon.co.uk
397
424The ideas embodied in 398The ideas embodied in
425.I bzip
426and
427.I bzip2 399.I bzip2
428are due to (at least) the following people: 400are due to (at least) the following people:
429Michael Burrows and David Wheeler (for the block sorting 401Michael Burrows and David Wheeler (for the block sorting
430transformation), David Wheeler (again, for the Huffman coder), 402transformation), David Wheeler (again, for the Huffman coder),
431Peter Fenwick (for the structured coding model in 0.21, 403Peter Fenwick (for the structured coding model in the original
404.I bzip,
432and many refinements), 405and many refinements),
433and 406and
434Alistair Moffat, Radford Neal and Ian Witten (for the arithmetic 407Alistair Moffat, Radford Neal and Ian Witten (for the arithmetic
435coder in 0.21). I am much indebted for their help, support and advice. 408coder in the original
436See the file ALGORITHMS in the source distribution for pointers to 409.I bzip).
410I am much indebted for their help, support and advice.
411See the manual in the source distribution for pointers to
437sources of documentation. 412sources of documentation.
438Christian von Roques encouraged me to look for faster 413Christian von Roques encouraged me to look for faster
439sorting algorithms, so as to speed up compression. 414sorting algorithms, so as to speed up compression.
diff --git a/bzip2.1.preformatted b/bzip2.1.preformatted
index 5206e05..8c4fab1 100644
--- a/bzip2.1.preformatted
+++ b/bzip2.1.preformatted
@@ -5,18 +5,20 @@ bzip2(1) bzip2(1)
5 5
6 6
7NNAAMMEE 7NNAAMMEE
8 bzip2, bunzip2 - a block-sorting file compressor, v0.1 8 bzip2, bunzip2 - a block-sorting file compressor, v0.9.0
9 bzcat - decompresses files to stdout
9 bzip2recover - recovers data from damaged bzip2 files 10 bzip2recover - recovers data from damaged bzip2 files
10 11
11 12
12SSYYNNOOPPSSIISS 13SSYYNNOOPPSSIISS
13 bbzziipp22 [ --ccddffkkssttvvVVLL112233445566778899 ] [ _f_i_l_e_n_a_m_e_s _._._. ] 14 bbzziipp22 [ --ccddffkkssttvvzzVVLL112233445566778899 ] [ _f_i_l_e_n_a_m_e_s _._._. ]
14 bbuunnzziipp22 [ --kkvvssVVLL ] [ _f_i_l_e_n_a_m_e_s _._._. ] 15 bbuunnzziipp22 [ --ffkkvvssVVLL ] [ _f_i_l_e_n_a_m_e_s _._._. ]
16 bbzzccaatt [ --ss ] [ _f_i_l_e_n_a_m_e_s _._._. ]
15 bbzziipp22rreeccoovveerr _f_i_l_e_n_a_m_e 17 bbzziipp22rreeccoovveerr _f_i_l_e_n_a_m_e
16 18
17 19
18DDEESSCCRRIIPPTTIIOONN 20DDEESSCCRRIIPPTTIIOONN
19 _B_z_i_p_2 compresses files using the Burrows-Wheeler block- 21 _b_z_i_p_2 compresses files using the Burrows-Wheeler block-
20 sorting text compression algorithm, and Huffman coding. 22 sorting text compression algorithm, and Huffman coding.
21 Compression is generally considerably better than that 23 Compression is generally considerably better than that
22 achieved by more conventional LZ77/LZ78-based compressors, 24 achieved by more conventional LZ77/LZ78-based compressors,
@@ -26,7 +28,7 @@ DDEESSCCRRIIPPTTIIOONN
26 The command-line options are deliberately very similar to 28 The command-line options are deliberately very similar to
27 those of _G_N_U _G_z_i_p_, but they are not identical. 29 those of _G_N_U _G_z_i_p_, but they are not identical.
28 30
29 _B_z_i_p_2 expects a list of file names to accompany the com- 31 _b_z_i_p_2 expects a list of file names to accompany the com-
30 mand-line flags. Each file is replaced by a compressed 32 mand-line flags. Each file is replaced by a compressed
31 version of itself, with the name "original_name.bz2". 33 version of itself, with the name "original_name.bz2".
32 Each compressed file has the same modification date and 34 Each compressed file has the same modification date and
@@ -38,8 +40,8 @@ DDEESSCCRRIIPPTTIIOONN
38 cepts, or have serious file name length restrictions, such 40 cepts, or have serious file name length restrictions, such
39 as MS-DOS. 41 as MS-DOS.
40 42
41 _B_z_i_p_2 and _b_u_n_z_i_p_2 will not overwrite existing files; if 43 _b_z_i_p_2 and _b_u_n_z_i_p_2 will by default not overwrite existing
42 you want this to happen, you should delete them first. 44 files; if you want this to happen, specify the -f flag.
43 45
44 If no file names are specified, _b_z_i_p_2 compresses from 46 If no file names are specified, _b_z_i_p_2 compresses from
45 standard input to standard output. In this case, _b_z_i_p_2 47 standard input to standard output. In this case, _b_z_i_p_2
@@ -47,17 +49,15 @@ DDEESSCCRRIIPPTTIIOONN
47 this would be entirely incomprehensible and therefore 49 this would be entirely incomprehensible and therefore
48 pointless. 50 pointless.
49 51
50 _B_u_n_z_i_p_2 (or _b_z_i_p_2 _-_d ) decompresses and restores all spec- 52 _b_u_n_z_i_p_2 (or _b_z_i_p_2 _-_d ) decompresses and restores all spec-
51 ified files whose names end in ".bz2". Files without this 53 ified files whose names end in ".bz2". Files without this
52 suffix are ignored. Again, supplying no filenames causes 54 suffix are ignored. Again, supplying no filenames causes
53 decompression from standard input to standard output. 55 decompression from standard input to standard output.
54 56
55 You can also compress or decompress files to the standard 57 _b_u_n_z_i_p_2 will correctly decompress a file which is the con-
56 output by giving the -c flag. You can decompress multiple 58 catenation of two or more compressed files. The result is
57 files like this, but you may only compress a single file 59 the concatenation of the corresponding uncompressed files.
58 this way, since it would otherwise be difficult to sepa- 60 Integrity testing (-t) of concatenated compressed files is
59 rate out the compressed representations of the original
60 files.
61 61
62 62
63 63
@@ -70,6 +70,21 @@ DDEESSCCRRIIPPTTIIOONN
70bzip2(1) bzip2(1) 70bzip2(1) bzip2(1)
71 71
72 72
73 also supported.
74
75 You can also compress or decompress files to the standard
76 output by giving the -c flag. Multiple files may be com-
77 pressed and decompressed like this. The resulting outputs
78 are fed sequentially to stdout. Compression of multiple
79 files in this manner generates a stream containing multi-
80 ple compressed file representations. Such a stream can be
81 decompressed correctly only by _b_z_i_p_2 version 0.9.0 or
82 later. Earlier versions of _b_z_i_p_2 will stop after decom-
83 pressing the first file in the stream.
84
85 _b_z_c_a_t (or _b_z_i_p_2 _-_d_c ) decompresses all specified files to
86 the standard output.
87
73 Compression is always performed, even if the compressed 88 Compression is always performed, even if the compressed
74 file is slightly larger than the original. Files of less 89 file is slightly larger than the original. Files of less
75 than about one hundred bytes tend to get larger, since the 90 than about one hundred bytes tend to get larger, since the
@@ -108,36 +123,37 @@ MMEEMMOORRYY MMAANNAAGGEEMMEENNTT
108 file, and _b_u_n_z_i_p_2 then allocates itself just enough memory 123 file, and _b_u_n_z_i_p_2 then allocates itself just enough memory
109 to decompress the file. Since block sizes are stored in 124 to decompress the file. Since block sizes are stored in
110 compressed files, it follows that the flags -1 to -9 are 125 compressed files, it follows that the flags -1 to -9 are
111 irrelevant to and so ignored during decompression. Com- 126 irrelevant to and so ignored during decompression.
112 pression and decompression requirements, in bytes, can be
113 estimated as:
114 127
115 Compression: 400k + ( 7 x block size )
116 128
117 Decompression: 100k + ( 5 x block size ), or
118 100k + ( 2.5 x block size )
119 129
120 Larger block sizes give rapidly diminishing marginal 130 2
121 returns; most of the compression comes from the first two
122 or three hundred k of block size, a fact worth bearing in
123 mind when using _b_z_i_p_2 on small machines. It is also
124 important to appreciate that the decompression memory
125 requirement is set at compression-time by the choice of
126 block size.
127 131
128 132
129 133
130 2
131 134
132 135
136bzip2(1) bzip2(1)
133 137
134 138
139 Compression and decompression requirements, in bytes, can
140 be estimated as:
135 141
136bzip2(1) bzip2(1) 142 Compression: 400k + ( 7 x block size )
137 143
144 Decompression: 100k + ( 4 x block size ), or
145 100k + ( 2.5 x block size )
146
147 Larger block sizes give rapidly diminishing marginal
148 returns; most of the compression comes from the first two
149 or three hundred k of block size, a fact worth bearing in
150 mind when using _b_z_i_p_2 on small machines. It is also
151 important to appreciate that the decompression memory
152 requirement is set at compression-time by the choice of
153 block size.
138 154
139 For files compressed with the default 900k block size, 155 For files compressed with the default 900k block size,
140 _b_u_n_z_i_p_2 will require about 4600 kbytes to decompress. To 156 _b_u_n_z_i_p_2 will require about 3700 kbytes to decompress. To
141 support decompression of any file on a 4 megabyte machine, 157 support decompression of any file on a 4 megabyte machine,
142 _b_u_n_z_i_p_2 has an option to decompress using approximately 158 _b_u_n_z_i_p_2 has an option to decompress using approximately
143 half this amount of memory, about 2300 kbytes. Decompres- 159 half this amount of memory, about 2300 kbytes. Decompres-
@@ -157,8 +173,8 @@ bzip2(1) bzip2(1)
157 file 20,000 bytes long with the flag -9 will cause the 173 file 20,000 bytes long with the flag -9 will cause the
158 compressor to allocate around 6700k of memory, but only 174 compressor to allocate around 6700k of memory, but only
159 touch 400k + 20000 * 7 = 540 kbytes of it. Similarly, the 175 touch 400k + 20000 * 7 = 540 kbytes of it. Similarly, the
160 decompressor will allocate 4600k but only touch 100k + 176 decompressor will allocate 3700k but only touch 100k +
161 20000 * 5 = 200 kbytes. 177 20000 * 4 = 180 kbytes.
162 178
163 Here is a table which summarises the maximum memory usage 179 Here is a table which summarises the maximum memory usage
164 for different block sizes. Also recorded is the total 180 for different block sizes. Also recorded is the total
@@ -172,64 +188,66 @@ bzip2(1) bzip2(1)
172 Compress Decompress Decompress Corpus 188 Compress Decompress Decompress Corpus
173 Flag usage usage -s usage Size 189 Flag usage usage -s usage Size
174 190
175 -1 1100k 600k 350k 914704 191 -1 1100k 500k 350k 914704
176 -2 1800k 1100k 600k 877703 192 -2 1800k 900k 600k 877703
177 -3 2500k 1600k 850k 860338
178 -4 3200k 2100k 1100k 846899
179 -5 3900k 2600k 1350k 845160
180 -6 4600k 3100k 1600k 838626
181 -7 5400k 3600k 1850k 834096
182 -8 6000k 4100k 2100k 828642
183 -9 6700k 4600k 2350k 828642
184 193
185 194
186OOPPTTIIOONNSS
187 --cc ----ssttddoouutt
188 Compress or decompress to standard output. -c will
189 decompress multiple files to stdout, but will only
190 compress a single file to stdout.
191
192 195
196 3
193 197
194 198
195 199
196 3
197 200
198 201
202bzip2(1) bzip2(1)
199 203
200 204
205 -3 2500k 1300k 850k 860338
206 -4 3200k 1700k 1100k 846899
207 -5 3900k 2100k 1350k 845160
208 -6 4600k 2500k 1600k 838626
209 -7 5400k 2900k 1850k 834096
210 -8 6000k 3300k 2100k 828642
211 -9 6700k 3700k 2350k 828642
201 212
202bzip2(1) bzip2(1)
203 213
214OOPPTTIIOONNSS
215 --cc ----ssttddoouutt
216 Compress or decompress to standard output. -c will
217 decompress multiple files to stdout, but will only
218 compress a single file to stdout.
204 219
205 --dd ----ddeeccoommpprreessss 220 --dd ----ddeeccoommpprreessss
206 Force decompression. _B_z_i_p_2 and _b_u_n_z_i_p_2 are really 221 Force decompression. _b_z_i_p_2_, _b_u_n_z_i_p_2 and _b_z_c_a_t are
207 the same program, and the decision about whether to 222 really the same program, and the decision about
208 compress or decompress is done on the basis of 223 what actions to take is done on the basis of which
209 which name is used. This flag overrides that mech- 224 name is used. This flag overrides that mechanism,
210 anism, and forces _b_z_i_p_2 to decompress. 225 and forces _b_z_i_p_2 to decompress.
211 226
212 --ff ----ccoommpprreessss 227 --zz ----ccoommpprreessss
213 The complement to -d: forces compression, regard- 228 The complement to -d: forces compression, regard-
214 less of the invokation name. 229 less of the invokation name.
215 230
216 --tt ----tteesstt 231 --tt ----tteesstt
217 Check integrity of the specified file(s), but don't 232 Check integrity of the specified file(s), but don't
218 decompress them. This really performs a trial 233 decompress them. This really performs a trial
219 decompression and throws away the result, using the 234 decompression and throws away the result.
220 low-memory decompression algorithm (see -s). 235
236 --ff ----ffoorrccee
237 Force overwrite of output files. Normally, _b_z_i_p_2
238 will not overwrite existing output files.
221 239
222 --kk ----kkeeeepp 240 --kk ----kkeeeepp
223 Keep (don't delete) input files during compression 241 Keep (don't delete) input files during compression
224 or decompression. 242 or decompression.
225 243
226 --ss ----ssmmaallll 244 --ss ----ssmmaallll
227 Reduce memory usage, both for compression and 245 Reduce memory usage, for compression, decompression
228 decompression. Files are decompressed using a mod- 246 and testing. Files are decompressed and tested
229 ified algorithm which only requires 2.5 bytes per 247 using a modified algorithm which only requires 2.5
230 block byte. This means any file can be decom- 248 bytes per block byte. This means any file can be
231 pressed in 2300k of memory, albeit somewhat more 249 decompressed in 2300k of memory, albeit at about
232 slowly than usual. 250 half the normal speed.
233 251
234 During compression, -s selects a block size of 252 During compression, -s selects a block size of
235 200k, which limits memory use to around the same 253 200k, which limits memory use to around the same
@@ -239,35 +257,32 @@ bzip2(1) bzip2(1)
239 MEMORY MANAGEMENT above. 257 MEMORY MANAGEMENT above.
240 258
241 259
260
261
262 4
263
264
265
266
267
268bzip2(1) bzip2(1)
269
270
242 --vv ----vveerrbboossee 271 --vv ----vveerrbboossee
243 Verbose mode -- show the compression ratio for each 272 Verbose mode -- show the compression ratio for each
244 file processed. Further -v's increase the ver- 273 file processed. Further -v's increase the ver-
245 bosity level, spewing out lots of information which 274 bosity level, spewing out lots of information which
246 is primarily of interest for diagnostic purposes. 275 is primarily of interest for diagnostic purposes.
247 276
248 --LL ----lliicceennssee 277 --LL ----lliicceennssee --VV ----vveerrssiioonn
249 Display the software version, license terms and 278 Display the software version, license terms and
250 conditions. 279 conditions.
251 280
252 --VV ----vveerrssiioonn
253 Same as -L.
254
255 --11 ttoo --99 281 --11 ttoo --99
256 Set the block size to 100 k, 200 k .. 900 k when 282 Set the block size to 100 k, 200 k .. 900 k when
257 compressing. Has no effect when decompressing. 283 compressing. Has no effect when decompressing.
258 See MEMORY MANAGEMENT above. 284 See MEMORY MANAGEMENT above.
259 285
260
261
262 4
263
264
265
266
267
268bzip2(1) bzip2(1)
269
270
271 ----rreeppeettiittiivvee--ffaasstt 286 ----rreeppeettiittiivvee--ffaasstt
272 _b_z_i_p_2 injects some small pseudo-random variations 287 _b_z_i_p_2 injects some small pseudo-random variations
273 into very repetitive blocks to limit worst-case 288 into very repetitive blocks to limit worst-case
@@ -306,34 +321,34 @@ RREECCOOVVEERRIINNGG DDAATTAA FFRROOMM DDAAMMAAGGEEDD F
306 _b_z_i_p_2_r_e_c_o_v_e_r takes a single argument, the name of the dam- 321 _b_z_i_p_2_r_e_c_o_v_e_r takes a single argument, the name of the dam-
307 aged file, and writes a number of files "rec0001file.bz2", 322 aged file, and writes a number of files "rec0001file.bz2",
308 "rec0002file.bz2", etc, containing the extracted blocks. 323 "rec0002file.bz2", etc, containing the extracted blocks.
309 The output filenames are designed so that the use of wild- 324 The output filenames are designed so that the use of
310 cards in subsequent processing -- for example, "bzip2 -dc
311 rec*file.bz2 > recovered_data" -- lists the files in the
312 "right" order.
313 325
314 _b_z_i_p_2_r_e_c_o_v_e_r should be of most use dealing with large .bz2
315 files, as these will contain many blocks. It is clearly
316 futile to use it on damaged single-block files, since a
317 damaged block cannot be recovered. If you wish to min-
318 imise any potential data loss through media or transmis-
319 sion errors, you might consider compressing with a smaller
320 block size.
321 326
322 327
323PPEERRFFOORRMMAANNCCEE NNOOTTEESS 328 5
324 The sorting phase of compression gathers together similar
325 329
326 330
327 331
328 5
329 332
330 333
334bzip2(1) bzip2(1)
331 335
332 336
337 wildcards in subsequent processing -- for example, "bzip2
338 -dc rec*file.bz2 > recovered_data" -- lists the files in
339 the "right" order.
333 340
334bzip2(1) bzip2(1) 341 _b_z_i_p_2_r_e_c_o_v_e_r should be of most use dealing with large .bz2
342 files, as these will contain many blocks. It is clearly
343 futile to use it on damaged single-block files, since a
344 damaged block cannot be recovered. If you wish to min-
345 imise any potential data loss through media or transmis-
346 sion errors, you might consider compressing with a smaller
347 block size.
335 348
336 349
350PPEERRFFOORRMMAANNCCEE NNOOTTEESS
351 The sorting phase of compression gathers together similar
337 strings in the file. Because of this, files containing 352 strings in the file. Because of this, files containing
338 very long runs of repeated symbols, like "aabaabaabaab 353 very long runs of repeated symbols, like "aabaabaabaab
339 ..." (repeated several hundred times) may compress 354 ..." (repeated several hundred times) may compress
@@ -348,10 +363,6 @@ bzip2(1) bzip2(1)
348 severe slowness in compression, try making the block size 363 severe slowness in compression, try making the block size
349 as small as possible, with flag -1. 364 as small as possible, with flag -1.
350 365
351 Incompressible or virtually-incompressible data may decom-
352 press rather more slowly than one would hope. This is due
353 to a naive implementation of the move-to-front coder.
354
355 _b_z_i_p_2 usually allocates several megabytes of memory to 366 _b_z_i_p_2 usually allocates several megabytes of memory to
356 operate in, and then charges all over it in a fairly ran- 367 operate in, and then charges all over it in a fairly ran-
357 dom fashion. This means that performance, both for com- 368 dom fashion. This means that performance, both for com-
@@ -362,12 +373,6 @@ bzip2(1) bzip2(1)
362 large performance improvements. I imagine _b_z_i_p_2 will per- 373 large performance improvements. I imagine _b_z_i_p_2 will per-
363 form best on machines with very large caches. 374 form best on machines with very large caches.
364 375
365 Test mode (-t) uses the low-memory decompression algorithm
366 (-s). This means test mode does not run as fast as it
367 could; it could run as fast as the normal decompression
368 machinery. This could easily be fixed at the cost of some
369 code bloat.
370
371 376
372CCAAVVEEAATTSS 377CCAAVVEEAATTSS
373 I/O error messages are not as helpful as they could be. 378 I/O error messages are not as helpful as they could be.
@@ -375,19 +380,14 @@ CCAAVVEEAATTSS
375 but the details of what the problem is sometimes seem 380 but the details of what the problem is sometimes seem
376 rather misleading. 381 rather misleading.
377 382
378 This manual page pertains to version 0.1 of _b_z_i_p_2_. It may 383 This manual page pertains to version 0.9.0 of _b_z_i_p_2_. Com-
379 well happen that some future version will use a different 384 pressed data created by this version is entirely forwards
380 compressed file format. If you try to decompress, using 385 and backwards compatible with the previous public release,
381 0.1, a .bz2 file created with some future version which 386 version 0.1pl2, but with the following exception: 0.9.0
382 uses a different compressed file format, 0.1 will complain 387 can correctly decompress multiple concatenated compressed
383 that your file "is not a bzip2 file". If that happens, 388 files. 0.1pl2 cannot do this; it will stop after decom-
384 you should obtain a more recent version of _b_z_i_p_2 and use 389 pressing just the first file in the stream.
385 that to decompress the file.
386 390
387 Wildcard expansion for Windows 95 and NT is flaky.
388
389 _b_z_i_p_2_r_e_c_o_v_e_r uses 32-bit integers to represent bit posi-
390 tions in compressed files, so it cannot handle compressed
391 391
392 392
393 393
@@ -400,61 +400,59 @@ CCAAVVEEAATTSS
400bzip2(1) bzip2(1) 400bzip2(1) bzip2(1)
401 401
402 402
403 files more than 512 megabytes long. This could easily be 403 Wildcard expansion for Windows 95 and NT is flaky.
404
405 _b_z_i_p_2_r_e_c_o_v_e_r uses 32-bit integers to represent bit posi-
406 tions in compressed files, so it cannot handle compressed
407 files more than 512 megabytes long. This could easily be
404 fixed. 408 fixed.
405 409
406 _b_z_i_p_2_r_e_c_o_v_e_r sometimes reports a very small, incomplete 410
407 final block. This is spurious and can be safely ignored. 411AAUUTTHHOORR
412 Julian Seward, jseward@acm.org.
413 http://www.muraroa.demon.co.uk
414
415 The ideas embodied in _b_z_i_p_2 are due to (at least) the fol-
416 lowing people: Michael Burrows and David Wheeler (for the
417 block sorting transformation), David Wheeler (again, for
418 the Huffman coder), Peter Fenwick (for the structured cod-
419 ing model in the original _b_z_i_p_, and many refinements), and
420 Alistair Moffat, Radford Neal and Ian Witten (for the
421 arithmetic coder in the original _b_z_i_p_)_. I am much
422 indebted for their help, support and advice. See the man-
423 ual in the source distribution for pointers to sources of
424 documentation. Christian von Roques encouraged me to look
425 for faster sorting algorithms, so as to speed up compres-
426 sion. Bela Lubkin encouraged me to improve the worst-case
427 compression performance. Many people sent patches, helped
428 with portability problems, lent machines, gave advice and
429 were generally helpful.
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
408 448
409 449
410RREELLAATTIIOONNSSHHIIPP TTOO bbzziipp--00..2211
411 This program is a descendant of the _b_z_i_p program, version
412 0.21, which I released in August 1996. The primary dif-
413 ference of _b_z_i_p_2 is its avoidance of the possibly patented
414 algorithms which were used in 0.21. _b_z_i_p_2 also brings
415 various useful refinements (-s, -t), uses less memory,
416 decompresses significantly faster, and has support for
417 recovering data from damaged files.
418 450
419 Because _b_z_i_p_2 uses Huffman coding to construct the com-
420 pressed bitstream, rather than the arithmetic coding used
421 in 0.21, the compressed representations generated by the
422 two programs are incompatible, and they will not interop-
423 erate. The change in suffix from .bz to .bz2 reflects
424 this. It would have been helpful to at least allow _b_z_i_p_2
425 to decompress files created by 0.21, but this would defeat
426 the primary aim of having a patent-free compressor.
427 451
428 For a more precise statement about patent issues in bzip2,
429 please see the README file in the distribution.
430 452
431 Huffman coding necessarily involves some coding ineffi-
432 ciency compared to arithmetic coding. This means that
433 _b_z_i_p_2 compresses about 1% worse than 0.21, an unfortunate
434 but unavoidable fact-of-life. On the other hand, decom-
435 pression is approximately 50% faster for the same reason,
436 and the change in file format gave an opportunity to add
437 data-recovery features. So it is not all bad.
438 453
439 454
440AAUUTTHHOORR
441 Julian Seward, jseward@acm.org.
442 455
443 The ideas embodied in _b_z_i_p and _b_z_i_p_2 are due to (at least)
444 the following people: Michael Burrows and David Wheeler
445 (for the block sorting transformation), David Wheeler
446 (again, for the Huffman coder), Peter Fenwick (for the
447 structured coding model in 0.21, and many refinements),
448 and Alistair Moffat, Radford Neal and Ian Witten (for the
449 arithmetic coder in 0.21). I am much indebted for their
450 help, support and advice. See the file ALGORITHMS in the
451 source distribution for pointers to sources of documenta-
452 tion. Christian von Roques encouraged me to look for
453 faster sorting algorithms, so as to speed up compression.
454 Bela Lubkin encouraged me to improve the worst-case com-
455 pression performance. Many people sent patches, helped
456 with portability problems, lent machines, gave advice and
457 were generally helpful.
458 456
459 457
460 458