aboutsummaryrefslogtreecommitdiff
path: root/bzip2.1
diff options
context:
space:
mode:
authorJulian Seward <jseward@acm.org>1998-08-23 22:13:13 +0200
committerJulian Seward <jseward@acm.org>1998-08-23 22:13:13 +0200
commit977101ad5f833f5c0a574bfeea408e5301a6b052 (patch)
treefc1e8fed202869c116cbf6b8c362456042494a0a /bzip2.1
parent1eb67a9d8f7f05ae310bc9ef297d176f3a3f8a37 (diff)
downloadbzip2-977101ad5f833f5c0a574bfeea408e5301a6b052.tar.gz
bzip2-977101ad5f833f5c0a574bfeea408e5301a6b052.tar.bz2
bzip2-977101ad5f833f5c0a574bfeea408e5301a6b052.zip
bzip2-0.9.0cbzip2-0.9.0c
Diffstat (limited to 'bzip2.1')
-rw-r--r--bzip2.1191
1 files changed, 83 insertions, 108 deletions
diff --git a/bzip2.1 b/bzip2.1
index 489668f..a6789a4 100644
--- a/bzip2.1
+++ b/bzip2.1
@@ -1,21 +1,29 @@
1.PU 1.PU
2.TH bzip2 1 2.TH bzip2 1
3.SH NAME 3.SH NAME
4bzip2, bunzip2 \- a block-sorting file compressor, v0.1 4bzip2, bunzip2 \- a block-sorting file compressor, v0.9.0
5.br
6bzcat \- decompresses files to stdout
5.br 7.br
6bzip2recover \- recovers data from damaged bzip2 files 8bzip2recover \- recovers data from damaged bzip2 files
7 9
8.SH SYNOPSIS 10.SH SYNOPSIS
9.ll +8 11.ll +8
10.B bzip2 12.B bzip2
11.RB [ " \-cdfkstvVL123456789 " ] 13.RB [ " \-cdfkstvzVL123456789 " ]
12[ 14[
13.I "filenames \&..." 15.I "filenames \&..."
14] 16]
15.ll -8 17.ll -8
16.br 18.br
17.B bunzip2 19.B bunzip2
18.RB [ " \-kvsVL " ] 20.RB [ " \-fkvsVL " ]
21[
22.I "filenames \&..."
23]
24.br
25.B bzcat
26.RB [ " \-s " ]
19[ 27[
20.I "filenames \&..." 28.I "filenames \&..."
21] 29]
@@ -24,7 +32,7 @@ bzip2recover \- recovers data from damaged bzip2 files
24.I "filename" 32.I "filename"
25 33
26.SH DESCRIPTION 34.SH DESCRIPTION
27.I Bzip2 35.I bzip2
28compresses files using the Burrows-Wheeler block-sorting 36compresses files using the Burrows-Wheeler block-sorting
29text compression algorithm, and Huffman coding. 37text compression algorithm, and Huffman coding.
30Compression is generally considerably 38Compression is generally considerably
@@ -38,7 +46,7 @@ those of
38.I GNU Gzip, 46.I GNU Gzip,
39but they are not identical. 47but they are not identical.
40 48
41.I Bzip2 49.I bzip2
42expects a list of file names to accompany the command-line flags. 50expects a list of file names to accompany the command-line flags.
43Each file is replaced by a compressed version of itself, 51Each file is replaced by a compressed version of itself,
44with the name "original_name.bz2". 52with the name "original_name.bz2".
@@ -50,11 +58,11 @@ original file names, permissions and dates in filesystems
50which lack these concepts, or have serious file name length 58which lack these concepts, or have serious file name length
51restrictions, such as MS-DOS. 59restrictions, such as MS-DOS.
52 60
53.I Bzip2 61.I bzip2
54and 62and
55.I bunzip2 63.I bunzip2
56will not overwrite existing files; if you want this to happen, 64will by default not overwrite existing files;
57you should delete them first. 65if you want this to happen, specify the \-f flag.
58 66
59If no file names are specified, 67If no file names are specified,
60.I bzip2 68.I bzip2
@@ -64,7 +72,7 @@ In this case,
64will decline to write compressed output to a terminal, as 72will decline to write compressed output to a terminal, as
65this would be entirely incomprehensible and therefore pointless. 73this would be entirely incomprehensible and therefore pointless.
66 74
67.I Bunzip2 75.I bunzip2
68(or 76(or
69.I bzip2 \-d 77.I bzip2 \-d
70) decompresses and restores all specified files whose names 78) decompresses and restores all specified files whose names
@@ -73,12 +81,28 @@ Files without this suffix are ignored.
73Again, supplying no filenames 81Again, supplying no filenames
74causes decompression from standard input to standard output. 82causes decompression from standard input to standard output.
75 83
84.I bunzip2
85will correctly decompress a file which is the concatenation
86of two or more compressed files. The result is the concatenation
87of the corresponding uncompressed files. Integrity testing
88(\-t) of concatenated compressed files is also supported.
89
76You can also compress or decompress files to 90You can also compress or decompress files to
77the standard output by giving the \-c flag. 91the standard output by giving the \-c flag.
78You can decompress multiple files like this, but you may 92Multiple files may be compressed and decompressed like this.
79only compress a single file this way, since it would otherwise 93The resulting outputs are fed sequentially to stdout.
80be difficult to separate out the compressed representations of 94Compression of multiple files in this manner generates
81the original files. 95a stream containing multiple compressed file representations.
96Such a stream can be decompressed correctly only by
97.I bzip2
98version 0.9.0 or later. Earlier versions of
99.I bzip2
100will stop after decompressing the first file in the stream.
101
102.I bzcat
103(or
104.I bzip2 \-dc
105) decompresses all specified files to the standard output.
82 106
83Compression is always performed, even if the compressed file is 107Compression is always performed, even if the compressed file is
84slightly larger than the original. Files of less than about 108slightly larger than the original. Files of less than about
@@ -132,7 +156,7 @@ Compression and decompression requirements, in bytes, can be estimated as:
132 156
133 Compression: 400k + ( 7 x block size ) 157 Compression: 400k + ( 7 x block size )
134 158
135 Decompression: 100k + ( 5 x block size ), or 159 Decompression: 100k + ( 4 x block size ), or
136.br 160.br
137 100k + ( 2.5 x block size ) 161 100k + ( 2.5 x block size )
138 162
@@ -147,7 +171,7 @@ choice of block size.
147 171
148For files compressed with the default 900k block size, 172For files compressed with the default 900k block size,
149.I bunzip2 173.I bunzip2
150will require about 4600 kbytes to decompress. 174will require about 3700 kbytes to decompress.
151To support decompression of any file on a 4 megabyte machine, 175To support decompression of any file on a 4 megabyte machine,
152.I bunzip2 176.I bunzip2
153has an option to decompress using approximately half this 177has an option to decompress using approximately half this
@@ -168,8 +192,8 @@ For example, compressing a file 20,000 bytes long with the flag
168\-9 192\-9
169will cause the compressor to allocate around 193will cause the compressor to allocate around
1706700k of memory, but only touch 400k + 20000 * 7 = 540 1946700k of memory, but only touch 400k + 20000 * 7 = 540
171kbytes of it. Similarly, the decompressor will allocate 4600k but 195kbytes of it. Similarly, the decompressor will allocate 3700k but
172only touch 100k + 20000 * 5 = 200 kbytes. 196only touch 100k + 20000 * 4 = 180 kbytes.
173 197
174Here is a table which summarises the maximum memory usage for 198Here is a table which summarises the maximum memory usage for
175different block sizes. Also recorded is the total compressed 199different block sizes. Also recorded is the total compressed
@@ -182,71 +206,73 @@ Corpus is dominated by smaller files.
182 Compress Decompress Decompress Corpus 206 Compress Decompress Decompress Corpus
183 Flag usage usage -s usage Size 207 Flag usage usage -s usage Size
184 208
185 -1 1100k 600k 350k 914704 209 -1 1100k 500k 350k 914704
186 -2 1800k 1100k 600k 877703 210 -2 1800k 900k 600k 877703
187 -3 2500k 1600k 850k 860338 211 -3 2500k 1300k 850k 860338
188 -4 3200k 2100k 1100k 846899 212 -4 3200k 1700k 1100k 846899
189 -5 3900k 2600k 1350k 845160 213 -5 3900k 2100k 1350k 845160
190 -6 4600k 3100k 1600k 838626 214 -6 4600k 2500k 1600k 838626
191 -7 5400k 3600k 1850k 834096 215 -7 5400k 2900k 1850k 834096
192 -8 6000k 4100k 2100k 828642 216 -8 6000k 3300k 2100k 828642
193 -9 6700k 4600k 2350k 828642 217 -9 6700k 3700k 2350k 828642
194 218
195.SH OPTIONS 219.SH OPTIONS
196.TP 220.TP
197.B \-c --stdout 221.B \-c --stdout
198Compress or decompress to standard output. \-c will decompress 222Compress or decompress to standard output. \-c will decompress
199multiple files to stdout, but will only compress a single file to 223multiple files to stdout, but will only compress a single file to
200stdout. 224stdout.
201.TP 225.TP
202.B \-d --decompress 226.B \-d --decompress
203Force decompression. 227Force decompression.
204.I Bzip2 228.I bzip2,
205and
206.I bunzip2 229.I bunzip2
207are really the same program, and the decision about whether to 230and
208compress or decompress is done on the basis of which name is 231.I bzcat
232are really the same program, and the decision about what actions
233to take is done on the basis of which name is
209used. This flag overrides that mechanism, and forces 234used. This flag overrides that mechanism, and forces
210.I bzip2 235.I bzip2
211to decompress. 236to decompress.
212.TP 237.TP
213.B \-f --compress 238.B \-z --compress
214The complement to \-d: forces compression, regardless of the invokation 239The complement to \-d: forces compression, regardless of the invokation
215name. 240name.
216.TP 241.TP
217.B \-t --test 242.B \-t --test
218Check integrity of the specified file(s), but don't decompress them. 243Check integrity of the specified file(s), but don't decompress them.
219This really performs a trial decompression and throws away the result, 244This really performs a trial decompression and throws away the result.
220using the low-memory decompression algorithm (see \-s). 245.TP
246.B \-f --force
247Force overwrite of output files. Normally,
248.I bzip2
249will not overwrite existing output files.
221.TP 250.TP
222.B \-k --keep 251.B \-k --keep
223Keep (don't delete) input files during compression or decompression. 252Keep (don't delete) input files during compression or decompression.
224.TP 253.TP
225.B \-s --small 254.B \-s --small
226Reduce memory usage, both for compression and decompression. 255Reduce memory usage, for compression, decompression and
227Files are decompressed using a modified algorithm which only 256testing.
257Files are decompressed and tested using a modified algorithm which only
228requires 2.5 bytes per block byte. This means any file can be 258requires 2.5 bytes per block byte. This means any file can be
229decompressed in 2300k of memory, albeit somewhat more slowly than 259decompressed in 2300k of memory, albeit at about half the normal
230usual. 260speed.
231 261
232During compression, -s selects a block size of 200k, which limits 262During compression, -s selects a block size of 200k, which limits
233memory use to around the same figure, at the expense of your 263memory use to around the same figure, at the expense of your
234compression ratio. In short, if your machine is low on memory 264compression ratio. In short, if your machine is low on memory
235(8 megabytes or less), use -s for everything. See 265(8 megabytes or less), use -s for everything. See
236MEMORY MANAGEMENT above. 266MEMORY MANAGEMENT above.
237
238.TP 267.TP
239.B \-v --verbose 268.B \-v --verbose
240Verbose mode -- show the compression ratio for each file processed. 269Verbose mode -- show the compression ratio for each file processed.
241Further \-v's increase the verbosity level, spewing out lots of 270Further \-v's increase the verbosity level, spewing out lots of
242information which is primarily of interest for diagnostic purposes. 271information which is primarily of interest for diagnostic purposes.
243.TP 272.TP
244.B \-L --license 273.B \-L --license -V --version
245Display the software version, license terms and conditions. 274Display the software version, license terms and conditions.
246.TP 275.TP
247.B \-V --version
248Same as \-L.
249.TP
250.B \-1 to \-9 276.B \-1 to \-9
251Set the block size to 100 k, 200 k .. 900 k when 277Set the block size to 100 k, 200 k .. 900 k when
252compressing. Has no effect when decompressing. 278compressing. Has no effect when decompressing.
@@ -329,10 +355,6 @@ to compress the latter.
329If you do get a file which causes severe slowness in compression, 355If you do get a file which causes severe slowness in compression,
330try making the block size as small as possible, with flag \-1. 356try making the block size as small as possible, with flag \-1.
331 357
332Incompressible or virtually-incompressible data may decompress
333rather more slowly than one would hope. This is due to
334a naive implementation of the move-to-front coder.
335
336.I bzip2 358.I bzip2
337usually allocates several megabytes of memory to operate in, 359usually allocates several megabytes of memory to operate in,
338and then charges all over it in a fairly random fashion. This 360and then charges all over it in a fairly random fashion. This
@@ -346,28 +368,19 @@ I imagine
346.I bzip2 368.I bzip2
347will perform best on machines with very large caches. 369will perform best on machines with very large caches.
348 370
349Test mode (\-t) uses the low-memory decompression algorithm
350(\-s). This means test mode does not run as fast as it could;
351it could run as fast as the normal decompression machinery.
352This could easily be fixed at the cost of some code bloat.
353
354.SH CAVEATS 371.SH CAVEATS
355I/O error messages are not as helpful as they could be. 372I/O error messages are not as helpful as they could be.
356.I Bzip2 373.I Bzip2
357tries hard to detect I/O errors and exit cleanly, but the 374tries hard to detect I/O errors and exit cleanly, but the
358details of what the problem is sometimes seem rather misleading. 375details of what the problem is sometimes seem rather misleading.
359 376
360This manual page pertains to version 0.1 of 377This manual page pertains to version 0.9.0 of
361.I bzip2. 378.I bzip2.
362It may well happen that some future version will 379Compressed data created by this version is entirely forwards and
363use a different compressed file format. If you try to 380backwards compatible with the previous public release, version 0.1pl2,
364decompress, using 0.1, a .bz2 file created with some 381but with the following exception: 0.9.0 can correctly decompress
365future version which uses a different compressed file format, 382multiple concatenated compressed files. 0.1pl2 cannot do this; it
3660.1 will complain that your file "is not a bzip2 file". 383will stop after decompressing just the first file in the stream.
367If that happens, you should obtain a more recent version
368of
369.I bzip2
370and use that to decompress the file.
371 384
372Wildcard expansion for Windows 95 and NT 385Wildcard expansion for Windows 95 and NT
373is flaky. 386is flaky.
@@ -377,63 +390,25 @@ uses 32-bit integers to represent bit positions in
377compressed files, so it cannot handle compressed files 390compressed files, so it cannot handle compressed files
378more than 512 megabytes long. This could easily be fixed. 391more than 512 megabytes long. This could easily be fixed.
379 392
380.I bzip2recover
381sometimes reports a very small, incomplete final block.
382This is spurious and can be safely ignored.
383
384.SH RELATIONSHIP TO bzip-0.21
385This program is a descendant of the
386.I bzip
387program, version 0.21, which I released in August 1996.
388The primary difference of
389.I bzip2
390is its avoidance of the possibly patented algorithms
391which were used in 0.21.
392.I bzip2
393also brings various useful refinements (\-s, \-t),
394uses less memory, decompresses significantly faster, and
395has support for recovering data from damaged files.
396
397Because
398.I bzip2
399uses Huffman coding to construct the compressed bitstream,
400rather than the arithmetic coding used in 0.21,
401the compressed representations generated by the two programs
402are incompatible, and they will not interoperate. The change
403in suffix from .bz to .bz2 reflects this. It would have been
404helpful to at least allow
405.I bzip2
406to decompress files created by 0.21, but this would
407defeat the primary aim of having a patent-free compressor.
408
409For a more precise statement about patent issues in
410bzip2, please see the README file in the distribution.
411
412Huffman coding necessarily involves some coding inefficiency
413compared to arithmetic coding. This means that
414.I bzip2
415compresses about 1% worse than 0.21, an unfortunate but
416unavoidable fact-of-life. On the other hand, decompression
417is approximately 50% faster for the same reason, and the
418change in file format gave an opportunity to add data-recovery
419features. So it is not all bad.
420
421.SH AUTHOR 393.SH AUTHOR
422Julian Seward, jseward@acm.org. 394Julian Seward, jseward@acm.org.
423 395
396http://www.muraroa.demon.co.uk
397
424The ideas embodied in 398The ideas embodied in
425.I bzip
426and
427.I bzip2 399.I bzip2
428are due to (at least) the following people: 400are due to (at least) the following people:
429Michael Burrows and David Wheeler (for the block sorting 401Michael Burrows and David Wheeler (for the block sorting
430transformation), David Wheeler (again, for the Huffman coder), 402transformation), David Wheeler (again, for the Huffman coder),
431Peter Fenwick (for the structured coding model in 0.21, 403Peter Fenwick (for the structured coding model in the original
404.I bzip,
432and many refinements), 405and many refinements),
433and 406and
434Alistair Moffat, Radford Neal and Ian Witten (for the arithmetic 407Alistair Moffat, Radford Neal and Ian Witten (for the arithmetic
435coder in 0.21). I am much indebted for their help, support and advice. 408coder in the original
436See the file ALGORITHMS in the source distribution for pointers to 409.I bzip).
410I am much indebted for their help, support and advice.
411See the manual in the source distribution for pointers to
437sources of documentation. 412sources of documentation.
438Christian von Roques encouraged me to look for faster 413Christian von Roques encouraged me to look for faster
439sorting algorithms, so as to speed up compression. 414sorting algorithms, so as to speed up compression.