aboutsummaryrefslogtreecommitdiff
path: root/bzip2.1
diff options
context:
space:
mode:
authorJulian Seward <jseward@acm.org>1999-09-04 22:13:13 +0200
committerJulian Seward <jseward@acm.org>1999-09-04 22:13:13 +0200
commitf93cd82a9a7094ad90fd19bbc6ccf6f4627f8060 (patch)
treec95407df5665f5a7395683f07552f2b13f2e501f /bzip2.1
parent977101ad5f833f5c0a574bfeea408e5301a6b052 (diff)
downloadbzip2-f93cd82a9a7094ad90fd19bbc6ccf6f4627f8060.tar.gz
bzip2-f93cd82a9a7094ad90fd19bbc6ccf6f4627f8060.tar.bz2
bzip2-f93cd82a9a7094ad90fd19bbc6ccf6f4627f8060.zip
bzip2-0.9.5dbzip2-0.9.5d
Diffstat (limited to 'bzip2.1')
-rw-r--r--bzip2.1610
1 files changed, 314 insertions, 296 deletions
diff --git a/bzip2.1 b/bzip2.1
index a6789a4..99eda9b 100644
--- a/bzip2.1
+++ b/bzip2.1
@@ -1,7 +1,7 @@
1.PU 1.PU
2.TH bzip2 1 2.TH bzip2 1
3.SH NAME 3.SH NAME
4bzip2, bunzip2 \- a block-sorting file compressor, v0.9.0 4bzip2, bunzip2 \- a block-sorting file compressor, v0.9.5
5.br 5.br
6bzcat \- decompresses files to stdout 6bzcat \- decompresses files to stdout
7.br 7.br
@@ -10,7 +10,7 @@ bzip2recover \- recovers data from damaged bzip2 files
10.SH SYNOPSIS 10.SH SYNOPSIS
11.ll +8 11.ll +8
12.B bzip2 12.B bzip2
13.RB [ " \-cdfkstvzVL123456789 " ] 13.RB [ " \-cdfkqstvzVL123456789 " ]
14[ 14[
15.I "filenames \&..." 15.I "filenames \&..."
16] 16]
@@ -18,13 +18,13 @@ bzip2recover \- recovers data from damaged bzip2 files
18.br 18.br
19.B bunzip2 19.B bunzip2
20.RB [ " \-fkvsVL " ] 20.RB [ " \-fkvsVL " ]
21[ 21[
22.I "filenames \&..." 22.I "filenames \&..."
23] 23]
24.br 24.br
25.B bzcat 25.B bzcat
26.RB [ " \-s " ] 26.RB [ " \-s " ]
27[ 27[
28.I "filenames \&..." 28.I "filenames \&..."
29] 29]
30.br 30.br
@@ -33,211 +33,171 @@ bzip2recover \- recovers data from damaged bzip2 files
33 33
34.SH DESCRIPTION 34.SH DESCRIPTION
35.I bzip2 35.I bzip2
36compresses files using the Burrows-Wheeler block-sorting 36compresses files using the Burrows-Wheeler block sorting
37text compression algorithm, and Huffman coding. 37text compression algorithm, and Huffman coding. Compression is
38Compression is generally considerably 38generally considerably better than that achieved by more conventional
39better than that 39LZ77/LZ78-based compressors, and approaches the performance of the PPM
40achieved by more conventional LZ77/LZ78-based compressors, 40family of statistical compressors.
41and approaches the performance of the PPM family of statistical
42compressors.
43 41
44The command-line options are deliberately very similar to 42The command-line options are deliberately very similar to
45those of 43those of
46.I GNU Gzip, 44.I GNU gzip,
47but they are not identical. 45but they are not identical.
48 46
49.I bzip2 47.I bzip2
50expects a list of file names to accompany the command-line flags. 48expects a list of file names to accompany the
51Each file is replaced by a compressed version of itself, 49command-line flags. Each file is replaced by a compressed version of
52with the name "original_name.bz2". 50itself, with the name "original_name.bz2".
53Each compressed file has the same modification date and permissions 51Each compressed file
54as the corresponding original, so that these properties can be 52has the same modification date, permissions, and, when possible,
55correctly restored at decompression time. File name handling is 53ownership as the corresponding original, so that these properties can
56naive in the sense that there is no mechanism for preserving 54be correctly restored at decompression time. File name handling is
57original file names, permissions and dates in filesystems 55naive in the sense that there is no mechanism for preserving original
58which lack these concepts, or have serious file name length 56file names, permissions, ownerships or dates in filesystems which lack
59restrictions, such as MS-DOS. 57these concepts, or have serious file name length restrictions, such as
58MS-DOS.
60 59
61.I bzip2 60.I bzip2
62and 61and
63.I bunzip2 62.I bunzip2
64will by default not overwrite existing files; 63will by default not overwrite existing
65if you want this to happen, specify the \-f flag. 64files. If you want this to happen, specify the \-f flag.
66 65
67If no file names are specified, 66If no file names are specified,
68.I bzip2 67.I bzip2
69compresses from standard input to standard output. 68compresses from standard
70In this case, 69input to standard output. In this case,
71.I bzip2 70.I bzip2
72will decline to write compressed output to a terminal, as 71will decline to
73this would be entirely incomprehensible and therefore pointless. 72write compressed output to a terminal, as this would be entirely
73incomprehensible and therefore pointless.
74 74
75.I bunzip2 75.I bunzip2
76(or 76(or
77.I bzip2 \-d 77.I bzip2 \-d)
78) decompresses and restores all specified files whose names 78decompresses all
79end in ".bz2". 79specified files. Files which were not created by
80Files without this suffix are ignored.
81Again, supplying no filenames
82causes decompression from standard input to standard output.
83
84.I bunzip2
85will correctly decompress a file which is the concatenation
86of two or more compressed files. The result is the concatenation
87of the corresponding uncompressed files. Integrity testing
88(\-t) of concatenated compressed files is also supported.
89
90You can also compress or decompress files to
91the standard output by giving the \-c flag.
92Multiple files may be compressed and decompressed like this.
93The resulting outputs are fed sequentially to stdout.
94Compression of multiple files in this manner generates
95a stream containing multiple compressed file representations.
96Such a stream can be decompressed correctly only by
97.I bzip2 80.I bzip2
98version 0.9.0 or later. Earlier versions of 81will be detected and ignored, and a warning issued.
99.I bzip2 82.I bzip2
100will stop after decompressing the first file in the stream. 83attempts to guess the filename for the decompressed file
84from that of the compressed file as follows:
85
86 filename.bz2 becomes filename
87 filename.bz becomes filename
88 filename.tbz2 becomes filename.tar
89 filename.tbz becomes filename.tar
90 anyothername becomes anyothername.out
91
92If the file does not end in one of the recognised endings,
93.I .bz2,
94.I .bz,
95.I .tbz2
96or
97.I .tbz,
98.I bzip2
99complains that it cannot
100guess the name of the original file, and uses the original name
101with
102.I .out
103appended.
104
105As with compression, supplying no
106filenames causes decompression from
107standard input to standard output.
108
109.I bunzip2
110will correctly decompress a file which is the
111concatenation of two or more compressed files. The result is the
112concatenation of the corresponding uncompressed files. Integrity
113testing (\-t)
114of concatenated
115compressed files is also supported.
116
117You can also compress or decompress files to the standard output by
118giving the \-c flag. Multiple files may be compressed and
119decompressed like this. The resulting outputs are fed sequentially to
120stdout. Compression of multiple files
121in this manner generates a stream
122containing multiple compressed file representations. Such a stream
123can be decompressed correctly only by
124.I bzip2
125version 0.9.0 or
126later. Earlier versions of
127.I bzip2
128will stop after decompressing
129the first file in the stream.
101 130
102.I bzcat 131.I bzcat
103(or 132(or
104.I bzip2 \-dc 133.I bzip2 -dc)
105) decompresses all specified files to the standard output. 134decompresses all specified files to
106 135the standard output.
107Compression is always performed, even if the compressed file is 136
108slightly larger than the original. Files of less than about
109one hundred bytes tend to get larger, since the compression
110mechanism has a constant overhead in the region of 50 bytes.
111Random data (including the output of most file compressors)
112is coded at about 8.05 bits per byte, giving an expansion of
113around 0.5%.
114
115As a self-check for your protection,
116.I bzip2 137.I bzip2
117uses 32-bit CRCs to make sure that the decompressed 138will read arguments from the environment variables
118version of a file is identical to the original. 139.I BZIP2
119This guards against corruption of the compressed data, 140and
120and against undetected bugs in 141.I BZIP,
142in that order, and will process them
143before any arguments read from the command line. This gives a
144convenient way to supply default arguments.
145
146Compression is always performed, even if the compressed
147file is slightly
148larger than the original. Files of less than about one hundred bytes
149tend to get larger, since the compression mechanism has a constant
150overhead in the region of 50 bytes. Random data (including the output
151of most file compressors) is coded at about 8.05 bits per byte, giving
152an expansion of around 0.5%.
153
154As a self-check for your protection,
155.I
156bzip2
157uses 32-bit CRCs to
158make sure that the decompressed version of a file is identical to the
159original. This guards against corruption of the compressed data, and
160against undetected bugs in
121.I bzip2 161.I bzip2
122(hopefully very unlikely). 162(hopefully very unlikely). The
123The chances of data corruption going undetected is 163chances of data corruption going undetected is microscopic, about one
124microscopic, about one chance in four billion 164chance in four billion for each file processed. Be aware, though, that
125for each file processed. Be aware, though, that the check 165the check occurs upon decompression, so it can only tell you that
126occurs upon decompression, so it can only tell you that 166something is wrong. It can't help you
127that something is wrong. It can't help you recover the 167recover the original uncompressed
128original uncompressed data. 168data. You can use
129You can use
130.I bzip2recover 169.I bzip2recover
131to try to recover data from damaged files. 170to try to recover data from
132 171damaged files.
133Return values:
1340 for a normal exit,
1351 for environmental
136problems (file not found, invalid flags, I/O errors, &c),
1372 to indicate a corrupt compressed file,
1383 for an internal consistency error (eg, bug) which caused
139.I bzip2
140to panic.
141 172
142.SH MEMORY MANAGEMENT 173Return values: 0 for a normal exit, 1 for environmental problems (file
143.I Bzip2 174not found, invalid flags, I/O errors, &c), 2 to indicate a corrupt
144compresses large files in blocks. The block size affects both the 175compressed file, 3 for an internal consistency error (eg, bug) which
145compression ratio achieved, and the amount of memory needed both for 176caused
146compression and decompression. The flags \-1 through \-9
147specify the block size to be 100,000 bytes through 900,000 bytes
148(the default) respectively. At decompression-time, the block size used for
149compression is read from the header of the compressed file, and
150.I bunzip2
151then allocates itself just enough memory to decompress the file.
152Since block sizes are stored in compressed files, it follows that the flags
153\-1 to \-9
154are irrelevant to and so ignored during decompression.
155Compression and decompression requirements, in bytes, can be estimated as:
156
157 Compression: 400k + ( 7 x block size )
158
159 Decompression: 100k + ( 4 x block size ), or
160.br
161 100k + ( 2.5 x block size )
162
163Larger block sizes give rapidly diminishing marginal returns; most
164of the
165compression comes from the first two or three hundred k of block size,
166a fact worth bearing in mind when using
167.I bzip2 177.I bzip2
168on small machines. It is also important to appreciate that the 178to panic.
169decompression memory requirement is set at compression-time by the
170choice of block size.
171
172For files compressed with the default 900k block size,
173.I bunzip2
174will require about 3700 kbytes to decompress.
175To support decompression of any file on a 4 megabyte machine,
176.I bunzip2
177has an option to decompress using approximately half this
178amount of memory, about 2300 kbytes. Decompression speed is
179also halved, so you should use this option only where necessary.
180The relevant flag is \-s.
181
182In general, try and use the largest block size
183memory constraints allow, since that maximises the compression
184achieved. Compression and decompression
185speed are virtually unaffected by block size.
186
187Another significant point applies to files which fit in a single
188block -- that means most files you'd encounter using a large
189block size. The amount of real memory touched is proportional
190to the size of the file, since the file is smaller than a block.
191For example, compressing a file 20,000 bytes long with the flag
192\-9
193will cause the compressor to allocate around
1946700k of memory, but only touch 400k + 20000 * 7 = 540
195kbytes of it. Similarly, the decompressor will allocate 3700k but
196only touch 100k + 20000 * 4 = 180 kbytes.
197
198Here is a table which summarises the maximum memory usage for
199different block sizes. Also recorded is the total compressed
200size for 14 files of the Calgary Text Compression Corpus
201totalling 3,141,622 bytes. This column gives some feel for how
202compression varies with block size. These figures tend to understate
203the advantage of larger block sizes for larger files, since the
204Corpus is dominated by smaller files.
205
206 Compress Decompress Decompress Corpus
207 Flag usage usage -s usage Size
208
209 -1 1100k 500k 350k 914704
210 -2 1800k 900k 600k 877703
211 -3 2500k 1300k 850k 860338
212 -4 3200k 1700k 1100k 846899
213 -5 3900k 2100k 1350k 845160
214 -6 4600k 2500k 1600k 838626
215 -7 5400k 2900k 1850k 834096
216 -8 6000k 3300k 2100k 828642
217 -9 6700k 3700k 2350k 828642
218 179
219.SH OPTIONS 180.SH OPTIONS
220.TP 181.TP
221.B \-c --stdout 182.B \-c --stdout
222Compress or decompress to standard output. \-c will decompress 183Compress or decompress to standard output.
223multiple files to stdout, but will only compress a single file to
224stdout.
225.TP 184.TP
226.B \-d --decompress 185.B \-d --decompress
227Force decompression. 186Force decompression.
228.I bzip2, 187.I bzip2,
229.I bunzip2 188.I bunzip2
230and 189and
231.I bzcat 190.I bzcat
232are really the same program, and the decision about what actions 191are
233to take is done on the basis of which name is 192really the same program, and the decision about what actions to take is
234used. This flag overrides that mechanism, and forces 193done on the basis of which name is used. This flag overrides that
194mechanism, and forces
235.I bzip2 195.I bzip2
236to decompress. 196to decompress.
237.TP 197.TP
238.B \-z --compress 198.B \-z --compress
239The complement to \-d: forces compression, regardless of the invokation 199The complement to \-d: forces compression, regardless of the
240name. 200invokation name.
241.TP 201.TP
242.B \-t --test 202.B \-t --test
243Check integrity of the specified file(s), but don't decompress them. 203Check integrity of the specified file(s), but don't decompress them.
@@ -245,25 +205,31 @@ This really performs a trial decompression and throws away the result.
245.TP 205.TP
246.B \-f --force 206.B \-f --force
247Force overwrite of output files. Normally, 207Force overwrite of output files. Normally,
248.I bzip2 208.I bzip2
249will not overwrite existing output files. 209will not overwrite
210existing output files. Also forces
211.I bzip2
212to break hard links
213to files, which it otherwise wouldn't do.
250.TP 214.TP
251.B \-k --keep 215.B \-k --keep
252Keep (don't delete) input files during compression or decompression. 216Keep (don't delete) input files during compression
217or decompression.
253.TP 218.TP
254.B \-s --small 219.B \-s --small
255Reduce memory usage, for compression, decompression and 220Reduce memory usage, for compression, decompression and testing. Files
256testing. 221are decompressed and tested using a modified algorithm which only
257Files are decompressed and tested using a modified algorithm which only
258requires 2.5 bytes per block byte. This means any file can be 222requires 2.5 bytes per block byte. This means any file can be
259decompressed in 2300k of memory, albeit at about half the normal 223decompressed in 2300k of memory, albeit at about half the normal speed.
260speed. 224
261 225During compression, \-s selects a block size of 200k, which limits
262During compression, -s selects a block size of 200k, which limits 226memory use to around the same figure, at the expense of your compression
263memory use to around the same figure, at the expense of your 227ratio. In short, if your machine is low on memory (8 megabytes or
264compression ratio. In short, if your machine is low on memory 228less), use \-s for everything. See MEMORY MANAGEMENT below.
265(8 megabytes or less), use -s for everything. See 229.TP
266MEMORY MANAGEMENT above. 230.B \-q --quiet
231Suppress non-essential warning messages. Messages pertaining to
232I/O errors and other critical events will not be suppressed.
267.TP 233.TP
268.B \-v --verbose 234.B \-v --verbose
269Verbose mode -- show the compression ratio for each file processed. 235Verbose mode -- show the compression ratio for each file processed.
@@ -273,147 +239,199 @@ information which is primarily of interest for diagnostic purposes.
273.B \-L --license -V --version 239.B \-L --license -V --version
274Display the software version, license terms and conditions. 240Display the software version, license terms and conditions.
275.TP 241.TP
276.B \-1 to \-9 242.B \-1 to \-9
277Set the block size to 100 k, 200 k .. 900 k when 243Set the block size to 100 k, 200 k .. 900 k when compressing. Has no
278compressing. Has no effect when decompressing. 244effect when decompressing. See MEMORY MANAGEMENT below.
279See MEMORY MANAGEMENT above.
280.TP 245.TP
281.B \--repetitive-fast 246.B \--
282.I bzip2 247Treats all subsequent arguments as file names, even if they start
283injects some small pseudo-random variations 248with a dash. This is so you can handle files with names beginning
284into very repetitive blocks to limit 249with a dash, for example: bzip2 \-- \-myfilename.
285worst-case performance during compression. 250.TP
286If sorting runs into difficulties, the block 251.B \--repetitive-fast --repetitive-best
287is randomised, and sorting is restarted. 252These flags are redundant in versions 0.9.5 and above. They provided
288Very roughly, 253some coarse control over the behaviour of the sorting algorithm in
254earlier versions, which was sometimes useful. 0.9.5 and above have an
255improved algorithm which renders these flags irrelevant.
256
257.SH MEMORY MANAGEMENT
258.I bzip2
259compresses large files in blocks. The block size affects
260both the compression ratio achieved, and the amount of memory needed for
261compression and decompression. The flags \-1 through \-9
262specify the block size to be 100,000 bytes through 900,000 bytes (the
263default) respectively. At decompression time, the block size used for
264compression is read from the header of the compressed file, and
265.I bunzip2
266then allocates itself just enough memory to decompress
267the file. Since block sizes are stored in compressed files, it follows
268that the flags \-1 to \-9 are irrelevant to and so ignored
269during decompression.
270
271Compression and decompression requirements,
272in bytes, can be estimated as:
273
274 Compression: 400k + ( 8 x block size )
275
276 Decompression: 100k + ( 4 x block size ), or
277 100k + ( 2.5 x block size )
278
279Larger block sizes give rapidly diminishing marginal returns. Most of
280the compression comes from the first two or three hundred k of block
281size, a fact worth bearing in mind when using
289.I bzip2 282.I bzip2
290persists for three times as long as a well-behaved input 283on small machines.
291would take before resorting to randomisation. 284It is also important to appreciate that the decompression memory
292This flag makes it give up much sooner. 285requirement is set at compression time by the choice of block size.
293 286
294.TP 287For files compressed with the default 900k block size,
295.B \--repetitive-best 288.I bunzip2
296Opposite of \--repetitive-fast; try a lot harder before 289will require about 3700 kbytes to decompress. To support decompression
297resorting to randomisation. 290of any file on a 4 megabyte machine,
291.I bunzip2
292has an option to
293decompress using approximately half this amount of memory, about 2300
294kbytes. Decompression speed is also halved, so you should use this
295option only where necessary. The relevant flag is -s.
296
297In general, try and use the largest block size memory constraints allow,
298since that maximises the compression achieved. Compression and
299decompression speed are virtually unaffected by block size.
300
301Another significant point applies to files which fit in a single block
302-- that means most files you'd encounter using a large block size. The
303amount of real memory touched is proportional to the size of the file,
304since the file is smaller than a block. For example, compressing a file
30520,000 bytes long with the flag -9 will cause the compressor to
306allocate around 7600k of memory, but only touch 400k + 20000 * 8 = 560
307kbytes of it. Similarly, the decompressor will allocate 3700k but only
308touch 100k + 20000 * 4 = 180 kbytes.
309
310Here is a table which summarises the maximum memory usage for different
311block sizes. Also recorded is the total compressed size for 14 files of
312the Calgary Text Compression Corpus totalling 3,141,622 bytes. This
313column gives some feel for how compression varies with block size.
314These figures tend to understate the advantage of larger block sizes for
315larger files, since the Corpus is dominated by smaller files.
316
317 Compress Decompress Decompress Corpus
318 Flag usage usage -s usage Size
319
320 -1 1200k 500k 350k 914704
321 -2 2000k 900k 600k 877703
322 -3 2800k 1300k 850k 860338
323 -4 3600k 1700k 1100k 846899
324 -5 4400k 2100k 1350k 845160
325 -6 5200k 2500k 1600k 838626
326 -7 6100k 2900k 1850k 834096
327 -8 6800k 3300k 2100k 828642
328 -9 7600k 3700k 2350k 828642
298 329
299.SH RECOVERING DATA FROM DAMAGED FILES 330.SH RECOVERING DATA FROM DAMAGED FILES
300.I bzip2 331.I bzip2
301compresses files in blocks, usually 900kbytes long. 332compresses files in blocks, usually 900kbytes long. Each
302Each block is handled independently. If a media or 333block is handled independently. If a media or transmission error causes
303transmission error causes a multi-block .bz2 334a multi-block .bz2
304file to become damaged, 335file to become damaged, it may be possible to
305it may be possible to recover data from the undamaged blocks 336recover data from the undamaged blocks in the file.
306in the file. 337
307 338The compressed representation of each block is delimited by a 48-bit
308The compressed representation of each block is delimited by 339pattern, which makes it possible to find the block boundaries with
309a 48-bit pattern, which makes it possible to find the block 340reasonable certainty. Each block also carries its own 32-bit CRC, so
310boundaries with reasonable certainty. Each block also carries 341damaged blocks can be distinguished from undamaged ones.
311its own 32-bit CRC, so damaged blocks can be
312distinguished from undamaged ones.
313 342
314.I bzip2recover 343.I bzip2recover
315is a simple program whose purpose is to search for 344is a simple program whose purpose is to search for
316blocks in .bz2 files, and write each block out into 345blocks in .bz2 files, and write each block out into its own .bz2
317its own .bz2 file. You can then use 346file. You can then use
318.I bzip2 -t 347.I bzip2
319to test the integrity of the resulting files, 348\-t
320and decompress those which are undamaged. 349to test the
350integrity of the resulting files, and decompress those which are
351undamaged.
321 352
322.I bzip2recover 353.I bzip2recover
323takes a single argument, the name of the damaged file, 354takes a single argument, the name of the damaged file,
324and writes a number of files "rec0001file.bz2", "rec0002file.bz2", 355and writes a number of files "rec0001file.bz2",
325etc, containing the extracted blocks. The output filenames 356"rec0002file.bz2", etc, containing the extracted blocks.
326are designed so that the use of wildcards in subsequent processing 357The output filenames are designed so that the use of
327-- for example, "bzip2 -dc rec*file.bz2 > recovered_data" -- 358wildcards in subsequent processing -- for example,
328lists the files in the "right" order. 359"bzip2 -dc rec*file.bz2 > recovered_data" -- lists the files in
360the correct order.
329 361
330.I bzip2recover 362.I bzip2recover
331should be of most use dealing with large .bz2 files, as 363should be of most use dealing with large .bz2
332these will contain many blocks. It is clearly futile to 364files, as these will contain many blocks. It is clearly
333use it on damaged single-block files, since a damaged 365futile to use it on damaged single-block files, since a
334block cannot be recovered. If you wish to minimise 366damaged block cannot be recovered. If you wish to minimise
335any potential data loss through media or transmission 367any potential data loss through media or transmission errors,
336errors, you might consider compressing with a smaller 368you might consider compressing with a smaller
337block size. 369block size.
338 370
339.SH PERFORMANCE NOTES 371.SH PERFORMANCE NOTES
340The sorting phase of compression gathers together similar strings 372The sorting phase of compression gathers together similar strings in the
341in the file. Because of this, files containing very long 373file. Because of this, files containing very long runs of repeated
342runs of repeated symbols, like "aabaabaabaab ..." (repeated 374symbols, like "aabaabaabaab ..." (repeated several hundred times) may
343several hundred times) may compress extraordinarily slowly. 375compress more slowly than normal. Versions 0.9.5 and above fare much
344You can use the 376better than previous versions in this respect. The ratio between
345\-vvvvv 377worst-case and average-case compression time is in the region of 10:1.
346option to monitor progress in great detail, if you want. 378For previous versions, this figure was more like 100:1. You can use the
347Decompression speed is unaffected. 379\-vvvv option to monitor progress in great detail, if you want.
348 380
349Such pathological cases 381Decompression speed is unaffected by these phenomena.
350seem rare in practice, appearing mostly in artificially-constructed
351test files, and in low-level disk images. It may be inadvisable to
352use
353.I bzip2
354to compress the latter.
355If you do get a file which causes severe slowness in compression,
356try making the block size as small as possible, with flag \-1.
357 382
358.I bzip2 383.I bzip2
359usually allocates several megabytes of memory to operate in, 384usually allocates several megabytes of memory to operate
360and then charges all over it in a fairly random fashion. This 385in, and then charges all over it in a fairly random fashion. This means
361means that performance, both for compressing and decompressing, 386that performance, both for compressing and decompressing, is largely
362is largely determined by the speed 387determined by the speed at which your machine can service cache misses.
363at which your machine can service cache misses. 388Because of this, small changes to the code to reduce the miss rate have
364Because of this, small changes 389been observed to give disproportionately large performance improvements.
365to the code to reduce the miss rate have been observed to give
366disproportionately large performance improvements.
367I imagine 390I imagine
368.I bzip2 391.I bzip2
369will perform best on machines with very large caches. 392will perform best on machines with very large caches.
370 393
371.SH CAVEATS 394.SH CAVEATS
372I/O error messages are not as helpful as they could be. 395I/O error messages are not as helpful as they could be.
373.I Bzip2 396.I bzip2
374tries hard to detect I/O errors and exit cleanly, but the 397tries hard to detect I/O errors and exit cleanly, but the details of
375details of what the problem is sometimes seem rather misleading. 398what the problem is sometimes seem rather misleading.
376 399
377This manual page pertains to version 0.9.0 of 400This manual page pertains to version 0.9.5 of
378.I bzip2. 401.I bzip2.
379Compressed data created by this version is entirely forwards and 402Compressed
380backwards compatible with the previous public release, version 0.1pl2, 403data created by this version is entirely forwards and backwards
381but with the following exception: 0.9.0 can correctly decompress 404compatible with the previous public releases, versions 0.1pl2 and 0.9.0,
382multiple concatenated compressed files. 0.1pl2 cannot do this; it 405but with the following exception: 0.9.0 and above can correctly
383will stop after decompressing just the first file in the stream. 406decompress multiple concatenated compressed files. 0.1pl2 cannot do
384 407this; it will stop after decompressing just the first file in the
385Wildcard expansion for Windows 95 and NT 408stream.
386is flaky.
387 409
388.I bzip2recover 410.I bzip2recover
389uses 32-bit integers to represent bit positions in 411uses 32-bit integers to represent bit positions in
390compressed files, so it cannot handle compressed files 412compressed files, so it cannot handle compressed files more than 512
391more than 512 megabytes long. This could easily be fixed. 413megabytes long. This could easily be fixed.
392 414
393.SH AUTHOR 415.SH AUTHOR
394Julian Seward, jseward@acm.org. 416Julian Seward, jseward@acm.org.
395 417
396http://www.muraroa.demon.co.uk 418http://www.muraroa.demon.co.uk
397 419
398The ideas embodied in 420The ideas embodied in
399.I bzip2 421.I bzip2
400are due to (at least) the following people: 422are due to (at least) the following
401Michael Burrows and David Wheeler (for the block sorting 423people: Michael Burrows and David Wheeler (for the block sorting
402transformation), David Wheeler (again, for the Huffman coder), 424transformation), David Wheeler (again, for the Huffman coder), Peter
403Peter Fenwick (for the structured coding model in the original 425Fenwick (for the structured coding model in the original
404.I bzip, 426.I bzip,
405and many refinements), 427and many refinements), and Alistair Moffat, Radford Neal and Ian Witten
406and 428(for the arithmetic coder in the original
407Alistair Moffat, Radford Neal and Ian Witten (for the arithmetic
408coder in the original
409.I bzip). 429.I bzip).
410I am much indebted for their help, support and advice. 430I am much
411See the manual in the source distribution for pointers to 431indebted for their help, support and advice. See the manual in the
412sources of documentation. 432source distribution for pointers to sources of documentation. Christian
413Christian von Roques encouraged me to look for faster 433von Roques encouraged me to look for faster sorting algorithms, so as to
414sorting algorithms, so as to speed up compression. 434speed up compression. Bela Lubkin encouraged me to improve the
415Bela Lubkin encouraged me to improve the worst-case 435worst-case compression performance. Many people sent patches, helped
416compression performance. 436with portability problems, lent machines, gave advice and were generally
417Many people sent patches, helped with portability problems, 437helpful.
418lent machines, gave advice and were generally helpful.
419