aboutsummaryrefslogtreecommitdiff
path: root/manual.texi
diff options
context:
space:
mode:
Diffstat (limited to 'manual.texi')
-rw-r--r--manual.texi869
1 files changed, 459 insertions, 410 deletions
diff --git a/manual.texi b/manual.texi
index 99ce661..e48e656 100644
--- a/manual.texi
+++ b/manual.texi
@@ -2,10 +2,10 @@
2@setfilename bzip2.info 2@setfilename bzip2.info
3 3
4@ignore 4@ignore
5This file documents bzip2 version 0.9.0c, and associated library 5This file documents bzip2 version 0.9.5, and associated library
6libbzip2, written by Julian Seward (jseward@acm.org). 6libbzip2, written by Julian Seward (jseward@acm.org).
7 7
8Copyright (C) 1996-1998 Julian R Seward 8Copyright (C) 1996-1999 Julian R Seward
9 9
10Permission is granted to make and distribute verbatim copies of 10Permission is granted to make and distribute verbatim copies of
11this manual provided the copyright notice and this permission notice 11this manual provided the copyright notice and this permission notice
@@ -30,21 +30,21 @@ END-INFO-DIR-ENTRY
30@titlepage 30@titlepage
31@title bzip2 and libbzip2 31@title bzip2 and libbzip2
32@subtitle a program and library for data compression 32@subtitle a program and library for data compression
33@subtitle copyright (C) 1996-1998 Julian Seward 33@subtitle copyright (C) 1996-1999 Julian Seward
34@subtitle version 0.9.0c of 18 October 1998 34@subtitle version 0.9.5d of 4 September 1999
35@author Julian Seward 35@author Julian Seward
36 36
37@end titlepage 37@end titlepage
38@end iftex
39
40 38
41@parindent 0mm 39@parindent 0mm
42@parskip 2mm 40@parskip 2mm
43 41
42@end iftex
43@node Top, Overview, (dir), (dir)
44 44
45This program, @code{bzip2}, 45This program, @code{bzip2},
46and associated library @code{libbzip2}, are 46and associated library @code{libbzip2}, are
47Copyright (C) 1996-1998 Julian R Seward. All rights reserved. 47Copyright (C) 1996-1999 Julian R Seward. All rights reserved.
48 48
49Redistribution and use in source and binary forms, with or without 49Redistribution and use in source and binary forms, with or without
50modification, are permitted provided that the following conditions 50modification, are permitted provided that the following conditions
@@ -78,13 +78,13 @@ WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
78NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 78NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
79SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 79SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
80 80
81Julian Seward, Guildford, Surrey, UK. 81Julian Seward, Cambridge, UK.
82 82
83@code{jseward@@acm.org} 83@code{jseward@@acm.org}
84 84
85@code{http://www.muraroa.demon.co.uk} 85@code{http://www.muraroa.demon.co.uk}
86 86
87@code{bzip2}/@code{libbzip2} version 0.9.0c of 18 October 1998. 87@code{bzip2}/@code{libbzip2} version 0.9.5 of 24 May 1999.
88 88
89PATENTS: To the best of my knowledge, @code{bzip2} does not use any patented 89PATENTS: To the best of my knowledge, @code{bzip2} does not use any patented
90algorithms. However, I do not have the resources available to carry out 90algorithms. However, I do not have the resources available to carry out
@@ -124,360 +124,345 @@ ought to be recorded somewhere.
124 124
125This chapter contains a copy of the @code{bzip2} man page, 125This chapter contains a copy of the @code{bzip2} man page,
126and nothing else. 126and nothing else.
127
128@quotation
129
130@unnumberedsubsubsec NAME
131@itemize
132@item @code{bzip2}, @code{bunzip2}
133- a block-sorting file compressor, v0.9.5
134@item @code{bzcat}
135- decompresses files to stdout
136@item @code{bzip2recover}
137- recovers data from damaged bzip2 files
138@end itemize
139
140@unnumberedsubsubsec SYNOPSIS
141@itemize
142@item @code{bzip2} [ -cdfkqstvzVL123456789 ] [ filenames ... ]
143@item @code{bunzip2} [ -fkvsVL ] [ filenames ... ]
144@item @code{bzcat} [ -s ] [ filenames ... ]
145@item @code{bzip2recover} filename
146@end itemize
147
148@unnumberedsubsubsec DESCRIPTION
149
150@code{bzip2} compresses files using the Burrows-Wheeler block sorting
151text compression algorithm, and Huffman coding. Compression is
152generally considerably better than that achieved by more conventional
153LZ77/LZ78-based compressors, and approaches the performance of the PPM
154family of statistical compressors.
155
156The command-line options are deliberately very similar to those of GNU
157@code{gzip}, but they are not identical.
158
159@code{bzip2} expects a list of file names to accompany the command-line
160flags. Each file is replaced by a compressed version of itself, with
161the name @code{original_name.bz2}. Each compressed file has the same
162modification date, permissions, and, when possible, ownership as the
163corresponding original, so that these properties can be correctly
164restored at decompression time. File name handling is naive in the
165sense that there is no mechanism for preserving original file names,
166permissions, ownerships or dates in filesystems which lack these
167concepts, or have serious file name length restrictions, such as MS-DOS.
168
169@code{bzip2} and @code{bunzip2} will by default not overwrite existing
170files. If you want this to happen, specify the @code{-f} flag.
171
172If no file names are specified, @code{bzip2} compresses from standard
173input to standard output. In this case, @code{bzip2} will decline to
174write compressed output to a terminal, as this would be entirely
175incomprehensible and therefore pointless.
176
177@code{bunzip2} (or @code{bzip2 -d}) decompresses all
178specified files. Files which were not created by @code{bzip2}
179will be detected and ignored, and a warning issued.
180@code{bzip2} attempts to guess the filename for the decompressed file
181from that of the compressed file as follows:
182@itemize
183@item @code{filename.bz2 } becomes @code{filename}
184@item @code{filename.bz } becomes @code{filename}
185@item @code{filename.tbz2} becomes @code{filename.tar}
186@item @code{filename.tbz } becomes @code{filename.tar}
187@item @code{anyothername } becomes @code{anyothername.out}
188@end itemize
189If the file does not end in one of the recognised endings,
190@code{.bz2}, @code{.bz},
191@code{.tbz2} or @code{.tbz}, @code{bzip2} complains that it cannot
192guess the name of the original file, and uses the original name
193with @code{.out} appended.
194
195As with compression, supplying no
196filenames causes decompression from standard input to standard output.
197
198@code{bunzip2} will correctly decompress a file which is the
199concatenation of two or more compressed files. The result is the
200concatenation of the corresponding uncompressed files. Integrity
201testing (@code{-t}) of concatenated compressed files is also supported.
202
203You can also compress or decompress files to the standard output by
204giving the @code{-c} flag. Multiple files may be compressed and
205decompressed like this. The resulting outputs are fed sequentially to
206stdout. Compression of multiple files in this manner generates a stream
207containing multiple compressed file representations. Such a stream
208can be decompressed correctly only by @code{bzip2} version 0.9.0 or
209later. Earlier versions of @code{bzip2} will stop after decompressing
210the first file in the stream.
211
212@code{bzcat} (or @code{bzip2 -dc}) decompresses all specified files to
213the standard output.
214
215@code{bzip2} will read arguments from the environment variables
216@code{BZIP2} and @code{BZIP}, in that order, and will process them
217before any arguments read from the command line. This gives a
218convenient way to supply default arguments.
219
220Compression is always performed, even if the compressed file is slightly
221larger than the original. Files of less than about one hundred bytes
222tend to get larger, since the compression mechanism has a constant
223overhead in the region of 50 bytes. Random data (including the output
224of most file compressors) is coded at about 8.05 bits per byte, giving
225an expansion of around 0.5%.
226
227As a self-check for your protection, @code{bzip2} uses 32-bit CRCs to
228make sure that the decompressed version of a file is identical to the
229original. This guards against corruption of the compressed data, and
230against undetected bugs in @code{bzip2} (hopefully very unlikely). The
231chances of data corruption going undetected is microscopic, about one
232chance in four billion for each file processed. Be aware, though, that
233the check occurs upon decompression, so it can only tell you that
234something is wrong. It can't help you recover the original uncompressed
235data. You can use @code{bzip2recover} to try to recover data from
236damaged files.
237
238Return values: 0 for a normal exit, 1 for environmental problems (file
239not found, invalid flags, I/O errors, &c), 2 to indicate a corrupt
240compressed file, 3 for an internal consistency error (eg, bug) which
241caused @code{bzip2} to panic.
242
243
244@unnumberedsubsubsec OPTIONS
245@table @code
246@item -c --stdout
247Compress or decompress to standard output.
248@item -d --decompress
249Force decompression. @code{bzip2}, @code{bunzip2} and @code{bzcat} are
250really the same program, and the decision about what actions to take is
251done on the basis of which name is used. This flag overrides that
252mechanism, and forces bzip2 to decompress.
253@item -z --compress
254The complement to @code{-d}: forces compression, regardless of the
255invokation name.
256@item -t --test
257Check integrity of the specified file(s), but don't decompress them.
258This really performs a trial decompression and throws away the result.
259@item -f --force
260Force overwrite of output files. Normally, @code{bzip2} will not overwrite
261existing output files. Also forces @code{bzip2} to break hard links
262to files, which it otherwise wouldn't do.
263@item -k --keep
264Keep (don't delete) input files during compression
265or decompression.
266@item -s --small
267Reduce memory usage, for compression, decompression and testing. Files
268are decompressed and tested using a modified algorithm which only
269requires 2.5 bytes per block byte. This means any file can be
270decompressed in 2300k of memory, albeit at about half the normal speed.
271
272During compression, @code{-s} selects a block size of 200k, which limits
273memory use to around the same figure, at the expense of your compression
274ratio. In short, if your machine is low on memory (8 megabytes or
275less), use -s for everything. See MEMORY MANAGEMENT below.
276@item -q --quiet
277Suppress non-essential warning messages. Messages pertaining to
278I/O errors and other critical events will not be suppressed.
279@item -v --verbose
280Verbose mode -- show the compression ratio for each file processed.
281Further @code{-v}'s increase the verbosity level, spewing out lots of
282information which is primarily of interest for diagnostic purposes.
283@item -L --license -V --version
284Display the software version, license terms and conditions.
285@item -1 to -9
286Set the block size to 100 k, 200 k .. 900 k when compressing. Has no
287effect when decompressing. See MEMORY MANAGEMENT below.
288@item --
289Treats all subsequent arguments as file names, even if they start
290with a dash. This is so you can handle files with names beginning
291with a dash, for example: @code{bzip2 -- -myfilename}.
292@item --repetitive-fast
293@item --repetitive-best
294These flags are redundant in versions 0.9.5 and above. They provided
295some coarse control over the behaviour of the sorting algorithm in
296earlier versions, which was sometimes useful. 0.9.5 and above have an
297improved algorithm which renders these flags irrelevant.
298@end table
299
300
301@unnumberedsubsubsec MEMORY MANAGEMENT
302
303@code{bzip2} compresses large files in blocks. The block size affects
304both the compression ratio achieved, and the amount of memory needed for
305compression and decompression. The flags @code{-1} through @code{-9}
306specify the block size to be 100,000 bytes through 900,000 bytes (the
307default) respectively. At decompression time, the block size used for
308compression is read from the header of the compressed file, and
309@code{bunzip2} then allocates itself just enough memory to decompress
310the file. Since block sizes are stored in compressed files, it follows
311that the flags @code{-1} to @code{-9} are irrelevant to and so ignored
312during decompression.
313
314Compression and decompression requirements, in bytes, can be estimated
315as:
127@example 316@example
128NAME 317 Compression: 400k + ( 8 x block size )
129 bzip2, bunzip2 - a block-sorting file compressor, v0.9.0 318
130 bzcat - decompresses files to stdout 319 Decompression: 100k + ( 4 x block size ), or
131 bzip2recover - recovers data from damaged bzip2 files 320 100k + ( 2.5 x block size )
132 321@end example
133 322Larger block sizes give rapidly diminishing marginal returns. Most of
134SYNOPSIS 323the compression comes from the first two or three hundred k of block
135 bzip2 [ -cdfkstvzVL123456789 ] [ filenames ... ] 324size, a fact worth bearing in mind when using @code{bzip2} on small machines.
136 bunzip2 [ -fkvsVL ] [ filenames ... ] 325It is also important to appreciate that the decompression memory
137 bzcat [ -s ] [ filenames ... ] 326requirement is set at compression time by the choice of block size.
138 bzip2recover filename 327
139 328For files compressed with the default 900k block size, @code{bunzip2}
140 329will require about 3700 kbytes to decompress. To support decompression
141DESCRIPTION 330of any file on a 4 megabyte machine, @code{bunzip2} has an option to
142 bzip2 compresses files using the Burrows-Wheeler block- 331decompress using approximately half this amount of memory, about 2300
143 sorting text compression algorithm, and Huffman coding. 332kbytes. Decompression speed is also halved, so you should use this
144 Compression is generally considerably better than that 333option only where necessary. The relevant flag is @code{-s}.
145 achieved by more conventional LZ77/LZ78-based compressors, 334
146 and approaches the performance of the PPM family of sta- 335In general, try and use the largest block size memory constraints allow,
147 tistical compressors. 336since that maximises the compression achieved. Compression and
148 337decompression speed are virtually unaffected by block size.
149 The command-line options are deliberately very similar to 338
150 those of GNU Gzip, but they are not identical. 339Another significant point applies to files which fit in a single block
151 340-- that means most files you'd encounter using a large block size. The
152 bzip2 expects a list of file names to accompany the com- 341amount of real memory touched is proportional to the size of the file,
153 mand-line flags. Each file is replaced by a compressed 342since the file is smaller than a block. For example, compressing a file
154 version of itself, with the name "original_name.bz2". 34320,000 bytes long with the flag @code{-9} will cause the compressor to
155 Each compressed file has the same modification date and 344allocate around 7600k of memory, but only touch 400k + 20000 * 8 = 560
156 permissions as the corresponding original, so that these 345kbytes of it. Similarly, the decompressor will allocate 3700k but only
157 properties can be correctly restored at decompression 346touch 100k + 20000 * 4 = 180 kbytes.
158 time. File name handling is naive in the sense that there 347
159 is no mechanism for preserving original file names, per- 348Here is a table which summarises the maximum memory usage for different
160 missions and dates in filesystems which lack these con- 349block sizes. Also recorded is the total compressed size for 14 files of
161 cepts, or have serious file name length restrictions, such 350the Calgary Text Compression Corpus totalling 3,141,622 bytes. This
162 as MS-DOS. 351column gives some feel for how compression varies with block size.
163 352These figures tend to understate the advantage of larger block sizes for
164 bzip2 and bunzip2 will by default not overwrite existing 353larger files, since the Corpus is dominated by smaller files.
165 files; if you want this to happen, specify the -f flag. 354@example
166 355 Compress Decompress Decompress Corpus
167 If no file names are specified, bzip2 compresses from 356 Flag usage usage -s usage Size
168 standard input to standard output. In this case, bzip2 357
169 will decline to write compressed output to a terminal, as 358 -1 1200k 500k 350k 914704
170 this would be entirely incomprehensible and therefore 359 -2 2000k 900k 600k 877703
171 pointless. 360 -3 2800k 1300k 850k 860338
172 361 -4 3600k 1700k 1100k 846899
173 bunzip2 (or bzip2 -d ) decompresses and restores all spec- 362 -5 4400k 2100k 1350k 845160
174 ified files whose names end in ".bz2". Files without this 363 -6 5200k 2500k 1600k 838626
175 suffix are ignored. Again, supplying no filenames causes 364 -7 6100k 2900k 1850k 834096
176 decompression from standard input to standard output. 365 -8 6800k 3300k 2100k 828642
177 366 -9 7600k 3700k 2350k 828642
178 bunzip2 will correctly decompress a file which is the con- 367@end example
179 catenation of two or more compressed files. The result is 368
180 the concatenation of the corresponding uncompressed files. 369@unnumberedsubsubsec RECOVERING DATA FROM DAMAGED FILES
181 Integrity testing (-t) of concatenated compressed files is
182 also supported.
183
184 You can also compress or decompress files to the standard
185 output by giving the -c flag. Multiple files may be com-
186 pressed and decompressed like this. The resulting outputs
187 are fed sequentially to stdout. Compression of multiple
188 files in this manner generates a stream containing multi-
189 ple compressed file representations. Such a stream can be
190 decompressed correctly only by bzip2 version 0.9.0 or
191 later. Earlier versions of bzip2 will stop after decom-
192 pressing the first file in the stream.
193
194 bzcat (or bzip2 -dc ) decompresses all specified files to
195 the standard output.
196
197 Compression is always performed, even if the compressed
198 file is slightly larger than the original. Files of less
199 than about one hundred bytes tend to get larger, since the
200 compression mechanism has a constant overhead in the
201 region of 50 bytes. Random data (including the output of
202 most file compressors) is coded at about 8.05 bits per
203 byte, giving an expansion of around 0.5%.
204
205 As a self-check for your protection, bzip2 uses 32-bit
206 CRCs to make sure that the decompressed version of a file
207 is identical to the original. This guards against corrup-
208 tion of the compressed data, and against undetected bugs
209 in bzip2 (hopefully very unlikely). The chances of data
210 corruption going undetected is microscopic, about one
211 chance in four billion for each file processed. Be aware,
212 though, that the check occurs upon decompression, so it
213 can only tell you that that something is wrong. It can't
214 help you recover the original uncompressed data. You can
215 use bzip2recover to try to recover data from damaged
216 files.
217
218 Return values: 0 for a normal exit, 1 for environmental
219 problems (file not found, invalid flags, I/O errors, &c),
220 2 to indicate a corrupt compressed file, 3 for an internal
221 consistency error (eg, bug) which caused bzip2 to panic.
222
223
224MEMORY MANAGEMENT
225 Bzip2 compresses large files in blocks. The block size
226 affects both the compression ratio achieved, and the
227 amount of memory needed both for compression and decom-
228 pression. The flags -1 through -9 specify the block size
229 to be 100,000 bytes through 900,000 bytes (the default)
230 respectively. At decompression-time, the block size used
231 for compression is read from the header of the compressed
232 file, and bunzip2 then allocates itself just enough memory
233 to decompress the file. Since block sizes are stored in
234 compressed files, it follows that the flags -1 to -9 are
235 irrelevant to and so ignored during decompression.
236
237 Compression and decompression requirements, in bytes, can
238 be estimated as:
239
240 Compression: 400k + ( 7 x block size )
241
242 Decompression: 100k + ( 4 x block size ), or
243 100k + ( 2.5 x block size )
244
245 Larger block sizes give rapidly diminishing marginal
246 returns; most of the compression comes from the first two
247 or three hundred k of block size, a fact worth bearing in
248 mind when using bzip2 on small machines. It is also
249 important to appreciate that the decompression memory
250 requirement is set at compression-time by the choice of
251 block size.
252 370
253 For files compressed with the default 900k block size, 371@code{bzip2} compresses files in blocks, usually 900kbytes long. Each
254 bunzip2 will require about 3700 kbytes to decompress. To 372block is handled independently. If a media or transmission error causes
255 support decompression of any file on a 4 megabyte machine, 373a multi-block @code{.bz2} file to become damaged, it may be possible to
256 bunzip2 has an option to decompress using approximately 374recover data from the undamaged blocks in the file.
257 half this amount of memory, about 2300 kbytes. Decompres- 375
258 sion speed is also halved, so you should use this option 376The compressed representation of each block is delimited by a 48-bit
259 only where necessary. The relevant flag is -s. 377pattern, which makes it possible to find the block boundaries with
260 378reasonable certainty. Each block also carries its own 32-bit CRC, so
261 In general, try and use the largest block size memory con- 379damaged blocks can be distinguished from undamaged ones.
262 straints allow, since that maximises the compression 380
263 achieved. Compression and decompression speed are virtu- 381@code{bzip2recover} is a simple program whose purpose is to search for
264 ally unaffected by block size. 382blocks in @code{.bz2} files, and write each block out into its own
265 383@code{.bz2} file. You can then use @code{bzip2 -t} to test the
266 Another significant point applies to files which fit in a 384integrity of the resulting files, and decompress those which are
267 single block -- that means most files you'd encounter 385undamaged.
268 using a large block size. The amount of real memory 386
269 touched is proportional to the size of the file, since the 387@code{bzip2recover}
270 file is smaller than a block. For example, compressing a 388takes a single argument, the name of the damaged file,
271 file 20,000 bytes long with the flag -9 will cause the 389and writes a number of files @code{rec0001file.bz2},
272 compressor to allocate around 6700k of memory, but only 390 @code{rec0002file.bz2}, etc, containing the extracted blocks.
273 touch 400k + 20000 * 7 = 540 kbytes of it. Similarly, the
274 decompressor will allocate 3700k but only touch 100k +
275 20000 * 4 = 180 kbytes.
276
277 Here is a table which summarises the maximum memory usage
278 for different block sizes. Also recorded is the total
279 compressed size for 14 files of the Calgary Text Compres-
280 sion Corpus totalling 3,141,622 bytes. This column gives
281 some feel for how compression varies with block size.
282 These figures tend to understate the advantage of larger
283 block sizes for larger files, since the Corpus is domi-
284 nated by smaller files.
285
286 Compress Decompress Decompress Corpus
287 Flag usage usage -s usage Size
288
289 -1 1100k 500k 350k 914704
290 -2 1800k 900k 600k 877703
291 -3 2500k 1300k 850k 860338
292 -4 3200k 1700k 1100k 846899
293 -5 3900k 2100k 1350k 845160
294 -6 4600k 2500k 1600k 838626
295 -7 5400k 2900k 1850k 834096
296 -8 6000k 3300k 2100k 828642
297 -9 6700k 3700k 2350k 828642
298
299
300OPTIONS
301 -c --stdout
302 Compress or decompress to standard output. -c will
303 decompress multiple files to stdout, but will only
304 compress a single file to stdout.
305
306 -d --decompress
307 Force decompression. bzip2, bunzip2 and bzcat are
308 really the same program, and the decision about
309 what actions to take is done on the basis of which
310 name is used. This flag overrides that mechanism,
311 and forces bzip2 to decompress.
312
313 -z --compress
314 The complement to -d: forces compression, regard-
315 less of the invokation name.
316
317 -t --test
318 Check integrity of the specified file(s), but don't
319 decompress them. This really performs a trial
320 decompression and throws away the result.
321
322 -f --force
323 Force overwrite of output files. Normally, bzip2
324 will not overwrite existing output files.
325
326 -k --keep
327 Keep (don't delete) input files during compression
328 or decompression.
329
330 -s --small
331 Reduce memory usage, for compression, decompression
332 and testing. Files are decompressed and tested
333 using a modified algorithm which only requires 2.5
334 bytes per block byte. This means any file can be
335 decompressed in 2300k of memory, albeit at about
336 half the normal speed.
337
338 During compression, -s selects a block size of
339 200k, which limits memory use to around the same
340 figure, at the expense of your compression ratio.
341 In short, if your machine is low on memory (8
342 megabytes or less), use -s for everything. See
343 MEMORY MANAGEMENT above.
344
345 -v --verbose
346 Verbose mode -- show the compression ratio for each
347 file processed. Further -v's increase the ver-
348 bosity level, spewing out lots of information which
349 is primarily of interest for diagnostic purposes.
350
351 -L --license -V --version
352 Display the software version, license terms and
353 conditions.
354
355 -1 to -9
356 Set the block size to 100 k, 200 k .. 900 k when
357 compressing. Has no effect when decompressing.
358 See MEMORY MANAGEMENT above.
359
360 --repetitive-fast
361 bzip2 injects some small pseudo-random variations
362 into very repetitive blocks to limit worst-case
363 performance during compression. If sorting runs
364 into difficulties, the block is randomised, and
365 sorting is restarted. Very roughly, bzip2 persists
366 for three times as long as a well-behaved input
367 would take before resorting to randomisation. This
368 flag makes it give up much sooner.
369
370 --repetitive-best
371 Opposite of --repetitive-fast; try a lot harder
372 before resorting to randomisation.
373
374
375RECOVERING DATA FROM DAMAGED FILES
376 bzip2 compresses files in blocks, usually 900kbytes long.
377 Each block is handled independently. If a media or trans-
378 mission error causes a multi-block .bz2 file to become
379 damaged, it may be possible to recover data from the
380 undamaged blocks in the file.
381
382 The compressed representation of each block is delimited
383 by a 48-bit pattern, which makes it possible to find the
384 block boundaries with reasonable certainty. Each block
385 also carries its own 32-bit CRC, so damaged blocks can be
386 distinguished from undamaged ones.
387
388 bzip2recover is a simple program whose purpose is to
389 search for blocks in .bz2 files, and write each block out
390 into its own .bz2 file. You can then use bzip2 -t to test
391 the integrity of the resulting files, and decompress those
392 which are undamaged.
393
394 bzip2recover takes a single argument, the name of the dam-
395 aged file, and writes a number of files "rec0001file.bz2",
396 "rec0002file.bz2", etc, containing the extracted blocks.
397 The output filenames are designed so that the use of 391 The output filenames are designed so that the use of
398 wildcards in subsequent processing -- for example, "bzip2 392 wildcards in subsequent processing -- for example,
399 -dc rec*file.bz2 > recovered_data" -- lists the files in 393@code{bzip2 -dc rec*file.bz2 > recovered_data} -- lists the files in
400 the "right" order. 394 the correct order.
401 395
402 bzip2recover should be of most use dealing with large .bz2 396@code{bzip2recover} should be of most use dealing with large @code{.bz2}
403 files, as these will contain many blocks. It is clearly 397 files, as these will contain many blocks. It is clearly
404 futile to use it on damaged single-block files, since a 398 futile to use it on damaged single-block files, since a
405 damaged block cannot be recovered. If you wish to min- 399 damaged block cannot be recovered. If you wish to minimise
406 imise any potential data loss through media or transmis- 400any potential data loss through media or transmission errors,
407 sion errors, you might consider compressing with a smaller 401you might consider compressing with a smaller
408 block size. 402 block size.
409 403
410 404
411PERFORMANCE NOTES 405@unnumberedsubsubsec PERFORMANCE NOTES
412 The sorting phase of compression gathers together similar 406
413 strings in the file. Because of this, files containing 407The sorting phase of compression gathers together similar strings in the
414 very long runs of repeated symbols, like "aabaabaabaab 408file. Because of this, files containing very long runs of repeated
415 ..." (repeated several hundred times) may compress 409symbols, like "aabaabaabaab ..." (repeated several hundred times) may
416 extraordinarily slowly. You can use the -vvvvv option to 410compress more slowly than normal. Versions 0.9.5 and above fare much
417 monitor progress in great detail, if you want. Decompres- 411better than previous versions in this respect. The ratio between
418 sion speed is unaffected. 412worst-case and average-case compression time is in the region of 10:1.
419 413For previous versions, this figure was more like 100:1. You can use the
420 Such pathological cases seem rare in practice, appearing 414@code{-vvvv} option to monitor progress in great detail, if you want.
421 mostly in artificially-constructed test files, and in low- 415
422 level disk images. It may be inadvisable to use bzip2 to 416Decompression speed is unaffected by these phenomena.
423 compress the latter. If you do get a file which causes 417
424 severe slowness in compression, try making the block size 418@code{bzip2} usually allocates several megabytes of memory to operate
425 as small as possible, with flag -1. 419in, and then charges all over it in a fairly random fashion. This means
426 420that performance, both for compressing and decompressing, is largely
427 bzip2 usually allocates several megabytes of memory to 421determined by the speed at which your machine can service cache misses.
428 operate in, and then charges all over it in a fairly ran- 422Because of this, small changes to the code to reduce the miss rate have
429 dom fashion. This means that performance, both for com- 423been observed to give disproportionately large performance improvements.
430 pressing and decompressing, is largely determined by the 424I imagine @code{bzip2} will perform best on machines with very large
431 speed at which your machine can service cache misses. 425caches.
432 Because of this, small changes to the code to reduce the
433 miss rate have been observed to give disproportionately
434 large performance improvements. I imagine bzip2 will per-
435 form best on machines with very large caches.
436
437
438CAVEATS
439 I/O error messages are not as helpful as they could be.
440 Bzip2 tries hard to detect I/O errors and exit cleanly,
441 but the details of what the problem is sometimes seem
442 rather misleading.
443
444 This manual page pertains to version 0.9.0 of bzip2. Com-
445 pressed data created by this version is entirely forwards
446 and backwards compatible with the previous public release,
447 version 0.1pl2, but with the following exception: 0.9.0
448 can correctly decompress multiple concatenated compressed
449 files. 0.1pl2 cannot do this; it will stop after decom-
450 pressing just the first file in the stream.
451
452 Wildcard expansion for Windows 95 and NT is flaky.
453
454 bzip2recover uses 32-bit integers to represent bit posi-
455 tions in compressed files, so it cannot handle compressed
456 files more than 512 megabytes long. This could easily be
457 fixed.
458
459
460AUTHOR
461 Julian Seward, jseward@@acm.org.
462
463 The ideas embodied in bzip2 are due to (at least) the fol-
464 lowing people: Michael Burrows and David Wheeler (for the
465 block sorting transformation), David Wheeler (again, for
466 the Huffman coder), Peter Fenwick (for the structured cod-
467 ing model in the original bzip, and many refinements), and
468 Alistair Moffat, Radford Neal and Ian Witten (for the
469 arithmetic coder in the original bzip). I am much
470 indebted for their help, support and advice. See the man-
471 ual in the source distribution for pointers to sources of
472 documentation. Christian von Roques encouraged me to look
473 for faster sorting algorithms, so as to speed up compres-
474 sion. Bela Lubkin encouraged me to improve the worst-case
475 compression performance. Many people sent patches, helped
476 with portability problems, lent machines, gave advice and
477 were generally helpful.
478@end example
479 426
480 427
428@unnumberedsubsubsec CAVEATS
429
430I/O error messages are not as helpful as they could be. @code{bzip2}
431tries hard to detect I/O errors and exit cleanly, but the details of
432what the problem is sometimes seem rather misleading.
433
434This manual page pertains to version 0.9.5 of @code{bzip2}. Compressed
435data created by this version is entirely forwards and backwards
436compatible with the previous public releases, versions 0.1pl2 and 0.9.0,
437but with the following exception: 0.9.0 and above can correctly
438decompress multiple concatenated compressed files. 0.1pl2 cannot do
439this; it will stop after decompressing just the first file in the
440stream.
441
442@code{bzip2recover} uses 32-bit integers to represent bit positions in
443compressed files, so it cannot handle compressed files more than 512
444megabytes long. This could easily be fixed.
445
446
447@unnumberedsubsubsec AUTHOR
448Julian Seward, @code{jseward@@acm.org}.
449
450The ideas embodied in @code{bzip2} are due to (at least) the following
451people: Michael Burrows and David Wheeler (for the block sorting
452transformation), David Wheeler (again, for the Huffman coder), Peter
453Fenwick (for the structured coding model in the original @code{bzip},
454and many refinements), and Alistair Moffat, Radford Neal and Ian Witten
455(for the arithmetic coder in the original @code{bzip}). I am much
456indebted for their help, support and advice. See the manual in the
457source distribution for pointers to sources of documentation. Christian
458von Roques encouraged me to look for faster sorting algorithms, so as to
459speed up compression. Bela Lubkin encouraged me to improve the
460worst-case compression performance. Many people sent patches, helped
461with portability problems, lent machines, gave advice and were generally
462helpful.
463
464@end quotation
465
481 466
482 467
483 468
@@ -579,7 +564,7 @@ give better @code{zlib} compatibility. These functions are
579@code{bzerror} and @code{bzlibVersion}. You may find these functions 564@code{bzerror} and @code{bzlibVersion}. You may find these functions
580more convenient for simple file reading and writing, than those in the 565more convenient for simple file reading and writing, than those in the
581high-level interface. These functions are not (yet) officially part of 566high-level interface. These functions are not (yet) officially part of
582the library, and are not further documented here. If they break, you 567the library, and are minimally documented here. If they break, you
583get to keep all the pieces. I hope to document them properly when time 568get to keep all the pieces. I hope to document them properly when time
584permits. 569permits.
585 570
@@ -748,25 +733,31 @@ monitoring/debugging output. If the library has been compiled with
748setting. 733setting.
749 734
750Parameter @code{workFactor} controls how the compression phase behaves 735Parameter @code{workFactor} controls how the compression phase behaves
751when presented with worst case, highly repetitive, input data. 736when presented with worst case, highly repetitive, input data. If
752If compression runs into difficulties caused by repetitive data, 737compression runs into difficulties caused by repetitive data, the
753some pseudo-random variations are inserted into the block, and 738library switches from the standard sorting algorithm to a fallback
754compression is restarted. Lower values of @code{workFactor} 739algorithm. The fallback is slower than the standard algorithm by
755reduce the tolerance of compression to repetitive data. 740perhaps a factor of three, but always behaves reasonably, no matter how
756You should set this parameter carefully; too low, and 741bad the input.
757compression ratio suffers, too high, and your average-to-worst 742
758case compression times can become very large. 743Lower values of @code{workFactor} reduce the amount of effort the
759The default value of 30 744standard algorithm will expend before resorting to the fallback. You
760gives reasonable behaviour over a wide range of circumstances. 745should set this parameter carefully; too low, and many inputs will be
761 746handled by the fallback algorithm and so compress rather slowly, too
762Allowable values range from 0 to 250 inclusive. 0 is a special 747high, and your average-to-worst case compression times can become very
763case, equivalent to using the default value of 30. 748large. The default value of 30 gives reasonable behaviour over a wide
764 749range of circumstances.
765Note that the randomisation process is entirely transparent. 750
766If the library decides to randomise and restart compression on a 751Allowable values range from 0 to 250 inclusive. 0 is a special case,
767block, it does so without comment. Randomised blocks are 752equivalent to using the default value of 30.
768automatically de-randomised during decompression, so data 753
769integrity is never compromised. 754Note that the compressed output generated is the same regardless of
755whether or not the fallback algorithm is used.
756
757Be aware also that this parameter may disappear entirely in future
758versions of the library. In principle it should be possible to devise a
759good way to automatically choose which algorithm to use. Such a
760mechanism would render the parameter obsolete.
770 761
771Possible return values: 762Possible return values:
772@display 763@display
@@ -1652,6 +1643,48 @@ Possible return values:
1652 1643
1653 1644
1654 1645
1646@section @code{zlib} compatibility functions
1647Yoshioka Tsuneo has contributed some functions to
1648give better @code{zlib} compatibility. These functions are
1649@code{bzopen}, @code{bzread}, @code{bzwrite}, @code{bzflush},
1650@code{bzclose},
1651@code{bzerror} and @code{bzlibVersion}.
1652These functions are not (yet) officially part of
1653the library. If they break, you get to keep all the pieces.
1654Nevertheless, I think they work ok.
1655@example
1656typedef void BZFILE;
1657
1658const char * bzlibVersion ( void );
1659@end example
1660Returns a string indicating the library version.
1661@example
1662BZFILE * bzopen ( const char *path, const char *mode );
1663BZFILE * bzdopen ( int fd, const char *mode );
1664@end example
1665Opens a @code{.bz2} file for reading or writing, using either its name
1666or a pre-existing file descriptor.
1667Analogous to @code{fopen} and @code{fdopen}.
1668@example
1669int bzread ( BZFILE* b, void* buf, int len );
1670int bzwrite ( BZFILE* b, void* buf, int len );
1671@end example
1672Reads/writes data from/to a previously opened @code{BZFILE}.
1673Analogous to @code{fread} and @code{fwrite}.
1674@example
1675int bzflush ( BZFILE* b );
1676void bzclose ( BZFILE* b );
1677@end example
1678Flushes/closes a @code{BZFILE}. @code{bzflush} doesn't actually do
1679anything. Analogous to @code{fflush} and @code{fclose}.
1680
1681@example
1682const char * bzerror ( BZFILE *b, int *errnum )
1683@end example
1684Returns a string describing the more recent error status of
1685@code{b}, and also sets @code{*errnum} to its numerical value.
1686
1687
1655@section Using the library in a @code{stdio}-free environment 1688@section Using the library in a @code{stdio}-free environment
1656 1689
1657@subsection Getting rid of @code{stdio} 1690@subsection Getting rid of @code{stdio}
@@ -1677,14 +1710,14 @@ was compiled with @code{BZ_NO_STDIO} set.
1677 1710
1678For a normal compile, an assertion failure yields the message 1711For a normal compile, an assertion failure yields the message
1679@example 1712@example
1680 bzip2/libbzip2, v0.9.0: internal error number N. 1713 bzip2/libbzip2, v0.9.5: internal error number N.
1681 This is a bug in bzip2/libbzip2, v0.9.0. Please report 1714 This is a bug in bzip2/libbzip2, v0.9.5. Please report
1682 it to me at: jseward@@acm.org. If this happened when 1715 it to me at: jseward@@acm.org. If this happened when
1683 you were using some program which uses libbzip2 as a 1716 you were using some program which uses libbzip2 as a
1684 component, you should also report this bug to the author(s) 1717 component, you should also report this bug to the author(s)
1685 of that program. Please make an effort to report this bug; 1718 of that program. Please make an effort to report this bug;
1686 timely and accurate bug reports eventually lead to higher 1719 timely and accurate bug reports eventually lead to higher
1687 quality software. Thx. Julian Seward, 27 June 1998. 1720 quality software. Thanks. Julian Seward, 24 May 1999.
1688@end example 1721@end example
1689where @code{N} is some error code number. @code{exit(3)} 1722where @code{N} is some error code number. @code{exit(3)}
1690is then called. 1723is then called.
@@ -1721,7 +1754,7 @@ If you can't
1721open the project file for some reason, make a new one, naming these files: 1754open the project file for some reason, make a new one, naming these files:
1722@code{blocksort.c}, @code{bzlib.c}, @code{compress.c}, 1755@code{blocksort.c}, @code{bzlib.c}, @code{compress.c},
1723@code{crctable.c}, @code{decompress.c}, @code{huffman.c}, @* 1756@code{crctable.c}, @code{decompress.c}, @code{huffman.c}, @*
1724@code{randtable.c} and @code{libbz2.def}. You might also need 1757@code{randtable.c} and @code{libbz2.def}. You will also need
1725to name the header files @code{bzlib.h} and @code{bzlib_private.h}. 1758to name the header files @code{bzlib.h} and @code{bzlib_private.h}.
1726 1759
1727If you don't use VC++, you may need to define the proprocessor symbol 1760If you don't use VC++, you may need to define the proprocessor symbol
@@ -1730,6 +1763,14 @@ If you don't use VC++, you may need to define the proprocessor symbol
1730Finally, @code{dlltest.c} is a sample program using the DLL. It has a 1763Finally, @code{dlltest.c} is a sample program using the DLL. It has a
1731project file, @code{dlltest.dsp}. 1764project file, @code{dlltest.dsp}.
1732 1765
1766If you just want a makefile for Visual C, have a look at
1767@code{makefile.msc}.
1768
1769Be aware that if you compile @code{bzip2} itself on Win32, you must set
1770@code{BZ_UNIX} to 0 and @code{BZ_LCCWIN32} to 1, in the file
1771@code{bzip2.c}, before compiling. Otherwise the resulting binary won't
1772work correctly.
1773
1733I haven't tried any of this stuff myself, but it all looks plausible. 1774I haven't tried any of this stuff myself, but it all looks plausible.
1734 1775
1735 1776
@@ -1740,7 +1781,8 @@ These are just some random thoughts of mine. Your mileage may
1740vary. 1781vary.
1741 1782
1742@section Limitations of the compressed file format 1783@section Limitations of the compressed file format
1743@code{bzip2-0.9.0} uses exactly the same file format as the previous 1784@code{bzip2-0.9.5} and @code{0.9.0}
1785use exactly the same file format as the previous
1744version, @code{bzip2-0.1}. This decision was made in the interests of 1786version, @code{bzip2-0.1}. This decision was made in the interests of
1745stability. Creating yet another incompatible compressed file format 1787stability. Creating yet another incompatible compressed file format
1746would create further confusion and disruption for users. 1788would create further confusion and disruption for users.
@@ -1776,12 +1818,11 @@ decompression and, in retrospect, are unnecessary. These are:
1776 @code{bzip2-0.1}'s performance on repetitive data, so 1818 @code{bzip2-0.1}'s performance on repetitive data, so
1777 perhaps it isn't a problem for real inputs. 1819 perhaps it isn't a problem for real inputs.
1778 1820
1779 Probably the best long-term solution 1821 Probably the best long-term solution,
1822 and the one I have incorporated into 0.9.5 and above,
1780 is to use the existing sorting 1823 is to use the existing sorting
1781 algorithm initially, and fall back to a O(N (log N)^2) 1824 algorithm initially, and fall back to a O(N (log N)^2)
1782 algorithm if the standard algorithm gets into difficulties. 1825 algorithm if the standard algorithm gets into difficulties.
1783 This can be done without much difficulty; I made
1784 a prototype implementation of it some months now.
1785@item The compressed file format was never designed to be 1826@item The compressed file format was never designed to be
1786 handled by a library, and I have had to jump though 1827 handled by a library, and I have had to jump though
1787 some hoops to produce an efficient implementation of 1828 some hoops to produce an efficient implementation of
@@ -1797,7 +1838,7 @@ It would be fair to say that the @code{bzip2} format was frozen
1797before I properly and fully understood the performance 1838before I properly and fully understood the performance
1798consequences of doing so. 1839consequences of doing so.
1799 1840
1800Improvements which I have been able to incorporate into 1841Improvements which I was able to incorporate into
18010.9.0, despite using the same file format, are: 18420.9.0, despite using the same file format, are:
1802@itemize @bullet 1843@itemize @bullet
1803@item Single array implementation of the inverse BWT. This 1844@item Single array implementation of the inverse BWT. This
@@ -1808,8 +1849,7 @@ Improvements which I have been able to incorporate into
1808 of values. 1849 of values.
1809@item @code{bzip2-0.9.0} now reads and writes files with @code{fread} 1850@item @code{bzip2-0.9.0} now reads and writes files with @code{fread}
1810 and @code{fwrite}; version 0.1 used @code{putc} and @code{getc}. 1851 and @code{fwrite}; version 0.1 used @code{putc} and @code{getc}.
1811 Duh! I'm embarrassed at my own moronicness (moronicity?) on this 1852 Duh! Well, you live and learn.
1812 one.
1813 1853
1814@end itemize 1854@end itemize
1815Further ahead, it would be nice 1855Further ahead, it would be nice
@@ -1820,7 +1860,7 @@ require some careful design of compressed file formats.
1820 1860
1821@section Portability issues 1861@section Portability issues
1822After some consideration, I have decided not to use 1862After some consideration, I have decided not to use
1823GNU @code{autoconf} to configure 0.9.0. 1863GNU @code{autoconf} to configure 0.9.5.
1824 1864
1825@code{autoconf}, admirable and wonderful though it is, 1865@code{autoconf}, admirable and wonderful though it is,
1826mainly assists with portability problems between Unix-like 1866mainly assists with portability problems between Unix-like
@@ -1835,8 +1875,10 @@ under Unix straight out-of-the-box, so to speak, especially
1835if you have a version of GNU C available. 1875if you have a version of GNU C available.
1836 1876
1837There are a couple of @code{__inline__} directives in the code. GNU C 1877There are a couple of @code{__inline__} directives in the code. GNU C
1838(@code{gcc}) should be able to handle them. If your compiler doesn't 1878(@code{gcc}) should be able to handle them. If you're not using
1839like them, just @code{#define} @code{__inline__} to be null. One 1879GNU C, your C compiler shouldn't see them at all.
1880If your compiler does, for some reason, see them and doesn't
1881like them, just @code{#define} @code{__inline__} to be @code{/* */}. One
1840easy way to do this is to compile with the flag @code{-D__inline__=}, 1882easy way to do this is to compile with the flag @code{-D__inline__=},
1841which should be understood by most Unix compilers. 1883which should be understood by most Unix compilers.
1842 1884
@@ -1853,6 +1895,11 @@ distribution, please try and link it statically (@code{gcc -s}). This
1853avoids all sorts of library-version issues that others may encounter 1895avoids all sorts of library-version issues that others may encounter
1854later on. 1896later on.
1855 1897
1898If you build @code{bzip2} on Win32, you must set @code{BZ_UNIX} to 0 and
1899@code{BZ_LCCWIN32} to 1, in the file @code{bzip2.c}, before compiling.
1900Otherwise the resulting binary won't work correctly.
1901
1902
1856 1903
1857@section Reporting bugs 1904@section Reporting bugs
1858I tried pretty hard to make sure @code{bzip2} is 1905I tried pretty hard to make sure @code{bzip2} is
@@ -1976,45 +2023,39 @@ A record of the tests I've done.
1976 2023
1977First, some data sets: 2024First, some data sets:
1978@itemize @bullet 2025@itemize @bullet
1979@item B: a directory containing a 6001 files, one for every length in the 2026@item B: a directory containing 6001 files, one for every length in the
1980 range 0 to 6000 bytes. The files contain random lowercase 2027 range 0 to 6000 bytes. The files contain random lowercase
1981 letters. 18.7 megabytes. 2028 letters. 18.7 megabytes.
1982@item H: my home directory tree. Documents, source code, mail files, 2029@item H: my home directory tree. Documents, source code, mail files,
1983 compressed data. H contains B, and also a directory of 2030 compressed data. H contains B, and also a directory of
1984 files designed as boundary cases for the sorting; mostly very 2031 files designed as boundary cases for the sorting; mostly very
1985 repetitive, nasty files. 445 megabytes. 2032 repetitive, nasty files. 565 megabytes.
1986@item A: directory tree holding various applications built from source: 2033@item A: directory tree holding various applications built from source:
1987 @code{egcs-1.0.2}, @code{gcc-2.8.1}, KDE Beta 4, GTK, Octave, etc. 2034 @code{egcs}, @code{gcc-2.8.1}, KDE, GTK, Octave, etc.
1988 827 megabytes. 2035 2200 megabytes.
1989@item P: directory tree holding large amounts of source code (@code{.tar}
1990 files) of the entire GNU distribution, plus a couple of
1991 Linux distributions. 2400 megabytes.
1992@end itemize 2036@end itemize
1993The tests conducted are as follows. Each test means compressing 2037The tests conducted are as follows. Each test means compressing
1994(a copy of) each file in the data set, decompressing it and 2038(a copy of) each file in the data set, decompressing it and
1995comparing it against the original. 2039comparing it against the original.
1996 2040
1997First, a bunch of tests with block sizes, internal buffer 2041First, a bunch of tests with block sizes and internal buffer
1998sizes and randomisation lengths set very small, 2042sizes set very small,
1999to detect any problems with the 2043to detect any problems with the
2000blocking, buffering and randomisation mechanisms. 2044blocking and buffering mechanisms.
2001This required modifying the source code so as to try to 2045This required modifying the source code so as to try to
2002break it. 2046break it.
2003@enumerate 2047@enumerate
2004@item Data set H, with 2048@item Data set H, with
2005 buffer size of 1 byte, and block size of 23 bytes. 2049 buffer size of 1 byte, and block size of 23 bytes.
2006@item Data set B, buffer sizes 1 byte, block size 1 byte. 2050@item Data set B, buffer sizes 1 byte, block size 1 byte.
2007@item As (2) but small-mode decompression (first 1700 files). 2051@item As (2) but small-mode decompression.
2008@item As (2) with block size 2 bytes. 2052@item As (2) with block size 2 bytes.
2009@item As (2) with block size 3 bytes. 2053@item As (2) with block size 3 bytes.
2010@item As (2) with block size 4 bytes. 2054@item As (2) with block size 4 bytes.
2011@item As (2) with block size 5 bytes. 2055@item As (2) with block size 5 bytes.
2012@item As (2) with block size 6 bytes and small-mode decompression. 2056@item As (2) with block size 6 bytes and small-mode decompression.
2013@item H with normal buffer sizes (5000 bytes), normal block 2057@item H with buffer size of 1 byte, but normal block
2014 size (up to 900000 bytes), but with randomisation 2058 size (up to 900000 bytes).
2015 mechanism running intensely (randomising approximately every
2016 third byte).
2017@item As (9) with small-mode decompression.
2018@end enumerate 2059@end enumerate
2019Then some tests with unmodified source code. 2060Then some tests with unmodified source code.
2020@enumerate 2061@enumerate
@@ -2023,18 +2064,26 @@ Then some tests with unmodified source code.
2023@item H, compress with flag @code{-1}. 2064@item H, compress with flag @code{-1}.
2024@item H, compress with flag @code{-s}, decompress with flag @code{-s}. 2065@item H, compress with flag @code{-s}, decompress with flag @code{-s}.
2025@item Forwards compatibility: H, @code{bzip2-0.1pl2} compressing, 2066@item Forwards compatibility: H, @code{bzip2-0.1pl2} compressing,
2026 @code{bzip2-0.9.0} decompressing, all settings normal. 2067 @code{bzip2-0.9.5} decompressing, all settings normal.
2027@item Backwards compatibility: H, @code{bzip2-0.9.0} compressing, 2068@item Backwards compatibility: H, @code{bzip2-0.9.5} compressing,
2028 @code{bzip2-0.1pl2} decompressing, all settings normal. 2069 @code{bzip2-0.1pl2} decompressing, all settings normal.
2029@item Bigger tests: A, all settings normal. 2070@item Bigger tests: A, all settings normal.
2030@item P, all settings normal. 2071@item As (7), using the fallback (Sadakane-like) sorting algorithm.
2031@item Misc test: about 100 megabytes of @code{.tar} files with 2072@item As (8), compress with flag @code{-1}, decompress with flag
2032 @code{bzip2} compiled with Purify. 2073 @code{-s}.
2074@item H, using the fallback sorting algorithm.
2075@item Forwards compatibility: A, @code{bzip2-0.1pl2} compressing,
2076 @code{bzip2-0.9.5} decompressing, all settings normal.
2077@item Backwards compatibility: A, @code{bzip2-0.9.5} compressing,
2078 @code{bzip2-0.1pl2} decompressing, all settings normal.
2079@item Misc test: about 400 megabytes of @code{.tar} files with
2080 @code{bzip2} compiled with Checker (a memory access error
2081 detector, like Purify).
2033@item Misc tests to make sure it builds and runs ok on non-Linux/x86 2082@item Misc tests to make sure it builds and runs ok on non-Linux/x86
2034 platforms. 2083 platforms.
2035@end enumerate 2084@end enumerate
2036These tests were conducted on a 205 MHz Cyrix 6x86MX machine, running 2085These tests were conducted on a 225 MHz IDT WinChip machine, running
2037Linux 2.0.32. They represent nearly a week of continuous computation. 2086Linux 2.0.36. They represent nearly a week of continuous computation.
2038All tests completed successfully. 2087All tests completed successfully.
2039 2088
2040 2089
@@ -2063,7 +2112,7 @@ David J. Wheeler
2063 Program bred3.c and accompanying document bred3.ps. 2112 Program bred3.c and accompanying document bred3.ps.
2064 This contains the idea behind the multi-table Huffman 2113 This contains the idea behind the multi-table Huffman
2065 coding scheme. 2114 coding scheme.
2066 ftp://ftp.cl.cam.ac.uk/pub/user/djw3/ 2115 ftp://ftp.cl.cam.ac.uk/users/djw3/
2067 2116
2068Jon L. Bentley and Robert Sedgewick 2117Jon L. Bentley and Robert Sedgewick
2069 "Fast Algorithms for Sorting and Searching Strings" 2118 "Fast Algorithms for Sorting and Searching Strings"