From 977101ad5f833f5c0a574bfeea408e5301a6b052 Mon Sep 17 00:00:00 2001 From: Julian Seward Date: Sun, 23 Aug 1998 22:13:13 +0200 Subject: bzip2-0.9.0c --- bzip2.txt | 292 ++++++++++++++++++++------------------------------------------ 1 file changed, 91 insertions(+), 201 deletions(-) (limited to 'bzip2.txt') diff --git a/bzip2.txt b/bzip2.txt index aee8e2b..898dfe8 100644 --- a/bzip2.txt +++ b/bzip2.txt @@ -1,22 +1,22 @@ - - bzip2(1) bzip2(1) NAME - bzip2, bunzip2 - a block-sorting file compressor, v0.1 + bzip2, bunzip2 - a block-sorting file compressor, v0.9.0 + bzcat - decompresses files to stdout bzip2recover - recovers data from damaged bzip2 files SYNOPSIS - bzip2 [ -cdfkstvVL123456789 ] [ filenames ... ] - bunzip2 [ -kvsVL ] [ filenames ... ] + bzip2 [ -cdfkstvzVL123456789 ] [ filenames ... ] + bunzip2 [ -fkvsVL ] [ filenames ... ] + bzcat [ -s ] [ filenames ... ] bzip2recover filename DESCRIPTION - Bzip2 compresses files using the Burrows-Wheeler block- + bzip2 compresses files using the Burrows-Wheeler block- sorting text compression algorithm, and Huffman coding. Compression is generally considerably better than that achieved by more conventional LZ77/LZ78-based compressors, @@ -26,7 +26,7 @@ DESCRIPTION The command-line options are deliberately very similar to those of GNU Gzip, but they are not identical. - Bzip2 expects a list of file names to accompany the com- + bzip2 expects a list of file names to accompany the com- mand-line flags. Each file is replaced by a compressed version of itself, with the name "original_name.bz2". Each compressed file has the same modification date and @@ -38,8 +38,8 @@ DESCRIPTION cepts, or have serious file name length restrictions, such as MS-DOS. - Bzip2 and bunzip2 will not overwrite existing files; if - you want this to happen, you should delete them first. + bzip2 and bunzip2 will by default not overwrite existing + files; if you want this to happen, specify the -f flag. If no file names are specified, bzip2 compresses from standard input to standard output. In this case, bzip2 @@ -47,28 +47,29 @@ DESCRIPTION this would be entirely incomprehensible and therefore pointless. - Bunzip2 (or bzip2 -d ) decompresses and restores all spec- + bunzip2 (or bzip2 -d ) decompresses and restores all spec- ified files whose names end in ".bz2". Files without this suffix are ignored. Again, supplying no filenames causes decompression from standard input to standard output. - You can also compress or decompress files to the standard - output by giving the -c flag. You can decompress multiple - files like this, but you may only compress a single file - this way, since it would otherwise be difficult to sepa- - rate out the compressed representations of the original - files. - - - - 1 - - - - - -bzip2(1) bzip2(1) + bunzip2 will correctly decompress a file which is the con- + catenation of two or more compressed files. The result is + the concatenation of the corresponding uncompressed files. + Integrity testing (-t) of concatenated compressed files is + also supported. + You can also compress or decompress files to the standard + output by giving the -c flag. Multiple files may be com- + pressed and decompressed like this. The resulting outputs + are fed sequentially to stdout. Compression of multiple + files in this manner generates a stream containing multi- + ple compressed file representations. Such a stream can be + decompressed correctly only by bzip2 version 0.9.0 or + later. Earlier versions of bzip2 will stop after decom- + pressing the first file in the stream. + + bzcat (or bzip2 -dc ) decompresses all specified files to + the standard output. Compression is always performed, even if the compressed file is slightly larger than the original. Files of less @@ -108,13 +109,14 @@ MEMORY MANAGEMENT file, and bunzip2 then allocates itself just enough memory to decompress the file. Since block sizes are stored in compressed files, it follows that the flags -1 to -9 are - irrelevant to and so ignored during decompression. Com- - pression and decompression requirements, in bytes, can be - estimated as: + irrelevant to and so ignored during decompression. + + Compression and decompression requirements, in bytes, can + be estimated as: Compression: 400k + ( 7 x block size ) - Decompression: 100k + ( 5 x block size ), or + Decompression: 100k + ( 4 x block size ), or 100k + ( 2.5 x block size ) Larger block sizes give rapidly diminishing marginal @@ -125,19 +127,8 @@ MEMORY MANAGEMENT requirement is set at compression-time by the choice of block size. - - - 2 - - - - - -bzip2(1) bzip2(1) - - For files compressed with the default 900k block size, - bunzip2 will require about 4600 kbytes to decompress. To + bunzip2 will require about 3700 kbytes to decompress. To support decompression of any file on a 4 megabyte machine, bunzip2 has an option to decompress using approximately half this amount of memory, about 2300 kbytes. Decompres- @@ -157,8 +148,8 @@ bzip2(1) bzip2(1) file 20,000 bytes long with the flag -9 will cause the compressor to allocate around 6700k of memory, but only touch 400k + 20000 * 7 = 540 kbytes of it. Similarly, the - decompressor will allocate 4600k but only touch 100k + - 20000 * 5 = 200 kbytes. + decompressor will allocate 3700k but only touch 100k + + 20000 * 4 = 180 kbytes. Here is a table which summarises the maximum memory usage for different block sizes. Also recorded is the total @@ -172,15 +163,15 @@ bzip2(1) bzip2(1) Compress Decompress Decompress Corpus Flag usage usage -s usage Size - -1 1100k 600k 350k 914704 - -2 1800k 1100k 600k 877703 - -3 2500k 1600k 850k 860338 - -4 3200k 2100k 1100k 846899 - -5 3900k 2600k 1350k 845160 - -6 4600k 3100k 1600k 838626 - -7 5400k 3600k 1850k 834096 - -8 6000k 4100k 2100k 828642 - -9 6700k 4600k 2350k 828642 + -1 1100k 500k 350k 914704 + -2 1800k 900k 600k 877703 + -3 2500k 1300k 850k 860338 + -4 3200k 1700k 1100k 846899 + -5 3900k 2100k 1350k 845160 + -6 4600k 2500k 1600k 838626 + -7 5400k 2900k 1850k 834096 + -8 6000k 3300k 2100k 828642 + -9 6700k 3700k 2350k 828642 OPTIONS @@ -189,47 +180,37 @@ OPTIONS decompress multiple files to stdout, but will only compress a single file to stdout. - - - - - 3 - - - - - -bzip2(1) bzip2(1) - - -d --decompress - Force decompression. Bzip2 and bunzip2 are really - the same program, and the decision about whether to - compress or decompress is done on the basis of - which name is used. This flag overrides that mech- - anism, and forces bzip2 to decompress. + Force decompression. bzip2, bunzip2 and bzcat are + really the same program, and the decision about + what actions to take is done on the basis of which + name is used. This flag overrides that mechanism, + and forces bzip2 to decompress. - -f --compress + -z --compress The complement to -d: forces compression, regard- less of the invokation name. -t --test Check integrity of the specified file(s), but don't decompress them. This really performs a trial - decompression and throws away the result, using the - low-memory decompression algorithm (see -s). + decompression and throws away the result. + + -f --force + Force overwrite of output files. Normally, bzip2 + will not overwrite existing output files. -k --keep Keep (don't delete) input files during compression or decompression. -s --small - Reduce memory usage, both for compression and - decompression. Files are decompressed using a mod- - ified algorithm which only requires 2.5 bytes per - block byte. This means any file can be decom- - pressed in 2300k of memory, albeit somewhat more - slowly than usual. + Reduce memory usage, for compression, decompression + and testing. Files are decompressed and tested + using a modified algorithm which only requires 2.5 + bytes per block byte. This means any file can be + decompressed in 2300k of memory, albeit at about + half the normal speed. During compression, -s selects a block size of 200k, which limits memory use to around the same @@ -238,36 +219,21 @@ bzip2(1) bzip2(1) megabytes or less), use -s for everything. See MEMORY MANAGEMENT above. - -v --verbose Verbose mode -- show the compression ratio for each file processed. Further -v's increase the ver- bosity level, spewing out lots of information which is primarily of interest for diagnostic purposes. - -L --license + -L --license -V --version Display the software version, license terms and conditions. - -V --version - Same as -L. - -1 to -9 Set the block size to 100 k, 200 k .. 900 k when compressing. Has no effect when decompressing. See MEMORY MANAGEMENT above. - - - 4 - - - - - -bzip2(1) bzip2(1) - - --repetitive-fast bzip2 injects some small pseudo-random variations into very repetitive blocks to limit worst-case @@ -278,7 +244,6 @@ bzip2(1) bzip2(1) would take before resorting to randomisation. This flag makes it give up much sooner. - --repetitive-best Opposite of --repetitive-fast; try a lot harder before resorting to randomisation. @@ -306,10 +271,10 @@ RECOVERING DATA FROM DAMAGED FILES bzip2recover takes a single argument, the name of the dam- aged file, and writes a number of files "rec0001file.bz2", "rec0002file.bz2", etc, containing the extracted blocks. - The output filenames are designed so that the use of wild- - cards in subsequent processing -- for example, "bzip2 -dc - rec*file.bz2 > recovered_data" -- lists the files in the - "right" order. + The output filenames are designed so that the use of + wildcards in subsequent processing -- for example, "bzip2 + -dc rec*file.bz2 > recovered_data" -- lists the files in + the "right" order. bzip2recover should be of most use dealing with large .bz2 files, as these will contain many blocks. It is clearly @@ -322,18 +287,6 @@ RECOVERING DATA FROM DAMAGED FILES PERFORMANCE NOTES The sorting phase of compression gathers together similar - - - - 5 - - - - - -bzip2(1) bzip2(1) - - strings in the file. Because of this, files containing very long runs of repeated symbols, like "aabaabaabaab ..." (repeated several hundred times) may compress @@ -348,10 +301,6 @@ bzip2(1) bzip2(1) severe slowness in compression, try making the block size as small as possible, with flag -1. - Incompressible or virtually-incompressible data may decom- - press rather more slowly than one would hope. This is due - to a naive implementation of the move-to-front coder. - bzip2 usually allocates several megabytes of memory to operate in, and then charges all over it in a fairly ran- dom fashion. This means that performance, both for com- @@ -362,12 +311,6 @@ bzip2(1) bzip2(1) large performance improvements. I imagine bzip2 will per- form best on machines with very large caches. - Test mode (-t) uses the low-memory decompression algorithm - (-s). This means test mode does not run as fast as it - could; it could run as fast as the normal decompression - machinery. This could easily be fixed at the cost of some - code bloat. - CAVEATS I/O error messages are not as helpful as they could be. @@ -375,91 +318,38 @@ CAVEATS but the details of what the problem is sometimes seem rather misleading. - This manual page pertains to version 0.1 of bzip2. It may - well happen that some future version will use a different - compressed file format. If you try to decompress, using - 0.1, a .bz2 file created with some future version which - uses a different compressed file format, 0.1 will complain - that your file "is not a bzip2 file". If that happens, - you should obtain a more recent version of bzip2 and use - that to decompress the file. + This manual page pertains to version 0.9.0 of bzip2. Com- + pressed data created by this version is entirely forwards + and backwards compatible with the previous public release, + version 0.1pl2, but with the following exception: 0.9.0 + can correctly decompress multiple concatenated compressed + files. 0.1pl2 cannot do this; it will stop after decom- + pressing just the first file in the stream. Wildcard expansion for Windows 95 and NT is flaky. - bzip2recover uses 32-bit integers to represent bit posi- - tions in compressed files, so it cannot handle compressed - - - - 6 - - - - - -bzip2(1) bzip2(1) - - - files more than 512 megabytes long. This could easily be + bzip2recover uses 32-bit integers to represent bit posi- + tions in compressed files, so it cannot handle compressed + files more than 512 megabytes long. This could easily be fixed. - bzip2recover sometimes reports a very small, incomplete - final block. This is spurious and can be safely ignored. - - -RELATIONSHIP TO bzip-0.21 - This program is a descendant of the bzip program, version - 0.21, which I released in August 1996. The primary dif- - ference of bzip2 is its avoidance of the possibly patented - algorithms which were used in 0.21. bzip2 also brings - various useful refinements (-s, -t), uses less memory, - decompresses significantly faster, and has support for - recovering data from damaged files. - - Because bzip2 uses Huffman coding to construct the com- - pressed bitstream, rather than the arithmetic coding used - in 0.21, the compressed representations generated by the - two programs are incompatible, and they will not interop- - erate. The change in suffix from .bz to .bz2 reflects - this. It would have been helpful to at least allow bzip2 - to decompress files created by 0.21, but this would defeat - the primary aim of having a patent-free compressor. - - For a more precise statement about patent issues in bzip2, - please see the README file in the distribution. - - Huffman coding necessarily involves some coding ineffi- - ciency compared to arithmetic coding. This means that - bzip2 compresses about 1% worse than 0.21, an unfortunate - but unavoidable fact-of-life. On the other hand, decom- - pression is approximately 50% faster for the same reason, - and the change in file format gave an opportunity to add - data-recovery features. So it is not all bad. - AUTHOR Julian Seward, jseward@acm.org. - - The ideas embodied in bzip and bzip2 are due to (at least) - the following people: Michael Burrows and David Wheeler - (for the block sorting transformation), David Wheeler - (again, for the Huffman coder), Peter Fenwick (for the - structured coding model in 0.21, and many refinements), - and Alistair Moffat, Radford Neal and Ian Witten (for the - arithmetic coder in 0.21). I am much indebted for their - help, support and advice. See the file ALGORITHMS in the - source distribution for pointers to sources of documenta- - tion. Christian von Roques encouraged me to look for - faster sorting algorithms, so as to speed up compression. - Bela Lubkin encouraged me to improve the worst-case com- - pression performance. Many people sent patches, helped + http://www.muraroa.demon.co.uk + + The ideas embodied in bzip2 are due to (at least) the fol- + lowing people: Michael Burrows and David Wheeler (for the + block sorting transformation), David Wheeler (again, for + the Huffman coder), Peter Fenwick (for the structured cod- + ing model in the original bzip, and many refinements), and + Alistair Moffat, Radford Neal and Ian Witten (for the + arithmetic coder in the original bzip). I am much + indebted for their help, support and advice. See the man- + ual in the source distribution for pointers to sources of + documentation. Christian von Roques encouraged me to look + for faster sorting algorithms, so as to speed up compres- + sion. Bela Lubkin encouraged me to improve the worst-case + compression performance. Many people sent patches, helped with portability problems, lent machines, gave advice and were generally helpful. - - - - - - 7 - - -- cgit v1.2.3-55-g6feb