From 099d844292f60f9d58914da29e5773204dc55e7a Mon Sep 17 00:00:00 2001 From: Julian Seward Date: Sun, 30 Dec 2001 22:13:13 +0100 Subject: bzip2-1.0.2 --- bzip2.1.preformatted | 226 ++++++++++++++++++--------------------------------- 1 file changed, 81 insertions(+), 145 deletions(-) (limited to 'bzip2.1.preformatted') diff --git a/bzip2.1.preformatted b/bzip2.1.preformatted index 9f18339..0f20cb5 100644 --- a/bzip2.1.preformatted +++ b/bzip2.1.preformatted @@ -1,11 +1,9 @@ - - - bzip2(1) bzip2(1) + NNAAMMEE - bzip2, bunzip2 - a block-sorting file compressor, v1.0 + bzip2, bunzip2 - a block-sorting file compressor, v1.0.2 bzcat - decompresses files to stdout bzip2recover - recovers data from damaged bzip2 files @@ -22,20 +20,20 @@ DDEESSCCRRIIPPTTIIOONN sorting text compression algorithm, and Huffman coding. Compression is generally considerably better than that achieved by more conventional LZ77/LZ78-based compressors, - and approaches the performance of the PPM family of sta- + and approaches the performance of the PPM family of sta­ tistical compressors. The command-line options are deliberately very similar to those of _G_N_U _g_z_i_p_, but they are not identical. - _b_z_i_p_2 expects a list of file names to accompany the com- + _b_z_i_p_2 expects a list of file names to accompany the com­ mand-line flags. Each file is replaced by a compressed version of itself, with the name "original_name.bz2". - Each compressed file has the same modification date, per- - missions, and, when possible, ownership as the correspond- + Each compressed file has the same modification date, per­ + missions, and, when possible, ownership as the correspond­ ing original, so that these properties can be correctly restored at decompression time. File name handling is - naive in the sense that there is no mechanism for preserv- + naive in the sense that there is no mechanism for preserv­ ing original file names, permissions, ownerships or dates in filesystems which lack these concepts, or have serious file name length restrictions, such as MS-DOS. @@ -58,18 +56,6 @@ DDEESSCCRRIIPPTTIIOONN filename.bz2 becomes filename filename.bz becomes filename filename.tbz2 becomes filename.tar - - - - 1 - - - - - -bzip2(1) bzip2(1) - - filename.tbz becomes filename.tar anyothername becomes anyothername.out @@ -78,23 +64,23 @@ bzip2(1) bzip2(1) guess the name of the original file, and uses the original name with _._o_u_t appended. - As with compression, supplying no filenames causes decom- + As with compression, supplying no filenames causes decom­ pression from standard input to standard output. - _b_u_n_z_i_p_2 will correctly decompress a file which is the con- + _b_u_n_z_i_p_2 will correctly decompress a file which is the con­ catenation of two or more compressed files. The result is the concatenation of the corresponding uncompressed files. Integrity testing (-t) of concatenated compressed files is also supported. You can also compress or decompress files to the standard - output by giving the -c flag. Multiple files may be com- + output by giving the -c flag. Multiple files may be com­ pressed and decompressed like this. The resulting outputs are fed sequentially to stdout. Compression of multiple - files in this manner generates a stream containing multi- + files in this manner generates a stream containing multi­ ple compressed file representations. Such a stream can be decompressed correctly only by _b_z_i_p_2 version 0.9.0 or - later. Earlier versions of _b_z_i_p_2 will stop after decom- + later. Earlier versions of _b_z_i_p_2 will stop after decom­ pressing the first file in the stream. _b_z_c_a_t (or _b_z_i_p_2 _-_d_c_) decompresses all specified files to @@ -115,7 +101,7 @@ bzip2(1) bzip2(1) As a self-check for your protection, _b_z_i_p_2 uses 32-bit CRCs to make sure that the decompressed version of a file - is identical to the original. This guards against corrup- + is identical to the original. This guards against corrup­ tion of the compressed data, and against undetected bugs in _b_z_i_p_2 (hopefully very unlikely). The chances of data corruption going undetected is microscopic, about one @@ -125,17 +111,6 @@ bzip2(1) bzip2(1) you recover the original uncompressed data. You can use _b_z_i_p_2_r_e_c_o_v_e_r to try to recover data from damaged files. - - - 2 - - - - - -bzip2(1) bzip2(1) - - Return values: 0 for a normal exit, 1 for environmental problems (file not found, invalid flags, I/O errors, &c), 2 to indicate a corrupt compressed file, 3 for an internal @@ -154,8 +129,8 @@ OOPPTTIIOONNSS and forces _b_z_i_p_2 to decompress. --zz ----ccoommpprreessss - The complement to -d: forces compression, regard- - less of the invokation name. + The complement to -d: forces compression, + regardless of the invocation name. --tt ----tteesstt Check integrity of the specified file(s), but don't @@ -168,6 +143,11 @@ OOPPTTIIOONNSS forces _b_z_i_p_2 to break hard links to files, which it otherwise wouldn't do. + bzip2 normally declines to decompress files which + don't have the correct magic header bytes. If + forced (-f), however, it will pass such files + through unmodified. This is how GNU gzip behaves. + --kk ----kkeeeepp Keep (don't delete) input files during compression or decompression. @@ -190,23 +170,11 @@ OOPPTTIIOONNSS --qq ----qquuiieett Suppress non-essential warning messages. Messages pertaining to I/O errors and other critical events - - - - 3 - - - - - -bzip2(1) bzip2(1) - - will not be suppressed. --vv ----vveerrbboossee Verbose mode -- show the compression ratio for each - file processed. Further -v's increase the ver- + file processed. Further -v's increase the ver­ bosity level, spewing out lots of information which is primarily of interest for diagnostic purposes. @@ -214,20 +182,24 @@ bzip2(1) bzip2(1) Display the software version, license terms and conditions. - --11 ttoo --99 + --11 ((oorr ----ffaasstt)) ttoo --99 ((oorr ----bbeesstt)) Set the block size to 100 k, 200 k .. 900 k when compressing. Has no effect when decompressing. - See MEMORY MANAGEMENT below. + See MEMORY MANAGEMENT below. The --fast and --best + aliases are primarily for GNU gzip compatibility. + In particular, --fast doesn't make things signifi­ + cantly faster. And --best merely selects the + default behaviour. ---- Treats all subsequent arguments as file names, even - if they start with a dash. This is so you can han- + if they start with a dash. This is so you can han­ dle files with names beginning with a dash, for example: bzip2 -- -myfilename. ----rreeppeettiittiivvee--ffaasstt ----rreeppeettiittiivvee--bbeesstt These flags are redundant in versions 0.9.5 and above. They provided some coarse control over the - behaviour of the sorting algorithm in earlier ver- + behaviour of the sorting algorithm in earlier ver­ sions, which was sometimes useful. 0.9.5 and above have an improved algorithm which renders these flags irrelevant. @@ -238,7 +210,7 @@ MMEEMMOORRYY MMAANNAAGGEEMMEENNTT affects both the compression ratio achieved, and the amount of memory needed for compression and decompression. The flags -1 through -9 specify the block size to be - 100,000 bytes through 900,000 bytes (the default) respec- + 100,000 bytes through 900,000 bytes (the default) respec­ tively. At decompression time, the block size used for compression is read from the header of the compressed file, and _b_u_n_z_i_p_2 then allocates itself just enough memory @@ -256,18 +228,6 @@ MMEEMMOORRYY MMAANNAAGGEEMMEENNTT Larger block sizes give rapidly diminishing marginal returns. Most of the compression comes from the first two - - - - 4 - - - - - -bzip2(1) bzip2(1) - - or three hundred k of block size, a fact worth bearing in mind when using _b_z_i_p_2 on small machines. It is also important to appreciate that the decompression memory @@ -278,13 +238,13 @@ bzip2(1) bzip2(1) _b_u_n_z_i_p_2 will require about 3700 kbytes to decompress. To support decompression of any file on a 4 megabyte machine, _b_u_n_z_i_p_2 has an option to decompress using approximately - half this amount of memory, about 2300 kbytes. Decompres- + half this amount of memory, about 2300 kbytes. Decompres­ sion speed is also halved, so you should use this option only where necessary. The relevant flag is -s. - In general, try and use the largest block size memory con- + In general, try and use the largest block size memory con­ straints allow, since that maximises the compression - achieved. Compression and decompression speed are virtu- + achieved. Compression and decompression speed are virtu­ ally unaffected by block size. Another significant point applies to files which fit in a @@ -300,11 +260,11 @@ bzip2(1) bzip2(1) Here is a table which summarises the maximum memory usage for different block sizes. Also recorded is the total - compressed size for 14 files of the Calgary Text Compres- + compressed size for 14 files of the Calgary Text Compres­ sion Corpus totalling 3,141,622 bytes. This column gives some feel for how compression varies with block size. These figures tend to understate the advantage of larger - block sizes for larger files, since the Corpus is domi- + block sizes for larger files, since the Corpus is domi­ nated by smaller files. Compress Decompress Decompress Corpus @@ -321,22 +281,9 @@ bzip2(1) bzip2(1) -9 7600k 3700k 2350k 828642 - - - - - 5 - - - - - -bzip2(1) bzip2(1) - - RREECCOOVVEERRIINNGG DDAATTAA FFRROOMM DDAAMMAAGGEEDD FFIILLEESS _b_z_i_p_2 compresses files in blocks, usually 900kbytes long. - Each block is handled independently. If a media or trans- + Each block is handled independently. If a media or trans­ mission error causes a multi-block .bz2 file to become damaged, it may be possible to recover data from the undamaged blocks in the file. @@ -353,19 +300,19 @@ RREECCOOVVEERRIINNGG DDAATTAA FFRROOMM DDAAMMAAGGEEDD F the integrity of the resulting files, and decompress those which are undamaged. - _b_z_i_p_2_r_e_c_o_v_e_r takes a single argument, the name of the dam- - aged file, and writes a number of files "rec0001file.bz2", - "rec0002file.bz2", etc, containing the extracted blocks. - The output filenames are designed so that the use of - wildcards in subsequent processing -- for example, "bzip2 - -dc rec*file.bz2 > recovered_data" -- lists the files in - the correct order. + _b_z_i_p_2_r_e_c_o_v_e_r takes a single argument, the name of the dam­ + aged file, and writes a number of files + "rec00001file.bz2", "rec00002file.bz2", etc, containing + the extracted blocks. The output filenames are + designed so that the use of wildcards in subsequent pro­ + cessing -- for example, "bzip2 -dc rec*file.bz2 > recov­ + ered_data" -- processes the files in the correct order. _b_z_i_p_2_r_e_c_o_v_e_r should be of most use dealing with large .bz2 files, as these will contain many blocks. It is clearly futile to use it on damaged single-block files, since a - damaged block cannot be recovered. If you wish to min- - imise any potential data loss through media or transmis- + damaged block cannot be recovered. If you wish to min­ + imise any potential data loss through media or transmis­ sion errors, you might consider compressing with a smaller block size. @@ -379,31 +326,19 @@ PPEERRFFOORRMMAANNCCEE NNOOTTEESS better than previous versions in this respect. The ratio between worst-case and average-case compression time is in the region of 10:1. For previous versions, this figure - was more like 100:1. You can use the -vvvv option to mon- + was more like 100:1. You can use the -vvvv option to mon­ itor progress in great detail, if you want. Decompression speed is unaffected by these phenomena. _b_z_i_p_2 usually allocates several megabytes of memory to - operate in, and then charges all over it in a fairly ran- - dom fashion. This means that performance, both for com- + operate in, and then charges all over it in a fairly ran­ + dom fashion. This means that performance, both for com­ pressing and decompressing, is largely determined by the - - - - 6 - - - - - -bzip2(1) bzip2(1) - - speed at which your machine can service cache misses. Because of this, small changes to the code to reduce the miss rate have been observed to give disproportionately - large performance improvements. I imagine _b_z_i_p_2 will per- + large performance improvements. I imagine _b_z_i_p_2 will per­ form best on machines with very large caches. @@ -413,50 +348,51 @@ CCAAVVEEAATTSS but the details of what the problem is sometimes seem rather misleading. - This manual page pertains to version 1.0 of _b_z_i_p_2_. Com- + This manual page pertains to version 1.0.2 of _b_z_i_p_2_. Com­ pressed data created by this version is entirely forwards and backwards compatible with the previous public - releases, versions 0.1pl2, 0.9.0 and 0.9.5, but with the - following exception: 0.9.0 and above can correctly decom- - press multiple concatenated compressed files. 0.1pl2 can- - not do this; it will stop after decompressing just the - first file in the stream. + releases, versions 0.1pl2, 0.9.0, 0.9.5, 1.0.0 and 1.0.1, + but with the following exception: 0.9.0 and above can cor­ + rectly decompress multiple concatenated compressed files. + 0.1pl2 cannot do this; it will stop after decompressing + just the first file in the stream. + + _b_z_i_p_2_r_e_c_o_v_e_r versions prior to this one, 1.0.2, used + 32-bit integers to represent bit positions in compressed + files, so it could not handle compressed files more than + 512 megabytes long. Version 1.0.2 and above uses 64-bit + ints on some platforms which support them (GNU supported + targets, and Windows). To establish whether or not + bzip2recover was built with such a limitation, run it + without arguments. In any event you can build yourself an + unlimited version if you can recompile it with MaybeUInt64 + set to be an unsigned 64-bit integer. + - _b_z_i_p_2_r_e_c_o_v_e_r uses 32-bit integers to represent bit posi- - tions in compressed files, so it cannot handle compressed - files more than 512 megabytes long. This could easily be - fixed. AAUUTTHHOORR Julian Seward, jseward@acm.org. - http://sourceware.cygnus.com/bzip2 - http://www.muraroa.demon.co.uk + http://sources.redhat.com/bzip2 - The ideas embodied in _b_z_i_p_2 are due to (at least) the fol- - lowing people: Michael Burrows and David Wheeler (for the - block sorting transformation), David Wheeler (again, for - the Huffman coder), Peter Fenwick (for the structured cod- + The ideas embodied in _b_z_i_p_2 are due to (at least) the fol­ + lowing people: Michael Burrows and David Wheeler (for the + block sorting transformation), David Wheeler (again, for + the Huffman coder), Peter Fenwick (for the structured cod­ ing model in the original _b_z_i_p_, and many refinements), and - Alistair Moffat, Radford Neal and Ian Witten (for the + Alistair Moffat, Radford Neal and Ian Witten (for the arithmetic coder in the original _b_z_i_p_)_. I am much - indebted for their help, support and advice. See the man- - ual in the source distribution for pointers to sources of + indebted for their help, support and advice. See the man­ + ual in the source distribution for pointers to sources of documentation. Christian von Roques encouraged me to look - for faster sorting algorithms, so as to speed up compres- + for faster sorting algorithms, so as to speed up compres­ sion. Bela Lubkin encouraged me to improve the worst-case - compression performance. Many people sent patches, helped - with portability problems, lent machines, gave advice and - were generally helpful. - - - - - - - + compression performance. The bz* scripts are derived from + those of GNU gzip. Many people sent patches, helped with + portability problems, lent machines, gave advice and were + generally helpful. - 7 + bzip2(1) -- cgit v1.2.3-55-g6feb