diff options
Diffstat (limited to 'bzip2.1.preformatted')
| -rw-r--r-- | bzip2.1.preformatted | 318 |
1 files changed, 158 insertions, 160 deletions
diff --git a/bzip2.1.preformatted b/bzip2.1.preformatted index 5206e05..8c4fab1 100644 --- a/bzip2.1.preformatted +++ b/bzip2.1.preformatted | |||
| @@ -5,18 +5,20 @@ bzip2(1) bzip2(1) | |||
| 5 | 5 | ||
| 6 | 6 | ||
| 7 | NNAAMMEE | 7 | NNAAMMEE |
| 8 | bzip2, bunzip2 - a block-sorting file compressor, v0.1 | 8 | bzip2, bunzip2 - a block-sorting file compressor, v0.9.0 |
| 9 | bzcat - decompresses files to stdout | ||
| 9 | bzip2recover - recovers data from damaged bzip2 files | 10 | bzip2recover - recovers data from damaged bzip2 files |
| 10 | 11 | ||
| 11 | 12 | ||
| 12 | SSYYNNOOPPSSIISS | 13 | SSYYNNOOPPSSIISS |
| 13 | bbzziipp22 [ --ccddffkkssttvvVVLL112233445566778899 ] [ _f_i_l_e_n_a_m_e_s _._._. ] | 14 | bbzziipp22 [ --ccddffkkssttvvzzVVLL112233445566778899 ] [ _f_i_l_e_n_a_m_e_s _._._. ] |
| 14 | bbuunnzziipp22 [ --kkvvssVVLL ] [ _f_i_l_e_n_a_m_e_s _._._. ] | 15 | bbuunnzziipp22 [ --ffkkvvssVVLL ] [ _f_i_l_e_n_a_m_e_s _._._. ] |
| 16 | bbzzccaatt [ --ss ] [ _f_i_l_e_n_a_m_e_s _._._. ] | ||
| 15 | bbzziipp22rreeccoovveerr _f_i_l_e_n_a_m_e | 17 | bbzziipp22rreeccoovveerr _f_i_l_e_n_a_m_e |
| 16 | 18 | ||
| 17 | 19 | ||
| 18 | DDEESSCCRRIIPPTTIIOONN | 20 | DDEESSCCRRIIPPTTIIOONN |
| 19 | _B_z_i_p_2 compresses files using the Burrows-Wheeler block- | 21 | _b_z_i_p_2 compresses files using the Burrows-Wheeler block- |
| 20 | sorting text compression algorithm, and Huffman coding. | 22 | sorting text compression algorithm, and Huffman coding. |
| 21 | Compression is generally considerably better than that | 23 | Compression is generally considerably better than that |
| 22 | achieved by more conventional LZ77/LZ78-based compressors, | 24 | achieved by more conventional LZ77/LZ78-based compressors, |
| @@ -26,7 +28,7 @@ DDEESSCCRRIIPPTTIIOONN | |||
| 26 | The command-line options are deliberately very similar to | 28 | The command-line options are deliberately very similar to |
| 27 | those of _G_N_U _G_z_i_p_, but they are not identical. | 29 | those of _G_N_U _G_z_i_p_, but they are not identical. |
| 28 | 30 | ||
| 29 | _B_z_i_p_2 expects a list of file names to accompany the com- | 31 | _b_z_i_p_2 expects a list of file names to accompany the com- |
| 30 | mand-line flags. Each file is replaced by a compressed | 32 | mand-line flags. Each file is replaced by a compressed |
| 31 | version of itself, with the name "original_name.bz2". | 33 | version of itself, with the name "original_name.bz2". |
| 32 | Each compressed file has the same modification date and | 34 | Each compressed file has the same modification date and |
| @@ -38,8 +40,8 @@ DDEESSCCRRIIPPTTIIOONN | |||
| 38 | cepts, or have serious file name length restrictions, such | 40 | cepts, or have serious file name length restrictions, such |
| 39 | as MS-DOS. | 41 | as MS-DOS. |
| 40 | 42 | ||
| 41 | _B_z_i_p_2 and _b_u_n_z_i_p_2 will not overwrite existing files; if | 43 | _b_z_i_p_2 and _b_u_n_z_i_p_2 will by default not overwrite existing |
| 42 | you want this to happen, you should delete them first. | 44 | files; if you want this to happen, specify the -f flag. |
| 43 | 45 | ||
| 44 | If no file names are specified, _b_z_i_p_2 compresses from | 46 | If no file names are specified, _b_z_i_p_2 compresses from |
| 45 | standard input to standard output. In this case, _b_z_i_p_2 | 47 | standard input to standard output. In this case, _b_z_i_p_2 |
| @@ -47,17 +49,15 @@ DDEESSCCRRIIPPTTIIOONN | |||
| 47 | this would be entirely incomprehensible and therefore | 49 | this would be entirely incomprehensible and therefore |
| 48 | pointless. | 50 | pointless. |
| 49 | 51 | ||
| 50 | _B_u_n_z_i_p_2 (or _b_z_i_p_2 _-_d ) decompresses and restores all spec- | 52 | _b_u_n_z_i_p_2 (or _b_z_i_p_2 _-_d ) decompresses and restores all spec- |
| 51 | ified files whose names end in ".bz2". Files without this | 53 | ified files whose names end in ".bz2". Files without this |
| 52 | suffix are ignored. Again, supplying no filenames causes | 54 | suffix are ignored. Again, supplying no filenames causes |
| 53 | decompression from standard input to standard output. | 55 | decompression from standard input to standard output. |
| 54 | 56 | ||
| 55 | You can also compress or decompress files to the standard | 57 | _b_u_n_z_i_p_2 will correctly decompress a file which is the con- |
| 56 | output by giving the -c flag. You can decompress multiple | 58 | catenation of two or more compressed files. The result is |
| 57 | files like this, but you may only compress a single file | 59 | the concatenation of the corresponding uncompressed files. |
| 58 | this way, since it would otherwise be difficult to sepa- | 60 | Integrity testing (-t) of concatenated compressed files is |
| 59 | rate out the compressed representations of the original | ||
| 60 | files. | ||
| 61 | 61 | ||
| 62 | 62 | ||
| 63 | 63 | ||
| @@ -70,6 +70,21 @@ DDEESSCCRRIIPPTTIIOONN | |||
| 70 | bzip2(1) bzip2(1) | 70 | bzip2(1) bzip2(1) |
| 71 | 71 | ||
| 72 | 72 | ||
| 73 | also supported. | ||
| 74 | |||
| 75 | You can also compress or decompress files to the standard | ||
| 76 | output by giving the -c flag. Multiple files may be com- | ||
| 77 | pressed and decompressed like this. The resulting outputs | ||
| 78 | are fed sequentially to stdout. Compression of multiple | ||
| 79 | files in this manner generates a stream containing multi- | ||
| 80 | ple compressed file representations. Such a stream can be | ||
| 81 | decompressed correctly only by _b_z_i_p_2 version 0.9.0 or | ||
| 82 | later. Earlier versions of _b_z_i_p_2 will stop after decom- | ||
| 83 | pressing the first file in the stream. | ||
| 84 | |||
| 85 | _b_z_c_a_t (or _b_z_i_p_2 _-_d_c ) decompresses all specified files to | ||
| 86 | the standard output. | ||
| 87 | |||
| 73 | Compression is always performed, even if the compressed | 88 | Compression is always performed, even if the compressed |
| 74 | file is slightly larger than the original. Files of less | 89 | file is slightly larger than the original. Files of less |
| 75 | than about one hundred bytes tend to get larger, since the | 90 | than about one hundred bytes tend to get larger, since the |
| @@ -108,36 +123,37 @@ MMEEMMOORRYY MMAANNAAGGEEMMEENNTT | |||
| 108 | file, and _b_u_n_z_i_p_2 then allocates itself just enough memory | 123 | file, and _b_u_n_z_i_p_2 then allocates itself just enough memory |
| 109 | to decompress the file. Since block sizes are stored in | 124 | to decompress the file. Since block sizes are stored in |
| 110 | compressed files, it follows that the flags -1 to -9 are | 125 | compressed files, it follows that the flags -1 to -9 are |
| 111 | irrelevant to and so ignored during decompression. Com- | 126 | irrelevant to and so ignored during decompression. |
| 112 | pression and decompression requirements, in bytes, can be | ||
| 113 | estimated as: | ||
| 114 | 127 | ||
| 115 | Compression: 400k + ( 7 x block size ) | ||
| 116 | 128 | ||
| 117 | Decompression: 100k + ( 5 x block size ), or | ||
| 118 | 100k + ( 2.5 x block size ) | ||
| 119 | 129 | ||
| 120 | Larger block sizes give rapidly diminishing marginal | 130 | 2 |
| 121 | returns; most of the compression comes from the first two | ||
| 122 | or three hundred k of block size, a fact worth bearing in | ||
| 123 | mind when using _b_z_i_p_2 on small machines. It is also | ||
| 124 | important to appreciate that the decompression memory | ||
| 125 | requirement is set at compression-time by the choice of | ||
| 126 | block size. | ||
| 127 | 131 | ||
| 128 | 132 | ||
| 129 | 133 | ||
| 130 | 2 | ||
| 131 | 134 | ||
| 132 | 135 | ||
| 136 | bzip2(1) bzip2(1) | ||
| 133 | 137 | ||
| 134 | 138 | ||
| 139 | Compression and decompression requirements, in bytes, can | ||
| 140 | be estimated as: | ||
| 135 | 141 | ||
| 136 | bzip2(1) bzip2(1) | 142 | Compression: 400k + ( 7 x block size ) |
| 137 | 143 | ||
| 144 | Decompression: 100k + ( 4 x block size ), or | ||
| 145 | 100k + ( 2.5 x block size ) | ||
| 146 | |||
| 147 | Larger block sizes give rapidly diminishing marginal | ||
| 148 | returns; most of the compression comes from the first two | ||
| 149 | or three hundred k of block size, a fact worth bearing in | ||
| 150 | mind when using _b_z_i_p_2 on small machines. It is also | ||
| 151 | important to appreciate that the decompression memory | ||
| 152 | requirement is set at compression-time by the choice of | ||
| 153 | block size. | ||
| 138 | 154 | ||
| 139 | For files compressed with the default 900k block size, | 155 | For files compressed with the default 900k block size, |
| 140 | _b_u_n_z_i_p_2 will require about 4600 kbytes to decompress. To | 156 | _b_u_n_z_i_p_2 will require about 3700 kbytes to decompress. To |
| 141 | support decompression of any file on a 4 megabyte machine, | 157 | support decompression of any file on a 4 megabyte machine, |
| 142 | _b_u_n_z_i_p_2 has an option to decompress using approximately | 158 | _b_u_n_z_i_p_2 has an option to decompress using approximately |
| 143 | half this amount of memory, about 2300 kbytes. Decompres- | 159 | half this amount of memory, about 2300 kbytes. Decompres- |
| @@ -157,8 +173,8 @@ bzip2(1) bzip2(1) | |||
| 157 | file 20,000 bytes long with the flag -9 will cause the | 173 | file 20,000 bytes long with the flag -9 will cause the |
| 158 | compressor to allocate around 6700k of memory, but only | 174 | compressor to allocate around 6700k of memory, but only |
| 159 | touch 400k + 20000 * 7 = 540 kbytes of it. Similarly, the | 175 | touch 400k + 20000 * 7 = 540 kbytes of it. Similarly, the |
| 160 | decompressor will allocate 4600k but only touch 100k + | 176 | decompressor will allocate 3700k but only touch 100k + |
| 161 | 20000 * 5 = 200 kbytes. | 177 | 20000 * 4 = 180 kbytes. |
| 162 | 178 | ||
| 163 | Here is a table which summarises the maximum memory usage | 179 | Here is a table which summarises the maximum memory usage |
| 164 | for different block sizes. Also recorded is the total | 180 | for different block sizes. Also recorded is the total |
| @@ -172,64 +188,66 @@ bzip2(1) bzip2(1) | |||
| 172 | Compress Decompress Decompress Corpus | 188 | Compress Decompress Decompress Corpus |
| 173 | Flag usage usage -s usage Size | 189 | Flag usage usage -s usage Size |
| 174 | 190 | ||
| 175 | -1 1100k 600k 350k 914704 | 191 | -1 1100k 500k 350k 914704 |
| 176 | -2 1800k 1100k 600k 877703 | 192 | -2 1800k 900k 600k 877703 |
| 177 | -3 2500k 1600k 850k 860338 | ||
| 178 | -4 3200k 2100k 1100k 846899 | ||
| 179 | -5 3900k 2600k 1350k 845160 | ||
| 180 | -6 4600k 3100k 1600k 838626 | ||
| 181 | -7 5400k 3600k 1850k 834096 | ||
| 182 | -8 6000k 4100k 2100k 828642 | ||
| 183 | -9 6700k 4600k 2350k 828642 | ||
| 184 | 193 | ||
| 185 | 194 | ||
| 186 | OOPPTTIIOONNSS | ||
| 187 | --cc ----ssttddoouutt | ||
| 188 | Compress or decompress to standard output. -c will | ||
| 189 | decompress multiple files to stdout, but will only | ||
| 190 | compress a single file to stdout. | ||
| 191 | |||
| 192 | 195 | ||
| 196 | 3 | ||
| 193 | 197 | ||
| 194 | 198 | ||
| 195 | 199 | ||
| 196 | 3 | ||
| 197 | 200 | ||
| 198 | 201 | ||
| 202 | bzip2(1) bzip2(1) | ||
| 199 | 203 | ||
| 200 | 204 | ||
| 205 | -3 2500k 1300k 850k 860338 | ||
| 206 | -4 3200k 1700k 1100k 846899 | ||
| 207 | -5 3900k 2100k 1350k 845160 | ||
| 208 | -6 4600k 2500k 1600k 838626 | ||
| 209 | -7 5400k 2900k 1850k 834096 | ||
| 210 | -8 6000k 3300k 2100k 828642 | ||
| 211 | -9 6700k 3700k 2350k 828642 | ||
| 201 | 212 | ||
| 202 | bzip2(1) bzip2(1) | ||
| 203 | 213 | ||
| 214 | OOPPTTIIOONNSS | ||
| 215 | --cc ----ssttddoouutt | ||
| 216 | Compress or decompress to standard output. -c will | ||
| 217 | decompress multiple files to stdout, but will only | ||
| 218 | compress a single file to stdout. | ||
| 204 | 219 | ||
| 205 | --dd ----ddeeccoommpprreessss | 220 | --dd ----ddeeccoommpprreessss |
| 206 | Force decompression. _B_z_i_p_2 and _b_u_n_z_i_p_2 are really | 221 | Force decompression. _b_z_i_p_2_, _b_u_n_z_i_p_2 and _b_z_c_a_t are |
| 207 | the same program, and the decision about whether to | 222 | really the same program, and the decision about |
| 208 | compress or decompress is done on the basis of | 223 | what actions to take is done on the basis of which |
| 209 | which name is used. This flag overrides that mech- | 224 | name is used. This flag overrides that mechanism, |
| 210 | anism, and forces _b_z_i_p_2 to decompress. | 225 | and forces _b_z_i_p_2 to decompress. |
| 211 | 226 | ||
| 212 | --ff ----ccoommpprreessss | 227 | --zz ----ccoommpprreessss |
| 213 | The complement to -d: forces compression, regard- | 228 | The complement to -d: forces compression, regard- |
| 214 | less of the invokation name. | 229 | less of the invokation name. |
| 215 | 230 | ||
| 216 | --tt ----tteesstt | 231 | --tt ----tteesstt |
| 217 | Check integrity of the specified file(s), but don't | 232 | Check integrity of the specified file(s), but don't |
| 218 | decompress them. This really performs a trial | 233 | decompress them. This really performs a trial |
| 219 | decompression and throws away the result, using the | 234 | decompression and throws away the result. |
| 220 | low-memory decompression algorithm (see -s). | 235 | |
| 236 | --ff ----ffoorrccee | ||
| 237 | Force overwrite of output files. Normally, _b_z_i_p_2 | ||
| 238 | will not overwrite existing output files. | ||
| 221 | 239 | ||
| 222 | --kk ----kkeeeepp | 240 | --kk ----kkeeeepp |
| 223 | Keep (don't delete) input files during compression | 241 | Keep (don't delete) input files during compression |
| 224 | or decompression. | 242 | or decompression. |
| 225 | 243 | ||
| 226 | --ss ----ssmmaallll | 244 | --ss ----ssmmaallll |
| 227 | Reduce memory usage, both for compression and | 245 | Reduce memory usage, for compression, decompression |
| 228 | decompression. Files are decompressed using a mod- | 246 | and testing. Files are decompressed and tested |
| 229 | ified algorithm which only requires 2.5 bytes per | 247 | using a modified algorithm which only requires 2.5 |
| 230 | block byte. This means any file can be decom- | 248 | bytes per block byte. This means any file can be |
| 231 | pressed in 2300k of memory, albeit somewhat more | 249 | decompressed in 2300k of memory, albeit at about |
| 232 | slowly than usual. | 250 | half the normal speed. |
| 233 | 251 | ||
| 234 | During compression, -s selects a block size of | 252 | During compression, -s selects a block size of |
| 235 | 200k, which limits memory use to around the same | 253 | 200k, which limits memory use to around the same |
| @@ -239,35 +257,32 @@ bzip2(1) bzip2(1) | |||
| 239 | MEMORY MANAGEMENT above. | 257 | MEMORY MANAGEMENT above. |
| 240 | 258 | ||
| 241 | 259 | ||
| 260 | |||
| 261 | |||
| 262 | 4 | ||
| 263 | |||
| 264 | |||
| 265 | |||
| 266 | |||
| 267 | |||
| 268 | bzip2(1) bzip2(1) | ||
| 269 | |||
| 270 | |||
| 242 | --vv ----vveerrbboossee | 271 | --vv ----vveerrbboossee |
| 243 | Verbose mode -- show the compression ratio for each | 272 | Verbose mode -- show the compression ratio for each |
| 244 | file processed. Further -v's increase the ver- | 273 | file processed. Further -v's increase the ver- |
| 245 | bosity level, spewing out lots of information which | 274 | bosity level, spewing out lots of information which |
| 246 | is primarily of interest for diagnostic purposes. | 275 | is primarily of interest for diagnostic purposes. |
| 247 | 276 | ||
| 248 | --LL ----lliicceennssee | 277 | --LL ----lliicceennssee --VV ----vveerrssiioonn |
| 249 | Display the software version, license terms and | 278 | Display the software version, license terms and |
| 250 | conditions. | 279 | conditions. |
| 251 | 280 | ||
| 252 | --VV ----vveerrssiioonn | ||
| 253 | Same as -L. | ||
| 254 | |||
| 255 | --11 ttoo --99 | 281 | --11 ttoo --99 |
| 256 | Set the block size to 100 k, 200 k .. 900 k when | 282 | Set the block size to 100 k, 200 k .. 900 k when |
| 257 | compressing. Has no effect when decompressing. | 283 | compressing. Has no effect when decompressing. |
| 258 | See MEMORY MANAGEMENT above. | 284 | See MEMORY MANAGEMENT above. |
| 259 | 285 | ||
| 260 | |||
| 261 | |||
| 262 | 4 | ||
| 263 | |||
| 264 | |||
| 265 | |||
| 266 | |||
| 267 | |||
| 268 | bzip2(1) bzip2(1) | ||
| 269 | |||
| 270 | |||
| 271 | ----rreeppeettiittiivvee--ffaasstt | 286 | ----rreeppeettiittiivvee--ffaasstt |
| 272 | _b_z_i_p_2 injects some small pseudo-random variations | 287 | _b_z_i_p_2 injects some small pseudo-random variations |
| 273 | into very repetitive blocks to limit worst-case | 288 | into very repetitive blocks to limit worst-case |
| @@ -306,34 +321,34 @@ RREECCOOVVEERRIINNGG DDAATTAA FFRROOMM DDAAMMAAGGEEDD F | |||
| 306 | _b_z_i_p_2_r_e_c_o_v_e_r takes a single argument, the name of the dam- | 321 | _b_z_i_p_2_r_e_c_o_v_e_r takes a single argument, the name of the dam- |
| 307 | aged file, and writes a number of files "rec0001file.bz2", | 322 | aged file, and writes a number of files "rec0001file.bz2", |
| 308 | "rec0002file.bz2", etc, containing the extracted blocks. | 323 | "rec0002file.bz2", etc, containing the extracted blocks. |
| 309 | The output filenames are designed so that the use of wild- | 324 | The output filenames are designed so that the use of |
| 310 | cards in subsequent processing -- for example, "bzip2 -dc | ||
| 311 | rec*file.bz2 > recovered_data" -- lists the files in the | ||
| 312 | "right" order. | ||
| 313 | 325 | ||
| 314 | _b_z_i_p_2_r_e_c_o_v_e_r should be of most use dealing with large .bz2 | ||
| 315 | files, as these will contain many blocks. It is clearly | ||
| 316 | futile to use it on damaged single-block files, since a | ||
| 317 | damaged block cannot be recovered. If you wish to min- | ||
| 318 | imise any potential data loss through media or transmis- | ||
| 319 | sion errors, you might consider compressing with a smaller | ||
| 320 | block size. | ||
| 321 | 326 | ||
| 322 | 327 | ||
| 323 | PPEERRFFOORRMMAANNCCEE NNOOTTEESS | 328 | 5 |
| 324 | The sorting phase of compression gathers together similar | ||
| 325 | 329 | ||
| 326 | 330 | ||
| 327 | 331 | ||
| 328 | 5 | ||
| 329 | 332 | ||
| 330 | 333 | ||
| 334 | bzip2(1) bzip2(1) | ||
| 331 | 335 | ||
| 332 | 336 | ||
| 337 | wildcards in subsequent processing -- for example, "bzip2 | ||
| 338 | -dc rec*file.bz2 > recovered_data" -- lists the files in | ||
| 339 | the "right" order. | ||
| 333 | 340 | ||
| 334 | bzip2(1) bzip2(1) | 341 | _b_z_i_p_2_r_e_c_o_v_e_r should be of most use dealing with large .bz2 |
| 342 | files, as these will contain many blocks. It is clearly | ||
| 343 | futile to use it on damaged single-block files, since a | ||
| 344 | damaged block cannot be recovered. If you wish to min- | ||
| 345 | imise any potential data loss through media or transmis- | ||
| 346 | sion errors, you might consider compressing with a smaller | ||
| 347 | block size. | ||
| 335 | 348 | ||
| 336 | 349 | ||
| 350 | PPEERRFFOORRMMAANNCCEE NNOOTTEESS | ||
| 351 | The sorting phase of compression gathers together similar | ||
| 337 | strings in the file. Because of this, files containing | 352 | strings in the file. Because of this, files containing |
| 338 | very long runs of repeated symbols, like "aabaabaabaab | 353 | very long runs of repeated symbols, like "aabaabaabaab |
| 339 | ..." (repeated several hundred times) may compress | 354 | ..." (repeated several hundred times) may compress |
| @@ -348,10 +363,6 @@ bzip2(1) bzip2(1) | |||
| 348 | severe slowness in compression, try making the block size | 363 | severe slowness in compression, try making the block size |
| 349 | as small as possible, with flag -1. | 364 | as small as possible, with flag -1. |
| 350 | 365 | ||
| 351 | Incompressible or virtually-incompressible data may decom- | ||
| 352 | press rather more slowly than one would hope. This is due | ||
| 353 | to a naive implementation of the move-to-front coder. | ||
| 354 | |||
| 355 | _b_z_i_p_2 usually allocates several megabytes of memory to | 366 | _b_z_i_p_2 usually allocates several megabytes of memory to |
| 356 | operate in, and then charges all over it in a fairly ran- | 367 | operate in, and then charges all over it in a fairly ran- |
| 357 | dom fashion. This means that performance, both for com- | 368 | dom fashion. This means that performance, both for com- |
| @@ -362,12 +373,6 @@ bzip2(1) bzip2(1) | |||
| 362 | large performance improvements. I imagine _b_z_i_p_2 will per- | 373 | large performance improvements. I imagine _b_z_i_p_2 will per- |
| 363 | form best on machines with very large caches. | 374 | form best on machines with very large caches. |
| 364 | 375 | ||
| 365 | Test mode (-t) uses the low-memory decompression algorithm | ||
| 366 | (-s). This means test mode does not run as fast as it | ||
| 367 | could; it could run as fast as the normal decompression | ||
| 368 | machinery. This could easily be fixed at the cost of some | ||
| 369 | code bloat. | ||
| 370 | |||
| 371 | 376 | ||
| 372 | CCAAVVEEAATTSS | 377 | CCAAVVEEAATTSS |
| 373 | I/O error messages are not as helpful as they could be. | 378 | I/O error messages are not as helpful as they could be. |
| @@ -375,19 +380,14 @@ CCAAVVEEAATTSS | |||
| 375 | but the details of what the problem is sometimes seem | 380 | but the details of what the problem is sometimes seem |
| 376 | rather misleading. | 381 | rather misleading. |
| 377 | 382 | ||
| 378 | This manual page pertains to version 0.1 of _b_z_i_p_2_. It may | 383 | This manual page pertains to version 0.9.0 of _b_z_i_p_2_. Com- |
| 379 | well happen that some future version will use a different | 384 | pressed data created by this version is entirely forwards |
| 380 | compressed file format. If you try to decompress, using | 385 | and backwards compatible with the previous public release, |
| 381 | 0.1, a .bz2 file created with some future version which | 386 | version 0.1pl2, but with the following exception: 0.9.0 |
| 382 | uses a different compressed file format, 0.1 will complain | 387 | can correctly decompress multiple concatenated compressed |
| 383 | that your file "is not a bzip2 file". If that happens, | 388 | files. 0.1pl2 cannot do this; it will stop after decom- |
| 384 | you should obtain a more recent version of _b_z_i_p_2 and use | 389 | pressing just the first file in the stream. |
| 385 | that to decompress the file. | ||
| 386 | 390 | ||
| 387 | Wildcard expansion for Windows 95 and NT is flaky. | ||
| 388 | |||
| 389 | _b_z_i_p_2_r_e_c_o_v_e_r uses 32-bit integers to represent bit posi- | ||
| 390 | tions in compressed files, so it cannot handle compressed | ||
| 391 | 391 | ||
| 392 | 392 | ||
| 393 | 393 | ||
| @@ -400,61 +400,59 @@ CCAAVVEEAATTSS | |||
| 400 | bzip2(1) bzip2(1) | 400 | bzip2(1) bzip2(1) |
| 401 | 401 | ||
| 402 | 402 | ||
| 403 | files more than 512 megabytes long. This could easily be | 403 | Wildcard expansion for Windows 95 and NT is flaky. |
| 404 | |||
| 405 | _b_z_i_p_2_r_e_c_o_v_e_r uses 32-bit integers to represent bit posi- | ||
| 406 | tions in compressed files, so it cannot handle compressed | ||
| 407 | files more than 512 megabytes long. This could easily be | ||
| 404 | fixed. | 408 | fixed. |
| 405 | 409 | ||
| 406 | _b_z_i_p_2_r_e_c_o_v_e_r sometimes reports a very small, incomplete | 410 | |
| 407 | final block. This is spurious and can be safely ignored. | 411 | AAUUTTHHOORR |
| 412 | Julian Seward, jseward@acm.org. | ||
| 413 | http://www.muraroa.demon.co.uk | ||
| 414 | |||
| 415 | The ideas embodied in _b_z_i_p_2 are due to (at least) the fol- | ||
| 416 | lowing people: Michael Burrows and David Wheeler (for the | ||
| 417 | block sorting transformation), David Wheeler (again, for | ||
| 418 | the Huffman coder), Peter Fenwick (for the structured cod- | ||
| 419 | ing model in the original _b_z_i_p_, and many refinements), and | ||
| 420 | Alistair Moffat, Radford Neal and Ian Witten (for the | ||
| 421 | arithmetic coder in the original _b_z_i_p_)_. I am much | ||
| 422 | indebted for their help, support and advice. See the man- | ||
| 423 | ual in the source distribution for pointers to sources of | ||
| 424 | documentation. Christian von Roques encouraged me to look | ||
| 425 | for faster sorting algorithms, so as to speed up compres- | ||
| 426 | sion. Bela Lubkin encouraged me to improve the worst-case | ||
| 427 | compression performance. Many people sent patches, helped | ||
| 428 | with portability problems, lent machines, gave advice and | ||
| 429 | were generally helpful. | ||
| 430 | |||
| 431 | |||
| 432 | |||
| 433 | |||
| 434 | |||
| 435 | |||
| 436 | |||
| 437 | |||
| 438 | |||
| 439 | |||
| 440 | |||
| 441 | |||
| 442 | |||
| 443 | |||
| 444 | |||
| 445 | |||
| 446 | |||
| 447 | |||
| 408 | 448 | ||
| 409 | 449 | ||
| 410 | RREELLAATTIIOONNSSHHIIPP TTOO bbzziipp--00..2211 | ||
| 411 | This program is a descendant of the _b_z_i_p program, version | ||
| 412 | 0.21, which I released in August 1996. The primary dif- | ||
| 413 | ference of _b_z_i_p_2 is its avoidance of the possibly patented | ||
| 414 | algorithms which were used in 0.21. _b_z_i_p_2 also brings | ||
| 415 | various useful refinements (-s, -t), uses less memory, | ||
| 416 | decompresses significantly faster, and has support for | ||
| 417 | recovering data from damaged files. | ||
| 418 | 450 | ||
| 419 | Because _b_z_i_p_2 uses Huffman coding to construct the com- | ||
| 420 | pressed bitstream, rather than the arithmetic coding used | ||
| 421 | in 0.21, the compressed representations generated by the | ||
| 422 | two programs are incompatible, and they will not interop- | ||
| 423 | erate. The change in suffix from .bz to .bz2 reflects | ||
| 424 | this. It would have been helpful to at least allow _b_z_i_p_2 | ||
| 425 | to decompress files created by 0.21, but this would defeat | ||
| 426 | the primary aim of having a patent-free compressor. | ||
| 427 | 451 | ||
| 428 | For a more precise statement about patent issues in bzip2, | ||
| 429 | please see the README file in the distribution. | ||
| 430 | 452 | ||
| 431 | Huffman coding necessarily involves some coding ineffi- | ||
| 432 | ciency compared to arithmetic coding. This means that | ||
| 433 | _b_z_i_p_2 compresses about 1% worse than 0.21, an unfortunate | ||
| 434 | but unavoidable fact-of-life. On the other hand, decom- | ||
| 435 | pression is approximately 50% faster for the same reason, | ||
| 436 | and the change in file format gave an opportunity to add | ||
| 437 | data-recovery features. So it is not all bad. | ||
| 438 | 453 | ||
| 439 | 454 | ||
| 440 | AAUUTTHHOORR | ||
| 441 | Julian Seward, jseward@acm.org. | ||
| 442 | 455 | ||
| 443 | The ideas embodied in _b_z_i_p and _b_z_i_p_2 are due to (at least) | ||
| 444 | the following people: Michael Burrows and David Wheeler | ||
| 445 | (for the block sorting transformation), David Wheeler | ||
| 446 | (again, for the Huffman coder), Peter Fenwick (for the | ||
| 447 | structured coding model in 0.21, and many refinements), | ||
| 448 | and Alistair Moffat, Radford Neal and Ian Witten (for the | ||
| 449 | arithmetic coder in 0.21). I am much indebted for their | ||
| 450 | help, support and advice. See the file ALGORITHMS in the | ||
| 451 | source distribution for pointers to sources of documenta- | ||
| 452 | tion. Christian von Roques encouraged me to look for | ||
| 453 | faster sorting algorithms, so as to speed up compression. | ||
| 454 | Bela Lubkin encouraged me to improve the worst-case com- | ||
| 455 | pression performance. Many people sent patches, helped | ||
| 456 | with portability problems, lent machines, gave advice and | ||
| 457 | were generally helpful. | ||
| 458 | 456 | ||
| 459 | 457 | ||
| 460 | 458 | ||
