From 977101ad5f833f5c0a574bfeea408e5301a6b052 Mon Sep 17 00:00:00 2001
From: Julian Seward <jseward@acm.org>
Date: Sun, 23 Aug 1998 22:13:13 +0200
Subject: bzip2-0.9.0c

---
 bzip2.txt | 292 ++++++++++++++++++++------------------------------------------
 1 file changed, 91 insertions(+), 201 deletions(-)

(limited to 'bzip2.txt')

diff --git a/bzip2.txt b/bzip2.txt
index aee8e2b..898dfe8 100644
--- a/bzip2.txt
+++ b/bzip2.txt
@@ -1,22 +1,22 @@
 
-
-
 bzip2(1)                                                 bzip2(1)
 
 
 NAME
-       bzip2, bunzip2 - a block-sorting file compressor, v0.1
+       bzip2, bunzip2 - a block-sorting file compressor, v0.9.0
+       bzcat - decompresses files to stdout
        bzip2recover - recovers data from damaged bzip2 files
 
 
 SYNOPSIS
-       bzip2 [ -cdfkstvVL123456789 ] [ filenames ...  ]
-       bunzip2 [ -kvsVL ] [ filenames ...  ]
+       bzip2 [ -cdfkstvzVL123456789 ] [ filenames ...  ]
+       bunzip2 [ -fkvsVL ] [ filenames ...  ]
+       bzcat [ -s ] [ filenames ...  ]
        bzip2recover filename
 
 
 DESCRIPTION
-       Bzip2  compresses  files  using the Burrows-Wheeler block-
+       bzip2  compresses  files  using the Burrows-Wheeler block-
        sorting text compression algorithm,  and  Huffman  coding.
        Compression  is  generally  considerably  better than that
        achieved by more conventional LZ77/LZ78-based compressors,
@@ -26,7 +26,7 @@ DESCRIPTION
        The command-line options are deliberately very similar  to
        those of GNU Gzip, but they are not identical.
 
-       Bzip2  expects  a list of file names to accompany the com-
+       bzip2  expects  a list of file names to accompany the com-
        mand-line flags.  Each file is replaced  by  a  compressed
        version  of  itself,  with  the  name "original_name.bz2".
        Each compressed file has the same  modification  date  and
@@ -38,8 +38,8 @@ DESCRIPTION
        cepts, or have serious file name length restrictions, such
        as MS-DOS.
 
-       Bzip2  and  bunzip2  will not overwrite existing files; if
-       you want this to happen, you should delete them first.
+       bzip2  and  bunzip2 will by default not overwrite existing
+       files; if you want this to happen, specify the -f flag.
 
        If no file names  are  specified,  bzip2  compresses  from
        standard  input  to  standard output.  In this case, bzip2
@@ -47,28 +47,29 @@ DESCRIPTION
        this  would  be  entirely  incomprehensible  and therefore
        pointless.
 
-       Bunzip2 (or bzip2 -d ) decompresses and restores all spec-
+       bunzip2 (or bzip2 -d ) decompresses and restores all spec-
        ified files whose names end in ".bz2".  Files without this
        suffix are ignored.  Again, supplying no filenames  causes
        decompression from standard input to standard output.
 
-       You  can also compress or decompress files to the standard
-       output by giving the -c flag.  You can decompress multiple
-       files  like  this, but you may only compress a single file
-       this way, since it would otherwise be difficult  to  sepa-
-       rate  out  the  compressed representations of the original
-       files.
-
-
-
-                                                                1
-
-
-
-
-
-bzip2(1)                                                 bzip2(1)
+       bunzip2 will correctly decompress a file which is the con-
+       catenation of two or more compressed files.  The result is
+       the concatenation of the corresponding uncompressed files.
+       Integrity testing (-t) of concatenated compressed files is
+       also supported.
 
+       You  can also compress or decompress files to the standard
+       output by giving the -c flag.  Multiple files may be  com-
+       pressed and decompressed like this.  The resulting outputs
+       are fed sequentially to stdout.  Compression  of  multiple
+       files  in this manner generates a stream containing multi-
+       ple compressed file representations.  Such a stream can be
+       decompressed  correctly  only  by  bzip2  version 0.9.0 or
+       later.  Earlier versions of bzip2 will stop  after  decom-
+       pressing the first file in the stream.
+
+       bzcat  (or bzip2 -dc ) decompresses all specified files to
+       the standard output.
 
        Compression is always performed, even  if  the  compressed
        file  is slightly larger than the original.  Files of less
@@ -108,13 +109,14 @@ MEMORY MANAGEMENT
        file, and bunzip2 then allocates itself just enough memory
        to decompress the file.  Since block sizes are  stored  in
        compressed  files,  it follows that the flags -1 to -9 are
-       irrelevant to and so ignored during  decompression.   Com-
-       pression  and decompression requirements, in bytes, can be
-       estimated as:
+       irrelevant  to  and  so  ignored   during   decompression.
+
+       Compression  and decompression requirements, in bytes, can
+       be estimated as:
 
              Compression:   400k + ( 7 x block size )
 
-             Decompression: 100k + ( 5 x block size ), or
+             Decompression: 100k + ( 4 x block size ), or
                             100k + ( 2.5 x block size )
 
        Larger  block  sizes  give  rapidly  diminishing  marginal
@@ -125,19 +127,8 @@ MEMORY MANAGEMENT
        requirement  is  set  at compression-time by the choice of
        block size.
 
-
-
-                                                                2
-
-
-
-
-
-bzip2(1)                                                 bzip2(1)
-
-
        For files compressed with the  default  900k  block  size,
-       bunzip2  will require about 4600 kbytes to decompress.  To
+       bunzip2  will require about 3700 kbytes to decompress.  To
        support decompression of any file on a 4 megabyte machine,
        bunzip2  has  an  option to decompress using approximately
        half this amount of memory, about 2300 kbytes.  Decompres-
@@ -157,8 +148,8 @@ bzip2(1)                                                 bzip2(1)
        file 20,000 bytes long with the flag  -9  will  cause  the
        compressor  to  allocate  around 6700k of memory, but only
        touch 400k + 20000 * 7 = 540 kbytes of it.  Similarly, the
-       decompressor  will  allocate  4600k  but only touch 100k +
-       20000 * 5 = 200 kbytes.
+       decompressor  will  allocate  3700k  but only touch 100k +
+       20000 * 4 = 180 kbytes.
 
        Here is a table which summarises the maximum memory  usage
        for  different  block  sizes.   Also recorded is the total
@@ -172,15 +163,15 @@ bzip2(1)                                                 bzip2(1)
                   Compress   Decompress   Decompress   Corpus
            Flag     usage      usage       -s usage     Size
 
-            -1      1100k       600k         350k      914704
-            -2      1800k      1100k         600k      877703
-            -3      2500k      1600k         850k      860338
-            -4      3200k      2100k        1100k      846899
-            -5      3900k      2600k        1350k      845160
-            -6      4600k      3100k        1600k      838626
-            -7      5400k      3600k        1850k      834096
-            -8      6000k      4100k        2100k      828642
-            -9      6700k      4600k        2350k      828642
+            -1      1100k       500k         350k      914704
+            -2      1800k       900k         600k      877703
+            -3      2500k      1300k         850k      860338
+            -4      3200k      1700k        1100k      846899
+            -5      3900k      2100k        1350k      845160
+            -6      4600k      2500k        1600k      838626
+            -7      5400k      2900k        1850k      834096
+            -8      6000k      3300k        2100k      828642
+            -9      6700k      3700k        2350k      828642
 
 
 OPTIONS
@@ -189,47 +180,37 @@ OPTIONS
               decompress multiple files to stdout, but will  only
               compress a single file to stdout.
 
-
-
-
-
-                                                                3
-
-
-
-
-
-bzip2(1)                                                 bzip2(1)
-
-
        -d --decompress
-              Force  decompression.  Bzip2 and bunzip2 are really
-              the same program, and the decision about whether to
-              compress  or  decompress  is  done  on the basis of
-              which name is used.  This flag overrides that mech-
-              anism, and forces bzip2 to decompress.
+              Force  decompression.  bzip2, bunzip2 and bzcat are
+              really the same program,  and  the  decision  about
+              what  actions to take is done on the basis of which
+              name is used.  This flag overrides that  mechanism,
+              and forces bzip2 to decompress.
 
-       -f --compress
+       -z --compress
               The  complement  to -d: forces compression, regard-
               less of the invokation name.
 
        -t --test
               Check integrity of the specified file(s), but don't
               decompress  them.   This  really  performs  a trial
-              decompression and throws away the result, using the
-              low-memory decompression algorithm (see -s).
+              decompression and throws away the result.
+
+       -f --force
+              Force overwrite of output files.   Normally,  bzip2
+              will not overwrite existing output files.
 
        -k --keep
               Keep  (don't delete) input files during compression
               or decompression.
 
        -s --small
-              Reduce  memory  usage,  both  for  compression  and
-              decompression.  Files are decompressed using a mod-
-              ified algorithm which only requires 2.5  bytes  per
-              block  byte.   This  means  any  file can be decom-
-              pressed in 2300k of memory,  albeit  somewhat  more
-              slowly than usual.
+              Reduce memory usage, for compression, decompression
+              and  testing.   Files  are  decompressed and tested
+              using a modified algorithm which only requires  2.5
+              bytes  per  block byte.  This means any file can be
+              decompressed in 2300k of memory,  albeit  at  about
+              half the normal speed.
 
               During  compression,  -s  selects  a  block size of
               200k, which limits memory use to  around  the  same
@@ -238,36 +219,21 @@ bzip2(1)                                                 bzip2(1)
               megabytes  or  less),  use  -s for everything.  See
               MEMORY MANAGEMENT above.
 
-
        -v --verbose
               Verbose mode -- show the compression ratio for each
               file  processed.   Further  -v's  increase the ver-
               bosity level, spewing out lots of information which
               is primarily of interest for diagnostic purposes.
 
-       -L --license
+       -L --license -V --version
               Display  the  software  version,  license terms and
               conditions.
 
-       -V --version
-              Same as -L.
-
        -1 to -9
               Set the block size to 100 k, 200 k ..  900  k  when
               compressing.   Has  no  effect  when decompressing.
               See MEMORY MANAGEMENT above.
 
-
-
-                                                                4
-
-
-
-
-
-bzip2(1)                                                 bzip2(1)
-
-
        --repetitive-fast
               bzip2 injects some small  pseudo-random  variations
               into  very  repetitive  blocks  to limit worst-case
@@ -278,7 +244,6 @@ bzip2(1)                                                 bzip2(1)
               would take before resorting to randomisation.  This
               flag makes it give up much sooner.
 
-
        --repetitive-best
               Opposite  of  --repetitive-fast;  try  a lot harder
               before resorting to randomisation.
@@ -306,10 +271,10 @@ RECOVERING DATA FROM DAMAGED FILES
        bzip2recover takes a single argument, the name of the dam-
        aged file, and writes a number of files "rec0001file.bz2",
        "rec0002file.bz2", etc, containing the  extracted  blocks.
-       The output filenames are designed so that the use of wild-
-       cards in subsequent processing -- for example, "bzip2  -dc
-       rec*file.bz2  >  recovered_data" -- lists the files in the
-       "right" order.
+       The  output  filenames  are  designed  so  that the use of
+       wildcards in subsequent processing -- for example,  "bzip2
+       -dc  rec*file.bz2  > recovered_data" -- lists the files in
+       the "right" order.
 
        bzip2recover should be of most use dealing with large .bz2
        files,  as  these will contain many blocks.  It is clearly
@@ -322,18 +287,6 @@ RECOVERING DATA FROM DAMAGED FILES
 
 PERFORMANCE NOTES
        The sorting phase of compression gathers together  similar
-
-
-
-                                                                5
-
-
-
-
-
-bzip2(1)                                                 bzip2(1)
-
-
        strings  in  the  file.  Because of this, files containing
        very long runs of  repeated  symbols,  like  "aabaabaabaab
        ..."   (repeated   several  hundred  times)  may  compress
@@ -348,10 +301,6 @@ bzip2(1)                                                 bzip2(1)
        severe slowness in compression, try making the block  size
        as small as possible, with flag -1.
 
-       Incompressible or virtually-incompressible data may decom-
-       press rather more slowly than one would hope.  This is due
-       to a naive implementation of the move-to-front coder.
-
        bzip2  usually  allocates  several  megabytes of memory to
        operate in, and then charges all over it in a fairly  ran-
        dom  fashion.   This means that performance, both for com-
@@ -362,12 +311,6 @@ bzip2(1)                                                 bzip2(1)
        large performance improvements.  I imagine bzip2 will per-
        form best on machines with very large caches.
 
-       Test mode (-t) uses the low-memory decompression algorithm
-       (-s).  This means test mode does not run  as  fast  as  it
-       could;  it  could  run as fast as the normal decompression
-       machinery.  This could easily be fixed at the cost of some
-       code bloat.
-
 
 CAVEATS
        I/O  error  messages  are not as helpful as they could be.
@@ -375,91 +318,38 @@ CAVEATS
        but  the  details  of  what  the problem is sometimes seem
        rather misleading.
 
-       This manual page pertains to version 0.1 of bzip2.  It may
-       well  happen that some future version will use a different
-       compressed file format.  If you try to  decompress,  using
-       0.1,  a  .bz2  file created with some future version which
-       uses a different compressed file format, 0.1 will complain
-       that  your  file  "is not a bzip2 file".  If that happens,
-       you should obtain a more recent version of bzip2  and  use
-       that to decompress the file.
+       This manual page pertains to version 0.9.0 of bzip2.  Com-
+       pressed  data created by this version is entirely forwards
+       and backwards compatible with the previous public release,
+       version  0.1pl2,  but  with the following exception: 0.9.0
+       can correctly decompress multiple concatenated  compressed
+       files.   0.1pl2  cannot do this; it will stop after decom-
+       pressing just the first file in the stream.
 
        Wildcard expansion for Windows 95 and NT is flaky.
 
-       bzip2recover  uses  32-bit integers to represent bit posi-
-       tions in compressed files, so it cannot handle  compressed
-
-
-
-                                                                6
-
-
-
-
-
-bzip2(1)                                                 bzip2(1)
-
-
-       files  more than 512 megabytes long.  This could easily be
+       bzip2recover uses 32-bit integers to represent  bit  posi-
+       tions  in compressed files, so it cannot handle compressed
+       files more than 512 megabytes long.  This could easily  be
        fixed.
 
-       bzip2recover sometimes reports a  very  small,  incomplete
-       final  block.  This is spurious and can be safely ignored.
-
-
-RELATIONSHIP TO bzip-0.21
-       This program is a descendant of the bzip program,  version
-       0.21,  which  I released in August 1996.  The primary dif-
-       ference of bzip2 is its avoidance of the possibly patented
-       algorithms  which  were  used  in 0.21.  bzip2 also brings
-       various useful refinements (-s,  -t),  uses  less  memory,
-       decompresses  significantly  faster,  and  has support for
-       recovering data from damaged files.
-
-       Because bzip2 uses Huffman coding to  construct  the  com-
-       pressed  bitstream, rather than the arithmetic coding used
-       in 0.21, the compressed representations generated  by  the
-       two  programs are incompatible, and they will not interop-
-       erate.  The change in suffix from  .bz  to  .bz2  reflects
-       this.   It would have been helpful to at least allow bzip2
-       to decompress files created by 0.21, but this would defeat
-       the primary aim of having a patent-free compressor.
-
-       For a more precise statement about patent issues in bzip2,
-       please see the README file in the distribution.
-
-       Huffman  coding  necessarily  involves some coding ineffi-
-       ciency compared to arithmetic  coding.   This  means  that
-       bzip2  compresses about 1% worse than 0.21, an unfortunate
-       but unavoidable fact-of-life.  On the other  hand,  decom-
-       pression  is approximately 50% faster for the same reason,
-       and the change in file format gave an opportunity  to  add
-       data-recovery features.  So it is not all bad.
-
 
 AUTHOR
        Julian Seward, jseward@acm.org.
-
-       The ideas embodied in bzip and bzip2 are due to (at least)
-       the following people: Michael Burrows  and  David  Wheeler
-       (for  the  block  sorting  transformation),  David Wheeler
-       (again, for the Huffman coder),  Peter  Fenwick  (for  the
-       structured  coding  model  in 0.21, and many refinements),
-       and Alistair Moffat, Radford Neal and Ian Witten (for  the
-       arithmetic  coder  in 0.21).  I am much indebted for their
-       help, support and advice.  See the file ALGORITHMS in  the
-       source  distribution for pointers to sources of documenta-
-       tion.  Christian von Roques  encouraged  me  to  look  for
-       faster  sorting algorithms, so as to speed up compression.
-       Bela Lubkin encouraged me to improve the  worst-case  com-
-       pression  performance.   Many  people sent patches, helped
+       http://www.muraroa.demon.co.uk
+
+       The ideas embodied in bzip2 are due to (at least) the fol-
+       lowing people: Michael Burrows and David Wheeler (for  the
+       block  sorting  transformation), David Wheeler (again, for
+       the Huffman coder), Peter Fenwick (for the structured cod-
+       ing model in the original bzip, and many refinements), and
+       Alistair Moffat, Radford Neal  and  Ian  Witten  (for  the
+       arithmetic  coder  in  the  original  bzip).   I  am  much
+       indebted for their help, support and advice.  See the man-
+       ual  in the source distribution for pointers to sources of
+       documentation.  Christian von Roques encouraged me to look
+       for  faster sorting algorithms, so as to speed up compres-
+       sion.  Bela Lubkin encouraged me to improve the worst-case
+       compression performance.  Many people sent patches, helped
        with portability problems, lent machines, gave advice  and
        were generally helpful.
-
-
-
-
-
-                                                                7
-
-
-- 
cgit v1.2.3-55-g6feb