1 files changed, 462 insertions, 0 deletions
diff --git a/bzip2.1.preformatted b/bzip2.1.preformatted
new file mode 100644
index 0000000..947dc97
--- /dev/null
+++ b/bzip2.1.preformatted
@@ -0,0 +1,462 @@
+bzip2(1)                                                 bzip2(1)
+NNAAMMEE
+       bzip2, bunzip2 - a block-sorting file compressor, v0.1
+       bzip2recover - recovers data from damaged bzip2 files
+SSYYNNOOPPSSIISS
+       bbzziipp22 [ --ccddffkkssttvvVVLL112233445566778899 ] [ _f_i_l_e_n_a_m_e_s _._._.  ]
+       bbuunnzziipp22 [ --kkvvssVVLL ] [ _f_i_l_e_n_a_m_e_s _._._.  ]
+       bbzziipp22rreeccoovveerr _f_i_l_e_n_a_m_e
+DDEESSCCRRIIPPTTIIOONN
+       _B_z_i_p_2  compresses  files  using the Burrows-Wheeler block-
+       sorting text compression algorithm,  and  Huffman  coding.
+       Compression  is  generally  considerably  better than that
+       achieved by more conventional LZ77/LZ78-based compressors,
+       and  approaches  the performance of the PPM family of sta-
+       tistical compressors.
+       The command-line options are deliberately very similar  to
+       those of _G_N_U _G_z_i_p_, but they are not identical.
+       _B_z_i_p_2  expects  a list of file names to accompany the com-
+       mand-line flags.  Each file is replaced  by  a  compressed
+       version  of  itself,  with  the  name "original_name.bz2".
+       Each compressed file has the same  modification  date  and
+       permissions  as  the corresponding original, so that these
+       properties can  be  correctly  restored  at  decompression
+       time.  File name handling is naive in the sense that there
+       is no mechanism for preserving original file  names,  per-
+       missions  and  dates  in filesystems which lack these con-
+       cepts, or have serious file name length restrictions, such
+       as MS-DOS.
+       _B_z_i_p_2  and  _b_u_n_z_i_p_2  will not overwrite existing files; if
+       you want this to happen, you should delete them first.
+       If no file names  are  specified,  _b_z_i_p_2  compresses  from
+       standard  input  to  standard output.  In this case, _b_z_i_p_2
+       will decline to write compressed output to a terminal,  as
+       this  would  be  entirely  incomprehensible  and therefore
+       pointless.
+       _B_u_n_z_i_p_2 (or _b_z_i_p_2 _-_d ) decompresses and restores all spec-
+       ified files whose names end in ".bz2".  Files without this
+       suffix are ignored.  Again, supplying no filenames  causes
+       decompression from standard input to standard output.
+       You  can also compress or decompress files to the standard
+       output by giving the -c flag.  You can decompress multiple
+       files  like  this, but you may only compress a single file
+       this way, since it would otherwise be difficult  to  sepa-
+       rate  out  the  compressed representations of the original
+       files.
+                                                                1
+bzip2(1)                                                 bzip2(1)
+       Compression is always performed, even  if  the  compressed
+       file  is slightly larger than the original.  Files of less
+       than about one hundred bytes tend to get larger, since the
+       compression  mechanism  has  a  constant  overhead  in the
+       region of 50 bytes.  Random data (including the output  of
+       most  file  compressors)  is  coded at about 8.05 bits per
+       byte, giving an expansion of around 0.5%.
+       As a self-check for your  protection,  _b_z_i_p_2  uses  32-bit
+       CRCs  to make sure that the decompressed version of a file
+       is identical to the original.  This guards against corrup-
+       tion  of  the compressed data, and against undetected bugs
+       in _b_z_i_p_2 (hopefully very unlikely).  The chances  of  data
+       corruption  going  undetected  is  microscopic,  about one
+       chance in four billion for each file processed.  Be aware,
+       though,  that  the  check occurs upon decompression, so it
+       can only tell you that that something is wrong.  It  can't
+       help  you recover the original uncompressed data.  You can
+       use _b_z_i_p_2_r_e_c_o_v_e_r to  try  to  recover  data  from  damaged
+       files.
+       Return  values:  0  for a normal exit, 1 for environmental
+       problems (file not found, invalid flags, I/O errors,  &c),
+       2 to indicate a corrupt compressed file, 3 for an internal
+       consistency error (eg, bug) which caused _b_z_i_p_2 to panic.
+MMEEMMOORRYY MMAANNAAGGEEMMEENNTT
+       _B_z_i_p_2 compresses large files in blocks.   The  block  size
+       affects  both  the  compression  ratio  achieved,  and the
+       amount of memory needed both for  compression  and  decom-
+       pression.   The flags -1 through -9 specify the block size
+       to be 100,000 bytes through 900,000  bytes  (the  default)
+       respectively.   At decompression-time, the block size used
+       for compression is read from the header of the  compressed
+       file, and _b_u_n_z_i_p_2 then allocates itself just enough memory
+       to decompress the file.  Since block sizes are  stored  in
+       compressed  files,  it follows that the flags -1 to -9 are
+       irrelevant to and so ignored during  decompression.   Com-
+       pression  and decompression requirements, in bytes, can be
+       estimated as:
+             Compression:   400k + ( 7 x block size )
+             Decompression: 100k + ( 5 x block size ), or
+                            100k + ( 2.5 x block size )
+       Larger  block  sizes  give  rapidly  diminishing  marginal
+       returns;  most of the compression comes from the first two
+       or three hundred k of block size, a fact worth bearing  in
+       mind  when  using  _b_z_i_p_2  on  small  machines.  It is also
+       important to  appreciate  that  the  decompression  memory
+       requirement  is  set  at compression-time by the choice of
+       block size.
+                                                                2
+bzip2(1)                                                 bzip2(1)
+       For files compressed with the  default  900k  block  size,
+       _b_u_n_z_i_p_2  will require about 4600 kbytes to decompress.  To
+       support decompression of any file on a 4 megabyte machine,
+       _b_u_n_z_i_p_2  has  an  option to decompress using approximately
+       half this amount of memory, about 2300 kbytes.  Decompres-
+       sion  speed  is also halved, so you should use this option
+       only where necessary.  The relevant flag is -s.
+       In general, try and use the largest block size memory con-
+       straints  allow,  since  that  maximises  the  compression
+       achieved.  Compression and decompression speed are  virtu-
+       ally unaffected by block size.
+       Another  significant point applies to files which fit in a
+       single block -- that  means  most  files  you'd  encounter
+       using  a  large  block  size.   The  amount of real memory
+       touched is proportional to the size of the file, since the
+       file  is smaller than a block.  For example, compressing a
+       file 20,000 bytes long with the flag  -9  will  cause  the
+       compressor  to  allocate  around 6700k of memory, but only
+       touch 400k + 20000 * 7 = 540 kbytes of it.  Similarly, the
+       decompressor  will  allocate  4600k  but only touch 100k +
+       20000 * 5 = 200 kbytes.
+       Here is a table which summarises the maximum memory  usage
+       for  different  block  sizes.   Also recorded is the total
+       compressed size for 14 files of the Calgary Text  Compres-
+       sion  Corpus totalling 3,141,622 bytes.  This column gives
+       some feel for how  compression  varies  with  block  size.
+       These  figures  tend to understate the advantage of larger
+       block sizes for larger files, since the  Corpus  is  domi-
+       nated by smaller files.
+                  Compress   Decompress   Decompress   Corpus
+           Flag     usage      usage       -s usage     Size
+            -1      1100k       600k         350k      914704
+            -2      1800k      1100k         600k      877703
+            -3      2500k      1600k         850k      860338
+            -4      3200k      2100k        1100k      846899
+            -5      3900k      2600k        1350k      845160
+            -6      4600k      3100k        1600k      838626
+            -7      5400k      3600k        1850k      834096
+            -8      6000k      4100k        2100k      828642
+            -9      6700k      4600k        2350k      828642
+OOPPTTIIOONNSS
+       --cc ----ssttddoouutt
+              Compress or decompress to standard output.  -c will
+              decompress multiple files to stdout, but will  only
+              compress a single file to stdout.
+                                                                3
+bzip2(1)                                                 bzip2(1)
+       --dd ----ddeeccoommpprreessss
+              Force  decompression.  _B_z_i_p_2 and _b_u_n_z_i_p_2 are really
+              the same program, and the decision about whether to
+              compress  or  decompress  is  done  on the basis of
+              which name is used.  This flag overrides that mech-
+              anism, and forces _b_z_i_p_2 to decompress.
+       --ff ----ccoommpprreessss
+              The  complement  to -d: forces compression, regard-
+              less of the invokation name.
+       --tt ----tteesstt
+              Check integrity of the specified file(s), but don't
+              decompress  them.   This  really  performs  a trial
+              decompression and throws away the result, using the
+              low-memory decompression algorithm (see -s).
+       --kk ----kkeeeepp
+              Keep  (don't delete) input files during compression
+              or decompression.
+       --ss ----ssmmaallll
+              Reduce  memory  usage,  both  for  compression  and
+              decompression.  Files are decompressed using a mod-
+              ified algorithm which only requires 2.5  bytes  per
+              block  byte.   This  means  any  file can be decom-
+              pressed in 2300k of memory,  albeit  somewhat  more
+              slowly than usual.
+              During  compression,  -s  selects  a  block size of
+              200k, which limits memory use to  around  the  same
+              figure,  at  the expense of your compression ratio.
+              In short, if your  machine  is  low  on  memory  (8
+              megabytes  or  less),  use  -s for everything.  See
+              MEMORY MANAGEMENT above.
+       --vv ----vveerrbboossee
+              Verbose mode -- show the compression ratio for each
+              file  processed.   Further  -v's  increase the ver-
+              bosity level, spewing out lots of information which
+              is primarily of interest for diagnostic purposes.
+       --LL ----lliicceennssee
+              Display  the  software  version,  license terms and
+              conditions.
+       --VV ----vveerrssiioonn
+              Same as -L.
+       --11 ttoo --99
+              Set the block size to 100 k, 200 k ..  900  k  when
+              compressing.   Has  no  effect  when decompressing.
+              See MEMORY MANAGEMENT above.
+                                                                4
+bzip2(1)                                                 bzip2(1)
+       ----rreeppeettiittiivvee--ffaasstt
+              _b_z_i_p_2 injects some small  pseudo-random  variations
+              into  very  repetitive  blocks  to limit worst-case
+              performance during compression.   If  sorting  runs
+              into  difficulties,  the  block  is randomised, and
+              sorting is restarted.  Very roughly, _b_z_i_p_2 persists
+              for  three  times  as  long as a well-behaved input
+              would take before resorting to randomisation.  This
+              flag makes it give up much sooner.
+       ----rreeppeettiittiivvee--bbeesstt
+              Opposite  of  --repetitive-fast;  try  a lot harder
+              before resorting to randomisation.
+RREECCOOVVEERRIINNGG DDAATTAA FFRROOMM DDAAMMAAGGEEDD FFIILLEESS
+       _b_z_i_p_2 compresses files in blocks, usually 900kbytes  long.
+       Each block is handled independently.  If a media or trans-
+       mission error causes a multi-block  .bz2  file  to  become
+       damaged,  it  may  be  possible  to  recover data from the
+       undamaged blocks in the file.
+       The compressed representation of each block  is  delimited
+       by  a  48-bit pattern, which makes it possible to find the
+       block boundaries with reasonable  certainty.   Each  block
+       also  carries its own 32-bit CRC, so damaged blocks can be
+       distinguished from undamaged ones.
+       _b_z_i_p_2_r_e_c_o_v_e_r is a  simple  program  whose  purpose  is  to
+       search  for blocks in .bz2 files, and write each block out
+       into its own .bz2 file.  You can then use _b_z_i_p_2 _-_t to test
+       the integrity of the resulting files, and decompress those
+       which are undamaged.
+       _b_z_i_p_2_r_e_c_o_v_e_r takes a single argument, the name of the dam-
+       aged file, and writes a number of files "rec0001file.bz2",
+       "rec0002file.bz2", etc, containing the  extracted  blocks.
+       The output filenames are designed so that the use of wild-
+       cards in subsequent processing -- for example, "bzip2  -dc
+       rec*file.bz2  >  recovered_data" -- lists the files in the
+       "right" order.
+       _b_z_i_p_2_r_e_c_o_v_e_r should be of most use dealing with large .bz2
+       files,  as  these will contain many blocks.  It is clearly
+       futile to use it on damaged single-block  files,  since  a
+       damaged  block  cannot  be recovered.  If you wish to min-
+       imise any potential data loss through media  or  transmis-
+       sion errors, you might consider compressing with a smaller
+       block size.
+PPEERRFFOORRMMAANNCCEE NNOOTTEESS
+       The sorting phase of compression gathers together  similar
+                                                                5
+bzip2(1)                                                 bzip2(1)
+       strings  in  the  file.  Because of this, files containing
+       very long runs of  repeated  symbols,  like  "aabaabaabaab
+       ..."   (repeated   several  hundred  times)  may  compress
+       extraordinarily slowly.  You can use the -vvvvv option  to
+       monitor progress in great detail, if you want.  Decompres-
+       sion speed is unaffected.
+       Such pathological cases seem rare in  practice,  appearing
+       mostly in artificially-constructed test files, and in low-
+       level disk images.  It may be inadvisable to use _b_z_i_p_2  to
+       compress  the  latter.   If you do get a file which causes
+       severe slowness in compression, try making the block  size
+       as small as possible, with flag -1.
+       Incompressible or virtually-incompressible data may decom-
+       press rather more slowly than one would hope.  This is due
+       to a naive implementation of the move-to-front coder.
+       _b_z_i_p_2  usually  allocates  several  megabytes of memory to
+       operate in, and then charges all over it in a fairly  ran-
+       dom  fashion.   This means that performance, both for com-
+       pressing and decompressing, is largely determined  by  the
+       speed  at  which  your  machine  can service cache misses.
+       Because of this, small changes to the code to  reduce  the
+       miss  rate  have  been observed to give disproportionately
+       large performance improvements.  I imagine _b_z_i_p_2 will per-
+       form best on machines with very large caches.
+       Test mode (-t) uses the low-memory decompression algorithm
+       (-s).  This means test mode does not run  as  fast  as  it
+       could;  it  could  run as fast as the normal decompression
+       machinery.  This could easily be fixed at the cost of some
+       code bloat.
+CCAAVVEEAATTSS
+       I/O  error  messages  are not as helpful as they could be.
+       _B_z_i_p_2 tries hard to detect I/O errors  and  exit  cleanly,
+       but  the  details  of  what  the problem is sometimes seem
+       rather misleading.
+       This manual page pertains to version 0.1 of _b_z_i_p_2_.  It may
+       well  happen that some future version will use a different
+       compressed file format.  If you try to  decompress,  using
+       0.1,  a  .bz2  file created with some future version which
+       uses a different compressed file format, 0.1 will complain
+       that  your  file  "is not a bzip2 file".  If that happens,
+       you should obtain a more recent version of _b_z_i_p_2  and  use
+       that to decompress the file.
+       Wildcard expansion for Windows 95 and NT is flaky.
+       _b_z_i_p_2_r_e_c_o_v_e_r  uses  32-bit integers to represent bit posi-
+       tions in compressed files, so it cannot handle  compressed
+                                                                6
+bzip2(1)                                                 bzip2(1)
+       files  more than 512 megabytes long.  This could easily be
+       fixed.
+       _b_z_i_p_2_r_e_c_o_v_e_r sometimes reports a  very  small,  incomplete
+       final  block.  This is spurious and can be safely ignored.
+RREELLAATTIIOONNSSHHIIPP TTOO bbzziipp--00..2211
+       This program is a descendant of the _b_z_i_p program,  version
+       0.21,  which  I released in August 1996.  The primary dif-
+       ference of _b_z_i_p_2 is its avoidance of the possibly patented
+       algorithms  which  were  used  in 0.21.  _b_z_i_p_2 also brings
+       various useful refinements (-s,  -t),  uses  less  memory,
+       decompresses  significantly  faster,  and  has support for
+       recovering data from damaged files.
+       Because _b_z_i_p_2 uses Huffman coding to  construct  the  com-
+       pressed  bitstream, rather than the arithmetic coding used
+       in 0.21, the compressed representations generated  by  the
+       two  programs are incompatible, and they will not interop-
+       erate.  The change in suffix from  .bz  to  .bz2  reflects
+       this.   It would have been helpful to at least allow _b_z_i_p_2
+       to decompress files created by 0.21, but this would defeat
+       the primary aim of having a patent-free compressor.
+       Huffman  coding  necessarily  involves some coding ineffi-
+       ciency compared to arithmetic  coding.   This  means  that
+       _b_z_i_p_2  compresses about 1% worse than 0.21, an unfortunate
+       but unavoidable fact-of-life.  On the other  hand,  decom-
+       pression  is approximately 50% faster for the same reason,
+       and the change in file format gave an opportunity  to  add
+       data-recovery features.  So it is not all bad.
+AAUUTTHHOORR
+       Julian Seward, jseward@acm.org.
+       The ideas embodied in _b_z_i_p and _b_z_i_p_2 are due to (at least)
+       the following people: Michael Burrows  and  David  Wheeler
+       (for  the  block  sorting  transformation),  David Wheeler
+       (again, for the Huffman coder),  Peter  Fenwick  (for  the
+       structured  coding  model  in 0.21, and many refinements),
+       and Alistair Moffat, Radford Neal and Ian Witten (for  the
+       arithmetic  coder  in 0.21).  I am much indebted for their
+       help, support and advice.  See the file ALGORITHMS in  the
+       source  distribution for pointers to sources of documenta-
+       tion.  Christian von Roques  encouraged  me  to  look  for
+       faster  sorting algorithms, so as to speed up compression.
+       Bela Lubkin encouraged me to improve the  worst-case  com-
+       pression  performance.   Many  people sent patches, helped
+       with portability problems, lent machines, gave advice  and
+       were generally helpful.
+                                                                7

diff --git a/bzip2.1.preformatted b/bzip2.1.preformatted new file mode 100644 index 0000000..947dc97 --- /dev/null +++ b/bzip2.1.preformatted
@@ -0,0 +1,462 @@
	1
	2
	3
	4	bzip2(1) bzip2(1)
	5
	6
	7	NNAAMMEE
	8	bzip2, bunzip2 - a block-sorting file compressor, v0.1
	9	bzip2recover - recovers data from damaged bzip2 files
	10
	11
	12	SSYYNNOOPPSSIISS
	13	bbzziipp22 [ --ccddffkkssttvvVVLL112233445566778899 ] [ _f_i_l_e_n_a_m_e_s _._._. ]
	14	bbuunnzziipp22 [ --kkvvssVVLL ] [ _f_i_l_e_n_a_m_e_s _._._. ]
	15	bbzziipp22rreeccoovveerr _f_i_l_e_n_a_m_e
	16
	17
	18	DDEESSCCRRIIPPTTIIOONN
	19	_B_z_i_p_2 compresses files using the Burrows-Wheeler block-
	20	sorting text compression algorithm, and Huffman coding.
	21	Compression is generally considerably better than that
	22	achieved by more conventional LZ77/LZ78-based compressors,
	23	and approaches the performance of the PPM family of sta-
	24	tistical compressors.
	25
	26	The command-line options are deliberately very similar to
	27	those of _G_N_U _G_z_i_p_, but they are not identical.
	28
	29	_B_z_i_p_2 expects a list of file names to accompany the com-
	30	mand-line flags. Each file is replaced by a compressed
	31	version of itself, with the name "original_name.bz2".
	32	Each compressed file has the same modification date and
	33	permissions as the corresponding original, so that these
	34	properties can be correctly restored at decompression
	35	time. File name handling is naive in the sense that there
	36	is no mechanism for preserving original file names, per-
	37	missions and dates in filesystems which lack these con-
	38	cepts, or have serious file name length restrictions, such
	39	as MS-DOS.
	40
	41	_B_z_i_p_2 and _b_u_n_z_i_p_2 will not overwrite existing files; if
	42	you want this to happen, you should delete them first.
	43
	44	If no file names are specified, _b_z_i_p_2 compresses from
	45	standard input to standard output. In this case, _b_z_i_p_2
	46	will decline to write compressed output to a terminal, as
	47	this would be entirely incomprehensible and therefore
	48	pointless.
	49
	50	_B_u_n_z_i_p_2 (or _b_z_i_p_2 _-_d ) decompresses and restores all spec-
	51	ified files whose names end in ".bz2". Files without this
	52	suffix are ignored. Again, supplying no filenames causes
	53	decompression from standard input to standard output.
	54
	55	You can also compress or decompress files to the standard
	56	output by giving the -c flag. You can decompress multiple
	57	files like this, but you may only compress a single file
	58	this way, since it would otherwise be difficult to sepa-
	59	rate out the compressed representations of the original
	60	files.
	61
	62
	63
	64	1
	65
	66
	67
	68
	69
	70	bzip2(1) bzip2(1)
	71
	72
	73	Compression is always performed, even if the compressed
	74	file is slightly larger than the original. Files of less
	75	than about one hundred bytes tend to get larger, since the
	76	compression mechanism has a constant overhead in the
	77	region of 50 bytes. Random data (including the output of
	78	most file compressors) is coded at about 8.05 bits per
	79	byte, giving an expansion of around 0.5%.
	80
	81	As a self-check for your protection, _b_z_i_p_2 uses 32-bit
	82	CRCs to make sure that the decompressed version of a file
	83	is identical to the original. This guards against corrup-
	84	tion of the compressed data, and against undetected bugs
	85	in _b_z_i_p_2 (hopefully very unlikely). The chances of data
	86	corruption going undetected is microscopic, about one
	87	chance in four billion for each file processed. Be aware,
	88	though, that the check occurs upon decompression, so it
	89	can only tell you that that something is wrong. It can't
	90	help you recover the original uncompressed data. You can
	91	use _b_z_i_p_2_r_e_c_o_v_e_r to try to recover data from damaged
	92	files.
	93
	94	Return values: 0 for a normal exit, 1 for environmental
	95	problems (file not found, invalid flags, I/O errors, &c),
	96	2 to indicate a corrupt compressed file, 3 for an internal
	97	consistency error (eg, bug) which caused _b_z_i_p_2 to panic.
	98
	99
	100	MMEEMMOORRYY MMAANNAAGGEEMMEENNTT
	101	_B_z_i_p_2 compresses large files in blocks. The block size
	102	affects both the compression ratio achieved, and the
	103	amount of memory needed both for compression and decom-
	104	pression. The flags -1 through -9 specify the block size
	105	to be 100,000 bytes through 900,000 bytes (the default)
	106	respectively. At decompression-time, the block size used
	107	for compression is read from the header of the compressed
	108	file, and _b_u_n_z_i_p_2 then allocates itself just enough memory
	109	to decompress the file. Since block sizes are stored in
	110	compressed files, it follows that the flags -1 to -9 are
	111	irrelevant to and so ignored during decompression. Com-
	112	pression and decompression requirements, in bytes, can be
	113	estimated as:
	114
	115	Compression: 400k + ( 7 x block size )
	116
	117	Decompression: 100k + ( 5 x block size ), or
	118	100k + ( 2.5 x block size )
	119
	120	Larger block sizes give rapidly diminishing marginal
	121	returns; most of the compression comes from the first two
	122	or three hundred k of block size, a fact worth bearing in
	123	mind when using _b_z_i_p_2 on small machines. It is also
	124	important to appreciate that the decompression memory
	125	requirement is set at compression-time by the choice of
	126	block size.
	127
	128
	129
	130	2
	131
	132
	133
	134
	135
	136	bzip2(1) bzip2(1)
	137
	138
	139	For files compressed with the default 900k block size,
	140	_b_u_n_z_i_p_2 will require about 4600 kbytes to decompress. To
	141	support decompression of any file on a 4 megabyte machine,
	142	_b_u_n_z_i_p_2 has an option to decompress using approximately
	143	half this amount of memory, about 2300 kbytes. Decompres-
	144	sion speed is also halved, so you should use this option
	145	only where necessary. The relevant flag is -s.
	146
	147	In general, try and use the largest block size memory con-
	148	straints allow, since that maximises the compression
	149	achieved. Compression and decompression speed are virtu-
	150	ally unaffected by block size.
	151
	152	Another significant point applies to files which fit in a
	153	single block -- that means most files you'd encounter
	154	using a large block size. The amount of real memory
	155	touched is proportional to the size of the file, since the
	156	file is smaller than a block. For example, compressing a
	157	file 20,000 bytes long with the flag -9 will cause the
	158	compressor to allocate around 6700k of memory, but only
	159	touch 400k + 20000 * 7 = 540 kbytes of it. Similarly, the
	160	decompressor will allocate 4600k but only touch 100k +
	161	20000 * 5 = 200 kbytes.
	162
	163	Here is a table which summarises the maximum memory usage
	164	for different block sizes. Also recorded is the total
	165	compressed size for 14 files of the Calgary Text Compres-
	166	sion Corpus totalling 3,141,622 bytes. This column gives
	167	some feel for how compression varies with block size.
	168	These figures tend to understate the advantage of larger
	169	block sizes for larger files, since the Corpus is domi-
	170	nated by smaller files.
	171
	172	Compress Decompress Decompress Corpus
	173	Flag usage usage -s usage Size
	174
	175	-1 1100k 600k 350k 914704
	176	-2 1800k 1100k 600k 877703
	177	-3 2500k 1600k 850k 860338
	178	-4 3200k 2100k 1100k 846899
	179	-5 3900k 2600k 1350k 845160
	180	-6 4600k 3100k 1600k 838626
	181	-7 5400k 3600k 1850k 834096
	182	-8 6000k 4100k 2100k 828642
	183	-9 6700k 4600k 2350k 828642
	184
	185
	186	OOPPTTIIOONNSS
	187	--cc ----ssttddoouutt
	188	Compress or decompress to standard output. -c will
	189	decompress multiple files to stdout, but will only
	190	compress a single file to stdout.
	191
	192
	193
	194
	195
	196	3
	197
	198
	199
	200
	201
	202	bzip2(1) bzip2(1)
	203
	204
	205	--dd ----ddeeccoommpprreessss
	206	Force decompression. _B_z_i_p_2 and _b_u_n_z_i_p_2 are really
	207	the same program, and the decision about whether to
	208	compress or decompress is done on the basis of
	209	which name is used. This flag overrides that mech-
	210	anism, and forces _b_z_i_p_2 to decompress.
	211
	212	--ff ----ccoommpprreessss
	213	The complement to -d: forces compression, regard-
	214	less of the invokation name.
	215
	216	--tt ----tteesstt
	217	Check integrity of the specified file(s), but don't
	218	decompress them. This really performs a trial
	219	decompression and throws away the result, using the
	220	low-memory decompression algorithm (see -s).
	221
	222	--kk ----kkeeeepp
	223	Keep (don't delete) input files during compression
	224	or decompression.
	225
	226	--ss ----ssmmaallll
	227	Reduce memory usage, both for compression and
	228	decompression. Files are decompressed using a mod-
	229	ified algorithm which only requires 2.5 bytes per
	230	block byte. This means any file can be decom-
	231	pressed in 2300k of memory, albeit somewhat more
	232	slowly than usual.
	233
	234	During compression, -s selects a block size of
	235	200k, which limits memory use to around the same
	236	figure, at the expense of your compression ratio.
	237	In short, if your machine is low on memory (8
	238	megabytes or less), use -s for everything. See
	239	MEMORY MANAGEMENT above.
	240
	241
	242	--vv ----vveerrbboossee
	243	Verbose mode -- show the compression ratio for each
	244	file processed. Further -v's increase the ver-
	245	bosity level, spewing out lots of information which
	246	is primarily of interest for diagnostic purposes.
	247
	248	--LL ----lliicceennssee
	249	Display the software version, license terms and
	250	conditions.
	251
	252	--VV ----vveerrssiioonn
	253	Same as -L.
	254
	255	--11 ttoo --99
	256	Set the block size to 100 k, 200 k .. 900 k when
	257	compressing. Has no effect when decompressing.
	258	See MEMORY MANAGEMENT above.
	259
	260
	261
	262	4
	263
	264
	265
	266
	267
	268	bzip2(1) bzip2(1)
	269
	270
	271	----rreeppeettiittiivvee--ffaasstt
	272	_b_z_i_p_2 injects some small pseudo-random variations
	273	into very repetitive blocks to limit worst-case
	274	performance during compression. If sorting runs
	275	into difficulties, the block is randomised, and
	276	sorting is restarted. Very roughly, _b_z_i_p_2 persists
	277	for three times as long as a well-behaved input
	278	would take before resorting to randomisation. This
	279	flag makes it give up much sooner.
	280
	281
	282	----rreeppeettiittiivvee--bbeesstt
	283	Opposite of --repetitive-fast; try a lot harder
	284	before resorting to randomisation.
	285
	286
	287	RREECCOOVVEERRIINNGG DDAATTAA FFRROOMM DDAAMMAAGGEEDD FFIILLEESS
	288	_b_z_i_p_2 compresses files in blocks, usually 900kbytes long.
	289	Each block is handled independently. If a media or trans-
	290	mission error causes a multi-block .bz2 file to become
	291	damaged, it may be possible to recover data from the
	292	undamaged blocks in the file.
	293
	294	The compressed representation of each block is delimited
	295	by a 48-bit pattern, which makes it possible to find the
	296	block boundaries with reasonable certainty. Each block
	297	also carries its own 32-bit CRC, so damaged blocks can be
	298	distinguished from undamaged ones.
	299
	300	_b_z_i_p_2_r_e_c_o_v_e_r is a simple program whose purpose is to
	301	search for blocks in .bz2 files, and write each block out
	302	into its own .bz2 file. You can then use _b_z_i_p_2 _-_t to test
	303	the integrity of the resulting files, and decompress those
	304	which are undamaged.
	305
	306	_b_z_i_p_2_r_e_c_o_v_e_r takes a single argument, the name of the dam-
	307	aged file, and writes a number of files "rec0001file.bz2",
	308	"rec0002file.bz2", etc, containing the extracted blocks.
	309	The output filenames are designed so that the use of wild-
	310	cards in subsequent processing -- for example, "bzip2 -dc
	311	rec*file.bz2 > recovered_data" -- lists the files in the
	312	"right" order.
	313
	314	_b_z_i_p_2_r_e_c_o_v_e_r should be of most use dealing with large .bz2
	315	files, as these will contain many blocks. It is clearly
	316	futile to use it on damaged single-block files, since a
	317	damaged block cannot be recovered. If you wish to min-
	318	imise any potential data loss through media or transmis-
	319	sion errors, you might consider compressing with a smaller
	320	block size.
	321
	322
	323	PPEERRFFOORRMMAANNCCEE NNOOTTEESS
	324	The sorting phase of compression gathers together similar
	325
	326
	327
	328	5
	329
	330
	331
	332
	333
	334	bzip2(1) bzip2(1)
	335
	336
	337	strings in the file. Because of this, files containing
	338	very long runs of repeated symbols, like "aabaabaabaab
	339	..." (repeated several hundred times) may compress
	340	extraordinarily slowly. You can use the -vvvvv option to
	341	monitor progress in great detail, if you want. Decompres-
	342	sion speed is unaffected.
	343
	344	Such pathological cases seem rare in practice, appearing
	345	mostly in artificially-constructed test files, and in low-
	346	level disk images. It may be inadvisable to use _b_z_i_p_2 to
	347	compress the latter. If you do get a file which causes
	348	severe slowness in compression, try making the block size
	349	as small as possible, with flag -1.
	350
	351	Incompressible or virtually-incompressible data may decom-
	352	press rather more slowly than one would hope. This is due
	353	to a naive implementation of the move-to-front coder.
	354
	355	_b_z_i_p_2 usually allocates several megabytes of memory to
	356	operate in, and then charges all over it in a fairly ran-
	357	dom fashion. This means that performance, both for com-
	358	pressing and decompressing, is largely determined by the
	359	speed at which your machine can service cache misses.
	360	Because of this, small changes to the code to reduce the
	361	miss rate have been observed to give disproportionately
	362	large performance improvements. I imagine _b_z_i_p_2 will per-
	363	form best on machines with very large caches.
	364
	365	Test mode (-t) uses the low-memory decompression algorithm
	366	(-s). This means test mode does not run as fast as it
	367	could; it could run as fast as the normal decompression
	368	machinery. This could easily be fixed at the cost of some
	369	code bloat.
	370
	371
	372	CCAAVVEEAATTSS
	373	I/O error messages are not as helpful as they could be.
	374	_B_z_i_p_2 tries hard to detect I/O errors and exit cleanly,
	375	but the details of what the problem is sometimes seem
	376	rather misleading.
	377
	378	This manual page pertains to version 0.1 of _b_z_i_p_2_. It may
	379	well happen that some future version will use a different
	380	compressed file format. If you try to decompress, using
	381	0.1, a .bz2 file created with some future version which
	382	uses a different compressed file format, 0.1 will complain
	383	that your file "is not a bzip2 file". If that happens,
	384	you should obtain a more recent version of _b_z_i_p_2 and use
	385	that to decompress the file.
	386
	387	Wildcard expansion for Windows 95 and NT is flaky.
	388
	389	_b_z_i_p_2_r_e_c_o_v_e_r uses 32-bit integers to represent bit posi-
	390	tions in compressed files, so it cannot handle compressed
	391
	392
	393
	394	6
	395
	396
	397
	398
	399
	400	bzip2(1) bzip2(1)
	401
	402
	403	files more than 512 megabytes long. This could easily be
	404	fixed.
	405
	406	_b_z_i_p_2_r_e_c_o_v_e_r sometimes reports a very small, incomplete
	407	final block. This is spurious and can be safely ignored.
	408
	409
	410	RREELLAATTIIOONNSSHHIIPP TTOO bbzziipp--00..2211
	411	This program is a descendant of the _b_z_i_p program, version
	412	0.21, which I released in August 1996. The primary dif-
	413	ference of _b_z_i_p_2 is its avoidance of the possibly patented
	414	algorithms which were used in 0.21. _b_z_i_p_2 also brings
	415	various useful refinements (-s, -t), uses less memory,
	416	decompresses significantly faster, and has support for
	417	recovering data from damaged files.
	418
	419	Because _b_z_i_p_2 uses Huffman coding to construct the com-
	420	pressed bitstream, rather than the arithmetic coding used
	421	in 0.21, the compressed representations generated by the
	422	two programs are incompatible, and they will not interop-
	423	erate. The change in suffix from .bz to .bz2 reflects
	424	this. It would have been helpful to at least allow _b_z_i_p_2
	425	to decompress files created by 0.21, but this would defeat
	426	the primary aim of having a patent-free compressor.
	427
	428	Huffman coding necessarily involves some coding ineffi-
	429	ciency compared to arithmetic coding. This means that
	430	_b_z_i_p_2 compresses about 1% worse than 0.21, an unfortunate
	431	but unavoidable fact-of-life. On the other hand, decom-
	432	pression is approximately 50% faster for the same reason,
	433	and the change in file format gave an opportunity to add
	434	data-recovery features. So it is not all bad.
	435
	436
	437	AAUUTTHHOORR
	438	Julian Seward, jseward@acm.org.
	439
	440	The ideas embodied in _b_z_i_p and _b_z_i_p_2 are due to (at least)
	441	the following people: Michael Burrows and David Wheeler
	442	(for the block sorting transformation), David Wheeler
	443	(again, for the Huffman coder), Peter Fenwick (for the
	444	structured coding model in 0.21, and many refinements),
	445	and Alistair Moffat, Radford Neal and Ian Witten (for the
	446	arithmetic coder in 0.21). I am much indebted for their
	447	help, support and advice. See the file ALGORITHMS in the
	448	source distribution for pointers to sources of documenta-
	449	tion. Christian von Roques encouraged me to look for
	450	faster sorting algorithms, so as to speed up compression.
	451	Bela Lubkin encouraged me to improve the worst-case com-
	452	pression performance. Many people sent patches, helped
	453	with portability problems, lent machines, gave advice and
	454	were generally helpful.
	455
	456
	457
	458
	459
	460	7
	461
	462