bzip2-0.1bzip2-0.1

author: Julian Seward <jseward@acm.org> 1997-08-07 22:13:13 +0200
committer: Julian Seward <jseward@acm.org> 1997-08-07 22:13:13 +0200
commit: 33d134030248633ffa7d60c0a35a783c46da034b (patch)
tree: b760dc34185dccc7054989c1472478574223cc31 /bzip2.1
download: bzip2-33d134030248633ffa7d60c0a35a783c46da034b.tar.gz
bzip2-33d134030248633ffa7d60c0a35a783c46da034b.tar.bz2
bzip2-33d134030248633ffa7d60c0a35a783c46da034b.zip
1 files changed, 441 insertions, 0 deletions
diff --git a/bzip2.1 b/bzip2.1
new file mode 100644
index 0000000..9094c7c
--- /dev/null
+++ b/bzip2.1
@@ -0,0 +1,441 @@
+.PU
+.TH bzip2 1
+.SH NAME
+bzip2, bunzip2 \- a block-sorting file compressor, v0.1
+.br
+bzip2recover \- recovers data from damaged bzip2 files
+.SH SYNOPSIS
+.ll +8
+.B bzip2
+.RB [ " \-cdfkstvVL123456789 " ]
+[
+.I "filenames \&..."
+]
+.ll -8
+.br
+.B bunzip2
+.RB [ " \-kvsVL " ]
+[
+.I "filenames \&..."
+]
+.br
+.B bzip2recover
+.I "filename"
+.SH DESCRIPTION
+.I Bzip2
+compresses files using the Burrows-Wheeler block-sorting 
+text compression algorithm, and Huffman coding.
+Compression is generally considerably
+better than that 
+achieved by more conventional LZ77/LZ78-based compressors,
+and approaches the performance of the PPM family of statistical
+compressors.
+The command-line options are deliberately very similar to 
+those of 
+.I GNU Gzip,
+but they are not identical.
+.I Bzip2 
+expects a list of file names to accompany the command-line flags.  
+Each file is replaced by a compressed version of itself,
+with the name "original_name.bz2".
+Each compressed file has the same modification date and permissions
+as the corresponding original, so that these properties can be 
+correctly restored at decompression time.  File name handling is
+naive in the sense that there is no mechanism for preserving
+original file names, permissions and dates in filesystems 
+which lack these concepts, or have serious file name length
+restrictions, such as MS-DOS.
+.I Bzip2
+and
+.I bunzip2
+will not overwrite existing files; if you want this to happen,
+you should delete them first.
+If no file names are specified,
+.I bzip2
+compresses from standard input to standard output.
+In this case,
+.I bzip2
+will decline to write compressed output to a terminal, as
+this would be entirely incomprehensible and therefore pointless.
+.I Bunzip2
+(or
+.I bzip2 \-d
+) decompresses and restores all specified files whose names
+end in ".bz2".
+Files without this suffix are ignored.  
+Again, supplying no filenames
+causes decompression from standard input to standard output.
+You can also compress or decompress files to
+the standard output by giving the \-c flag.
+You can decompress multiple files like this, but you may
+only compress a single file this way, since it would otherwise
+be difficult to separate out the compressed representations of
+the original files.
+Compression is always performed, even if the compressed file is
+slightly larger than the original.  Files of less than about
+one hundred bytes tend to get larger, since the compression 
+mechanism has a constant overhead in the region of 50 bytes.
+Random data (including the output of most file compressors)
+is coded at about 8.05 bits per byte, giving an expansion of 
+around 0.5%.
+As a self-check for your protection,
+.I bzip2
+uses 32-bit CRCs to make sure that the decompressed
+version of a file is identical to the original.  
+This guards against corruption of the compressed data,
+and against undetected bugs in
+.I bzip2
+(hopefully very unlikely).
+The chances of data corruption going undetected is 
+microscopic, about one chance in four billion
+for each file processed.  Be aware, though, that the check
+occurs upon decompression, so it can only tell you that
+that something is wrong.  It can't help you recover the
+original uncompressed data.
+You can use
+.I bzip2recover
+to try to recover data from damaged files.
+Return values: 
+0 for a normal exit, 
+1 for environmental
+problems (file not found, invalid flags, I/O errors, &c),
+2 to indicate a corrupt compressed file,
+3 for an internal consistency error (eg, bug) which caused
+.I bzip2 
+to panic.
+.SH MEMORY MANAGEMENT
+.I Bzip2
+compresses large files in blocks.  The block size affects both the 
+compression ratio achieved, and the amount of memory needed both for
+compression and decompression.  The flags \-1 through \-9
+specify the block size to be 100,000 bytes through 900,000 bytes
+(the default) respectively.  At decompression-time, the block size used for
+compression is read from the header of the compressed file, and
+.I bunzip2
+then allocates itself just enough memory to decompress the file.
+Since block sizes are stored in compressed files, it follows that the flags
+\-1 to \-9
+are irrelevant to and so ignored during decompression.
+Compression and decompression requirements, in bytes, can be estimated as:
+      Compression:   400k + ( 7 x block size )
+      Decompression: 100k + ( 5 x block size ), or
+.br
+                     100k + ( 2.5 x block size )
+Larger block sizes give rapidly diminishing marginal returns; most
+of the 
+compression comes from the first two or three hundred k of block size,
+a fact worth bearing in mind when using 
+.I bzip2
+on small machines.  It is also important to appreciate that the
+decompression memory requirement is set at compression-time by the
+choice of block size.
+For files compressed with the default 900k block size, 
+.I bunzip2
+will require about 4600 kbytes to decompress.
+To support decompression of any file on a 4 megabyte machine,
+.I bunzip2
+has an option to decompress using approximately half this
+amount of memory, about 2300 kbytes.  Decompression speed is
+also halved, so you should use this option only where necessary.
+The relevant flag is \-s.
+In general, try and use the largest block size
+memory constraints allow, since that maximises the compression
+achieved.  Compression and decompression
+speed are virtually unaffected by block size.
+Another significant point applies to files which fit in a single
+block -- that means most files you'd encounter using a large 
+block size.  The amount of real memory touched is proportional
+to the size of the file, since the file is smaller than a block.
+For example, compressing a file 20,000 bytes long with the flag
+\-9
+will cause the compressor to allocate around
+6700k of memory, but only touch 400k + 20000 * 7 = 540
+kbytes of it.  Similarly, the decompressor will allocate 4600k but
+only touch 100k + 20000 * 5 = 200 kbytes.
+Here is a table which summarises the maximum memory usage for 
+different block sizes.  Also recorded is the total compressed
+size for 14 files of the Calgary Text Compression Corpus
+totalling 3,141,622 bytes.  This column gives some feel for how
+compression varies with block size.  These figures tend to understate
+the advantage of larger block sizes for larger files, since the
+Corpus is dominated by smaller files.
+           Compress   Decompress   Decompress   Corpus
+    Flag     usage      usage       -s usage     Size
+     -1      1100k       600k         350k      914704
+     -2      1800k      1100k         600k      877703
+     -3      2500k      1600k         850k      860338
+     -4      3200k      2100k        1100k      846899
+     -5      3900k      2600k        1350k      845160
+     -6      4600k      3100k        1600k      838626
+     -7      5400k      3600k        1850k      834096
+     -8      6000k      4100k        2100k      828642
+     -9      6700k      4600k        2350k      828642
+.SH OPTIONS
+.TP
+.B \-c  --stdout
+Compress or decompress to standard output.  \-c will decompress
+multiple files to stdout, but will only compress a single file to
+stdout.
+.TP
+.B \-d --decompress
+Force decompression.
+.I Bzip2
+and
+.I bunzip2
+are really the same program, and the decision about whether to
+compress or decompress is done on the basis of which name is
+used.  This flag overrides that mechanism, and forces
+.I bzip2
+to decompress.
+.TP 
+.B \-f --compress
+The complement to \-d: forces compression, regardless of the invokation
+name.
+.TP
+.B \-t --test
+Check integrity of the specified file(s), but don't decompress them.
+This really performs a trial decompression and throws away the result,
+using the low-memory decompression algorithm (see \-s).
+.TP
+.B \-k --keep
+Keep (don't delete) input files during compression or decompression.
+.TP
+.B \-s --small
+Reduce memory usage, both for compression and decompression.
+Files are decompressed using a modified algorithm which only
+requires 2.5 bytes per block byte.  This means any file can be
+decompressed in 2300k of memory, albeit somewhat more slowly than
+usual.
+During compression, -s selects a block size of 200k, which limits
+memory use to around the same figure, at the expense of your
+compression ratio.  In short, if your machine is low on memory
+(8 megabytes or less), use -s for everything.  See
+MEMORY MANAGEMENT above.
+.TP
+.B \-v --verbose
+Verbose mode -- show the compression ratio for each file processed.
+Further \-v's increase the verbosity level, spewing out lots of
+information which is primarily of interest for diagnostic purposes.
+.TP
+.B \-L --license
+Display the software version, license terms and conditions.
+.TP
+.B \-V --version
+Same as \-L.
+.TP
+.B \-1 to \-9 
+Set the block size to 100 k, 200 k .. 900 k when
+compressing.  Has no effect when decompressing.
+See MEMORY MANAGEMENT above.
+.TP
+.B \--repetitive-fast
+.I bzip2
+injects some small pseudo-random variations
+into very repetitive blocks to limit
+worst-case performance during compression.
+If sorting runs into difficulties, the block
+is randomised, and sorting is restarted.  
+Very roughly, 
+.I bzip2
+persists for three times as long as a well-behaved input
+would take before resorting to randomisation.
+This flag makes it give up much sooner.
+.TP
+.B \--repetitive-best
+Opposite of \--repetitive-fast; try a lot harder before 
+resorting to randomisation.
+.SH RECOVERING DATA FROM DAMAGED FILES
+.I bzip2
+compresses files in blocks, usually 900kbytes long.
+Each block is handled independently.  If a media or
+transmission error causes a multi-block .bz2 
+file to become damaged,
+it may be possible to recover data from the undamaged blocks
+in the file.  
+The compressed representation of each block is delimited by
+a 48-bit pattern, which makes it possible to find the block
+boundaries with reasonable certainty.  Each block also carries
+its own 32-bit CRC, so damaged blocks can be
+distinguished from undamaged ones.
+.I bzip2recover
+is a simple program whose purpose is to search for 
+blocks in .bz2 files, and write each block out into
+its own .bz2 file.  You can then use
+.I bzip2 -t
+to test the integrity of the resulting files, 
+and decompress those which are undamaged.
+.I bzip2recover
+takes a single argument, the name of the damaged file,
+and writes a number of files "rec0001file.bz2", "rec0002file.bz2",
+etc, containing the extracted blocks.  The output filenames
+are designed so that the use of wildcards in subsequent processing
+-- for example, "bzip2 -dc rec*file.bz2 > recovered_data" --
+lists the files in the "right" order.
+.I bzip2recover
+should be of most use dealing with large .bz2 files, as
+these will contain many blocks.  It is clearly futile to
+use it on damaged single-block files, since a damaged
+block cannot be recovered.  If you wish to minimise 
+any potential data loss through media or transmission
+errors, you might consider compressing with a smaller
+block size.
+.SH PERFORMANCE NOTES
+The sorting phase of compression gathers together similar strings
+in the file.  Because of this, files containing very long 
+runs of repeated symbols, like "aabaabaabaab ..." (repeated
+several hundred times) may compress extraordinarily slowly.
+You can use the
+\-vvvvv 
+option to monitor progress in great detail, if you want.
+Decompression speed is unaffected.
+Such pathological cases
+seem rare in practice, appearing mostly in artificially-constructed
+test files, and in low-level disk images.  It may be inadvisable to
+use 
+.I bzip2
+to compress the latter.  
+If you do get a file which causes severe slowness in compression,
+try making the block size as small as possible, with flag \-1.
+Incompressible or virtually-incompressible data may decompress
+rather more slowly than one would hope.  This is due to 
+a naive implementation of the move-to-front coder.
+.I bzip2
+usually allocates several megabytes of memory to operate in,
+and then charges all over it in a fairly random fashion.  This
+means that performance, both for compressing and decompressing,
+is largely determined by the speed
+at which your machine can service cache misses.  
+Because of this, small changes
+to the code to reduce the miss rate have been observed to give
+disproportionately large performance improvements.
+I imagine 
+.I bzip2
+will perform best on machines with very large caches.
+Test mode (\-t) uses the low-memory decompression algorithm
+(\-s).  This means test mode does not run as fast as it could;
+it could run as fast as the normal decompression machinery.
+This could easily be fixed at the cost of some code bloat.
+.SH CAVEATS
+I/O error messages are not as helpful as they could be.
+.I Bzip2
+tries hard to detect I/O errors and exit cleanly, but the
+details of what the problem is sometimes seem rather misleading.
+This manual page pertains to version 0.1 of 
+.I bzip2.  
+It may well happen that some future version will
+use a different compressed file format.  If you try to 
+decompress, using 0.1, a .bz2 file created with some
+future version which uses a different compressed file format,
+0.1 will complain that your file "is not a bzip2 file".
+If that happens, you should obtain a more recent version
+of 
+.I bzip2
+and use that to decompress the file.
+Wildcard expansion for Windows 95 and NT 
+is flaky.
+.I bzip2recover
+uses 32-bit integers to represent bit positions in
+compressed files, so it cannot handle compressed files
+more than 512 megabytes long.  This could easily be fixed.
+.I bzip2recover
+sometimes reports a very small, incomplete final block.
+This is spurious and can be safely ignored.
+.SH RELATIONSHIP TO bzip-0.21
+This program is a descendant of the 
+.I bzip
+program, version 0.21, which I released in August 1996.  
+The primary difference of
+.I bzip2
+is its avoidance of the possibly patented algorithms
+which were used in 0.21.  
+.I bzip2
+also brings various useful refinements (\-s, \-t),
+uses less memory, decompresses significantly faster, and
+has support for recovering data from damaged files.
+Because
+.I bzip2
+uses Huffman coding to construct the compressed bitstream,
+rather than the arithmetic coding used in 0.21,
+the compressed representations generated by the two programs
+are incompatible, and they will not interoperate.  The change
+in suffix from .bz to .bz2 reflects this.  It would have been
+helpful to at least allow
+.I bzip2
+to decompress files created by 0.21, but this would
+defeat the primary aim of having a patent-free compressor.
+Huffman coding necessarily involves some coding inefficiency
+compared to arithmetic coding.  This means that
+.I bzip2
+compresses about 1% worse than 0.21, an unfortunate but
+unavoidable fact-of-life.  On the other hand, decompression
+is approximately 50% faster for the same reason, and the
+change in file format gave an opportunity to add data-recovery
+features.  So it is not all bad.
+.SH AUTHOR
+Julian Seward, jseward@acm.org.
+The ideas embodied in 
+.I bzip
+and
+.I bzip2
+are due to (at least) the following people:
+Michael Burrows and David Wheeler (for the block sorting
+transformation), David Wheeler (again, for the Huffman coder),
+Peter Fenwick (for the structured coding model in 0.21, 
+and many refinements),
+and
+Alistair Moffat, Radford Neal and Ian Witten (for the arithmetic
+coder in 0.21).  I am much indebted for their help, support and advice.
+See the file ALGORITHMS in the source distribution for pointers to
+sources of documentation.
+Christian von Roques encouraged me to look for faster
+sorting algorithms, so as to speed up compression.
+Bela Lubkin encouraged me to improve the worst-case
+compression performance.
+Many people sent patches, helped with portability problems,
+lent machines, gave advice and were generally helpful.
author	Julian Seward <jseward@acm.org>	1997-08-07 22:13:13 +0200
committer	Julian Seward <jseward@acm.org>	1997-08-07 22:13:13 +0200
commit	33d134030248633ffa7d60c0a35a783c46da034b (patch)
tree	b760dc34185dccc7054989c1472478574223cc31 /bzip2.1
download	bzip2-33d134030248633ffa7d60c0a35a783c46da034b.tar.gz bzip2-33d134030248633ffa7d60c0a35a783c46da034b.tar.bz2 bzip2-33d134030248633ffa7d60c0a35a783c46da034b.zip

diff --git a/bzip2.1 b/bzip2.1 new file mode 100644 index 0000000..9094c7c --- /dev/null +++ b/bzip2.1
@@ -0,0 +1,441 @@
	1	.PU
	2	.TH bzip2 1
	3	.SH NAME
	4	bzip2, bunzip2 \- a block-sorting file compressor, v0.1
	5	.br
	6	bzip2recover \- recovers data from damaged bzip2 files
	7
	8	.SH SYNOPSIS
	9	.ll +8
	10	.B bzip2
	11	.RB [ " \-cdfkstvVL123456789 " ]
	12	[
	13	.I "filenames \&..."
	14	]
	15	.ll -8
	16	.br
	17	.B bunzip2
	18	.RB [ " \-kvsVL " ]
	19	[
	20	.I "filenames \&..."
	21	]
	22	.br
	23	.B bzip2recover
	24	.I "filename"
	25
	26	.SH DESCRIPTION
	27	.I Bzip2
	28	compresses files using the Burrows-Wheeler block-sorting
	29	text compression algorithm, and Huffman coding.
	30	Compression is generally considerably
	31	better than that
	32	achieved by more conventional LZ77/LZ78-based compressors,
	33	and approaches the performance of the PPM family of statistical
	34	compressors.
	35
	36	The command-line options are deliberately very similar to
	37	those of
	38	.I GNU Gzip,
	39	but they are not identical.
	40
	41	.I Bzip2
	42	expects a list of file names to accompany the command-line flags.
	43	Each file is replaced by a compressed version of itself,
	44	with the name "original_name.bz2".
	45	Each compressed file has the same modification date and permissions
	46	as the corresponding original, so that these properties can be
	47	correctly restored at decompression time. File name handling is
	48	naive in the sense that there is no mechanism for preserving
	49	original file names, permissions and dates in filesystems
	50	which lack these concepts, or have serious file name length
	51	restrictions, such as MS-DOS.
	52
	53	.I Bzip2
	54	and
	55	.I bunzip2
	56	will not overwrite existing files; if you want this to happen,
	57	you should delete them first.
	58
	59	If no file names are specified,
	60	.I bzip2
	61	compresses from standard input to standard output.
	62	In this case,
	63	.I bzip2
	64	will decline to write compressed output to a terminal, as
	65	this would be entirely incomprehensible and therefore pointless.
	66
	67	.I Bunzip2
	68	(or
	69	.I bzip2 \-d
	70	) decompresses and restores all specified files whose names
	71	end in ".bz2".
	72	Files without this suffix are ignored.
	73	Again, supplying no filenames
	74	causes decompression from standard input to standard output.
	75
	76	You can also compress or decompress files to
	77	the standard output by giving the \-c flag.
	78	You can decompress multiple files like this, but you may
	79	only compress a single file this way, since it would otherwise
	80	be difficult to separate out the compressed representations of
	81	the original files.
	82
	83	Compression is always performed, even if the compressed file is
	84	slightly larger than the original. Files of less than about
	85	one hundred bytes tend to get larger, since the compression
	86	mechanism has a constant overhead in the region of 50 bytes.
	87	Random data (including the output of most file compressors)
	88	is coded at about 8.05 bits per byte, giving an expansion of
	89	around 0.5%.
	90
	91	As a self-check for your protection,
	92	.I bzip2
	93	uses 32-bit CRCs to make sure that the decompressed
	94	version of a file is identical to the original.
	95	This guards against corruption of the compressed data,
	96	and against undetected bugs in
	97	.I bzip2
	98	(hopefully very unlikely).
	99	The chances of data corruption going undetected is
	100	microscopic, about one chance in four billion
	101	for each file processed. Be aware, though, that the check
	102	occurs upon decompression, so it can only tell you that
	103	that something is wrong. It can't help you recover the
	104	original uncompressed data.
	105	You can use
	106	.I bzip2recover
	107	to try to recover data from damaged files.
	108
	109	Return values:
	110	0 for a normal exit,
	111	1 for environmental
	112	problems (file not found, invalid flags, I/O errors, &c),
	113	2 to indicate a corrupt compressed file,
	114	3 for an internal consistency error (eg, bug) which caused
	115	.I bzip2
	116	to panic.
	117
	118	.SH MEMORY MANAGEMENT
	119	.I Bzip2
	120	compresses large files in blocks. The block size affects both the
	121	compression ratio achieved, and the amount of memory needed both for
	122	compression and decompression. The flags \-1 through \-9
	123	specify the block size to be 100,000 bytes through 900,000 bytes
	124	(the default) respectively. At decompression-time, the block size used for
	125	compression is read from the header of the compressed file, and
	126	.I bunzip2
	127	then allocates itself just enough memory to decompress the file.
	128	Since block sizes are stored in compressed files, it follows that the flags
	129	\-1 to \-9
	130	are irrelevant to and so ignored during decompression.
	131	Compression and decompression requirements, in bytes, can be estimated as:
	132
	133	Compression: 400k + ( 7 x block size )
	134
	135	Decompression: 100k + ( 5 x block size ), or
	136	.br
	137	100k + ( 2.5 x block size )
	138
	139	Larger block sizes give rapidly diminishing marginal returns; most
	140	of the
	141	compression comes from the first two or three hundred k of block size,
	142	a fact worth bearing in mind when using
	143	.I bzip2
	144	on small machines. It is also important to appreciate that the
	145	decompression memory requirement is set at compression-time by the
	146	choice of block size.
	147
	148	For files compressed with the default 900k block size,
	149	.I bunzip2
	150	will require about 4600 kbytes to decompress.
	151	To support decompression of any file on a 4 megabyte machine,
	152	.I bunzip2
	153	has an option to decompress using approximately half this
	154	amount of memory, about 2300 kbytes. Decompression speed is
	155	also halved, so you should use this option only where necessary.
	156	The relevant flag is \-s.
	157
	158	In general, try and use the largest block size
	159	memory constraints allow, since that maximises the compression
	160	achieved. Compression and decompression
	161	speed are virtually unaffected by block size.
	162
	163	Another significant point applies to files which fit in a single
	164	block -- that means most files you'd encounter using a large
	165	block size. The amount of real memory touched is proportional
	166	to the size of the file, since the file is smaller than a block.
	167	For example, compressing a file 20,000 bytes long with the flag
	168	\-9
	169	will cause the compressor to allocate around
	170	6700k of memory, but only touch 400k + 20000 * 7 = 540
	171	kbytes of it. Similarly, the decompressor will allocate 4600k but
	172	only touch 100k + 20000 * 5 = 200 kbytes.
	173
	174	Here is a table which summarises the maximum memory usage for
	175	different block sizes. Also recorded is the total compressed
	176	size for 14 files of the Calgary Text Compression Corpus
	177	totalling 3,141,622 bytes. This column gives some feel for how
	178	compression varies with block size. These figures tend to understate
	179	the advantage of larger block sizes for larger files, since the
	180	Corpus is dominated by smaller files.
	181
	182	Compress Decompress Decompress Corpus
	183	Flag usage usage -s usage Size
	184
	185	-1 1100k 600k 350k 914704
	186	-2 1800k 1100k 600k 877703
	187	-3 2500k 1600k 850k 860338
	188	-4 3200k 2100k 1100k 846899
	189	-5 3900k 2600k 1350k 845160
	190	-6 4600k 3100k 1600k 838626
	191	-7 5400k 3600k 1850k 834096
	192	-8 6000k 4100k 2100k 828642
	193	-9 6700k 4600k 2350k 828642
	194
	195	.SH OPTIONS
	196	.TP
	197	.B \-c --stdout
	198	Compress or decompress to standard output. \-c will decompress
	199	multiple files to stdout, but will only compress a single file to
	200	stdout.
	201	.TP
	202	.B \-d --decompress
	203	Force decompression.
	204	.I Bzip2
	205	and
	206	.I bunzip2
	207	are really the same program, and the decision about whether to
	208	compress or decompress is done on the basis of which name is
	209	used. This flag overrides that mechanism, and forces
	210	.I bzip2
	211	to decompress.
	212	.TP
	213	.B \-f --compress
	214	The complement to \-d: forces compression, regardless of the invokation
	215	name.
	216	.TP
	217	.B \-t --test
	218	Check integrity of the specified file(s), but don't decompress them.
	219	This really performs a trial decompression and throws away the result,
	220	using the low-memory decompression algorithm (see \-s).
	221	.TP
	222	.B \-k --keep
	223	Keep (don't delete) input files during compression or decompression.
	224	.TP
	225	.B \-s --small
	226	Reduce memory usage, both for compression and decompression.
	227	Files are decompressed using a modified algorithm which only
	228	requires 2.5 bytes per block byte. This means any file can be
	229	decompressed in 2300k of memory, albeit somewhat more slowly than
	230	usual.
	231
	232	During compression, -s selects a block size of 200k, which limits
	233	memory use to around the same figure, at the expense of your
	234	compression ratio. In short, if your machine is low on memory
	235	(8 megabytes or less), use -s for everything. See
	236	MEMORY MANAGEMENT above.
	237
	238	.TP
	239	.B \-v --verbose
	240	Verbose mode -- show the compression ratio for each file processed.
	241	Further \-v's increase the verbosity level, spewing out lots of
	242	information which is primarily of interest for diagnostic purposes.
	243	.TP
	244	.B \-L --license
	245	Display the software version, license terms and conditions.
	246	.TP
	247	.B \-V --version
	248	Same as \-L.
	249	.TP
	250	.B \-1 to \-9
	251	Set the block size to 100 k, 200 k .. 900 k when
	252	compressing. Has no effect when decompressing.
	253	See MEMORY MANAGEMENT above.
	254	.TP
	255	.B \--repetitive-fast
	256	.I bzip2
	257	injects some small pseudo-random variations
	258	into very repetitive blocks to limit
	259	worst-case performance during compression.
	260	If sorting runs into difficulties, the block
	261	is randomised, and sorting is restarted.
	262	Very roughly,
	263	.I bzip2
	264	persists for three times as long as a well-behaved input
	265	would take before resorting to randomisation.
	266	This flag makes it give up much sooner.
	267
	268	.TP
	269	.B \--repetitive-best
	270	Opposite of \--repetitive-fast; try a lot harder before
	271	resorting to randomisation.
	272
	273	.SH RECOVERING DATA FROM DAMAGED FILES
	274	.I bzip2
	275	compresses files in blocks, usually 900kbytes long.
	276	Each block is handled independently. If a media or
	277	transmission error causes a multi-block .bz2
	278	file to become damaged,
	279	it may be possible to recover data from the undamaged blocks
	280	in the file.
	281
	282	The compressed representation of each block is delimited by
	283	a 48-bit pattern, which makes it possible to find the block
	284	boundaries with reasonable certainty. Each block also carries
	285	its own 32-bit CRC, so damaged blocks can be
	286	distinguished from undamaged ones.
	287
	288	.I bzip2recover
	289	is a simple program whose purpose is to search for
	290	blocks in .bz2 files, and write each block out into
	291	its own .bz2 file. You can then use
	292	.I bzip2 -t
	293	to test the integrity of the resulting files,
	294	and decompress those which are undamaged.
	295
	296	.I bzip2recover
	297	takes a single argument, the name of the damaged file,
	298	and writes a number of files "rec0001file.bz2", "rec0002file.bz2",
	299	etc, containing the extracted blocks. The output filenames
	300	are designed so that the use of wildcards in subsequent processing
	301	-- for example, "bzip2 -dc rec*file.bz2 > recovered_data" --
	302	lists the files in the "right" order.
	303
	304	.I bzip2recover
	305	should be of most use dealing with large .bz2 files, as
	306	these will contain many blocks. It is clearly futile to
	307	use it on damaged single-block files, since a damaged
	308	block cannot be recovered. If you wish to minimise
	309	any potential data loss through media or transmission
	310	errors, you might consider compressing with a smaller
	311	block size.
	312
	313	.SH PERFORMANCE NOTES
	314	The sorting phase of compression gathers together similar strings
	315	in the file. Because of this, files containing very long
	316	runs of repeated symbols, like "aabaabaabaab ..." (repeated
	317	several hundred times) may compress extraordinarily slowly.
	318	You can use the
	319	\-vvvvv
	320	option to monitor progress in great detail, if you want.
	321	Decompression speed is unaffected.
	322
	323	Such pathological cases
	324	seem rare in practice, appearing mostly in artificially-constructed
	325	test files, and in low-level disk images. It may be inadvisable to
	326	use
	327	.I bzip2
	328	to compress the latter.
	329	If you do get a file which causes severe slowness in compression,
	330	try making the block size as small as possible, with flag \-1.
	331
	332	Incompressible or virtually-incompressible data may decompress
	333	rather more slowly than one would hope. This is due to
	334	a naive implementation of the move-to-front coder.
	335
	336	.I bzip2
	337	usually allocates several megabytes of memory to operate in,
	338	and then charges all over it in a fairly random fashion. This
	339	means that performance, both for compressing and decompressing,
	340	is largely determined by the speed
	341	at which your machine can service cache misses.
	342	Because of this, small changes
	343	to the code to reduce the miss rate have been observed to give
	344	disproportionately large performance improvements.
	345	I imagine
	346	.I bzip2
	347	will perform best on machines with very large caches.
	348
	349	Test mode (\-t) uses the low-memory decompression algorithm
	350	(\-s). This means test mode does not run as fast as it could;
	351	it could run as fast as the normal decompression machinery.
	352	This could easily be fixed at the cost of some code bloat.
	353
	354	.SH CAVEATS
	355	I/O error messages are not as helpful as they could be.
	356	.I Bzip2
	357	tries hard to detect I/O errors and exit cleanly, but the
	358	details of what the problem is sometimes seem rather misleading.
	359
	360	This manual page pertains to version 0.1 of
	361	.I bzip2.
	362	It may well happen that some future version will
	363	use a different compressed file format. If you try to
	364	decompress, using 0.1, a .bz2 file created with some
	365	future version which uses a different compressed file format,
	366	0.1 will complain that your file "is not a bzip2 file".
	367	If that happens, you should obtain a more recent version
	368	of
	369	.I bzip2
	370	and use that to decompress the file.
	371
	372	Wildcard expansion for Windows 95 and NT
	373	is flaky.
	374
	375	.I bzip2recover
	376	uses 32-bit integers to represent bit positions in
	377	compressed files, so it cannot handle compressed files
	378	more than 512 megabytes long. This could easily be fixed.
	379
	380	.I bzip2recover
	381	sometimes reports a very small, incomplete final block.
	382	This is spurious and can be safely ignored.
	383
	384	.SH RELATIONSHIP TO bzip-0.21
	385	This program is a descendant of the
	386	.I bzip
	387	program, version 0.21, which I released in August 1996.
	388	The primary difference of
	389	.I bzip2
	390	is its avoidance of the possibly patented algorithms
	391	which were used in 0.21.
	392	.I bzip2
	393	also brings various useful refinements (\-s, \-t),
	394	uses less memory, decompresses significantly faster, and
	395	has support for recovering data from damaged files.
	396
	397	Because
	398	.I bzip2
	399	uses Huffman coding to construct the compressed bitstream,
	400	rather than the arithmetic coding used in 0.21,
	401	the compressed representations generated by the two programs
	402	are incompatible, and they will not interoperate. The change
	403	in suffix from .bz to .bz2 reflects this. It would have been
	404	helpful to at least allow
	405	.I bzip2
	406	to decompress files created by 0.21, but this would
	407	defeat the primary aim of having a patent-free compressor.
	408
	409	Huffman coding necessarily involves some coding inefficiency
	410	compared to arithmetic coding. This means that
	411	.I bzip2
	412	compresses about 1% worse than 0.21, an unfortunate but
	413	unavoidable fact-of-life. On the other hand, decompression
	414	is approximately 50% faster for the same reason, and the
	415	change in file format gave an opportunity to add data-recovery
	416	features. So it is not all bad.
	417
	418	.SH AUTHOR
	419	Julian Seward, jseward@acm.org.
	420
	421	The ideas embodied in
	422	.I bzip
	423	and
	424	.I bzip2
	425	are due to (at least) the following people:
	426	Michael Burrows and David Wheeler (for the block sorting
	427	transformation), David Wheeler (again, for the Huffman coder),
	428	Peter Fenwick (for the structured coding model in 0.21,
	429	and many refinements),
	430	and
	431	Alistair Moffat, Radford Neal and Ian Witten (for the arithmetic
	432	coder in 0.21). I am much indebted for their help, support and advice.
	433	See the file ALGORITHMS in the source distribution for pointers to
	434	sources of documentation.
	435	Christian von Roques encouraged me to look for faster
	436	sorting algorithms, so as to speed up compression.
	437	Bela Lubkin encouraged me to improve the worst-case
	438	compression performance.
	439	Many people sent patches, helped with portability problems,
	440	lent machines, gave advice and were generally helpful.
	441