diff options
author | Julian Seward <jseward@acm.org> | 2005-02-15 22:13:13 +0100 |
---|---|---|
committer | Julian Seward <jseward@acm.org> | 2005-02-15 22:13:13 +0100 |
commit | 4d540bfc95a4b0eefc1d1f388ec33534aaeb3a2f (patch) | |
tree | 3b7e9c650b4c61d114e1716c4698e40d5c8d7ef7 /manual.xml | |
parent | 099d844292f60f9d58914da29e5773204dc55e7a (diff) | |
download | bzip2-4d540bfc95a4b0eefc1d1f388ec33534aaeb3a2f.tar.gz bzip2-4d540bfc95a4b0eefc1d1f388ec33534aaeb3a2f.tar.bz2 bzip2-4d540bfc95a4b0eefc1d1f388ec33534aaeb3a2f.zip |
bzip2-1.0.3bzip2-1.0.3
Diffstat (limited to 'manual.xml')
-rw-r--r-- | manual.xml | 2966 |
1 files changed, 2966 insertions, 0 deletions
diff --git a/manual.xml b/manual.xml new file mode 100644 index 0000000..1ab5bd7 --- /dev/null +++ b/manual.xml | |||
@@ -0,0 +1,2966 @@ | |||
1 | <?xml version="1.0"?> <!-- -*- sgml -*- --> | ||
2 | <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN" | ||
3 | "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"[ | ||
4 | |||
5 | <!-- various strings, dates etc. common to all docs --> | ||
6 | <!ENTITY % common-ents SYSTEM "entities.xml"> %common-ents; | ||
7 | ]> | ||
8 | |||
9 | <book lang="en" id="userman" xreflabel="bzip2 Manual"> | ||
10 | |||
11 | <bookinfo> | ||
12 | <title>bzip2 and libbzip2, version 1.0.3</title> | ||
13 | <subtitle>A program and library for data compression</subtitle> | ||
14 | <copyright> | ||
15 | <year>&bz-lifespan;</year> | ||
16 | <holder>Julian Seward</holder> | ||
17 | </copyright> | ||
18 | <releaseinfo>Version &bz-version; of &bz-date;</releaseinfo> | ||
19 | |||
20 | <authorgroup> | ||
21 | <author> | ||
22 | <firstname>Julian</firstname> | ||
23 | <surname>Seward</surname> | ||
24 | <affiliation> | ||
25 | <orgname>&bz-url;</orgname> | ||
26 | </affiliation> | ||
27 | </author> | ||
28 | </authorgroup> | ||
29 | |||
30 | <legalnotice> | ||
31 | |||
32 | <para>This program, <computeroutput>bzip2</computeroutput>, the | ||
33 | associated library <computeroutput>libbzip2</computeroutput>, and | ||
34 | all documentation, are copyright © &bz-lifespan; Julian Seward. | ||
35 | All rights reserved.</para> | ||
36 | |||
37 | <para>Redistribution and use in source and binary forms, with | ||
38 | or without modification, are permitted provided that the | ||
39 | following conditions are met:</para> | ||
40 | |||
41 | <itemizedlist mark='bullet'> | ||
42 | |||
43 | <listitem><para>Redistributions of source code must retain the | ||
44 | above copyright notice, this list of conditions and the | ||
45 | following disclaimer.</para></listitem> | ||
46 | |||
47 | <listitem><para>The origin of this software must not be | ||
48 | misrepresented; you must not claim that you wrote the original | ||
49 | software. If you use this software in a product, an | ||
50 | acknowledgment in the product documentation would be | ||
51 | appreciated but is not required.</para></listitem> | ||
52 | |||
53 | <listitem><para>Altered source versions must be plainly marked | ||
54 | as such, and must not be misrepresented as being the original | ||
55 | software.</para></listitem> | ||
56 | |||
57 | <listitem><para>The name of the author may not be used to | ||
58 | endorse or promote products derived from this software without | ||
59 | specific prior written permission.</para></listitem> | ||
60 | |||
61 | </itemizedlist> | ||
62 | |||
63 | <para>THIS SOFTWARE IS PROVIDED BY THE AUTHOR "AS IS" AND ANY | ||
64 | EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, | ||
65 | THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A | ||
66 | PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE | ||
67 | AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, | ||
68 | EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED | ||
69 | TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, | ||
70 | DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND | ||
71 | ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT | ||
72 | LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING | ||
73 | IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF | ||
74 | THE POSSIBILITY OF SUCH DAMAGE.</para> | ||
75 | |||
76 | <para>PATENTS: To the best of my knowledge, | ||
77 | <computeroutput>bzip2</computeroutput> and | ||
78 | <computeroutput>libbzip2</computeroutput> do not use any patented | ||
79 | algorithms. However, I do not have the resources to carry | ||
80 | out a patent search. Therefore I cannot give any guarantee of | ||
81 | the above statement. | ||
82 | </para> | ||
83 | |||
84 | </legalnotice> | ||
85 | |||
86 | </bookinfo> | ||
87 | |||
88 | |||
89 | |||
90 | <chapter id="intro" xreflabel="Introduction"> | ||
91 | <title>Introduction</title> | ||
92 | |||
93 | <para><computeroutput>bzip2</computeroutput> compresses files | ||
94 | using the Burrows-Wheeler block-sorting text compression | ||
95 | algorithm, and Huffman coding. Compression is generally | ||
96 | considerably better than that achieved by more conventional | ||
97 | LZ77/LZ78-based compressors, and approaches the performance of | ||
98 | the PPM family of statistical compressors.</para> | ||
99 | |||
100 | <para><computeroutput>bzip2</computeroutput> is built on top of | ||
101 | <computeroutput>libbzip2</computeroutput>, a flexible library for | ||
102 | handling compressed data in the | ||
103 | <computeroutput>bzip2</computeroutput> format. This manual | ||
104 | describes both how to use the program and how to work with the | ||
105 | library interface. Most of the manual is devoted to this | ||
106 | library, not the program, which is good news if your interest is | ||
107 | only in the program.</para> | ||
108 | |||
109 | <itemizedlist mark='bullet'> | ||
110 | |||
111 | <listitem><para><xref linkend="using"/> describes how to use | ||
112 | <computeroutput>bzip2</computeroutput>; this is the only part | ||
113 | you need to read if you just want to know how to operate the | ||
114 | program.</para></listitem> | ||
115 | |||
116 | <listitem><para><xref linkend="libprog"/> describes the | ||
117 | programming interfaces in detail, and</para></listitem> | ||
118 | |||
119 | <listitem><para><xref linkend="misc"/> records some | ||
120 | miscellaneous notes which I thought ought to be recorded | ||
121 | somewhere.</para></listitem> | ||
122 | |||
123 | </itemizedlist> | ||
124 | |||
125 | </chapter> | ||
126 | |||
127 | |||
128 | <chapter id="using" xreflabel="How to use bzip2"> | ||
129 | <title>How to use bzip2</title> | ||
130 | |||
131 | <para>This chapter contains a copy of the | ||
132 | <computeroutput>bzip2</computeroutput> man page, and nothing | ||
133 | else.</para> | ||
134 | |||
135 | <sect1 id="name" xreflabel="NAME"> | ||
136 | <title>NAME</title> | ||
137 | |||
138 | <itemizedlist mark='bullet'> | ||
139 | |||
140 | <listitem><para><computeroutput>bzip2</computeroutput>, | ||
141 | <computeroutput>bunzip2</computeroutput> - a block-sorting file | ||
142 | compressor, v1.0.3</para></listitem> | ||
143 | |||
144 | <listitem><para><computeroutput>bzcat</computeroutput> - | ||
145 | decompresses files to stdout</para></listitem> | ||
146 | |||
147 | <listitem><para><computeroutput>bzip2recover</computeroutput> - | ||
148 | recovers data from damaged bzip2 files</para></listitem> | ||
149 | |||
150 | </itemizedlist> | ||
151 | |||
152 | </sect1> | ||
153 | |||
154 | |||
155 | <sect1 id="synopsis" xreflabel="SYNOPSIS"> | ||
156 | <title>SYNOPSIS</title> | ||
157 | |||
158 | <itemizedlist mark='bullet'> | ||
159 | |||
160 | <listitem><para><computeroutput>bzip2</computeroutput> [ | ||
161 | -cdfkqstvzVL123456789 ] [ filenames ... ]</para></listitem> | ||
162 | |||
163 | <listitem><para><computeroutput>bunzip2</computeroutput> [ | ||
164 | -fkvsVL ] [ filenames ... ]</para></listitem> | ||
165 | |||
166 | <listitem><para><computeroutput>bzcat</computeroutput> [ -s ] [ | ||
167 | filenames ... ]</para></listitem> | ||
168 | |||
169 | <listitem><para><computeroutput>bzip2recover</computeroutput> | ||
170 | filename</para></listitem> | ||
171 | |||
172 | </itemizedlist> | ||
173 | |||
174 | </sect1> | ||
175 | |||
176 | |||
177 | <sect1 id="description" xreflabel="DESCRIPTION"> | ||
178 | <title>DESCRIPTION</title> | ||
179 | |||
180 | <para><computeroutput>bzip2</computeroutput> compresses files | ||
181 | using the Burrows-Wheeler block sorting text compression | ||
182 | algorithm, and Huffman coding. Compression is generally | ||
183 | considerably better than that achieved by more conventional | ||
184 | LZ77/LZ78-based compressors, and approaches the performance of | ||
185 | the PPM family of statistical compressors.</para> | ||
186 | |||
187 | <para>The command-line options are deliberately very similar to | ||
188 | those of GNU <computeroutput>gzip</computeroutput>, but they are | ||
189 | not identical.</para> | ||
190 | |||
191 | <para><computeroutput>bzip2</computeroutput> expects a list of | ||
192 | file names to accompany the command-line flags. Each file is | ||
193 | replaced by a compressed version of itself, with the name | ||
194 | <computeroutput>original_name.bz2</computeroutput>. Each | ||
195 | compressed file has the same modification date, permissions, and, | ||
196 | when possible, ownership as the corresponding original, so that | ||
197 | these properties can be correctly restored at decompression time. | ||
198 | File name handling is naive in the sense that there is no | ||
199 | mechanism for preserving original file names, permissions, | ||
200 | ownerships or dates in filesystems which lack these concepts, or | ||
201 | have serious file name length restrictions, such as | ||
202 | MS-DOS.</para> | ||
203 | |||
204 | <para><computeroutput>bzip2</computeroutput> and | ||
205 | <computeroutput>bunzip2</computeroutput> will by default not | ||
206 | overwrite existing files. If you want this to happen, specify | ||
207 | the <computeroutput>-f</computeroutput> flag.</para> | ||
208 | |||
209 | <para>If no file names are specified, | ||
210 | <computeroutput>bzip2</computeroutput> compresses from standard | ||
211 | input to standard output. In this case, | ||
212 | <computeroutput>bzip2</computeroutput> will decline to write | ||
213 | compressed output to a terminal, as this would be entirely | ||
214 | incomprehensible and therefore pointless.</para> | ||
215 | |||
216 | <para><computeroutput>bunzip2</computeroutput> (or | ||
217 | <computeroutput>bzip2 -d</computeroutput>) decompresses all | ||
218 | specified files. Files which were not created by | ||
219 | <computeroutput>bzip2</computeroutput> will be detected and | ||
220 | ignored, and a warning issued. | ||
221 | <computeroutput>bzip2</computeroutput> attempts to guess the | ||
222 | filename for the decompressed file from that of the compressed | ||
223 | file as follows:</para> | ||
224 | |||
225 | <itemizedlist mark='bullet'> | ||
226 | |||
227 | <listitem><para><computeroutput>filename.bz2 </computeroutput> | ||
228 | becomes | ||
229 | <computeroutput>filename</computeroutput></para></listitem> | ||
230 | |||
231 | <listitem><para><computeroutput>filename.bz </computeroutput> | ||
232 | becomes | ||
233 | <computeroutput>filename</computeroutput></para></listitem> | ||
234 | |||
235 | <listitem><para><computeroutput>filename.tbz2</computeroutput> | ||
236 | becomes | ||
237 | <computeroutput>filename.tar</computeroutput></para></listitem> | ||
238 | |||
239 | <listitem><para><computeroutput>filename.tbz </computeroutput> | ||
240 | becomes | ||
241 | <computeroutput>filename.tar</computeroutput></para></listitem> | ||
242 | |||
243 | <listitem><para><computeroutput>anyothername </computeroutput> | ||
244 | becomes | ||
245 | <computeroutput>anyothername.out</computeroutput></para></listitem> | ||
246 | |||
247 | </itemizedlist> | ||
248 | |||
249 | <para>If the file does not end in one of the recognised endings, | ||
250 | <computeroutput>.bz2</computeroutput>, | ||
251 | <computeroutput>.bz</computeroutput>, | ||
252 | <computeroutput>.tbz2</computeroutput> or | ||
253 | <computeroutput>.tbz</computeroutput>, | ||
254 | <computeroutput>bzip2</computeroutput> complains that it cannot | ||
255 | guess the name of the original file, and uses the original name | ||
256 | with <computeroutput>.out</computeroutput> appended.</para> | ||
257 | |||
258 | <para>As with compression, supplying no filenames causes | ||
259 | decompression from standard input to standard output.</para> | ||
260 | |||
261 | <para><computeroutput>bunzip2</computeroutput> will correctly | ||
262 | decompress a file which is the concatenation of two or more | ||
263 | compressed files. The result is the concatenation of the | ||
264 | corresponding uncompressed files. Integrity testing | ||
265 | (<computeroutput>-t</computeroutput>) of concatenated compressed | ||
266 | files is also supported.</para> | ||
267 | |||
268 | <para>You can also compress or decompress files to the standard | ||
269 | output by giving the <computeroutput>-c</computeroutput> flag. | ||
270 | Multiple files may be compressed and decompressed like this. The | ||
271 | resulting outputs are fed sequentially to stdout. Compression of | ||
272 | multiple files in this manner generates a stream containing | ||
273 | multiple compressed file representations. Such a stream can be | ||
274 | decompressed correctly only by | ||
275 | <computeroutput>bzip2</computeroutput> version 0.9.0 or later. | ||
276 | Earlier versions of <computeroutput>bzip2</computeroutput> will | ||
277 | stop after decompressing the first file in the stream.</para> | ||
278 | |||
279 | <para><computeroutput>bzcat</computeroutput> (or | ||
280 | <computeroutput>bzip2 -dc</computeroutput>) decompresses all | ||
281 | specified files to the standard output.</para> | ||
282 | |||
283 | <para><computeroutput>bzip2</computeroutput> will read arguments | ||
284 | from the environment variables | ||
285 | <computeroutput>BZIP2</computeroutput> and | ||
286 | <computeroutput>BZIP</computeroutput>, in that order, and will | ||
287 | process them before any arguments read from the command line. | ||
288 | This gives a convenient way to supply default arguments.</para> | ||
289 | |||
290 | <para>Compression is always performed, even if the compressed | ||
291 | file is slightly larger than the original. Files of less than | ||
292 | about one hundred bytes tend to get larger, since the compression | ||
293 | mechanism has a constant overhead in the region of 50 bytes. | ||
294 | Random data (including the output of most file compressors) is | ||
295 | coded at about 8.05 bits per byte, giving an expansion of around | ||
296 | 0.5%.</para> | ||
297 | |||
298 | <para>As a self-check for your protection, | ||
299 | <computeroutput>bzip2</computeroutput> uses 32-bit CRCs to make | ||
300 | sure that the decompressed version of a file is identical to the | ||
301 | original. This guards against corruption of the compressed data, | ||
302 | and against undetected bugs in | ||
303 | <computeroutput>bzip2</computeroutput> (hopefully very unlikely). | ||
304 | The chances of data corruption going undetected is microscopic, | ||
305 | about one chance in four billion for each file processed. Be | ||
306 | aware, though, that the check occurs upon decompression, so it | ||
307 | can only tell you that something is wrong. It can't help you | ||
308 | recover the original uncompressed data. You can use | ||
309 | <computeroutput>bzip2recover</computeroutput> to try to recover | ||
310 | data from damaged files.</para> | ||
311 | |||
312 | <para>Return values: 0 for a normal exit, 1 for environmental | ||
313 | problems (file not found, invalid flags, I/O errors, etc.), 2 | ||
314 | to indicate a corrupt compressed file, 3 for an internal | ||
315 | consistency error (eg, bug) which caused | ||
316 | <computeroutput>bzip2</computeroutput> to panic.</para> | ||
317 | |||
318 | </sect1> | ||
319 | |||
320 | |||
321 | <sect1 id="options" xreflabel="OPTIONS"> | ||
322 | <title>OPTIONS</title> | ||
323 | |||
324 | <variablelist> | ||
325 | |||
326 | <varlistentry> | ||
327 | <term><computeroutput>-c --stdout</computeroutput></term> | ||
328 | <listitem><para>Compress or decompress to standard | ||
329 | output.</para></listitem> | ||
330 | </varlistentry> | ||
331 | |||
332 | <varlistentry> | ||
333 | <term><computeroutput>-d --decompress</computeroutput></term> | ||
334 | <listitem><para>Force decompression. | ||
335 | <computeroutput>bzip2</computeroutput>, | ||
336 | <computeroutput>bunzip2</computeroutput> and | ||
337 | <computeroutput>bzcat</computeroutput> are really the same | ||
338 | program, and the decision about what actions to take is done on | ||
339 | the basis of which name is used. This flag overrides that | ||
340 | mechanism, and forces bzip2 to decompress.</para></listitem> | ||
341 | </varlistentry> | ||
342 | |||
343 | <varlistentry> | ||
344 | <term><computeroutput>-z --compress</computeroutput></term> | ||
345 | <listitem><para>The complement to | ||
346 | <computeroutput>-d</computeroutput>: forces compression, | ||
347 | regardless of the invokation name.</para></listitem> | ||
348 | </varlistentry> | ||
349 | |||
350 | <varlistentry> | ||
351 | <term><computeroutput>-t --test</computeroutput></term> | ||
352 | <listitem><para>Check integrity of the specified file(s), but | ||
353 | don't decompress them. This really performs a trial | ||
354 | decompression and throws away the result.</para></listitem> | ||
355 | </varlistentry> | ||
356 | |||
357 | <varlistentry> | ||
358 | <term><computeroutput>-f --force</computeroutput></term> | ||
359 | <listitem><para>Force overwrite of output files. Normally, | ||
360 | <computeroutput>bzip2</computeroutput> will not overwrite | ||
361 | existing output files. Also forces | ||
362 | <computeroutput>bzip2</computeroutput> to break hard links to | ||
363 | files, which it otherwise wouldn't do.</para> | ||
364 | <para><computeroutput>bzip2</computeroutput> normally declines | ||
365 | to decompress files which don't have the correct magic header | ||
366 | bytes. If forced (<computeroutput>-f</computeroutput>), | ||
367 | however, it will pass such files through unmodified. This is | ||
368 | how GNU <computeroutput>gzip</computeroutput> behaves.</para> | ||
369 | </listitem> | ||
370 | </varlistentry> | ||
371 | |||
372 | <varlistentry> | ||
373 | <term><computeroutput>-k --keep</computeroutput></term> | ||
374 | <listitem><para>Keep (don't delete) input files during | ||
375 | compression or decompression.</para></listitem> | ||
376 | </varlistentry> | ||
377 | |||
378 | <varlistentry> | ||
379 | <term><computeroutput>-s --small</computeroutput></term> | ||
380 | <listitem><para>Reduce memory usage, for compression, | ||
381 | decompression and testing. Files are decompressed and tested | ||
382 | using a modified algorithm which only requires 2.5 bytes per | ||
383 | block byte. This means any file can be decompressed in 2300k | ||
384 | of memory, albeit at about half the normal speed.</para> | ||
385 | <para>During compression, <computeroutput>-s</computeroutput> | ||
386 | selects a block size of 200k, which limits memory use to around | ||
387 | the same figure, at the expense of your compression ratio. In | ||
388 | short, if your machine is low on memory (8 megabytes or less), | ||
389 | use <computeroutput>-s</computeroutput> for everything. See | ||
390 | <xref linkend="memory-management"/> below.</para></listitem> | ||
391 | </varlistentry> | ||
392 | |||
393 | <varlistentry> | ||
394 | <term><computeroutput>-q --quiet</computeroutput></term> | ||
395 | <listitem><para>Suppress non-essential warning messages. | ||
396 | Messages pertaining to I/O errors and other critical events | ||
397 | will not be suppressed.</para></listitem> | ||
398 | </varlistentry> | ||
399 | |||
400 | <varlistentry> | ||
401 | <term><computeroutput>-v --verbose</computeroutput></term> | ||
402 | <listitem><para>Verbose mode -- show the compression ratio for | ||
403 | each file processed. Further | ||
404 | <computeroutput>-v</computeroutput>'s increase the verbosity | ||
405 | level, spewing out lots of information which is primarily of | ||
406 | interest for diagnostic purposes.</para></listitem> | ||
407 | </varlistentry> | ||
408 | |||
409 | <varlistentry> | ||
410 | <term><computeroutput>-L --license -V --version</computeroutput></term> | ||
411 | <listitem><para>Display the software version, license terms and | ||
412 | conditions.</para></listitem> | ||
413 | </varlistentry> | ||
414 | |||
415 | <varlistentry> | ||
416 | <term><computeroutput>-1</computeroutput> (or | ||
417 | <computeroutput>--fast</computeroutput>) to | ||
418 | <computeroutput>-9</computeroutput> (or | ||
419 | <computeroutput>-best</computeroutput>)</term> | ||
420 | <listitem><para>Set the block size to 100 k, 200 k ... 900 k | ||
421 | when compressing. Has no effect when decompressing. See <xref | ||
422 | linkend="memory-management" /> below. The | ||
423 | <computeroutput>--fast</computeroutput> and | ||
424 | <computeroutput>--best</computeroutput> aliases are primarily | ||
425 | for GNU <computeroutput>gzip</computeroutput> compatibility. | ||
426 | In particular, <computeroutput>--fast</computeroutput> doesn't | ||
427 | make things significantly faster. And | ||
428 | <computeroutput>--best</computeroutput> merely selects the | ||
429 | default behaviour.</para></listitem> | ||
430 | </varlistentry> | ||
431 | |||
432 | <varlistentry> | ||
433 | <term><computeroutput>--</computeroutput></term> | ||
434 | <listitem><para>Treats all subsequent arguments as file names, | ||
435 | even if they start with a dash. This is so you can handle | ||
436 | files with names beginning with a dash, for example: | ||
437 | <computeroutput>bzip2 -- | ||
438 | -myfilename</computeroutput>.</para></listitem> | ||
439 | </varlistentry> | ||
440 | |||
441 | <varlistentry> | ||
442 | <term><computeroutput>--repetitive-fast</computeroutput></term> | ||
443 | <term><computeroutput>--repetitive-best</computeroutput></term> | ||
444 | <listitem><para>These flags are redundant in versions 0.9.5 and | ||
445 | above. They provided some coarse control over the behaviour of | ||
446 | the sorting algorithm in earlier versions, which was sometimes | ||
447 | useful. 0.9.5 and above have an improved algorithm which | ||
448 | renders these flags irrelevant.</para></listitem> | ||
449 | </varlistentry> | ||
450 | |||
451 | </variablelist> | ||
452 | |||
453 | </sect1> | ||
454 | |||
455 | |||
456 | <sect1 id="memory-management" xreflabel="MEMORY MANAGEMENT"> | ||
457 | <title>MEMORY MANAGEMENT</title> | ||
458 | |||
459 | <para><computeroutput>bzip2</computeroutput> compresses large | ||
460 | files in blocks. The block size affects both the compression | ||
461 | ratio achieved, and the amount of memory needed for compression | ||
462 | and decompression. The flags <computeroutput>-1</computeroutput> | ||
463 | through <computeroutput>-9</computeroutput> specify the block | ||
464 | size to be 100,000 bytes through 900,000 bytes (the default) | ||
465 | respectively. At decompression time, the block size used for | ||
466 | compression is read from the header of the compressed file, and | ||
467 | <computeroutput>bunzip2</computeroutput> then allocates itself | ||
468 | just enough memory to decompress the file. Since block sizes are | ||
469 | stored in compressed files, it follows that the flags | ||
470 | <computeroutput>-1</computeroutput> to | ||
471 | <computeroutput>-9</computeroutput> are irrelevant to and so | ||
472 | ignored during decompression.</para> | ||
473 | |||
474 | <para>Compression and decompression requirements, in bytes, can be | ||
475 | estimated as:</para> | ||
476 | <programlisting> | ||
477 | Compression: 400k + ( 8 x block size ) | ||
478 | |||
479 | Decompression: 100k + ( 4 x block size ), or | ||
480 | 100k + ( 2.5 x block size ) | ||
481 | </programlisting> | ||
482 | |||
483 | <para>Larger block sizes give rapidly diminishing marginal | ||
484 | returns. Most of the compression comes from the first two or | ||
485 | three hundred k of block size, a fact worth bearing in mind when | ||
486 | using <computeroutput>bzip2</computeroutput> on small machines. | ||
487 | It is also important to appreciate that the decompression memory | ||
488 | requirement is set at compression time by the choice of block | ||
489 | size.</para> | ||
490 | |||
491 | <para>For files compressed with the default 900k block size, | ||
492 | <computeroutput>bunzip2</computeroutput> will require about 3700 | ||
493 | kbytes to decompress. To support decompression of any file on a | ||
494 | 4 megabyte machine, <computeroutput>bunzip2</computeroutput> has | ||
495 | an option to decompress using approximately half this amount of | ||
496 | memory, about 2300 kbytes. Decompression speed is also halved, | ||
497 | so you should use this option only where necessary. The relevant | ||
498 | flag is <computeroutput>-s</computeroutput>.</para> | ||
499 | |||
500 | <para>In general, try and use the largest block size memory | ||
501 | constraints allow, since that maximises the compression achieved. | ||
502 | Compression and decompression speed are virtually unaffected by | ||
503 | block size.</para> | ||
504 | |||
505 | <para>Another significant point applies to files which fit in a | ||
506 | single block -- that means most files you'd encounter using a | ||
507 | large block size. The amount of real memory touched is | ||
508 | proportional to the size of the file, since the file is smaller | ||
509 | than a block. For example, compressing a file 20,000 bytes long | ||
510 | with the flag <computeroutput>-9</computeroutput> will cause the | ||
511 | compressor to allocate around 7600k of memory, but only touch | ||
512 | 400k + 20000 * 8 = 560 kbytes of it. Similarly, the decompressor | ||
513 | will allocate 3700k but only touch 100k + 20000 * 4 = 180 | ||
514 | kbytes.</para> | ||
515 | |||
516 | <para>Here is a table which summarises the maximum memory usage | ||
517 | for different block sizes. Also recorded is the total compressed | ||
518 | size for 14 files of the Calgary Text Compression Corpus | ||
519 | totalling 3,141,622 bytes. This column gives some feel for how | ||
520 | compression varies with block size. These figures tend to | ||
521 | understate the advantage of larger block sizes for larger files, | ||
522 | since the Corpus is dominated by smaller files.</para> | ||
523 | |||
524 | <programlisting> | ||
525 | Compress Decompress Decompress Corpus | ||
526 | Flag usage usage -s usage Size | ||
527 | |||
528 | -1 1200k 500k 350k 914704 | ||
529 | -2 2000k 900k 600k 877703 | ||
530 | -3 2800k 1300k 850k 860338 | ||
531 | -4 3600k 1700k 1100k 846899 | ||
532 | -5 4400k 2100k 1350k 845160 | ||
533 | -6 5200k 2500k 1600k 838626 | ||
534 | -7 6100k 2900k 1850k 834096 | ||
535 | -8 6800k 3300k 2100k 828642 | ||
536 | -9 7600k 3700k 2350k 828642 | ||
537 | </programlisting> | ||
538 | |||
539 | </sect1> | ||
540 | |||
541 | |||
542 | <sect1 id="recovering" xreflabel="RECOVERING DATA FROM DAMAGED FILES"> | ||
543 | <title>RECOVERING DATA FROM DAMAGED FILES</title> | ||
544 | |||
545 | <para><computeroutput>bzip2</computeroutput> compresses files in | ||
546 | blocks, usually 900kbytes long. Each block is handled | ||
547 | independently. If a media or transmission error causes a | ||
548 | multi-block <computeroutput>.bz2</computeroutput> file to become | ||
549 | damaged, it may be possible to recover data from the undamaged | ||
550 | blocks in the file.</para> | ||
551 | |||
552 | <para>The compressed representation of each block is delimited by | ||
553 | a 48-bit pattern, which makes it possible to find the block | ||
554 | boundaries with reasonable certainty. Each block also carries | ||
555 | its own 32-bit CRC, so damaged blocks can be distinguished from | ||
556 | undamaged ones.</para> | ||
557 | |||
558 | <para><computeroutput>bzip2recover</computeroutput> is a simple | ||
559 | program whose purpose is to search for blocks in | ||
560 | <computeroutput>.bz2</computeroutput> files, and write each block | ||
561 | out into its own <computeroutput>.bz2</computeroutput> file. You | ||
562 | can then use <computeroutput>bzip2 -t</computeroutput> to test | ||
563 | the integrity of the resulting files, and decompress those which | ||
564 | are undamaged.</para> | ||
565 | |||
566 | <para><computeroutput>bzip2recover</computeroutput> takes a | ||
567 | single argument, the name of the damaged file, and writes a | ||
568 | number of files <computeroutput>rec0001file.bz2</computeroutput>, | ||
569 | <computeroutput>rec0002file.bz2</computeroutput>, etc, containing | ||
570 | the extracted blocks. The output filenames are designed so that | ||
571 | the use of wildcards in subsequent processing -- for example, | ||
572 | <computeroutput>bzip2 -dc rec*file.bz2 > | ||
573 | recovered_data</computeroutput> -- lists the files in the correct | ||
574 | order.</para> | ||
575 | |||
576 | <para><computeroutput>bzip2recover</computeroutput> should be of | ||
577 | most use dealing with large <computeroutput>.bz2</computeroutput> | ||
578 | files, as these will contain many blocks. It is clearly futile | ||
579 | to use it on damaged single-block files, since a damaged block | ||
580 | cannot be recovered. If you wish to minimise any potential data | ||
581 | loss through media or transmission errors, you might consider | ||
582 | compressing with a smaller block size.</para> | ||
583 | |||
584 | </sect1> | ||
585 | |||
586 | |||
587 | <sect1 id="performance" xreflabel="PERFORMANCE NOTES"> | ||
588 | <title>PERFORMANCE NOTES</title> | ||
589 | |||
590 | <para>The sorting phase of compression gathers together similar | ||
591 | strings in the file. Because of this, files containing very long | ||
592 | runs of repeated symbols, like "aabaabaabaab ..." (repeated | ||
593 | several hundred times) may compress more slowly than normal. | ||
594 | Versions 0.9.5 and above fare much better than previous versions | ||
595 | in this respect. The ratio between worst-case and average-case | ||
596 | compression time is in the region of 10:1. For previous | ||
597 | versions, this figure was more like 100:1. You can use the | ||
598 | <computeroutput>-vvvv</computeroutput> option to monitor progress | ||
599 | in great detail, if you want.</para> | ||
600 | |||
601 | <para>Decompression speed is unaffected by these | ||
602 | phenomena.</para> | ||
603 | |||
604 | <para><computeroutput>bzip2</computeroutput> usually allocates | ||
605 | several megabytes of memory to operate in, and then charges all | ||
606 | over it in a fairly random fashion. This means that performance, | ||
607 | both for compressing and decompressing, is largely determined by | ||
608 | the speed at which your machine can service cache misses. | ||
609 | Because of this, small changes to the code to reduce the miss | ||
610 | rate have been observed to give disproportionately large | ||
611 | performance improvements. I imagine | ||
612 | <computeroutput>bzip2</computeroutput> will perform best on | ||
613 | machines with very large caches.</para> | ||
614 | |||
615 | </sect1> | ||
616 | |||
617 | |||
618 | |||
619 | <sect1 id="caveats" xreflabel="CAVEATS"> | ||
620 | <title>CAVEATS</title> | ||
621 | |||
622 | <para>I/O error messages are not as helpful as they could be. | ||
623 | <computeroutput>bzip2</computeroutput> tries hard to detect I/O | ||
624 | errors and exit cleanly, but the details of what the problem is | ||
625 | sometimes seem rather misleading.</para> | ||
626 | |||
627 | <para>This manual page pertains to version &bz-version; of | ||
628 | <computeroutput>bzip2</computeroutput>. Compressed data created | ||
629 | by this version is entirely forwards and backwards compatible | ||
630 | with the previous public releases, versions 0.1pl2, 0.9.0 and | ||
631 | 0.9.5, 1.0.0, 1.0.1 and 1.0.2, but with the following exception: 0.9.0 | ||
632 | and above can correctly decompress multiple concatenated | ||
633 | compressed files. 0.1pl2 cannot do this; it will stop after | ||
634 | decompressing just the first file in the stream.</para> | ||
635 | |||
636 | <para><computeroutput>bzip2recover</computeroutput> versions | ||
637 | prior to 1.0.2 used 32-bit integers to represent bit positions in | ||
638 | compressed files, so it could not handle compressed files more | ||
639 | than 512 megabytes long. Versions 1.0.2 and above use 64-bit ints | ||
640 | on some platforms which support them (GNU supported targets, and | ||
641 | Windows). To establish whether or not | ||
642 | <computeroutput>bzip2recover</computeroutput> was built with such | ||
643 | a limitation, run it without arguments. In any event you can | ||
644 | build yourself an unlimited version if you can recompile it with | ||
645 | <computeroutput>MaybeUInt64</computeroutput> set to be an | ||
646 | unsigned 64-bit integer.</para> | ||
647 | |||
648 | </sect1> | ||
649 | |||
650 | |||
651 | |||
652 | <sect1 id="author" xreflabel="AUTHOR"> | ||
653 | <title>AUTHOR</title> | ||
654 | |||
655 | <para>Julian Seward, | ||
656 | <computeroutput>&bz-email;</computeroutput></para> | ||
657 | |||
658 | <para>The ideas embodied in | ||
659 | <computeroutput>bzip2</computeroutput> are due to (at least) the | ||
660 | following people: Michael Burrows and David Wheeler (for the | ||
661 | block sorting transformation), David Wheeler (again, for the | ||
662 | Huffman coder), Peter Fenwick (for the structured coding model in | ||
663 | the original <computeroutput>bzip</computeroutput>, and many | ||
664 | refinements), and Alistair Moffat, Radford Neal and Ian Witten | ||
665 | (for the arithmetic coder in the original | ||
666 | <computeroutput>bzip</computeroutput>). I am much indebted for | ||
667 | their help, support and advice. See the manual in the source | ||
668 | distribution for pointers to sources of documentation. Christian | ||
669 | von Roques encouraged me to look for faster sorting algorithms, | ||
670 | so as to speed up compression. Bela Lubkin encouraged me to | ||
671 | improve the worst-case compression performance. | ||
672 | Donna Robinson XMLised the documentation. | ||
673 | Many people sent | ||
674 | patches, helped with portability problems, lent machines, gave | ||
675 | advice and were generally helpful.</para> | ||
676 | |||
677 | </sect1> | ||
678 | |||
679 | </chapter> | ||
680 | |||
681 | |||
682 | |||
683 | <chapter id="libprog" xreflabel="Programming with libbzip2"> | ||
684 | <title> | ||
685 | Programming with <computeroutput>libbzip2</computeroutput> | ||
686 | </title> | ||
687 | |||
688 | <para>This chapter describes the programming interface to | ||
689 | <computeroutput>libbzip2</computeroutput>.</para> | ||
690 | |||
691 | <para>For general background information, particularly about | ||
692 | memory use and performance aspects, you'd be well advised to read | ||
693 | <xref linkend="using"/> as well.</para> | ||
694 | |||
695 | |||
696 | <sect1 id="top-level" xreflabel="Top-level structure"> | ||
697 | <title>Top-level structure</title> | ||
698 | |||
699 | <para><computeroutput>libbzip2</computeroutput> is a flexible | ||
700 | library for compressing and decompressing data in the | ||
701 | <computeroutput>bzip2</computeroutput> data format. Although | ||
702 | packaged as a single entity, it helps to regard the library as | ||
703 | three separate parts: the low level interface, and the high level | ||
704 | interface, and some utility functions.</para> | ||
705 | |||
706 | <para>The structure of | ||
707 | <computeroutput>libbzip2</computeroutput>'s interfaces is similar | ||
708 | to that of Jean-loup Gailly's and Mark Adler's excellent | ||
709 | <computeroutput>zlib</computeroutput> library.</para> | ||
710 | |||
711 | <para>All externally visible symbols have names beginning | ||
712 | <computeroutput>BZ2_</computeroutput>. This is new in version | ||
713 | 1.0. The intention is to minimise pollution of the namespaces of | ||
714 | library clients.</para> | ||
715 | |||
716 | <para>To use any part of the library, you need to | ||
717 | <computeroutput>#include <bzlib.h></computeroutput> | ||
718 | into your sources.</para> | ||
719 | |||
720 | |||
721 | |||
722 | <sect2 id="ll-summary" xreflabel="Low-level summary"> | ||
723 | <title>Low-level summary</title> | ||
724 | |||
725 | <para>This interface provides services for compressing and | ||
726 | decompressing data in memory. There's no provision for dealing | ||
727 | with files, streams or any other I/O mechanisms, just straight | ||
728 | memory-to-memory work. In fact, this part of the library can be | ||
729 | compiled without inclusion of | ||
730 | <computeroutput>stdio.h</computeroutput>, which may be helpful | ||
731 | for embedded applications.</para> | ||
732 | |||
733 | <para>The low-level part of the library has no global variables | ||
734 | and is therefore thread-safe.</para> | ||
735 | |||
736 | <para>Six routines make up the low level interface: | ||
737 | <computeroutput>BZ2_bzCompressInit</computeroutput>, | ||
738 | <computeroutput>BZ2_bzCompress</computeroutput>, and | ||
739 | <computeroutput>BZ2_bzCompressEnd</computeroutput> for | ||
740 | compression, and a corresponding trio | ||
741 | <computeroutput>BZ2_bzDecompressInit</computeroutput>, | ||
742 | <computeroutput>BZ2_bzDecompress</computeroutput> and | ||
743 | <computeroutput>BZ2_bzDecompressEnd</computeroutput> for | ||
744 | decompression. The <computeroutput>*Init</computeroutput> | ||
745 | functions allocate memory for compression/decompression and do | ||
746 | other initialisations, whilst the | ||
747 | <computeroutput>*End</computeroutput> functions close down | ||
748 | operations and release memory.</para> | ||
749 | |||
750 | <para>The real work is done by | ||
751 | <computeroutput>BZ2_bzCompress</computeroutput> and | ||
752 | <computeroutput>BZ2_bzDecompress</computeroutput>. These | ||
753 | compress and decompress data from a user-supplied input buffer to | ||
754 | a user-supplied output buffer. These buffers can be any size; | ||
755 | arbitrary quantities of data are handled by making repeated calls | ||
756 | to these functions. This is a flexible mechanism allowing a | ||
757 | consumer-pull style of activity, or producer-push, or a mixture | ||
758 | of both.</para> | ||
759 | |||
760 | </sect2> | ||
761 | |||
762 | |||
763 | <sect2 id="hl-summary" xreflabel="High-level summary"> | ||
764 | <title>High-level summary</title> | ||
765 | |||
766 | <para>This interface provides some handy wrappers around the | ||
767 | low-level interface to facilitate reading and writing | ||
768 | <computeroutput>bzip2</computeroutput> format files | ||
769 | (<computeroutput>.bz2</computeroutput> files). The routines | ||
770 | provide hooks to facilitate reading files in which the | ||
771 | <computeroutput>bzip2</computeroutput> data stream is embedded | ||
772 | within some larger-scale file structure, or where there are | ||
773 | multiple <computeroutput>bzip2</computeroutput> data streams | ||
774 | concatenated end-to-end.</para> | ||
775 | |||
776 | <para>For reading files, | ||
777 | <computeroutput>BZ2_bzReadOpen</computeroutput>, | ||
778 | <computeroutput>BZ2_bzRead</computeroutput>, | ||
779 | <computeroutput>BZ2_bzReadClose</computeroutput> and | ||
780 | <computeroutput>BZ2_bzReadGetUnused</computeroutput> are | ||
781 | supplied. For writing files, | ||
782 | <computeroutput>BZ2_bzWriteOpen</computeroutput>, | ||
783 | <computeroutput>BZ2_bzWrite</computeroutput> and | ||
784 | <computeroutput>BZ2_bzWriteFinish</computeroutput> are | ||
785 | available.</para> | ||
786 | |||
787 | <para>As with the low-level library, no global variables are used | ||
788 | so the library is per se thread-safe. However, if I/O errors | ||
789 | occur whilst reading or writing the underlying compressed files, | ||
790 | you may have to consult <computeroutput>errno</computeroutput> to | ||
791 | determine the cause of the error. In that case, you'd need a C | ||
792 | library which correctly supports | ||
793 | <computeroutput>errno</computeroutput> in a multithreaded | ||
794 | environment.</para> | ||
795 | |||
796 | <para>To make the library a little simpler and more portable, | ||
797 | <computeroutput>BZ2_bzReadOpen</computeroutput> and | ||
798 | <computeroutput>BZ2_bzWriteOpen</computeroutput> require you to | ||
799 | pass them file handles (<computeroutput>FILE*</computeroutput>s) | ||
800 | which have previously been opened for reading or writing | ||
801 | respectively. That avoids portability problems associated with | ||
802 | file operations and file attributes, whilst not being much of an | ||
803 | imposition on the programmer.</para> | ||
804 | |||
805 | </sect2> | ||
806 | |||
807 | |||
808 | <sect2 id="util-fns-summary" xreflabel="Utility functions summary"> | ||
809 | <title>Utility functions summary</title> | ||
810 | |||
811 | <para>For very simple needs, | ||
812 | <computeroutput>BZ2_bzBuffToBuffCompress</computeroutput> and | ||
813 | <computeroutput>BZ2_bzBuffToBuffDecompress</computeroutput> are | ||
814 | provided. These compress data in memory from one buffer to | ||
815 | another buffer in a single function call. You should assess | ||
816 | whether these functions fulfill your memory-to-memory | ||
817 | compression/decompression requirements before investing effort in | ||
818 | understanding the more general but more complex low-level | ||
819 | interface.</para> | ||
820 | |||
821 | <para>Yoshioka Tsuneo | ||
822 | (<computeroutput>QWF00133@niftyserve.or.jp</computeroutput> / | ||
823 | <computeroutput>tsuneo-y@is.aist-nara.ac.jp</computeroutput>) has | ||
824 | contributed some functions to give better | ||
825 | <computeroutput>zlib</computeroutput> compatibility. These | ||
826 | functions are <computeroutput>BZ2_bzopen</computeroutput>, | ||
827 | <computeroutput>BZ2_bzread</computeroutput>, | ||
828 | <computeroutput>BZ2_bzwrite</computeroutput>, | ||
829 | <computeroutput>BZ2_bzflush</computeroutput>, | ||
830 | <computeroutput>BZ2_bzclose</computeroutput>, | ||
831 | <computeroutput>BZ2_bzerror</computeroutput> and | ||
832 | <computeroutput>BZ2_bzlibVersion</computeroutput>. You may find | ||
833 | these functions more convenient for simple file reading and | ||
834 | writing, than those in the high-level interface. These functions | ||
835 | are not (yet) officially part of the library, and are minimally | ||
836 | documented here. If they break, you get to keep all the pieces. | ||
837 | I hope to document them properly when time permits.</para> | ||
838 | |||
839 | <para>Yoshioka also contributed modifications to allow the | ||
840 | library to be built as a Windows DLL.</para> | ||
841 | |||
842 | </sect2> | ||
843 | |||
844 | </sect1> | ||
845 | |||
846 | |||
847 | <sect1 id="err-handling" xreflabel="Error handling"> | ||
848 | <title>Error handling</title> | ||
849 | |||
850 | <para>The library is designed to recover cleanly in all | ||
851 | situations, including the worst-case situation of decompressing | ||
852 | random data. I'm not 100% sure that it can always do this, so | ||
853 | you might want to add a signal handler to catch segmentation | ||
854 | violations during decompression if you are feeling especially | ||
855 | paranoid. I would be interested in hearing more about the | ||
856 | robustness of the library to corrupted compressed data.</para> | ||
857 | |||
858 | <para>Version 1.0.3 more robust in this respect than any | ||
859 | previous version. Investigations with Valgrind (a tool for detecting | ||
860 | problems with memory management) indicate | ||
861 | that, at least for the few files I tested, all single-bit errors | ||
862 | in the decompressed data are caught properly, with no | ||
863 | segmentation faults, no uses of uninitialised data, no out of | ||
864 | range reads or writes, and no infinite looping in the decompressor. | ||
865 | So it's certainly pretty robust, although | ||
866 | I wouldn't claim it to be totally bombproof.</para> | ||
867 | |||
868 | <para>The file <computeroutput>bzlib.h</computeroutput> contains | ||
869 | all definitions needed to use the library. In particular, you | ||
870 | should definitely not include | ||
871 | <computeroutput>bzlib_private.h</computeroutput>.</para> | ||
872 | |||
873 | <para>In <computeroutput>bzlib.h</computeroutput>, the various | ||
874 | return values are defined. The following list is not intended as | ||
875 | an exhaustive description of the circumstances in which a given | ||
876 | value may be returned -- those descriptions are given later. | ||
877 | Rather, it is intended to convey the rough meaning of each return | ||
878 | value. The first five actions are normal and not intended to | ||
879 | denote an error situation.</para> | ||
880 | |||
881 | <variablelist> | ||
882 | |||
883 | <varlistentry> | ||
884 | <term><computeroutput>BZ_OK</computeroutput></term> | ||
885 | <listitem><para>The requested action was completed | ||
886 | successfully.</para></listitem> | ||
887 | </varlistentry> | ||
888 | |||
889 | <varlistentry> | ||
890 | <term><computeroutput>BZ_RUN_OK, BZ_FLUSH_OK, | ||
891 | BZ_FINISH_OK</computeroutput></term> | ||
892 | <listitem><para>In | ||
893 | <computeroutput>BZ2_bzCompress</computeroutput>, the requested | ||
894 | flush/finish/nothing-special action was completed | ||
895 | successfully.</para></listitem> | ||
896 | </varlistentry> | ||
897 | |||
898 | <varlistentry> | ||
899 | <term><computeroutput>BZ_STREAM_END</computeroutput></term> | ||
900 | <listitem><para>Compression of data was completed, or the | ||
901 | logical stream end was detected during | ||
902 | decompression.</para></listitem> | ||
903 | </varlistentry> | ||
904 | |||
905 | </variablelist> | ||
906 | |||
907 | <para>The following return values indicate an error of some | ||
908 | kind.</para> | ||
909 | |||
910 | <variablelist> | ||
911 | |||
912 | <varlistentry> | ||
913 | <term><computeroutput>BZ_CONFIG_ERROR</computeroutput></term> | ||
914 | <listitem><para>Indicates that the library has been improperly | ||
915 | compiled on your platform -- a major configuration error. | ||
916 | Specifically, it means that | ||
917 | <computeroutput>sizeof(char)</computeroutput>, | ||
918 | <computeroutput>sizeof(short)</computeroutput> and | ||
919 | <computeroutput>sizeof(int)</computeroutput> are not 1, 2 and | ||
920 | 4 respectively, as they should be. Note that the library | ||
921 | should still work properly on 64-bit platforms which follow | ||
922 | the LP64 programming model -- that is, where | ||
923 | <computeroutput>sizeof(long)</computeroutput> and | ||
924 | <computeroutput>sizeof(void*)</computeroutput> are 8. Under | ||
925 | LP64, <computeroutput>sizeof(int)</computeroutput> is still 4, | ||
926 | so <computeroutput>libbzip2</computeroutput>, which doesn't | ||
927 | use the <computeroutput>long</computeroutput> type, is | ||
928 | OK.</para></listitem> | ||
929 | </varlistentry> | ||
930 | |||
931 | <varlistentry> | ||
932 | <term><computeroutput>BZ_SEQUENCE_ERROR</computeroutput></term> | ||
933 | <listitem><para>When using the library, it is important to call | ||
934 | the functions in the correct sequence and with data structures | ||
935 | (buffers etc) in the correct states. | ||
936 | <computeroutput>libbzip2</computeroutput> checks as much as it | ||
937 | can to ensure this is happening, and returns | ||
938 | <computeroutput>BZ_SEQUENCE_ERROR</computeroutput> if not. | ||
939 | Code which complies precisely with the function semantics, as | ||
940 | detailed below, should never receive this value; such an event | ||
941 | denotes buggy code which you should | ||
942 | investigate.</para></listitem> | ||
943 | </varlistentry> | ||
944 | |||
945 | <varlistentry> | ||
946 | <term><computeroutput>BZ_PARAM_ERROR</computeroutput></term> | ||
947 | <listitem><para>Returned when a parameter to a function call is | ||
948 | out of range or otherwise manifestly incorrect. As with | ||
949 | <computeroutput>BZ_SEQUENCE_ERROR</computeroutput>, this | ||
950 | denotes a bug in the client code. The distinction between | ||
951 | <computeroutput>BZ_PARAM_ERROR</computeroutput> and | ||
952 | <computeroutput>BZ_SEQUENCE_ERROR</computeroutput> is a bit | ||
953 | hazy, but still worth making.</para></listitem> | ||
954 | </varlistentry> | ||
955 | |||
956 | <varlistentry> | ||
957 | <term><computeroutput>BZ_MEM_ERROR</computeroutput></term> | ||
958 | <listitem><para>Returned when a request to allocate memory | ||
959 | failed. Note that the quantity of memory needed to decompress | ||
960 | a stream cannot be determined until the stream's header has | ||
961 | been read. So | ||
962 | <computeroutput>BZ2_bzDecompress</computeroutput> and | ||
963 | <computeroutput>BZ2_bzRead</computeroutput> may return | ||
964 | <computeroutput>BZ_MEM_ERROR</computeroutput> even though some | ||
965 | of the compressed data has been read. The same is not true | ||
966 | for compression; once | ||
967 | <computeroutput>BZ2_bzCompressInit</computeroutput> or | ||
968 | <computeroutput>BZ2_bzWriteOpen</computeroutput> have | ||
969 | successfully completed, | ||
970 | <computeroutput>BZ_MEM_ERROR</computeroutput> cannot | ||
971 | occur.</para></listitem> | ||
972 | </varlistentry> | ||
973 | |||
974 | <varlistentry> | ||
975 | <term><computeroutput>BZ_DATA_ERROR</computeroutput></term> | ||
976 | <listitem><para>Returned when a data integrity error is | ||
977 | detected during decompression. Most importantly, this means | ||
978 | when stored and computed CRCs for the data do not match. This | ||
979 | value is also returned upon detection of any other anomaly in | ||
980 | the compressed data.</para></listitem> | ||
981 | </varlistentry> | ||
982 | |||
983 | <varlistentry> | ||
984 | <term><computeroutput>BZ_DATA_ERROR_MAGIC</computeroutput></term> | ||
985 | <listitem><para>As a special case of | ||
986 | <computeroutput>BZ_DATA_ERROR</computeroutput>, it is | ||
987 | sometimes useful to know when the compressed stream does not | ||
988 | start with the correct magic bytes (<computeroutput>'B' 'Z' | ||
989 | 'h'</computeroutput>).</para></listitem> | ||
990 | </varlistentry> | ||
991 | |||
992 | <varlistentry> | ||
993 | <term><computeroutput>BZ_IO_ERROR</computeroutput></term> | ||
994 | <listitem><para>Returned by | ||
995 | <computeroutput>BZ2_bzRead</computeroutput> and | ||
996 | <computeroutput>BZ2_bzWrite</computeroutput> when there is an | ||
997 | error reading or writing in the compressed file, and by | ||
998 | <computeroutput>BZ2_bzReadOpen</computeroutput> and | ||
999 | <computeroutput>BZ2_bzWriteOpen</computeroutput> for attempts | ||
1000 | to use a file for which the error indicator (viz, | ||
1001 | <computeroutput>ferror(f)</computeroutput>) is set. On | ||
1002 | receipt of <computeroutput>BZ_IO_ERROR</computeroutput>, the | ||
1003 | caller should consult <computeroutput>errno</computeroutput> | ||
1004 | and/or <computeroutput>perror</computeroutput> to acquire | ||
1005 | operating-system specific information about the | ||
1006 | problem.</para></listitem> | ||
1007 | </varlistentry> | ||
1008 | |||
1009 | <varlistentry> | ||
1010 | <term><computeroutput>BZ_UNEXPECTED_EOF</computeroutput></term> | ||
1011 | <listitem><para>Returned by | ||
1012 | <computeroutput>BZ2_bzRead</computeroutput> when the | ||
1013 | compressed file finishes before the logical end of stream is | ||
1014 | detected.</para></listitem> | ||
1015 | </varlistentry> | ||
1016 | |||
1017 | <varlistentry> | ||
1018 | <term><computeroutput>BZ_OUTBUFF_FULL</computeroutput></term> | ||
1019 | <listitem><para>Returned by | ||
1020 | <computeroutput>BZ2_bzBuffToBuffCompress</computeroutput> and | ||
1021 | <computeroutput>BZ2_bzBuffToBuffDecompress</computeroutput> to | ||
1022 | indicate that the output data will not fit into the output | ||
1023 | buffer provided.</para></listitem> | ||
1024 | </varlistentry> | ||
1025 | |||
1026 | </variablelist> | ||
1027 | |||
1028 | </sect1> | ||
1029 | |||
1030 | |||
1031 | |||
1032 | <sect1 id="low-level" xreflabel=">Low-level interface"> | ||
1033 | <title>Low-level interface</title> | ||
1034 | |||
1035 | |||
1036 | <sect2 id="bzcompress-init" xreflabel="BZ2_bzCompressInit"> | ||
1037 | <title><computeroutput>BZ2_bzCompressInit</computeroutput></title> | ||
1038 | |||
1039 | <programlisting> | ||
1040 | typedef struct { | ||
1041 | char *next_in; | ||
1042 | unsigned int avail_in; | ||
1043 | unsigned int total_in_lo32; | ||
1044 | unsigned int total_in_hi32; | ||
1045 | |||
1046 | char *next_out; | ||
1047 | unsigned int avail_out; | ||
1048 | unsigned int total_out_lo32; | ||
1049 | unsigned int total_out_hi32; | ||
1050 | |||
1051 | void *state; | ||
1052 | |||
1053 | void *(*bzalloc)(void *,int,int); | ||
1054 | void (*bzfree)(void *,void *); | ||
1055 | void *opaque; | ||
1056 | } bz_stream; | ||
1057 | |||
1058 | int BZ2_bzCompressInit ( bz_stream *strm, | ||
1059 | int blockSize100k, | ||
1060 | int verbosity, | ||
1061 | int workFactor ); | ||
1062 | </programlisting> | ||
1063 | |||
1064 | <para>Prepares for compression. The | ||
1065 | <computeroutput>bz_stream</computeroutput> structure holds all | ||
1066 | data pertaining to the compression activity. A | ||
1067 | <computeroutput>bz_stream</computeroutput> structure should be | ||
1068 | allocated and initialised prior to the call. The fields of | ||
1069 | <computeroutput>bz_stream</computeroutput> comprise the entirety | ||
1070 | of the user-visible data. <computeroutput>state</computeroutput> | ||
1071 | is a pointer to the private data structures required for | ||
1072 | compression.</para> | ||
1073 | |||
1074 | <para>Custom memory allocators are supported, via fields | ||
1075 | <computeroutput>bzalloc</computeroutput>, | ||
1076 | <computeroutput>bzfree</computeroutput>, and | ||
1077 | <computeroutput>opaque</computeroutput>. The value | ||
1078 | <computeroutput>opaque</computeroutput> is passed to as the first | ||
1079 | argument to all calls to <computeroutput>bzalloc</computeroutput> | ||
1080 | and <computeroutput>bzfree</computeroutput>, but is otherwise | ||
1081 | ignored by the library. The call <computeroutput>bzalloc ( | ||
1082 | opaque, n, m )</computeroutput> is expected to return a pointer | ||
1083 | <computeroutput>p</computeroutput> to <computeroutput>n * | ||
1084 | m</computeroutput> bytes of memory, and <computeroutput>bzfree ( | ||
1085 | opaque, p )</computeroutput> should free that memory.</para> | ||
1086 | |||
1087 | <para>If you don't want to use a custom memory allocator, set | ||
1088 | <computeroutput>bzalloc</computeroutput>, | ||
1089 | <computeroutput>bzfree</computeroutput> and | ||
1090 | <computeroutput>opaque</computeroutput> to | ||
1091 | <computeroutput>NULL</computeroutput>, and the library will then | ||
1092 | use the standard <computeroutput>malloc</computeroutput> / | ||
1093 | <computeroutput>free</computeroutput> routines.</para> | ||
1094 | |||
1095 | <para>Before calling | ||
1096 | <computeroutput>BZ2_bzCompressInit</computeroutput>, fields | ||
1097 | <computeroutput>bzalloc</computeroutput>, | ||
1098 | <computeroutput>bzfree</computeroutput> and | ||
1099 | <computeroutput>opaque</computeroutput> should be filled | ||
1100 | appropriately, as just described. Upon return, the internal | ||
1101 | state will have been allocated and initialised, and | ||
1102 | <computeroutput>total_in_lo32</computeroutput>, | ||
1103 | <computeroutput>total_in_hi32</computeroutput>, | ||
1104 | <computeroutput>total_out_lo32</computeroutput> and | ||
1105 | <computeroutput>total_out_hi32</computeroutput> will have been | ||
1106 | set to zero. These four fields are used by the library to inform | ||
1107 | the caller of the total amount of data passed into and out of the | ||
1108 | library, respectively. You should not try to change them. As of | ||
1109 | version 1.0, 64-bit counts are maintained, even on 32-bit | ||
1110 | platforms, using the <computeroutput>_hi32</computeroutput> | ||
1111 | fields to store the upper 32 bits of the count. So, for example, | ||
1112 | the total amount of data in is <computeroutput>(total_in_hi32 | ||
1113 | << 32) + total_in_lo32</computeroutput>.</para> | ||
1114 | |||
1115 | <para>Parameter <computeroutput>blockSize100k</computeroutput> | ||
1116 | specifies the block size to be used for compression. It should | ||
1117 | be a value between 1 and 9 inclusive, and the actual block size | ||
1118 | used is 100000 x this figure. 9 gives the best compression but | ||
1119 | takes most memory.</para> | ||
1120 | |||
1121 | <para>Parameter <computeroutput>verbosity</computeroutput> should | ||
1122 | be set to a number between 0 and 4 inclusive. 0 is silent, and | ||
1123 | greater numbers give increasingly verbose monitoring/debugging | ||
1124 | output. If the library has been compiled with | ||
1125 | <computeroutput>-DBZ_NO_STDIO</computeroutput>, no such output | ||
1126 | will appear for any verbosity setting.</para> | ||
1127 | |||
1128 | <para>Parameter <computeroutput>workFactor</computeroutput> | ||
1129 | controls how the compression phase behaves when presented with | ||
1130 | worst case, highly repetitive, input data. If compression runs | ||
1131 | into difficulties caused by repetitive data, the library switches | ||
1132 | from the standard sorting algorithm to a fallback algorithm. The | ||
1133 | fallback is slower than the standard algorithm by perhaps a | ||
1134 | factor of three, but always behaves reasonably, no matter how bad | ||
1135 | the input.</para> | ||
1136 | |||
1137 | <para>Lower values of <computeroutput>workFactor</computeroutput> | ||
1138 | reduce the amount of effort the standard algorithm will expend | ||
1139 | before resorting to the fallback. You should set this parameter | ||
1140 | carefully; too low, and many inputs will be handled by the | ||
1141 | fallback algorithm and so compress rather slowly, too high, and | ||
1142 | your average-to-worst case compression times can become very | ||
1143 | large. The default value of 30 gives reasonable behaviour over a | ||
1144 | wide range of circumstances.</para> | ||
1145 | |||
1146 | <para>Allowable values range from 0 to 250 inclusive. 0 is a | ||
1147 | special case, equivalent to using the default value of 30.</para> | ||
1148 | |||
1149 | <para>Note that the compressed output generated is the same | ||
1150 | regardless of whether or not the fallback algorithm is | ||
1151 | used.</para> | ||
1152 | |||
1153 | <para>Be aware also that this parameter may disappear entirely in | ||
1154 | future versions of the library. In principle it should be | ||
1155 | possible to devise a good way to automatically choose which | ||
1156 | algorithm to use. Such a mechanism would render the parameter | ||
1157 | obsolete.</para> | ||
1158 | |||
1159 | <para>Possible return values:</para> | ||
1160 | |||
1161 | <programlisting> | ||
1162 | BZ_CONFIG_ERROR | ||
1163 | if the library has been mis-compiled | ||
1164 | BZ_PARAM_ERROR | ||
1165 | if strm is NULL | ||
1166 | or blockSize < 1 or blockSize > 9 | ||
1167 | or verbosity < 0 or verbosity > 4 | ||
1168 | or workFactor < 0 or workFactor > 250 | ||
1169 | BZ_MEM_ERROR | ||
1170 | if not enough memory is available | ||
1171 | BZ_OK | ||
1172 | otherwise | ||
1173 | </programlisting> | ||
1174 | |||
1175 | <para>Allowable next actions:</para> | ||
1176 | |||
1177 | <programlisting> | ||
1178 | BZ2_bzCompress | ||
1179 | if BZ_OK is returned | ||
1180 | no specific action needed in case of error | ||
1181 | </programlisting> | ||
1182 | |||
1183 | </sect2> | ||
1184 | |||
1185 | |||
1186 | <sect2 id="bzCompress" xreflabel="BZ2_bzCompress"> | ||
1187 | <title><computeroutput>BZ2_bzCompress</computeroutput></title> | ||
1188 | |||
1189 | <programlisting> | ||
1190 | int BZ2_bzCompress ( bz_stream *strm, int action ); | ||
1191 | </programlisting> | ||
1192 | |||
1193 | <para>Provides more input and/or output buffer space for the | ||
1194 | library. The caller maintains input and output buffers, and | ||
1195 | calls <computeroutput>BZ2_bzCompress</computeroutput> to transfer | ||
1196 | data between them.</para> | ||
1197 | |||
1198 | <para>Before each call to | ||
1199 | <computeroutput>BZ2_bzCompress</computeroutput>, | ||
1200 | <computeroutput>next_in</computeroutput> should point at the data | ||
1201 | to be compressed, and <computeroutput>avail_in</computeroutput> | ||
1202 | should indicate how many bytes the library may read. | ||
1203 | <computeroutput>BZ2_bzCompress</computeroutput> updates | ||
1204 | <computeroutput>next_in</computeroutput>, | ||
1205 | <computeroutput>avail_in</computeroutput> and | ||
1206 | <computeroutput>total_in</computeroutput> to reflect the number | ||
1207 | of bytes it has read.</para> | ||
1208 | |||
1209 | <para>Similarly, <computeroutput>next_out</computeroutput> should | ||
1210 | point to a buffer in which the compressed data is to be placed, | ||
1211 | with <computeroutput>avail_out</computeroutput> indicating how | ||
1212 | much output space is available. | ||
1213 | <computeroutput>BZ2_bzCompress</computeroutput> updates | ||
1214 | <computeroutput>next_out</computeroutput>, | ||
1215 | <computeroutput>avail_out</computeroutput> and | ||
1216 | <computeroutput>total_out</computeroutput> to reflect the number | ||
1217 | of bytes output.</para> | ||
1218 | |||
1219 | <para>You may provide and remove as little or as much data as you | ||
1220 | like on each call of | ||
1221 | <computeroutput>BZ2_bzCompress</computeroutput>. In the limit, | ||
1222 | it is acceptable to supply and remove data one byte at a time, | ||
1223 | although this would be terribly inefficient. You should always | ||
1224 | ensure that at least one byte of output space is available at | ||
1225 | each call.</para> | ||
1226 | |||
1227 | <para>A second purpose of | ||
1228 | <computeroutput>BZ2_bzCompress</computeroutput> is to request a | ||
1229 | change of mode of the compressed stream.</para> | ||
1230 | |||
1231 | <para>Conceptually, a compressed stream can be in one of four | ||
1232 | states: IDLE, RUNNING, FLUSHING and FINISHING. Before | ||
1233 | initialisation | ||
1234 | (<computeroutput>BZ2_bzCompressInit</computeroutput>) and after | ||
1235 | termination (<computeroutput>BZ2_bzCompressEnd</computeroutput>), | ||
1236 | a stream is regarded as IDLE.</para> | ||
1237 | |||
1238 | <para>Upon initialisation | ||
1239 | (<computeroutput>BZ2_bzCompressInit</computeroutput>), the stream | ||
1240 | is placed in the RUNNING state. Subsequent calls to | ||
1241 | <computeroutput>BZ2_bzCompress</computeroutput> should pass | ||
1242 | <computeroutput>BZ_RUN</computeroutput> as the requested action; | ||
1243 | other actions are illegal and will result in | ||
1244 | <computeroutput>BZ_SEQUENCE_ERROR</computeroutput>.</para> | ||
1245 | |||
1246 | <para>At some point, the calling program will have provided all | ||
1247 | the input data it wants to. It will then want to finish up -- in | ||
1248 | effect, asking the library to process any data it might have | ||
1249 | buffered internally. In this state, | ||
1250 | <computeroutput>BZ2_bzCompress</computeroutput> will no longer | ||
1251 | attempt to read data from | ||
1252 | <computeroutput>next_in</computeroutput>, but it will want to | ||
1253 | write data to <computeroutput>next_out</computeroutput>. Because | ||
1254 | the output buffer supplied by the user can be arbitrarily small, | ||
1255 | the finishing-up operation cannot necessarily be done with a | ||
1256 | single call of | ||
1257 | <computeroutput>BZ2_bzCompress</computeroutput>.</para> | ||
1258 | |||
1259 | <para>Instead, the calling program passes | ||
1260 | <computeroutput>BZ_FINISH</computeroutput> as an action to | ||
1261 | <computeroutput>BZ2_bzCompress</computeroutput>. This changes | ||
1262 | the stream's state to FINISHING. Any remaining input (ie, | ||
1263 | <computeroutput>next_in[0 .. avail_in-1]</computeroutput>) is | ||
1264 | compressed and transferred to the output buffer. To do this, | ||
1265 | <computeroutput>BZ2_bzCompress</computeroutput> must be called | ||
1266 | repeatedly until all the output has been consumed. At that | ||
1267 | point, <computeroutput>BZ2_bzCompress</computeroutput> returns | ||
1268 | <computeroutput>BZ_STREAM_END</computeroutput>, and the stream's | ||
1269 | state is set back to IDLE. | ||
1270 | <computeroutput>BZ2_bzCompressEnd</computeroutput> should then be | ||
1271 | called.</para> | ||
1272 | |||
1273 | <para>Just to make sure the calling program does not cheat, the | ||
1274 | library makes a note of <computeroutput>avail_in</computeroutput> | ||
1275 | at the time of the first call to | ||
1276 | <computeroutput>BZ2_bzCompress</computeroutput> which has | ||
1277 | <computeroutput>BZ_FINISH</computeroutput> as an action (ie, at | ||
1278 | the time the program has announced its intention to not supply | ||
1279 | any more input). By comparing this value with that of | ||
1280 | <computeroutput>avail_in</computeroutput> over subsequent calls | ||
1281 | to <computeroutput>BZ2_bzCompress</computeroutput>, the library | ||
1282 | can detect any attempts to slip in more data to compress. Any | ||
1283 | calls for which this is detected will return | ||
1284 | <computeroutput>BZ_SEQUENCE_ERROR</computeroutput>. This | ||
1285 | indicates a programming mistake which should be corrected.</para> | ||
1286 | |||
1287 | <para>Instead of asking to finish, the calling program may ask | ||
1288 | <computeroutput>BZ2_bzCompress</computeroutput> to take all the | ||
1289 | remaining input, compress it and terminate the current | ||
1290 | (Burrows-Wheeler) compression block. This could be useful for | ||
1291 | error control purposes. The mechanism is analogous to that for | ||
1292 | finishing: call <computeroutput>BZ2_bzCompress</computeroutput> | ||
1293 | with an action of <computeroutput>BZ_FLUSH</computeroutput>, | ||
1294 | remove output data, and persist with the | ||
1295 | <computeroutput>BZ_FLUSH</computeroutput> action until the value | ||
1296 | <computeroutput>BZ_RUN</computeroutput> is returned. As with | ||
1297 | finishing, <computeroutput>BZ2_bzCompress</computeroutput> | ||
1298 | detects any attempt to provide more input data once the flush has | ||
1299 | begun.</para> | ||
1300 | |||
1301 | <para>Once the flush is complete, the stream returns to the | ||
1302 | normal RUNNING state.</para> | ||
1303 | |||
1304 | <para>This all sounds pretty complex, but isn't really. Here's a | ||
1305 | table which shows which actions are allowable in each state, what | ||
1306 | action will be taken, what the next state is, and what the | ||
1307 | non-error return values are. Note that you can't explicitly ask | ||
1308 | what state the stream is in, but nor do you need to -- it can be | ||
1309 | inferred from the values returned by | ||
1310 | <computeroutput>BZ2_bzCompress</computeroutput>.</para> | ||
1311 | |||
1312 | <programlisting> | ||
1313 | IDLE/any | ||
1314 | Illegal. IDLE state only exists after BZ2_bzCompressEnd or | ||
1315 | before BZ2_bzCompressInit. | ||
1316 | Return value = BZ_SEQUENCE_ERROR | ||
1317 | |||
1318 | RUNNING/BZ_RUN | ||
1319 | Compress from next_in to next_out as much as possible. | ||
1320 | Next state = RUNNING | ||
1321 | Return value = BZ_RUN_OK | ||
1322 | |||
1323 | RUNNING/BZ_FLUSH | ||
1324 | Remember current value of next_in. Compress from next_in | ||
1325 | to next_out as much as possible, but do not accept any more input. | ||
1326 | Next state = FLUSHING | ||
1327 | Return value = BZ_FLUSH_OK | ||
1328 | |||
1329 | RUNNING/BZ_FINISH | ||
1330 | Remember current value of next_in. Compress from next_in | ||
1331 | to next_out as much as possible, but do not accept any more input. | ||
1332 | Next state = FINISHING | ||
1333 | Return value = BZ_FINISH_OK | ||
1334 | |||
1335 | FLUSHING/BZ_FLUSH | ||
1336 | Compress from next_in to next_out as much as possible, | ||
1337 | but do not accept any more input. | ||
1338 | If all the existing input has been used up and all compressed | ||
1339 | output has been removed | ||
1340 | Next state = RUNNING; Return value = BZ_RUN_OK | ||
1341 | else | ||
1342 | Next state = FLUSHING; Return value = BZ_FLUSH_OK | ||
1343 | |||
1344 | FLUSHING/other | ||
1345 | Illegal. | ||
1346 | Return value = BZ_SEQUENCE_ERROR | ||
1347 | |||
1348 | FINISHING/BZ_FINISH | ||
1349 | Compress from next_in to next_out as much as possible, | ||
1350 | but to not accept any more input. | ||
1351 | If all the existing input has been used up and all compressed | ||
1352 | output has been removed | ||
1353 | Next state = IDLE; Return value = BZ_STREAM_END | ||
1354 | else | ||
1355 | Next state = FINISHING; Return value = BZ_FINISHING | ||
1356 | |||
1357 | FINISHING/other | ||
1358 | Illegal. | ||
1359 | Return value = BZ_SEQUENCE_ERROR | ||
1360 | </programlisting> | ||
1361 | |||
1362 | |||
1363 | <para>That still looks complicated? Well, fair enough. The | ||
1364 | usual sequence of calls for compressing a load of data is:</para> | ||
1365 | |||
1366 | <orderedlist> | ||
1367 | |||
1368 | <listitem><para>Get started with | ||
1369 | <computeroutput>BZ2_bzCompressInit</computeroutput>.</para></listitem> | ||
1370 | |||
1371 | <listitem><para>Shovel data in and shlurp out its compressed form | ||
1372 | using zero or more calls of | ||
1373 | <computeroutput>BZ2_bzCompress</computeroutput> with action = | ||
1374 | <computeroutput>BZ_RUN</computeroutput>.</para></listitem> | ||
1375 | |||
1376 | <listitem><para>Finish up. Repeatedly call | ||
1377 | <computeroutput>BZ2_bzCompress</computeroutput> with action = | ||
1378 | <computeroutput>BZ_FINISH</computeroutput>, copying out the | ||
1379 | compressed output, until | ||
1380 | <computeroutput>BZ_STREAM_END</computeroutput> is | ||
1381 | returned.</para></listitem> <listitem><para>Close up and go home. Call | ||
1382 | <computeroutput>BZ2_bzCompressEnd</computeroutput>.</para></listitem> | ||
1383 | |||
1384 | </orderedlist> | ||
1385 | |||
1386 | <para>If the data you want to compress fits into your input | ||
1387 | buffer all at once, you can skip the calls of | ||
1388 | <computeroutput>BZ2_bzCompress ( ..., BZ_RUN )</computeroutput> | ||
1389 | and just do the <computeroutput>BZ2_bzCompress ( ..., BZ_FINISH | ||
1390 | )</computeroutput> calls.</para> | ||
1391 | |||
1392 | <para>All required memory is allocated by | ||
1393 | <computeroutput>BZ2_bzCompressInit</computeroutput>. The | ||
1394 | compression library can accept any data at all (obviously). So | ||
1395 | you shouldn't get any error return values from the | ||
1396 | <computeroutput>BZ2_bzCompress</computeroutput> calls. If you | ||
1397 | do, they will be | ||
1398 | <computeroutput>BZ_SEQUENCE_ERROR</computeroutput>, and indicate | ||
1399 | a bug in your programming.</para> | ||
1400 | |||
1401 | <para>Trivial other possible return values:</para> | ||
1402 | |||
1403 | <programlisting> | ||
1404 | BZ_PARAM_ERROR | ||
1405 | if strm is NULL, or strm->s is NULL | ||
1406 | </programlisting> | ||
1407 | |||
1408 | </sect2> | ||
1409 | |||
1410 | |||
1411 | <sect2 id="bzCompress-end" xreflabel="BZ2_bzCompressEnd"> | ||
1412 | <title><computeroutput>BZ2_bzCompressEnd</computeroutput></title> | ||
1413 | |||
1414 | <programlisting> | ||
1415 | int BZ2_bzCompressEnd ( bz_stream *strm ); | ||
1416 | </programlisting> | ||
1417 | |||
1418 | <para>Releases all memory associated with a compression | ||
1419 | stream.</para> | ||
1420 | |||
1421 | <para>Possible return values:</para> | ||
1422 | |||
1423 | <programlisting> | ||
1424 | BZ_PARAM_ERROR if strm is NULL or strm->s is NULL | ||
1425 | BZ_OK otherwise | ||
1426 | </programlisting> | ||
1427 | |||
1428 | </sect2> | ||
1429 | |||
1430 | |||
1431 | <sect2 id="bzDecompress-init" xreflabel="BZ2_bzDecompressInit"> | ||
1432 | <title><computeroutput>BZ2_bzDecompressInit</computeroutput></title> | ||
1433 | |||
1434 | <programlisting> | ||
1435 | int BZ2_bzDecompressInit ( bz_stream *strm, int verbosity, int small ); | ||
1436 | </programlisting> | ||
1437 | |||
1438 | <para>Prepares for decompression. As with | ||
1439 | <computeroutput>BZ2_bzCompressInit</computeroutput>, a | ||
1440 | <computeroutput>bz_stream</computeroutput> record should be | ||
1441 | allocated and initialised before the call. Fields | ||
1442 | <computeroutput>bzalloc</computeroutput>, | ||
1443 | <computeroutput>bzfree</computeroutput> and | ||
1444 | <computeroutput>opaque</computeroutput> should be set if a custom | ||
1445 | memory allocator is required, or made | ||
1446 | <computeroutput>NULL</computeroutput> for the normal | ||
1447 | <computeroutput>malloc</computeroutput> / | ||
1448 | <computeroutput>free</computeroutput> routines. Upon return, the | ||
1449 | internal state will have been initialised, and | ||
1450 | <computeroutput>total_in</computeroutput> and | ||
1451 | <computeroutput>total_out</computeroutput> will be zero.</para> | ||
1452 | |||
1453 | <para>For the meaning of parameter | ||
1454 | <computeroutput>verbosity</computeroutput>, see | ||
1455 | <computeroutput>BZ2_bzCompressInit</computeroutput>.</para> | ||
1456 | |||
1457 | <para>If <computeroutput>small</computeroutput> is nonzero, the | ||
1458 | library will use an alternative decompression algorithm which | ||
1459 | uses less memory but at the cost of decompressing more slowly | ||
1460 | (roughly speaking, half the speed, but the maximum memory | ||
1461 | requirement drops to around 2300k). See <xref linkend="using"/> | ||
1462 | for more information on memory management.</para> | ||
1463 | |||
1464 | <para>Note that the amount of memory needed to decompress a | ||
1465 | stream cannot be determined until the stream's header has been | ||
1466 | read, so even if | ||
1467 | <computeroutput>BZ2_bzDecompressInit</computeroutput> succeeds, a | ||
1468 | subsequent <computeroutput>BZ2_bzDecompress</computeroutput> | ||
1469 | could fail with | ||
1470 | <computeroutput>BZ_MEM_ERROR</computeroutput>.</para> | ||
1471 | |||
1472 | <para>Possible return values:</para> | ||
1473 | |||
1474 | <programlisting> | ||
1475 | BZ_CONFIG_ERROR | ||
1476 | if the library has been mis-compiled | ||
1477 | BZ_PARAM_ERROR | ||
1478 | if ( small != 0 && small != 1 ) | ||
1479 | or (verbosity <; 0 || verbosity > 4) | ||
1480 | BZ_MEM_ERROR | ||
1481 | if insufficient memory is available | ||
1482 | </programlisting> | ||
1483 | |||
1484 | <para>Allowable next actions:</para> | ||
1485 | |||
1486 | <programlisting> | ||
1487 | BZ2_bzDecompress | ||
1488 | if BZ_OK was returned | ||
1489 | no specific action required in case of error | ||
1490 | </programlisting> | ||
1491 | |||
1492 | </sect2> | ||
1493 | |||
1494 | |||
1495 | <sect2 id="bzDecompress" xreflabel="BZ2_bzDecompress"> | ||
1496 | <title><computeroutput>BZ2_bzDecompress</computeroutput></title> | ||
1497 | |||
1498 | <programlisting> | ||
1499 | int BZ2_bzDecompress ( bz_stream *strm ); | ||
1500 | </programlisting> | ||
1501 | |||
1502 | <para>Provides more input and/out output buffer space for the | ||
1503 | library. The caller maintains input and output buffers, and uses | ||
1504 | <computeroutput>BZ2_bzDecompress</computeroutput> to transfer | ||
1505 | data between them.</para> | ||
1506 | |||
1507 | <para>Before each call to | ||
1508 | <computeroutput>BZ2_bzDecompress</computeroutput>, | ||
1509 | <computeroutput>next_in</computeroutput> should point at the | ||
1510 | compressed data, and <computeroutput>avail_in</computeroutput> | ||
1511 | should indicate how many bytes the library may read. | ||
1512 | <computeroutput>BZ2_bzDecompress</computeroutput> updates | ||
1513 | <computeroutput>next_in</computeroutput>, | ||
1514 | <computeroutput>avail_in</computeroutput> and | ||
1515 | <computeroutput>total_in</computeroutput> to reflect the number | ||
1516 | of bytes it has read.</para> | ||
1517 | |||
1518 | <para>Similarly, <computeroutput>next_out</computeroutput> should | ||
1519 | point to a buffer in which the uncompressed output is to be | ||
1520 | placed, with <computeroutput>avail_out</computeroutput> | ||
1521 | indicating how much output space is available. | ||
1522 | <computeroutput>BZ2_bzCompress</computeroutput> updates | ||
1523 | <computeroutput>next_out</computeroutput>, | ||
1524 | <computeroutput>avail_out</computeroutput> and | ||
1525 | <computeroutput>total_out</computeroutput> to reflect the number | ||
1526 | of bytes output.</para> | ||
1527 | |||
1528 | <para>You may provide and remove as little or as much data as you | ||
1529 | like on each call of | ||
1530 | <computeroutput>BZ2_bzDecompress</computeroutput>. In the limit, | ||
1531 | it is acceptable to supply and remove data one byte at a time, | ||
1532 | although this would be terribly inefficient. You should always | ||
1533 | ensure that at least one byte of output space is available at | ||
1534 | each call.</para> | ||
1535 | |||
1536 | <para>Use of <computeroutput>BZ2_bzDecompress</computeroutput> is | ||
1537 | simpler than | ||
1538 | <computeroutput>BZ2_bzCompress</computeroutput>.</para> | ||
1539 | |||
1540 | <para>You should provide input and remove output as described | ||
1541 | above, and repeatedly call | ||
1542 | <computeroutput>BZ2_bzDecompress</computeroutput> until | ||
1543 | <computeroutput>BZ_STREAM_END</computeroutput> is returned. | ||
1544 | Appearance of <computeroutput>BZ_STREAM_END</computeroutput> | ||
1545 | denotes that <computeroutput>BZ2_bzDecompress</computeroutput> | ||
1546 | has detected the logical end of the compressed stream. | ||
1547 | <computeroutput>BZ2_bzDecompress</computeroutput> will not | ||
1548 | produce <computeroutput>BZ_STREAM_END</computeroutput> until all | ||
1549 | output data has been placed into the output buffer, so once | ||
1550 | <computeroutput>BZ_STREAM_END</computeroutput> appears, you are | ||
1551 | guaranteed to have available all the decompressed output, and | ||
1552 | <computeroutput>BZ2_bzDecompressEnd</computeroutput> can safely | ||
1553 | be called.</para> | ||
1554 | |||
1555 | <para>If case of an error return value, you should call | ||
1556 | <computeroutput>BZ2_bzDecompressEnd</computeroutput> to clean up | ||
1557 | and release memory.</para> | ||
1558 | |||
1559 | <para>Possible return values:</para> | ||
1560 | |||
1561 | <programlisting> | ||
1562 | BZ_PARAM_ERROR | ||
1563 | if strm is NULL or strm->s is NULL | ||
1564 | or strm->avail_out < 1 | ||
1565 | BZ_DATA_ERROR | ||
1566 | if a data integrity error is detected in the compressed stream | ||
1567 | BZ_DATA_ERROR_MAGIC | ||
1568 | if the compressed stream doesn't begin with the right magic bytes | ||
1569 | BZ_MEM_ERROR | ||
1570 | if there wasn't enough memory available | ||
1571 | BZ_STREAM_END | ||
1572 | if the logical end of the data stream was detected and all | ||
1573 | output in has been consumed, eg s-->avail_out > 0 | ||
1574 | BZ_OK | ||
1575 | otherwise | ||
1576 | </programlisting> | ||
1577 | |||
1578 | <para>Allowable next actions:</para> | ||
1579 | |||
1580 | <programlisting> | ||
1581 | BZ2_bzDecompress | ||
1582 | if BZ_OK was returned | ||
1583 | BZ2_bzDecompressEnd | ||
1584 | otherwise | ||
1585 | </programlisting> | ||
1586 | |||
1587 | </sect2> | ||
1588 | |||
1589 | |||
1590 | <sect2 id="bzDecompress-end" xreflabel="BZ2_bzDecompressEnd"> | ||
1591 | <title><computeroutput>BZ2_bzDecompressEnd</computeroutput></title> | ||
1592 | |||
1593 | <programlisting> | ||
1594 | int BZ2_bzDecompressEnd ( bz_stream *strm ); | ||
1595 | </programlisting> | ||
1596 | |||
1597 | <para>Releases all memory associated with a decompression | ||
1598 | stream.</para> | ||
1599 | |||
1600 | <para>Possible return values:</para> | ||
1601 | |||
1602 | <programlisting> | ||
1603 | BZ_PARAM_ERROR | ||
1604 | if strm is NULL or strm->s is NULL | ||
1605 | BZ_OK | ||
1606 | otherwise | ||
1607 | </programlisting> | ||
1608 | |||
1609 | <para>Allowable next actions:</para> | ||
1610 | |||
1611 | <programlisting> | ||
1612 | None. | ||
1613 | </programlisting> | ||
1614 | |||
1615 | </sect2> | ||
1616 | |||
1617 | </sect1> | ||
1618 | |||
1619 | |||
1620 | <sect1 id="hl-interface" xreflabel="High-level interface"> | ||
1621 | <title>High-level interface</title> | ||
1622 | |||
1623 | <para>This interface provides functions for reading and writing | ||
1624 | <computeroutput>bzip2</computeroutput> format files. First, some | ||
1625 | general points.</para> | ||
1626 | |||
1627 | <itemizedlist mark='bullet'> | ||
1628 | |||
1629 | <listitem><para>All of the functions take an | ||
1630 | <computeroutput>int*</computeroutput> first argument, | ||
1631 | <computeroutput>bzerror</computeroutput>. After each call, | ||
1632 | <computeroutput>bzerror</computeroutput> should be consulted | ||
1633 | first to determine the outcome of the call. If | ||
1634 | <computeroutput>bzerror</computeroutput> is | ||
1635 | <computeroutput>BZ_OK</computeroutput>, the call completed | ||
1636 | successfully, and only then should the return value of the | ||
1637 | function (if any) be consulted. If | ||
1638 | <computeroutput>bzerror</computeroutput> is | ||
1639 | <computeroutput>BZ_IO_ERROR</computeroutput>, there was an | ||
1640 | error reading/writing the underlying compressed file, and you | ||
1641 | should then consult <computeroutput>errno</computeroutput> / | ||
1642 | <computeroutput>perror</computeroutput> to determine the cause | ||
1643 | of the difficulty. <computeroutput>bzerror</computeroutput> | ||
1644 | may also be set to various other values; precise details are | ||
1645 | given on a per-function basis below.</para></listitem> | ||
1646 | |||
1647 | <listitem><para>If <computeroutput>bzerror</computeroutput> indicates | ||
1648 | an error (ie, anything except | ||
1649 | <computeroutput>BZ_OK</computeroutput> and | ||
1650 | <computeroutput>BZ_STREAM_END</computeroutput>), you should | ||
1651 | immediately call | ||
1652 | <computeroutput>BZ2_bzReadClose</computeroutput> (or | ||
1653 | <computeroutput>BZ2_bzWriteClose</computeroutput>, depending on | ||
1654 | whether you are attempting to read or to write) to free up all | ||
1655 | resources associated with the stream. Once an error has been | ||
1656 | indicated, behaviour of all calls except | ||
1657 | <computeroutput>BZ2_bzReadClose</computeroutput> | ||
1658 | (<computeroutput>BZ2_bzWriteClose</computeroutput>) is | ||
1659 | undefined. The implication is that (1) | ||
1660 | <computeroutput>bzerror</computeroutput> should be checked | ||
1661 | after each call, and (2) if | ||
1662 | <computeroutput>bzerror</computeroutput> indicates an error, | ||
1663 | <computeroutput>BZ2_bzReadClose</computeroutput> | ||
1664 | (<computeroutput>BZ2_bzWriteClose</computeroutput>) should then | ||
1665 | be called to clean up.</para></listitem> | ||
1666 | |||
1667 | <listitem><para>The <computeroutput>FILE*</computeroutput> arguments | ||
1668 | passed to <computeroutput>BZ2_bzReadOpen</computeroutput> / | ||
1669 | <computeroutput>BZ2_bzWriteOpen</computeroutput> should be set | ||
1670 | to binary mode. Most Unix systems will do this by default, but | ||
1671 | other platforms, including Windows and Mac, will not. If you | ||
1672 | omit this, you may encounter problems when moving code to new | ||
1673 | platforms.</para></listitem> | ||
1674 | |||
1675 | <listitem><para>Memory allocation requests are handled by | ||
1676 | <computeroutput>malloc</computeroutput> / | ||
1677 | <computeroutput>free</computeroutput>. At present there is no | ||
1678 | facility for user-defined memory allocators in the file I/O | ||
1679 | functions (could easily be added, though).</para></listitem> | ||
1680 | |||
1681 | </itemizedlist> | ||
1682 | |||
1683 | |||
1684 | |||
1685 | <sect2 id="bzreadopen" xreflabel="BZ2_bzReadOpen"> | ||
1686 | <title><computeroutput>BZ2_bzReadOpen</computeroutput></title> | ||
1687 | |||
1688 | <programlisting> | ||
1689 | typedef void BZFILE; | ||
1690 | |||
1691 | BZFILE *BZ2_bzReadOpen( int *bzerror, FILE *f, | ||
1692 | int verbosity, int small, | ||
1693 | void *unused, int nUnused ); | ||
1694 | </programlisting> | ||
1695 | |||
1696 | <para>Prepare to read compressed data from file handle | ||
1697 | <computeroutput>f</computeroutput>. | ||
1698 | <computeroutput>f</computeroutput> should refer to a file which | ||
1699 | has been opened for reading, and for which the error indicator | ||
1700 | (<computeroutput>ferror(f)</computeroutput>)is not set. If | ||
1701 | <computeroutput>small</computeroutput> is 1, the library will try | ||
1702 | to decompress using less memory, at the expense of speed.</para> | ||
1703 | |||
1704 | <para>For reasons explained below, | ||
1705 | <computeroutput>BZ2_bzRead</computeroutput> will decompress the | ||
1706 | <computeroutput>nUnused</computeroutput> bytes starting at | ||
1707 | <computeroutput>unused</computeroutput>, before starting to read | ||
1708 | from the file <computeroutput>f</computeroutput>. At most | ||
1709 | <computeroutput>BZ_MAX_UNUSED</computeroutput> bytes may be | ||
1710 | supplied like this. If this facility is not required, you should | ||
1711 | pass <computeroutput>NULL</computeroutput> and | ||
1712 | <computeroutput>0</computeroutput> for | ||
1713 | <computeroutput>unused</computeroutput> and | ||
1714 | n<computeroutput>Unused</computeroutput> respectively.</para> | ||
1715 | |||
1716 | <para>For the meaning of parameters | ||
1717 | <computeroutput>small</computeroutput> and | ||
1718 | <computeroutput>verbosity</computeroutput>, see | ||
1719 | <computeroutput>BZ2_bzDecompressInit</computeroutput>.</para> | ||
1720 | |||
1721 | <para>The amount of memory needed to decompress a file cannot be | ||
1722 | determined until the file's header has been read. So it is | ||
1723 | possible that <computeroutput>BZ2_bzReadOpen</computeroutput> | ||
1724 | returns <computeroutput>BZ_OK</computeroutput> but a subsequent | ||
1725 | call of <computeroutput>BZ2_bzRead</computeroutput> will return | ||
1726 | <computeroutput>BZ_MEM_ERROR</computeroutput>.</para> | ||
1727 | |||
1728 | <para>Possible assignments to | ||
1729 | <computeroutput>bzerror</computeroutput>:</para> | ||
1730 | |||
1731 | <programlisting> | ||
1732 | BZ_CONFIG_ERROR | ||
1733 | if the library has been mis-compiled | ||
1734 | BZ_PARAM_ERROR | ||
1735 | if f is NULL | ||
1736 | or small is neither 0 nor 1 | ||
1737 | or ( unused == NULL && nUnused != 0 ) | ||
1738 | or ( unused != NULL && !(0 <= nUnused <= BZ_MAX_UNUSED) ) | ||
1739 | BZ_IO_ERROR | ||
1740 | if ferror(f) is nonzero | ||
1741 | BZ_MEM_ERROR | ||
1742 | if insufficient memory is available | ||
1743 | BZ_OK | ||
1744 | otherwise. | ||
1745 | </programlisting> | ||
1746 | |||
1747 | <para>Possible return values:</para> | ||
1748 | |||
1749 | <programlisting> | ||
1750 | Pointer to an abstract BZFILE | ||
1751 | if bzerror is BZ_OK | ||
1752 | NULL | ||
1753 | otherwise | ||
1754 | </programlisting> | ||
1755 | |||
1756 | <para>Allowable next actions:</para> | ||
1757 | |||
1758 | <programlisting> | ||
1759 | BZ2_bzRead | ||
1760 | if bzerror is BZ_OK | ||
1761 | BZ2_bzClose | ||
1762 | otherwise | ||
1763 | </programlisting> | ||
1764 | |||
1765 | </sect2> | ||
1766 | |||
1767 | |||
1768 | <sect2 id="bzread" xreflabel="BZ2_bzRead"> | ||
1769 | <title><computeroutput>BZ2_bzRead</computeroutput></title> | ||
1770 | |||
1771 | <programlisting> | ||
1772 | int BZ2_bzRead ( int *bzerror, BZFILE *b, void *buf, int len ); | ||
1773 | </programlisting> | ||
1774 | |||
1775 | <para>Reads up to <computeroutput>len</computeroutput> | ||
1776 | (uncompressed) bytes from the compressed file | ||
1777 | <computeroutput>b</computeroutput> into the buffer | ||
1778 | <computeroutput>buf</computeroutput>. If the read was | ||
1779 | successful, <computeroutput>bzerror</computeroutput> is set to | ||
1780 | <computeroutput>BZ_OK</computeroutput> and the number of bytes | ||
1781 | read is returned. If the logical end-of-stream was detected, | ||
1782 | <computeroutput>bzerror</computeroutput> will be set to | ||
1783 | <computeroutput>BZ_STREAM_END</computeroutput>, and the number of | ||
1784 | bytes read is returned. All other | ||
1785 | <computeroutput>bzerror</computeroutput> values denote an | ||
1786 | error.</para> | ||
1787 | |||
1788 | <para><computeroutput>BZ2_bzRead</computeroutput> will supply | ||
1789 | <computeroutput>len</computeroutput> bytes, unless the logical | ||
1790 | stream end is detected or an error occurs. Because of this, it | ||
1791 | is possible to detect the stream end by observing when the number | ||
1792 | of bytes returned is less than the number requested. | ||
1793 | Nevertheless, this is regarded as inadvisable; you should instead | ||
1794 | check <computeroutput>bzerror</computeroutput> after every call | ||
1795 | and watch out for | ||
1796 | <computeroutput>BZ_STREAM_END</computeroutput>.</para> | ||
1797 | |||
1798 | <para>Internally, <computeroutput>BZ2_bzRead</computeroutput> | ||
1799 | copies data from the compressed file in chunks of size | ||
1800 | <computeroutput>BZ_MAX_UNUSED</computeroutput> bytes before | ||
1801 | decompressing it. If the file contains more bytes than strictly | ||
1802 | needed to reach the logical end-of-stream, | ||
1803 | <computeroutput>BZ2_bzRead</computeroutput> will almost certainly | ||
1804 | read some of the trailing data before signalling | ||
1805 | <computeroutput>BZ_SEQUENCE_END</computeroutput>. To collect the | ||
1806 | read but unused data once | ||
1807 | <computeroutput>BZ_SEQUENCE_END</computeroutput> has appeared, | ||
1808 | call <computeroutput>BZ2_bzReadGetUnused</computeroutput> | ||
1809 | immediately before | ||
1810 | <computeroutput>BZ2_bzReadClose</computeroutput>.</para> | ||
1811 | |||
1812 | <para>Possible assignments to | ||
1813 | <computeroutput>bzerror</computeroutput>:</para> | ||
1814 | |||
1815 | <programlisting> | ||
1816 | BZ_PARAM_ERROR | ||
1817 | if b is NULL or buf is NULL or len < 0 | ||
1818 | BZ_SEQUENCE_ERROR | ||
1819 | if b was opened with BZ2_bzWriteOpen | ||
1820 | BZ_IO_ERROR | ||
1821 | if there is an error reading from the compressed file | ||
1822 | BZ_UNEXPECTED_EOF | ||
1823 | if the compressed file ended before | ||
1824 | the logical end-of-stream was detected | ||
1825 | BZ_DATA_ERROR | ||
1826 | if a data integrity error was detected in the compressed stream | ||
1827 | BZ_DATA_ERROR_MAGIC | ||
1828 | if the stream does not begin with the requisite header bytes | ||
1829 | (ie, is not a bzip2 data file). This is really | ||
1830 | a special case of BZ_DATA_ERROR. | ||
1831 | BZ_MEM_ERROR | ||
1832 | if insufficient memory was available | ||
1833 | BZ_STREAM_END | ||
1834 | if the logical end of stream was detected. | ||
1835 | BZ_OK | ||
1836 | otherwise. | ||
1837 | </programlisting> | ||
1838 | |||
1839 | <para>Possible return values:</para> | ||
1840 | |||
1841 | <programlisting> | ||
1842 | number of bytes read | ||
1843 | if bzerror is BZ_OK or BZ_STREAM_END | ||
1844 | undefined | ||
1845 | otherwise | ||
1846 | </programlisting> | ||
1847 | |||
1848 | <para>Allowable next actions:</para> | ||
1849 | |||
1850 | <programlisting> | ||
1851 | collect data from buf, then BZ2_bzRead or BZ2_bzReadClose | ||
1852 | if bzerror is BZ_OK | ||
1853 | collect data from buf, then BZ2_bzReadClose or BZ2_bzReadGetUnused | ||
1854 | if bzerror is BZ_SEQUENCE_END | ||
1855 | BZ2_bzReadClose | ||
1856 | otherwise | ||
1857 | </programlisting> | ||
1858 | |||
1859 | </sect2> | ||
1860 | |||
1861 | |||
1862 | <sect2 id="bzreadgetunused" xreflabel="BZ2_bzReadGetUnused"> | ||
1863 | <title><computeroutput>BZ2_bzReadGetUnused</computeroutput></title> | ||
1864 | |||
1865 | <programlisting> | ||
1866 | void BZ2_bzReadGetUnused( int* bzerror, BZFILE *b, | ||
1867 | void** unused, int* nUnused ); | ||
1868 | </programlisting> | ||
1869 | |||
1870 | <para>Returns data which was read from the compressed file but | ||
1871 | was not needed to get to the logical end-of-stream. | ||
1872 | <computeroutput>*unused</computeroutput> is set to the address of | ||
1873 | the data, and <computeroutput>*nUnused</computeroutput> to the | ||
1874 | number of bytes. <computeroutput>*nUnused</computeroutput> will | ||
1875 | be set to a value between <computeroutput>0</computeroutput> and | ||
1876 | <computeroutput>BZ_MAX_UNUSED</computeroutput> inclusive.</para> | ||
1877 | |||
1878 | <para>This function may only be called once | ||
1879 | <computeroutput>BZ2_bzRead</computeroutput> has signalled | ||
1880 | <computeroutput>BZ_STREAM_END</computeroutput> but before | ||
1881 | <computeroutput>BZ2_bzReadClose</computeroutput>.</para> | ||
1882 | |||
1883 | <para>Possible assignments to | ||
1884 | <computeroutput>bzerror</computeroutput>:</para> | ||
1885 | |||
1886 | <programlisting> | ||
1887 | BZ_PARAM_ERROR | ||
1888 | if b is NULL | ||
1889 | or unused is NULL or nUnused is NULL | ||
1890 | BZ_SEQUENCE_ERROR | ||
1891 | if BZ_STREAM_END has not been signalled | ||
1892 | or if b was opened with BZ2_bzWriteOpen | ||
1893 | BZ_OK | ||
1894 | otherwise | ||
1895 | </programlisting> | ||
1896 | |||
1897 | <para>Allowable next actions:</para> | ||
1898 | |||
1899 | <programlisting> | ||
1900 | BZ2_bzReadClose | ||
1901 | </programlisting> | ||
1902 | |||
1903 | </sect2> | ||
1904 | |||
1905 | |||
1906 | <sect2 id="bzreadclose" xreflabel="BZ2_bzReadClose"> | ||
1907 | <title><computeroutput>BZ2_bzReadClose</computeroutput></title> | ||
1908 | |||
1909 | <programlisting> | ||
1910 | void BZ2_bzReadClose ( int *bzerror, BZFILE *b ); | ||
1911 | </programlisting> | ||
1912 | |||
1913 | <para>Releases all memory pertaining to the compressed file | ||
1914 | <computeroutput>b</computeroutput>. | ||
1915 | <computeroutput>BZ2_bzReadClose</computeroutput> does not call | ||
1916 | <computeroutput>fclose</computeroutput> on the underlying file | ||
1917 | handle, so you should do that yourself if appropriate. | ||
1918 | <computeroutput>BZ2_bzReadClose</computeroutput> should be called | ||
1919 | to clean up after all error situations.</para> | ||
1920 | |||
1921 | <para>Possible assignments to | ||
1922 | <computeroutput>bzerror</computeroutput>:</para> | ||
1923 | |||
1924 | <programlisting> | ||
1925 | BZ_SEQUENCE_ERROR | ||
1926 | if b was opened with BZ2_bzOpenWrite | ||
1927 | BZ_OK | ||
1928 | otherwise | ||
1929 | </programlisting> | ||
1930 | |||
1931 | <para>Allowable next actions:</para> | ||
1932 | |||
1933 | <programlisting> | ||
1934 | none | ||
1935 | </programlisting> | ||
1936 | |||
1937 | </sect2> | ||
1938 | |||
1939 | |||
1940 | <sect2 id="bzwriteopen" xreflabel="BZ2_bzWriteOpen"> | ||
1941 | <title><computeroutput>BZ2_bzWriteOpen</computeroutput></title> | ||
1942 | |||
1943 | <programlisting> | ||
1944 | BZFILE *BZ2_bzWriteOpen( int *bzerror, FILE *f, | ||
1945 | int blockSize100k, int verbosity, | ||
1946 | int workFactor ); | ||
1947 | </programlisting> | ||
1948 | |||
1949 | <para>Prepare to write compressed data to file handle | ||
1950 | <computeroutput>f</computeroutput>. | ||
1951 | <computeroutput>f</computeroutput> should refer to a file which | ||
1952 | has been opened for writing, and for which the error indicator | ||
1953 | (<computeroutput>ferror(f)</computeroutput>)is not set.</para> | ||
1954 | |||
1955 | <para>For the meaning of parameters | ||
1956 | <computeroutput>blockSize100k</computeroutput>, | ||
1957 | <computeroutput>verbosity</computeroutput> and | ||
1958 | <computeroutput>workFactor</computeroutput>, see | ||
1959 | <computeroutput>BZ2_bzCompressInit</computeroutput>.</para> | ||
1960 | |||
1961 | <para>All required memory is allocated at this stage, so if the | ||
1962 | call completes successfully, | ||
1963 | <computeroutput>BZ_MEM_ERROR</computeroutput> cannot be signalled | ||
1964 | by a subsequent call to | ||
1965 | <computeroutput>BZ2_bzWrite</computeroutput>.</para> | ||
1966 | |||
1967 | <para>Possible assignments to | ||
1968 | <computeroutput>bzerror</computeroutput>:</para> | ||
1969 | |||
1970 | <programlisting> | ||
1971 | BZ_CONFIG_ERROR | ||
1972 | if the library has been mis-compiled | ||
1973 | BZ_PARAM_ERROR | ||
1974 | if f is NULL | ||
1975 | or blockSize100k < 1 or blockSize100k > 9 | ||
1976 | BZ_IO_ERROR | ||
1977 | if ferror(f) is nonzero | ||
1978 | BZ_MEM_ERROR | ||
1979 | if insufficient memory is available | ||
1980 | BZ_OK | ||
1981 | otherwise | ||
1982 | </programlisting> | ||
1983 | |||
1984 | <para>Possible return values:</para> | ||
1985 | |||
1986 | <programlisting> | ||
1987 | Pointer to an abstract BZFILE | ||
1988 | if bzerror is BZ_OK | ||
1989 | NULL | ||
1990 | otherwise | ||
1991 | </programlisting> | ||
1992 | |||
1993 | <para>Allowable next actions:</para> | ||
1994 | |||
1995 | <programlisting> | ||
1996 | BZ2_bzWrite | ||
1997 | if bzerror is BZ_OK | ||
1998 | (you could go directly to BZ2_bzWriteClose, but this would be pretty pointless) | ||
1999 | BZ2_bzWriteClose | ||
2000 | otherwise | ||
2001 | </programlisting> | ||
2002 | |||
2003 | </sect2> | ||
2004 | |||
2005 | |||
2006 | <sect2 id="bzwrite" xreflabel="BZ2_bzWrite"> | ||
2007 | <title><computeroutput>BZ2_bzWrite</computeroutput></title> | ||
2008 | |||
2009 | <programlisting> | ||
2010 | void BZ2_bzWrite ( int *bzerror, BZFILE *b, void *buf, int len ); | ||
2011 | </programlisting> | ||
2012 | |||
2013 | <para>Absorbs <computeroutput>len</computeroutput> bytes from the | ||
2014 | buffer <computeroutput>buf</computeroutput>, eventually to be | ||
2015 | compressed and written to the file.</para> | ||
2016 | |||
2017 | <para>Possible assignments to | ||
2018 | <computeroutput>bzerror</computeroutput>:</para> | ||
2019 | |||
2020 | <programlisting> | ||
2021 | BZ_PARAM_ERROR | ||
2022 | if b is NULL or buf is NULL or len < 0 | ||
2023 | BZ_SEQUENCE_ERROR | ||
2024 | if b was opened with BZ2_bzReadOpen | ||
2025 | BZ_IO_ERROR | ||
2026 | if there is an error writing the compressed file. | ||
2027 | BZ_OK | ||
2028 | otherwise | ||
2029 | </programlisting> | ||
2030 | |||
2031 | </sect2> | ||
2032 | |||
2033 | |||
2034 | <sect2 id="bzwriteclose" xreflabel="BZ2_bzWriteClose"> | ||
2035 | <title><computeroutput>BZ2_bzWriteClose</computeroutput></title> | ||
2036 | |||
2037 | <programlisting> | ||
2038 | void BZ2_bzWriteClose( int *bzerror, BZFILE* f, | ||
2039 | int abandon, | ||
2040 | unsigned int* nbytes_in, | ||
2041 | unsigned int* nbytes_out ); | ||
2042 | |||
2043 | void BZ2_bzWriteClose64( int *bzerror, BZFILE* f, | ||
2044 | int abandon, | ||
2045 | unsigned int* nbytes_in_lo32, | ||
2046 | unsigned int* nbytes_in_hi32, | ||
2047 | unsigned int* nbytes_out_lo32, | ||
2048 | unsigned int* nbytes_out_hi32 ); | ||
2049 | </programlisting> | ||
2050 | |||
2051 | <para>Compresses and flushes to the compressed file all data so | ||
2052 | far supplied by <computeroutput>BZ2_bzWrite</computeroutput>. | ||
2053 | The logical end-of-stream markers are also written, so subsequent | ||
2054 | calls to <computeroutput>BZ2_bzWrite</computeroutput> are | ||
2055 | illegal. All memory associated with the compressed file | ||
2056 | <computeroutput>b</computeroutput> is released. | ||
2057 | <computeroutput>fflush</computeroutput> is called on the | ||
2058 | compressed file, but it is not | ||
2059 | <computeroutput>fclose</computeroutput>'d.</para> | ||
2060 | |||
2061 | <para>If <computeroutput>BZ2_bzWriteClose</computeroutput> is | ||
2062 | called to clean up after an error, the only action is to release | ||
2063 | the memory. The library records the error codes issued by | ||
2064 | previous calls, so this situation will be detected automatically. | ||
2065 | There is no attempt to complete the compression operation, nor to | ||
2066 | <computeroutput>fflush</computeroutput> the compressed file. You | ||
2067 | can force this behaviour to happen even in the case of no error, | ||
2068 | by passing a nonzero value to | ||
2069 | <computeroutput>abandon</computeroutput>.</para> | ||
2070 | |||
2071 | <para>If <computeroutput>nbytes_in</computeroutput> is non-null, | ||
2072 | <computeroutput>*nbytes_in</computeroutput> will be set to be the | ||
2073 | total volume of uncompressed data handled. Similarly, | ||
2074 | <computeroutput>nbytes_out</computeroutput> will be set to the | ||
2075 | total volume of compressed data written. For compatibility with | ||
2076 | older versions of the library, | ||
2077 | <computeroutput>BZ2_bzWriteClose</computeroutput> only yields the | ||
2078 | lower 32 bits of these counts. Use | ||
2079 | <computeroutput>BZ2_bzWriteClose64</computeroutput> if you want | ||
2080 | the full 64 bit counts. These two functions are otherwise | ||
2081 | absolutely identical.</para> | ||
2082 | |||
2083 | <para>Possible assignments to | ||
2084 | <computeroutput>bzerror</computeroutput>:</para> | ||
2085 | |||
2086 | <programlisting> | ||
2087 | BZ_SEQUENCE_ERROR | ||
2088 | if b was opened with BZ2_bzReadOpen | ||
2089 | BZ_IO_ERROR | ||
2090 | if there is an error writing the compressed file | ||
2091 | BZ_OK | ||
2092 | otherwise | ||
2093 | </programlisting> | ||
2094 | |||
2095 | </sect2> | ||
2096 | |||
2097 | |||
2098 | <sect2 id="embed" xreflabel="Handling embedded compressed data streams"> | ||
2099 | <title>Handling embedded compressed data streams</title> | ||
2100 | |||
2101 | <para>The high-level library facilitates use of | ||
2102 | <computeroutput>bzip2</computeroutput> data streams which form | ||
2103 | some part of a surrounding, larger data stream.</para> | ||
2104 | |||
2105 | <itemizedlist mark='bullet'> | ||
2106 | |||
2107 | <listitem><para>For writing, the library takes an open file handle, | ||
2108 | writes compressed data to it, | ||
2109 | <computeroutput>fflush</computeroutput>es it but does not | ||
2110 | <computeroutput>fclose</computeroutput> it. The calling | ||
2111 | application can write its own data before and after the | ||
2112 | compressed data stream, using that same file handle.</para></listitem> | ||
2113 | |||
2114 | <listitem><para>Reading is more complex, and the facilities are not as | ||
2115 | general as they could be since generality is hard to reconcile | ||
2116 | with efficiency. <computeroutput>BZ2_bzRead</computeroutput> | ||
2117 | reads from the compressed file in blocks of size | ||
2118 | <computeroutput>BZ_MAX_UNUSED</computeroutput> bytes, and in | ||
2119 | doing so probably will overshoot the logical end of compressed | ||
2120 | stream. To recover this data once decompression has ended, | ||
2121 | call <computeroutput>BZ2_bzReadGetUnused</computeroutput> after | ||
2122 | the last call of <computeroutput>BZ2_bzRead</computeroutput> | ||
2123 | (the one returning | ||
2124 | <computeroutput>BZ_STREAM_END</computeroutput>) but before | ||
2125 | calling | ||
2126 | <computeroutput>BZ2_bzReadClose</computeroutput>.</para></listitem> | ||
2127 | |||
2128 | </itemizedlist> | ||
2129 | |||
2130 | <para>This mechanism makes it easy to decompress multiple | ||
2131 | <computeroutput>bzip2</computeroutput> streams placed end-to-end. | ||
2132 | As the end of one stream, when | ||
2133 | <computeroutput>BZ2_bzRead</computeroutput> returns | ||
2134 | <computeroutput>BZ_STREAM_END</computeroutput>, call | ||
2135 | <computeroutput>BZ2_bzReadGetUnused</computeroutput> to collect | ||
2136 | the unused data (copy it into your own buffer somewhere). That | ||
2137 | data forms the start of the next compressed stream. To start | ||
2138 | uncompressing that next stream, call | ||
2139 | <computeroutput>BZ2_bzReadOpen</computeroutput> again, feeding in | ||
2140 | the unused data via the <computeroutput>unused</computeroutput> / | ||
2141 | <computeroutput>nUnused</computeroutput> parameters. Keep doing | ||
2142 | this until <computeroutput>BZ_STREAM_END</computeroutput> return | ||
2143 | coincides with the physical end of file | ||
2144 | (<computeroutput>feof(f)</computeroutput>). In this situation | ||
2145 | <computeroutput>BZ2_bzReadGetUnused</computeroutput> will of | ||
2146 | course return no data.</para> | ||
2147 | |||
2148 | <para>This should give some feel for how the high-level interface | ||
2149 | can be used. If you require extra flexibility, you'll have to | ||
2150 | bite the bullet and get to grips with the low-level | ||
2151 | interface.</para> | ||
2152 | |||
2153 | </sect2> | ||
2154 | |||
2155 | |||
2156 | <sect2 id="std-rdwr" xreflabel="Standard file-reading/writing code"> | ||
2157 | <title>Standard file-reading/writing code</title> | ||
2158 | |||
2159 | <para>Here's how you'd write data to a compressed file:</para> | ||
2160 | |||
2161 | <programlisting> | ||
2162 | FILE* f; | ||
2163 | BZFILE* b; | ||
2164 | int nBuf; | ||
2165 | char buf[ /* whatever size you like */ ]; | ||
2166 | int bzerror; | ||
2167 | int nWritten; | ||
2168 | |||
2169 | f = fopen ( "myfile.bz2", "w" ); | ||
2170 | if ( !f ) { | ||
2171 | /* handle error */ | ||
2172 | } | ||
2173 | b = BZ2_bzWriteOpen( &bzerror, f, 9 ); | ||
2174 | if (bzerror != BZ_OK) { | ||
2175 | BZ2_bzWriteClose ( b ); | ||
2176 | /* handle error */ | ||
2177 | } | ||
2178 | |||
2179 | while ( /* condition */ ) { | ||
2180 | /* get data to write into buf, and set nBuf appropriately */ | ||
2181 | nWritten = BZ2_bzWrite ( &bzerror, b, buf, nBuf ); | ||
2182 | if (bzerror == BZ_IO_ERROR) { | ||
2183 | BZ2_bzWriteClose ( &bzerror, b ); | ||
2184 | /* handle error */ | ||
2185 | } | ||
2186 | } | ||
2187 | |||
2188 | BZ2_bzWriteClose( &bzerror, b ); | ||
2189 | if (bzerror == BZ_IO_ERROR) { | ||
2190 | /* handle error */ | ||
2191 | } | ||
2192 | </programlisting> | ||
2193 | |||
2194 | <para>And to read from a compressed file:</para> | ||
2195 | |||
2196 | <programlisting> | ||
2197 | FILE* f; | ||
2198 | BZFILE* b; | ||
2199 | int nBuf; | ||
2200 | char buf[ /* whatever size you like */ ]; | ||
2201 | int bzerror; | ||
2202 | int nWritten; | ||
2203 | |||
2204 | f = fopen ( "myfile.bz2", "r" ); | ||
2205 | if ( !f ) { | ||
2206 | /* handle error */ | ||
2207 | } | ||
2208 | b = BZ2_bzReadOpen ( &bzerror, f, 0, NULL, 0 ); | ||
2209 | if ( bzerror != BZ_OK ) { | ||
2210 | BZ2_bzReadClose ( &bzerror, b ); | ||
2211 | /* handle error */ | ||
2212 | } | ||
2213 | |||
2214 | bzerror = BZ_OK; | ||
2215 | while ( bzerror == BZ_OK && /* arbitrary other conditions */) { | ||
2216 | nBuf = BZ2_bzRead ( &bzerror, b, buf, /* size of buf */ ); | ||
2217 | if ( bzerror == BZ_OK ) { | ||
2218 | /* do something with buf[0 .. nBuf-1] */ | ||
2219 | } | ||
2220 | } | ||
2221 | if ( bzerror != BZ_STREAM_END ) { | ||
2222 | BZ2_bzReadClose ( &bzerror, b ); | ||
2223 | /* handle error */ | ||
2224 | } else { | ||
2225 | BZ2_bzReadClose ( &bzerror ); | ||
2226 | } | ||
2227 | </programlisting> | ||
2228 | |||
2229 | </sect2> | ||
2230 | |||
2231 | </sect1> | ||
2232 | |||
2233 | |||
2234 | <sect1 id="util-fns" xreflabel="Utility functions"> | ||
2235 | <title>Utility functions</title> | ||
2236 | |||
2237 | |||
2238 | <sect2 id="bzbufftobuffcompress" xreflabel="BZ2_bzBuffToBuffCompress"> | ||
2239 | <title><computeroutput>BZ2_bzBuffToBuffCompress</computeroutput></title> | ||
2240 | |||
2241 | <programlisting> | ||
2242 | int BZ2_bzBuffToBuffCompress( char* dest, | ||
2243 | unsigned int* destLen, | ||
2244 | char* source, | ||
2245 | unsigned int sourceLen, | ||
2246 | int blockSize100k, | ||
2247 | int verbosity, | ||
2248 | int workFactor ); | ||
2249 | </programlisting> | ||
2250 | |||
2251 | <para>Attempts to compress the data in <computeroutput>source[0 | ||
2252 | .. sourceLen-1]</computeroutput> into the destination buffer, | ||
2253 | <computeroutput>dest[0 .. *destLen-1]</computeroutput>. If the | ||
2254 | destination buffer is big enough, | ||
2255 | <computeroutput>*destLen</computeroutput> is set to the size of | ||
2256 | the compressed data, and <computeroutput>BZ_OK</computeroutput> | ||
2257 | is returned. If the compressed data won't fit, | ||
2258 | <computeroutput>*destLen</computeroutput> is unchanged, and | ||
2259 | <computeroutput>BZ_OUTBUFF_FULL</computeroutput> is | ||
2260 | returned.</para> | ||
2261 | |||
2262 | <para>Compression in this manner is a one-shot event, done with a | ||
2263 | single call to this function. The resulting compressed data is a | ||
2264 | complete <computeroutput>bzip2</computeroutput> format data | ||
2265 | stream. There is no mechanism for making additional calls to | ||
2266 | provide extra input data. If you want that kind of mechanism, | ||
2267 | use the low-level interface.</para> | ||
2268 | |||
2269 | <para>For the meaning of parameters | ||
2270 | <computeroutput>blockSize100k</computeroutput>, | ||
2271 | <computeroutput>verbosity</computeroutput> and | ||
2272 | <computeroutput>workFactor</computeroutput>, see | ||
2273 | <computeroutput>BZ2_bzCompressInit</computeroutput>.</para> | ||
2274 | |||
2275 | <para>To guarantee that the compressed data will fit in its | ||
2276 | buffer, allocate an output buffer of size 1% larger than the | ||
2277 | uncompressed data, plus six hundred extra bytes.</para> | ||
2278 | |||
2279 | <para><computeroutput>BZ2_bzBuffToBuffDecompress</computeroutput> | ||
2280 | will not write data at or beyond | ||
2281 | <computeroutput>dest[*destLen]</computeroutput>, even in case of | ||
2282 | buffer overflow.</para> | ||
2283 | |||
2284 | <para>Possible return values:</para> | ||
2285 | |||
2286 | <programlisting> | ||
2287 | BZ_CONFIG_ERROR | ||
2288 | if the library has been mis-compiled | ||
2289 | BZ_PARAM_ERROR | ||
2290 | if dest is NULL or destLen is NULL | ||
2291 | or blockSize100k < 1 or blockSize100k > 9 | ||
2292 | or verbosity < 0 or verbosity > 4 | ||
2293 | or workFactor < 0 or workFactor > 250 | ||
2294 | BZ_MEM_ERROR | ||
2295 | if insufficient memory is available | ||
2296 | BZ_OUTBUFF_FULL | ||
2297 | if the size of the compressed data exceeds *destLen | ||
2298 | BZ_OK | ||
2299 | otherwise | ||
2300 | </programlisting> | ||
2301 | |||
2302 | </sect2> | ||
2303 | |||
2304 | |||
2305 | <sect2 id="bzbufftobuffdecompress" xreflabel="BZ2_bzBuffToBuffDecompress"> | ||
2306 | <title><computeroutput>BZ2_bzBuffToBuffDecompress</computeroutput></title> | ||
2307 | |||
2308 | <programlisting> | ||
2309 | int BZ2_bzBuffToBuffDecompress( char* dest, | ||
2310 | unsigned int* destLen, | ||
2311 | char* source, | ||
2312 | unsigned int sourceLen, | ||
2313 | int small, | ||
2314 | int verbosity ); | ||
2315 | </programlisting> | ||
2316 | |||
2317 | <para>Attempts to decompress the data in <computeroutput>source[0 | ||
2318 | .. sourceLen-1]</computeroutput> into the destination buffer, | ||
2319 | <computeroutput>dest[0 .. *destLen-1]</computeroutput>. If the | ||
2320 | destination buffer is big enough, | ||
2321 | <computeroutput>*destLen</computeroutput> is set to the size of | ||
2322 | the uncompressed data, and <computeroutput>BZ_OK</computeroutput> | ||
2323 | is returned. If the compressed data won't fit, | ||
2324 | <computeroutput>*destLen</computeroutput> is unchanged, and | ||
2325 | <computeroutput>BZ_OUTBUFF_FULL</computeroutput> is | ||
2326 | returned.</para> | ||
2327 | |||
2328 | <para><computeroutput>source</computeroutput> is assumed to hold | ||
2329 | a complete <computeroutput>bzip2</computeroutput> format data | ||
2330 | stream. | ||
2331 | <computeroutput>BZ2_bzBuffToBuffDecompress</computeroutput> tries | ||
2332 | to decompress the entirety of the stream into the output | ||
2333 | buffer.</para> | ||
2334 | |||
2335 | <para>For the meaning of parameters | ||
2336 | <computeroutput>small</computeroutput> and | ||
2337 | <computeroutput>verbosity</computeroutput>, see | ||
2338 | <computeroutput>BZ2_bzDecompressInit</computeroutput>.</para> | ||
2339 | |||
2340 | <para>Because the compression ratio of the compressed data cannot | ||
2341 | be known in advance, there is no easy way to guarantee that the | ||
2342 | output buffer will be big enough. You may of course make | ||
2343 | arrangements in your code to record the size of the uncompressed | ||
2344 | data, but such a mechanism is beyond the scope of this | ||
2345 | library.</para> | ||
2346 | |||
2347 | <para><computeroutput>BZ2_bzBuffToBuffDecompress</computeroutput> | ||
2348 | will not write data at or beyond | ||
2349 | <computeroutput>dest[*destLen]</computeroutput>, even in case of | ||
2350 | buffer overflow.</para> | ||
2351 | |||
2352 | <para>Possible return values:</para> | ||
2353 | |||
2354 | <programlisting> | ||
2355 | BZ_CONFIG_ERROR | ||
2356 | if the library has been mis-compiled | ||
2357 | BZ_PARAM_ERROR | ||
2358 | if dest is NULL or destLen is NULL | ||
2359 | or small != 0 && small != 1 | ||
2360 | or verbosity < 0 or verbosity > 4 | ||
2361 | BZ_MEM_ERROR | ||
2362 | if insufficient memory is available | ||
2363 | BZ_OUTBUFF_FULL | ||
2364 | if the size of the compressed data exceeds *destLen | ||
2365 | BZ_DATA_ERROR | ||
2366 | if a data integrity error was detected in the compressed data | ||
2367 | BZ_DATA_ERROR_MAGIC | ||
2368 | if the compressed data doesn't begin with the right magic bytes | ||
2369 | BZ_UNEXPECTED_EOF | ||
2370 | if the compressed data ends unexpectedly | ||
2371 | BZ_OK | ||
2372 | otherwise | ||
2373 | </programlisting> | ||
2374 | |||
2375 | </sect2> | ||
2376 | |||
2377 | </sect1> | ||
2378 | |||
2379 | |||
2380 | <sect1 id="zlib-compat" xreflabel="zlib compatibility functions"> | ||
2381 | <title><computeroutput>zlib</computeroutput> compatibility functions</title> | ||
2382 | |||
2383 | <para>Yoshioka Tsuneo has contributed some functions to give | ||
2384 | better <computeroutput>zlib</computeroutput> compatibility. | ||
2385 | These functions are <computeroutput>BZ2_bzopen</computeroutput>, | ||
2386 | <computeroutput>BZ2_bzread</computeroutput>, | ||
2387 | <computeroutput>BZ2_bzwrite</computeroutput>, | ||
2388 | <computeroutput>BZ2_bzflush</computeroutput>, | ||
2389 | <computeroutput>BZ2_bzclose</computeroutput>, | ||
2390 | <computeroutput>BZ2_bzerror</computeroutput> and | ||
2391 | <computeroutput>BZ2_bzlibVersion</computeroutput>. These | ||
2392 | functions are not (yet) officially part of the library. If they | ||
2393 | break, you get to keep all the pieces. Nevertheless, I think | ||
2394 | they work ok.</para> | ||
2395 | |||
2396 | <programlisting> | ||
2397 | typedef void BZFILE; | ||
2398 | |||
2399 | const char * BZ2_bzlibVersion ( void ); | ||
2400 | </programlisting> | ||
2401 | |||
2402 | <para>Returns a string indicating the library version.</para> | ||
2403 | |||
2404 | <programlisting> | ||
2405 | BZFILE * BZ2_bzopen ( const char *path, const char *mode ); | ||
2406 | BZFILE * BZ2_bzdopen ( int fd, const char *mode ); | ||
2407 | </programlisting> | ||
2408 | |||
2409 | <para>Opens a <computeroutput>.bz2</computeroutput> file for | ||
2410 | reading or writing, using either its name or a pre-existing file | ||
2411 | descriptor. Analogous to <computeroutput>fopen</computeroutput> | ||
2412 | and <computeroutput>fdopen</computeroutput>.</para> | ||
2413 | |||
2414 | <programlisting> | ||
2415 | int BZ2_bzread ( BZFILE* b, void* buf, int len ); | ||
2416 | int BZ2_bzwrite ( BZFILE* b, void* buf, int len ); | ||
2417 | </programlisting> | ||
2418 | |||
2419 | <para>Reads/writes data from/to a previously opened | ||
2420 | <computeroutput>BZFILE</computeroutput>. Analogous to | ||
2421 | <computeroutput>fread</computeroutput> and | ||
2422 | <computeroutput>fwrite</computeroutput>.</para> | ||
2423 | |||
2424 | <programlisting> | ||
2425 | int BZ2_bzflush ( BZFILE* b ); | ||
2426 | void BZ2_bzclose ( BZFILE* b ); | ||
2427 | </programlisting> | ||
2428 | |||
2429 | <para>Flushes/closes a <computeroutput>BZFILE</computeroutput>. | ||
2430 | <computeroutput>BZ2_bzflush</computeroutput> doesn't actually do | ||
2431 | anything. Analogous to <computeroutput>fflush</computeroutput> | ||
2432 | and <computeroutput>fclose</computeroutput>.</para> | ||
2433 | |||
2434 | <programlisting> | ||
2435 | const char * BZ2_bzerror ( BZFILE *b, int *errnum ) | ||
2436 | </programlisting> | ||
2437 | |||
2438 | <para>Returns a string describing the more recent error status of | ||
2439 | <computeroutput>b</computeroutput>, and also sets | ||
2440 | <computeroutput>*errnum</computeroutput> to its numerical | ||
2441 | value.</para> | ||
2442 | |||
2443 | </sect1> | ||
2444 | |||
2445 | |||
2446 | <sect1 id="stdio-free" | ||
2447 | xreflabel="Using the library in a stdio-free environment"> | ||
2448 | <title>Using the library in a <computeroutput>stdio</computeroutput>-free environment</title> | ||
2449 | |||
2450 | |||
2451 | <sect2 id="stdio-bye" xreflabel="Getting rid of stdio"> | ||
2452 | <title>Getting rid of <computeroutput>stdio</computeroutput></title> | ||
2453 | |||
2454 | <para>In a deeply embedded application, you might want to use | ||
2455 | just the memory-to-memory functions. You can do this | ||
2456 | conveniently by compiling the library with preprocessor symbol | ||
2457 | <computeroutput>BZ_NO_STDIO</computeroutput> defined. Doing this | ||
2458 | gives you a library containing only the following eight | ||
2459 | functions:</para> | ||
2460 | |||
2461 | <para><computeroutput>BZ2_bzCompressInit</computeroutput>, | ||
2462 | <computeroutput>BZ2_bzCompress</computeroutput>, | ||
2463 | <computeroutput>BZ2_bzCompressEnd</computeroutput> | ||
2464 | <computeroutput>BZ2_bzDecompressInit</computeroutput>, | ||
2465 | <computeroutput>BZ2_bzDecompress</computeroutput>, | ||
2466 | <computeroutput>BZ2_bzDecompressEnd</computeroutput> | ||
2467 | <computeroutput>BZ2_bzBuffToBuffCompress</computeroutput>, | ||
2468 | <computeroutput>BZ2_bzBuffToBuffDecompress</computeroutput></para> | ||
2469 | |||
2470 | <para>When compiled like this, all functions will ignore | ||
2471 | <computeroutput>verbosity</computeroutput> settings.</para> | ||
2472 | |||
2473 | </sect2> | ||
2474 | |||
2475 | |||
2476 | <sect2 id="critical-error" xreflabel="Critical error handling"> | ||
2477 | <title>Critical error handling</title> | ||
2478 | |||
2479 | <para><computeroutput>libbzip2</computeroutput> contains a number | ||
2480 | of internal assertion checks which should, needless to say, never | ||
2481 | be activated. Nevertheless, if an assertion should fail, | ||
2482 | behaviour depends on whether or not the library was compiled with | ||
2483 | <computeroutput>BZ_NO_STDIO</computeroutput> set.</para> | ||
2484 | |||
2485 | <para>For a normal compile, an assertion failure yields the | ||
2486 | message:</para> | ||
2487 | |||
2488 | <blockquote> | ||
2489 | <para>bzip2/libbzip2: internal error number N.</para> | ||
2490 | <para>This is a bug in bzip2/libbzip2, &bz-version; of &bz-date;. | ||
2491 | Please report it to me at: &bz-email;. If this happened | ||
2492 | when you were using some program which uses libbzip2 as a | ||
2493 | component, you should also report this bug to the author(s) | ||
2494 | of that program. Please make an effort to report this bug; | ||
2495 | timely and accurate bug reports eventually lead to higher | ||
2496 | quality software. Thanks. Julian Seward, &bz-date;. | ||
2497 | </para></blockquote> | ||
2498 | |||
2499 | <para>where <computeroutput>N</computeroutput> is some error code | ||
2500 | number. If <computeroutput>N == 1007</computeroutput>, it also | ||
2501 | prints some extra text advising the reader that unreliable memory | ||
2502 | is often associated with internal error 1007. (This is a | ||
2503 | frequently-observed-phenomenon with versions 1.0.0/1.0.1).</para> | ||
2504 | |||
2505 | <para><computeroutput>exit(3)</computeroutput> is then | ||
2506 | called.</para> | ||
2507 | |||
2508 | <para>For a <computeroutput>stdio</computeroutput>-free library, | ||
2509 | assertion failures result in a call to a function declared | ||
2510 | as:</para> | ||
2511 | |||
2512 | <programlisting> | ||
2513 | extern void bz_internal_error ( int errcode ); | ||
2514 | </programlisting> | ||
2515 | |||
2516 | <para>The relevant code is passed as a parameter. You should | ||
2517 | supply such a function.</para> | ||
2518 | |||
2519 | <para>In either case, once an assertion failure has occurred, any | ||
2520 | <computeroutput>bz_stream</computeroutput> records involved can | ||
2521 | be regarded as invalid. You should not attempt to resume normal | ||
2522 | operation with them.</para> | ||
2523 | |||
2524 | <para>You may, of course, change critical error handling to suit | ||
2525 | your needs. As I said above, critical errors indicate bugs in | ||
2526 | the library and should not occur. All "normal" error situations | ||
2527 | are indicated via error return codes from functions, and can be | ||
2528 | recovered from.</para> | ||
2529 | |||
2530 | </sect2> | ||
2531 | |||
2532 | </sect1> | ||
2533 | |||
2534 | |||
2535 | <sect1 id="win-dll" xreflabel="Making a Windows DLL"> | ||
2536 | <title>Making a Windows DLL</title> | ||
2537 | |||
2538 | <para>Everything related to Windows has been contributed by | ||
2539 | Yoshioka Tsuneo | ||
2540 | (<computeroutput>QWF00133@niftyserve.or.jp</computeroutput> / | ||
2541 | <computeroutput>tsuneo-y@is.aist-nara.ac.jp</computeroutput>), so | ||
2542 | you should send your queries to him (but perhaps Cc: me, | ||
2543 | <computeroutput>&bz-email;</computeroutput>).</para> | ||
2544 | |||
2545 | <para>My vague understanding of what to do is: using Visual C++ | ||
2546 | 5.0, open the project file | ||
2547 | <computeroutput>libbz2.dsp</computeroutput>, and build. That's | ||
2548 | all.</para> | ||
2549 | |||
2550 | <para>If you can't open the project file for some reason, make a | ||
2551 | new one, naming these files: | ||
2552 | <computeroutput>blocksort.c</computeroutput>, | ||
2553 | <computeroutput>bzlib.c</computeroutput>, | ||
2554 | <computeroutput>compress.c</computeroutput>, | ||
2555 | <computeroutput>crctable.c</computeroutput>, | ||
2556 | <computeroutput>decompress.c</computeroutput>, | ||
2557 | <computeroutput>huffman.c</computeroutput>, | ||
2558 | <computeroutput>randtable.c</computeroutput> and | ||
2559 | <computeroutput>libbz2.def</computeroutput>. You will also need | ||
2560 | to name the header files <computeroutput>bzlib.h</computeroutput> | ||
2561 | and <computeroutput>bzlib_private.h</computeroutput>.</para> | ||
2562 | |||
2563 | <para>If you don't use VC++, you may need to define the | ||
2564 | proprocessor symbol | ||
2565 | <computeroutput>_WIN32</computeroutput>.</para> | ||
2566 | |||
2567 | <para>Finally, <computeroutput>dlltest.c</computeroutput> is a | ||
2568 | sample program using the DLL. It has a project file, | ||
2569 | <computeroutput>dlltest.dsp</computeroutput>.</para> | ||
2570 | |||
2571 | <para>If you just want a makefile for Visual C, have a look at | ||
2572 | <computeroutput>makefile.msc</computeroutput>.</para> | ||
2573 | |||
2574 | <para>Be aware that if you compile | ||
2575 | <computeroutput>bzip2</computeroutput> itself on Win32, you must | ||
2576 | set <computeroutput>BZ_UNIX</computeroutput> to 0 and | ||
2577 | <computeroutput>BZ_LCCWIN32</computeroutput> to 1, in the file | ||
2578 | <computeroutput>bzip2.c</computeroutput>, before compiling. | ||
2579 | Otherwise the resulting binary won't work correctly.</para> | ||
2580 | |||
2581 | <para>I haven't tried any of this stuff myself, but it all looks | ||
2582 | plausible.</para> | ||
2583 | |||
2584 | </sect1> | ||
2585 | |||
2586 | </chapter> | ||
2587 | |||
2588 | |||
2589 | |||
2590 | <chapter id="misc" xreflabel="Miscellanea"> | ||
2591 | <title>Miscellanea</title> | ||
2592 | |||
2593 | <para>These are just some random thoughts of mine. Your mileage | ||
2594 | may vary.</para> | ||
2595 | |||
2596 | |||
2597 | <sect1 id="limits" xreflabel="Limitations of the compressed file format"> | ||
2598 | <title>Limitations of the compressed file format</title> | ||
2599 | |||
2600 | <para><computeroutput>bzip2-1.0.X</computeroutput>, | ||
2601 | <computeroutput>0.9.5</computeroutput> and | ||
2602 | <computeroutput>0.9.0</computeroutput> use exactly the same file | ||
2603 | format as the original version, | ||
2604 | <computeroutput>bzip2-0.1</computeroutput>. This decision was | ||
2605 | made in the interests of stability. Creating yet another | ||
2606 | incompatible compressed file format would create further | ||
2607 | confusion and disruption for users.</para> | ||
2608 | |||
2609 | <para>Nevertheless, this is not a painless decision. Development | ||
2610 | work since the release of | ||
2611 | <computeroutput>bzip2-0.1</computeroutput> in August 1997 has | ||
2612 | shown complexities in the file format which slow down | ||
2613 | decompression and, in retrospect, are unnecessary. These | ||
2614 | are:</para> | ||
2615 | |||
2616 | <itemizedlist mark='bullet'> | ||
2617 | |||
2618 | <listitem><para>The run-length encoder, which is the first of the | ||
2619 | compression transformations, is entirely irrelevant. The | ||
2620 | original purpose was to protect the sorting algorithm from the | ||
2621 | very worst case input: a string of repeated symbols. But | ||
2622 | algorithm steps Q6a and Q6b in the original Burrows-Wheeler | ||
2623 | technical report (SRC-124) show how repeats can be handled | ||
2624 | without difficulty in block sorting.</para></listitem> | ||
2625 | |||
2626 | <listitem><para>The randomisation mechanism doesn't really need to be | ||
2627 | there. Udi Manber and Gene Myers published a suffix array | ||
2628 | construction algorithm a few years back, which can be employed | ||
2629 | to sort any block, no matter how repetitive, in O(N log N) | ||
2630 | time. Subsequent work by Kunihiko Sadakane has produced a | ||
2631 | derivative O(N (log N)^2) algorithm which usually outperforms | ||
2632 | the Manber-Myers algorithm.</para> | ||
2633 | |||
2634 | <para>I could have changed to Sadakane's algorithm, but I find | ||
2635 | it to be slower than <computeroutput>bzip2</computeroutput>'s | ||
2636 | existing algorithm for most inputs, and the randomisation | ||
2637 | mechanism protects adequately against bad cases. I didn't | ||
2638 | think it was a good tradeoff to make. Partly this is due to | ||
2639 | the fact that I was not flooded with email complaints about | ||
2640 | <computeroutput>bzip2-0.1</computeroutput>'s performance on | ||
2641 | repetitive data, so perhaps it isn't a problem for real | ||
2642 | inputs.</para> | ||
2643 | |||
2644 | <para>Probably the best long-term solution, and the one I have | ||
2645 | incorporated into 0.9.5 and above, is to use the existing | ||
2646 | sorting algorithm initially, and fall back to a O(N (log N)^2) | ||
2647 | algorithm if the standard algorithm gets into | ||
2648 | difficulties.</para></listitem> | ||
2649 | |||
2650 | <listitem><para>The compressed file format was never designed to be | ||
2651 | handled by a library, and I have had to jump though some hoops | ||
2652 | to produce an efficient implementation of decompression. It's | ||
2653 | a bit hairy. Try passing | ||
2654 | <computeroutput>decompress.c</computeroutput> through the C | ||
2655 | preprocessor and you'll see what I mean. Much of this | ||
2656 | complexity could have been avoided if the compressed size of | ||
2657 | each block of data was recorded in the data stream.</para></listitem> | ||
2658 | |||
2659 | <listitem><para>An Adler-32 checksum, rather than a CRC32 checksum, | ||
2660 | would be faster to compute.</para></listitem> | ||
2661 | |||
2662 | </itemizedlist> | ||
2663 | |||
2664 | <para>It would be fair to say that the | ||
2665 | <computeroutput>bzip2</computeroutput> format was frozen before I | ||
2666 | properly and fully understood the performance consequences of | ||
2667 | doing so.</para> | ||
2668 | |||
2669 | <para>Improvements which I was able to incorporate into 0.9.0, | ||
2670 | despite using the same file format, are:</para> | ||
2671 | |||
2672 | <itemizedlist mark='bullet'> | ||
2673 | |||
2674 | <listitem><para>Single array implementation of the inverse BWT. This | ||
2675 | significantly speeds up decompression, presumably because it | ||
2676 | reduces the number of cache misses.</para></listitem> | ||
2677 | |||
2678 | <listitem><para>Faster inverse MTF transform for large MTF values. | ||
2679 | The new implementation is based on the notion of sliding blocks | ||
2680 | of values.</para></listitem> | ||
2681 | |||
2682 | <listitem><para><computeroutput>bzip2-0.9.0</computeroutput> now reads | ||
2683 | and writes files with <computeroutput>fread</computeroutput> | ||
2684 | and <computeroutput>fwrite</computeroutput>; version 0.1 used | ||
2685 | <computeroutput>putc</computeroutput> and | ||
2686 | <computeroutput>getc</computeroutput>. Duh! Well, you live | ||
2687 | and learn.</para></listitem> | ||
2688 | |||
2689 | </itemizedlist> | ||
2690 | |||
2691 | <para>Further ahead, it would be nice to be able to do random | ||
2692 | access into files. This will require some careful design of | ||
2693 | compressed file formats.</para> | ||
2694 | |||
2695 | </sect1> | ||
2696 | |||
2697 | |||
2698 | <sect1 id="port-issues" xreflabel="Portability issues"> | ||
2699 | <title>Portability issues</title> | ||
2700 | |||
2701 | <para>After some consideration, I have decided not to use GNU | ||
2702 | <computeroutput>autoconf</computeroutput> to configure 0.9.5 or | ||
2703 | 1.0.</para> | ||
2704 | |||
2705 | <para><computeroutput>autoconf</computeroutput>, admirable and | ||
2706 | wonderful though it is, mainly assists with portability problems | ||
2707 | between Unix-like platforms. But | ||
2708 | <computeroutput>bzip2</computeroutput> doesn't have much in the | ||
2709 | way of portability problems on Unix; most of the difficulties | ||
2710 | appear when porting to the Mac, or to Microsoft's operating | ||
2711 | systems. <computeroutput>autoconf</computeroutput> doesn't help | ||
2712 | in those cases, and brings in a whole load of new | ||
2713 | complexity.</para> | ||
2714 | |||
2715 | <para>Most people should be able to compile the library and | ||
2716 | program under Unix straight out-of-the-box, so to speak, | ||
2717 | especially if you have a version of GNU C available.</para> | ||
2718 | |||
2719 | <para>There are a couple of | ||
2720 | <computeroutput>__inline__</computeroutput> directives in the | ||
2721 | code. GNU C (<computeroutput>gcc</computeroutput>) should be | ||
2722 | able to handle them. If you're not using GNU C, your C compiler | ||
2723 | shouldn't see them at all. If your compiler does, for some | ||
2724 | reason, see them and doesn't like them, just | ||
2725 | <computeroutput>#define</computeroutput> | ||
2726 | <computeroutput>__inline__</computeroutput> to be | ||
2727 | <computeroutput>/* */</computeroutput>. One easy way to do this | ||
2728 | is to compile with the flag | ||
2729 | <computeroutput>-D__inline__=</computeroutput>, which should be | ||
2730 | understood by most Unix compilers.</para> | ||
2731 | |||
2732 | <para>If you still have difficulties, try compiling with the | ||
2733 | macro <computeroutput>BZ_STRICT_ANSI</computeroutput> defined. | ||
2734 | This should enable you to build the library in a strictly ANSI | ||
2735 | compliant environment. Building the program itself like this is | ||
2736 | dangerous and not supported, since you remove | ||
2737 | <computeroutput>bzip2</computeroutput>'s checks against | ||
2738 | compressing directories, symbolic links, devices, and other | ||
2739 | not-really-a-file entities. This could cause filesystem | ||
2740 | corruption!</para> | ||
2741 | |||
2742 | <para>One other thing: if you create a | ||
2743 | <computeroutput>bzip2</computeroutput> binary for public distribution, | ||
2744 | please consider linking it statically (<computeroutput>gcc | ||
2745 | -static</computeroutput>). This avoids all sorts of library-version | ||
2746 | issues that others may encounter later on.</para> | ||
2747 | |||
2748 | <para>If you build <computeroutput>bzip2</computeroutput> on | ||
2749 | Win32, you must set <computeroutput>BZ_UNIX</computeroutput> to 0 | ||
2750 | and <computeroutput>BZ_LCCWIN32</computeroutput> to 1, in the | ||
2751 | file <computeroutput>bzip2.c</computeroutput>, before compiling. | ||
2752 | Otherwise the resulting binary won't work correctly.</para> | ||
2753 | |||
2754 | </sect1> | ||
2755 | |||
2756 | |||
2757 | <sect1 id="bugs" xreflabel="Reporting bugs"> | ||
2758 | <title>Reporting bugs</title> | ||
2759 | |||
2760 | <para>I tried pretty hard to make sure | ||
2761 | <computeroutput>bzip2</computeroutput> is bug free, both by | ||
2762 | design and by testing. Hopefully you'll never need to read this | ||
2763 | section for real.</para> | ||
2764 | |||
2765 | <para>Nevertheless, if <computeroutput>bzip2</computeroutput> dies | ||
2766 | with a segmentation fault, a bus error or an internal assertion | ||
2767 | failure, it will ask you to email me a bug report. Experience from | ||
2768 | years of feedback of bzip2 users indicates that almost all these | ||
2769 | problems can be traced to either compiler bugs or hardware | ||
2770 | problems.</para> | ||
2771 | |||
2772 | <itemizedlist mark='bullet'> | ||
2773 | |||
2774 | <listitem><para>Recompile the program with no optimisation, and | ||
2775 | see if it works. And/or try a different compiler. I heard all | ||
2776 | sorts of stories about various flavours of GNU C (and other | ||
2777 | compilers) generating bad code for | ||
2778 | <computeroutput>bzip2</computeroutput>, and I've run across two | ||
2779 | such examples myself.</para> | ||
2780 | |||
2781 | <para>2.7.X versions of GNU C are known to generate bad code | ||
2782 | from time to time, at high optimisation levels. If you get | ||
2783 | problems, try using the flags | ||
2784 | <computeroutput>-O2</computeroutput> | ||
2785 | <computeroutput>-fomit-frame-pointer</computeroutput> | ||
2786 | <computeroutput>-fno-strength-reduce</computeroutput>. You | ||
2787 | should specifically <emphasis>not</emphasis> use | ||
2788 | <computeroutput>-funroll-loops</computeroutput>.</para> | ||
2789 | |||
2790 | <para>You may notice that the Makefile runs six tests as part | ||
2791 | of the build process. If the program passes all of these, it's | ||
2792 | a pretty good (but not 100%) indication that the compiler has | ||
2793 | done its job correctly.</para></listitem> | ||
2794 | |||
2795 | <listitem><para>If <computeroutput>bzip2</computeroutput> | ||
2796 | crashes randomly, and the crashes are not repeatable, you may | ||
2797 | have a flaky memory subsystem. | ||
2798 | <computeroutput>bzip2</computeroutput> really hammers your | ||
2799 | memory hierarchy, and if it's a bit marginal, you may get these | ||
2800 | problems. Ditto if your disk or I/O subsystem is slowly | ||
2801 | failing. Yup, this really does happen.</para> | ||
2802 | |||
2803 | <para>Try using a different machine of the same type, and see | ||
2804 | if you can repeat the problem.</para></listitem> | ||
2805 | |||
2806 | <listitem><para>This isn't really a bug, but ... If | ||
2807 | <computeroutput>bzip2</computeroutput> tells you your file is | ||
2808 | corrupted on decompression, and you obtained the file via FTP, | ||
2809 | there is a possibility that you forgot to tell FTP to do a | ||
2810 | binary mode transfer. That absolutely will cause the file to | ||
2811 | be non-decompressible. You'll have to transfer it | ||
2812 | again.</para></listitem> | ||
2813 | |||
2814 | </itemizedlist> | ||
2815 | |||
2816 | <para>If you've incorporated | ||
2817 | <computeroutput>libbzip2</computeroutput> into your own program | ||
2818 | and are getting problems, please, please, please, check that the | ||
2819 | parameters you are passing in calls to the library, are correct, | ||
2820 | and in accordance with what the documentation says is allowable. | ||
2821 | I have tried to make the library robust against such problems, | ||
2822 | but I'm sure I haven't succeeded.</para> | ||
2823 | |||
2824 | <para>Finally, if the above comments don't help, you'll have to | ||
2825 | send me a bug report. Now, it's just amazing how many people | ||
2826 | will send me a bug report saying something like:</para> | ||
2827 | |||
2828 | <programlisting> | ||
2829 | bzip2 crashed with segmentation fault on my machine | ||
2830 | </programlisting> | ||
2831 | |||
2832 | <para>and absolutely nothing else. Needless to say, a such a | ||
2833 | report is <emphasis>totally, utterly, completely and | ||
2834 | comprehensively 100% useless; a waste of your time, my time, and | ||
2835 | net bandwidth</emphasis>. With no details at all, there's no way | ||
2836 | I can possibly begin to figure out what the problem is.</para> | ||
2837 | |||
2838 | <para>The rules of the game are: facts, facts, facts. Don't omit | ||
2839 | them because "oh, they won't be relevant". At the bare | ||
2840 | minimum:</para> | ||
2841 | |||
2842 | <programlisting> | ||
2843 | Machine type. Operating system version. | ||
2844 | Exact version of bzip2 (do bzip2 -V). | ||
2845 | Exact version of the compiler used. | ||
2846 | Flags passed to the compiler. | ||
2847 | </programlisting> | ||
2848 | |||
2849 | <para>However, the most important single thing that will help me | ||
2850 | is the file that you were trying to compress or decompress at the | ||
2851 | time the problem happened. Without that, my ability to do | ||
2852 | anything more than speculate about the cause, is limited.</para> | ||
2853 | |||
2854 | </sect1> | ||
2855 | |||
2856 | |||
2857 | <sect1 id="package" xreflabel="Did you get the right package?"> | ||
2858 | <title>Did you get the right package?</title> | ||
2859 | |||
2860 | <para><computeroutput>bzip2</computeroutput> is a resource hog. | ||
2861 | It soaks up large amounts of CPU cycles and memory. Also, it | ||
2862 | gives very large latencies. In the worst case, you can feed many | ||
2863 | megabytes of uncompressed data into the library before getting | ||
2864 | any compressed output, so this probably rules out applications | ||
2865 | requiring interactive behaviour.</para> | ||
2866 | |||
2867 | <para>These aren't faults of my implementation, I hope, but more | ||
2868 | an intrinsic property of the Burrows-Wheeler transform | ||
2869 | (unfortunately). Maybe this isn't what you want.</para> | ||
2870 | |||
2871 | <para>If you want a compressor and/or library which is faster, | ||
2872 | uses less memory but gets pretty good compression, and has | ||
2873 | minimal latency, consider Jean-loup Gailly's and Mark Adler's | ||
2874 | work, <computeroutput>zlib-1.2.1</computeroutput> and | ||
2875 | <computeroutput>gzip-1.2.4</computeroutput>. Look for them at | ||
2876 | <ulink url="http://www.zlib.org">http://www.zlib.org</ulink> and | ||
2877 | <ulink url="http://www.gzip.org">http://www.gzip.org</ulink> | ||
2878 | respectively.</para> | ||
2879 | |||
2880 | <para>For something faster and lighter still, you might try Markus F | ||
2881 | X J Oberhumer's <computeroutput>LZO</computeroutput> real-time | ||
2882 | compression/decompression library, at | ||
2883 | <ulink url="http://www.oberhumer.com/opensource">http://www.oberhumer.com/opensource</ulink>.</para> | ||
2884 | |||
2885 | </sect1> | ||
2886 | |||
2887 | |||
2888 | |||
2889 | <sect1 id="reading" xreflabel="Further Reading"> | ||
2890 | <title>Further Reading</title> | ||
2891 | |||
2892 | <para><computeroutput>bzip2</computeroutput> is not research | ||
2893 | work, in the sense that it doesn't present any new ideas. | ||
2894 | Rather, it's an engineering exercise based on existing | ||
2895 | ideas.</para> | ||
2896 | |||
2897 | <para>Four documents describe essentially all the ideas behind | ||
2898 | <computeroutput>bzip2</computeroutput>:</para> | ||
2899 | |||
2900 | <literallayout>Michael Burrows and D. J. Wheeler: | ||
2901 | "A block-sorting lossless data compression algorithm" | ||
2902 | 10th May 1994. | ||
2903 | Digital SRC Research Report 124. | ||
2904 | ftp://ftp.digital.com/pub/DEC/SRC/research-reports/SRC-124.ps.gz | ||
2905 | If you have trouble finding it, try searching at the | ||
2906 | New Zealand Digital Library, http://www.nzdl.org. | ||
2907 | |||
2908 | Daniel S. Hirschberg and Debra A. LeLewer | ||
2909 | "Efficient Decoding of Prefix Codes" | ||
2910 | Communications of the ACM, April 1990, Vol 33, Number 4. | ||
2911 | You might be able to get an electronic copy of this | ||
2912 | from the ACM Digital Library. | ||
2913 | |||
2914 | David J. Wheeler | ||
2915 | Program bred3.c and accompanying document bred3.ps. | ||
2916 | This contains the idea behind the multi-table Huffman coding scheme. | ||
2917 | ftp://ftp.cl.cam.ac.uk/users/djw3/ | ||
2918 | |||
2919 | Jon L. Bentley and Robert Sedgewick | ||
2920 | "Fast Algorithms for Sorting and Searching Strings" | ||
2921 | Available from Sedgewick's web page, | ||
2922 | www.cs.princeton.edu/~rs | ||
2923 | </literallayout> | ||
2924 | |||
2925 | <para>The following paper gives valuable additional insights into | ||
2926 | the algorithm, but is not immediately the basis of any code used | ||
2927 | in bzip2.</para> | ||
2928 | |||
2929 | <literallayout>Peter Fenwick: | ||
2930 | Block Sorting Text Compression | ||
2931 | Proceedings of the 19th Australasian Computer Science Conference, | ||
2932 | Melbourne, Australia. Jan 31 - Feb 2, 1996. | ||
2933 | ftp://ftp.cs.auckland.ac.nz/pub/peter-f/ACSC96paper.ps</literallayout> | ||
2934 | |||
2935 | <para>Kunihiko Sadakane's sorting algorithm, mentioned above, is | ||
2936 | available from:</para> | ||
2937 | |||
2938 | <literallayout>http://naomi.is.s.u-tokyo.ac.jp/~sada/papers/Sada98b.ps.gz | ||
2939 | </literallayout> | ||
2940 | |||
2941 | <para>The Manber-Myers suffix array construction algorithm is | ||
2942 | described in a paper available from:</para> | ||
2943 | |||
2944 | <literallayout>http://www.cs.arizona.edu/people/gene/PAPERS/suffix.ps | ||
2945 | </literallayout> | ||
2946 | |||
2947 | <para>Finally, the following papers document some | ||
2948 | investigations I made into the performance of sorting | ||
2949 | and decompression algorithms:</para> | ||
2950 | |||
2951 | <literallayout>Julian Seward | ||
2952 | On the Performance of BWT Sorting Algorithms | ||
2953 | Proceedings of the IEEE Data Compression Conference 2000 | ||
2954 | Snowbird, Utah. 28-30 March 2000. | ||
2955 | |||
2956 | Julian Seward | ||
2957 | Space-time Tradeoffs in the Inverse B-W Transform | ||
2958 | Proceedings of the IEEE Data Compression Conference 2001 | ||
2959 | Snowbird, Utah. 27-29 March 2001. | ||
2960 | </literallayout> | ||
2961 | |||
2962 | </sect1> | ||
2963 | |||
2964 | </chapter> | ||
2965 | |||
2966 | </book> | ||