aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJulian Seward <jseward@acm.org>1998-08-23 22:13:13 +0200
committerJulian Seward <jseward@acm.org>1998-08-23 22:13:13 +0200
commit977101ad5f833f5c0a574bfeea408e5301a6b052 (patch)
treefc1e8fed202869c116cbf6b8c362456042494a0a
parent1eb67a9d8f7f05ae310bc9ef297d176f3a3f8a37 (diff)
downloadbzip2-977101ad5f833f5c0a574bfeea408e5301a6b052.tar.gz
bzip2-977101ad5f833f5c0a574bfeea408e5301a6b052.tar.bz2
bzip2-977101ad5f833f5c0a574bfeea408e5301a6b052.zip
bzip2-0.9.0cbzip2-0.9.0c
-rw-r--r--ALGORITHMS47
-rw-r--r--CHANGES45
-rw-r--r--LICENSE360
-rw-r--r--Makefile52
-rw-r--r--README230
-rw-r--r--README.DOS16
-rw-r--r--blocksort.c709
-rw-r--r--bzip2.1191
-rw-r--r--bzip2.1.preformatted318
-rw-r--r--bzip2.c3389
-rw-r--r--bzip2.txt292
-rw-r--r--bzip2recover.c125
-rw-r--r--bzlib.c1512
-rw-r--r--bzlib.h299
-rw-r--r--bzlib_private.h523
-rw-r--r--compress.c588
-rw-r--r--crctable.c144
-rw-r--r--decompress.c636
-rw-r--r--dlltest.c163
-rw-r--r--dlltest.dsp93
-rw-r--r--howbig.c37
-rw-r--r--huffman.c228
-rw-r--r--libbz2.def25
-rw-r--r--libbz2.dsp130
-rw-r--r--manual.texi2100
-rw-r--r--randtable.c124
-rw-r--r--test.bat9
-rw-r--r--test.cmd9
-rw-r--r--words07
-rw-r--r--words11
-rw-r--r--words21
-rw-r--r--words321
-rw-r--r--words3sh12
33 files changed, 8332 insertions, 4104 deletions
diff --git a/ALGORITHMS b/ALGORITHMS
deleted file mode 100644
index 7c7d2ca..0000000
--- a/ALGORITHMS
+++ /dev/null
@@ -1,47 +0,0 @@
1
2Bzip2 is not research work, in the sense that it doesn't present any
3new ideas. Rather, it's an engineering exercise based on existing
4ideas.
5
6Four documents describe essentially all the ideas behind bzip2:
7
8 Michael Burrows and D. J. Wheeler:
9 "A block-sorting lossless data compression algorithm"
10 10th May 1994.
11 Digital SRC Research Report 124.
12 ftp://ftp.digital.com/pub/DEC/SRC/research-reports/SRC-124.ps.gz
13
14 Daniel S. Hirschberg and Debra A. LeLewer
15 "Efficient Decoding of Prefix Codes"
16 Communications of the ACM, April 1990, Vol 33, Number 4.
17 You might be able to get an electronic copy of this
18 from the ACM Digital Library.
19
20 David J. Wheeler
21 Program bred3.c and accompanying document bred3.ps.
22 This contains the idea behind the multi-table Huffman
23 coding scheme.
24 ftp://ftp.cl.cam.ac.uk/pub/user/djw3/
25
26 Jon L. Bentley and Robert Sedgewick
27 "Fast Algorithms for Sorting and Searching Strings"
28 Available from Sedgewick's web page,
29 www.cs.princeton.edu/~rs
30
31The following paper gives valuable additional insights into the
32algorithm, but is not immediately the basis of any code
33used in bzip2.
34
35 Peter Fenwick:
36 Block Sorting Text Compression
37 Proceedings of the 19th Australasian Computer Science Conference,
38 Melbourne, Australia. Jan 31 - Feb 2, 1996.
39 ftp://ftp.cs.auckland.ac.nz/pub/peter-f/ACSC96paper.ps
40
41All three are well written, and make fascinating reading. If you want
42to modify bzip2 in any non-trivial way, I strongly suggest you obtain,
43read and understand these papers.
44
45I am much indebted to the various authors for their help, support and
46advice.
47
diff --git a/CHANGES b/CHANGES
new file mode 100644
index 0000000..ac00f3a
--- /dev/null
+++ b/CHANGES
@@ -0,0 +1,45 @@
1
2
30.9.0
4~~~~~
5First version.
6
7
80.9.0a
9~~~~~~
10Removed 'ranlib' from Makefile, since most modern Unix-es
11don't need it, or even know about it.
12
13
140.9.0b
15~~~~~~
16Fixed a problem with error reporting in bzip2.c. This does not effect
17the library in any way. Problem is: versions 0.9.0 and 0.9.0a (of the
18program proper) compress and decompress correctly, but give misleading
19error messages (internal panics) when an I/O error occurs, instead of
20reporting the problem correctly. This shouldn't give any data loss
21(as far as I can see), but is confusing.
22
23Made the inline declarations disappear for non-GCC compilers.
24
25
260.9.0c
27~~~~~~
28Fixed some problems in the library pertaining to some boundary cases.
29This makes the library behave more correctly in those situations. The
30fixes apply only to features (calls and parameters) not used by
31bzip2.c, so the non-fixedness of them in previous versions has no
32effect on reliability of bzip2.c.
33
34In bzlib.c:
35 * made zero-length BZ_FLUSH work correctly in bzCompress().
36 * fixed bzWrite/bzRead to ignore zero-length requests.
37 * fixed bzread to correctly handle read requests after EOF.
38 * wrong parameter order in call to bzDecompressInit in
39 bzBuffToBuffDecompress. Fixed.
40
41In compress.c:
42 * changed setting of nGroups in sendMTFValues() so as to
43 do a bit better on small files. This _does_ effect
44 bzip2.c.
45
diff --git a/LICENSE b/LICENSE
index a43ea21..3de0301 100644
--- a/LICENSE
+++ b/LICENSE
@@ -1,339 +1,39 @@
1 GNU GENERAL PUBLIC LICENSE
2 Version 2, June 1991
3 1
4 Copyright (C) 1989, 1991 Free Software Foundation, Inc. 2This program, "bzip2" and associated library "libbzip2", are
5 675 Mass Ave, Cambridge, MA 02139, USA 3copyright (C) 1996-1998 Julian R Seward. All rights reserved.
6 Everyone is permitted to copy and distribute verbatim copies
7 of this license document, but changing it is not allowed.
8 4
9 Preamble 5Redistribution and use in source and binary forms, with or without
6modification, are permitted provided that the following conditions
7are met:
10 8
11 The licenses for most software are designed to take away your 91. Redistributions of source code must retain the above copyright
12freedom to share and change it. By contrast, the GNU General Public 10 notice, this list of conditions and the following disclaimer.
13License is intended to guarantee your freedom to share and change free
14software--to make sure the software is free for all its users. This
15General Public License applies to most of the Free Software
16Foundation's software and to any other program whose authors commit to
17using it. (Some other Free Software Foundation software is covered by
18the GNU Library General Public License instead.) You can apply it to
19your programs, too.
20 11
21 When we speak of free software, we are referring to freedom, not 122. The origin of this software must not be misrepresented; you must
22price. Our General Public Licenses are designed to make sure that you 13 not claim that you wrote the original software. If you use this
23have the freedom to distribute copies of free software (and charge for 14 software in a product, an acknowledgment in the product
24this service if you wish), that you receive source code or can get it 15 documentation would be appreciated but is not required.
25if you want it, that you can change the software or use pieces of it
26in new free programs; and that you know you can do these things.
27 16
28 To protect your rights, we need to make restrictions that forbid 173. Altered source versions must be plainly marked as such, and must
29anyone to deny you these rights or to ask you to surrender the rights. 18 not be misrepresented as being the original software.
30These restrictions translate to certain responsibilities for you if you
31distribute copies of the software, or if you modify it.
32 19
33 For example, if you distribute copies of such a program, whether 204. The name of the author may not be used to endorse or promote
34gratis or for a fee, you must give the recipients all the rights that 21 products derived from this software without specific prior written
35you have. You must make sure that they, too, receive or can get the 22 permission.
36source code. And you must show them these terms so they know their
37rights.
38 23
39 We protect your rights with two steps: (1) copyright the software, and 24THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
40(2) offer you this license which gives you legal permission to copy, 25OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
41distribute and/or modify the software. 26WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
27ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
28DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
29DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
30GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
31INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
32WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
33NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
34SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
42 35
43 Also, for each author's protection and ours, we want to make certain 36Julian Seward, Guildford, Surrey, UK.
44that everyone understands that there is no warranty for this free 37jseward@acm.org
45software. If the software is modified by someone else and passed on, we 38bzip2/libbzip2 version 0.9.0 of 28 June 1998
46want its recipients to know that what they have is not the original, so
47that any problems introduced by others will not reflect on the original
48authors' reputations.
49 39
50 Finally, any free program is threatened constantly by software
51patents. We wish to avoid the danger that redistributors of a free
52program will individually obtain patent licenses, in effect making the
53program proprietary. To prevent this, we have made it clear that any
54patent must be licensed for everyone's free use or not licensed at all.
55
56 The precise terms and conditions for copying, distribution and
57modification follow.
58
59 GNU GENERAL PUBLIC LICENSE
60 TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
61
62 0. This License applies to any program or other work which contains
63a notice placed by the copyright holder saying it may be distributed
64under the terms of this General Public License. The "Program", below,
65refers to any such program or work, and a "work based on the Program"
66means either the Program or any derivative work under copyright law:
67that is to say, a work containing the Program or a portion of it,
68either verbatim or with modifications and/or translated into another
69language. (Hereinafter, translation is included without limitation in
70the term "modification".) Each licensee is addressed as "you".
71
72Activities other than copying, distribution and modification are not
73covered by this License; they are outside its scope. The act of
74running the Program is not restricted, and the output from the Program
75is covered only if its contents constitute a work based on the
76Program (independent of having been made by running the Program).
77Whether that is true depends on what the Program does.
78
79 1. You may copy and distribute verbatim copies of the Program's
80source code as you receive it, in any medium, provided that you
81conspicuously and appropriately publish on each copy an appropriate
82copyright notice and disclaimer of warranty; keep intact all the
83notices that refer to this License and to the absence of any warranty;
84and give any other recipients of the Program a copy of this License
85along with the Program.
86
87You may charge a fee for the physical act of transferring a copy, and
88you may at your option offer warranty protection in exchange for a fee.
89
90 2. You may modify your copy or copies of the Program or any portion
91of it, thus forming a work based on the Program, and copy and
92distribute such modifications or work under the terms of Section 1
93above, provided that you also meet all of these conditions:
94
95 a) You must cause the modified files to carry prominent notices
96 stating that you changed the files and the date of any change.
97
98 b) You must cause any work that you distribute or publish, that in
99 whole or in part contains or is derived from the Program or any
100 part thereof, to be licensed as a whole at no charge to all third
101 parties under the terms of this License.
102
103 c) If the modified program normally reads commands interactively
104 when run, you must cause it, when started running for such
105 interactive use in the most ordinary way, to print or display an
106 announcement including an appropriate copyright notice and a
107 notice that there is no warranty (or else, saying that you provide
108 a warranty) and that users may redistribute the program under
109 these conditions, and telling the user how to view a copy of this
110 License. (Exception: if the Program itself is interactive but
111 does not normally print such an announcement, your work based on
112 the Program is not required to print an announcement.)
113
114These requirements apply to the modified work as a whole. If
115identifiable sections of that work are not derived from the Program,
116and can be reasonably considered independent and separate works in
117themselves, then this License, and its terms, do not apply to those
118sections when you distribute them as separate works. But when you
119distribute the same sections as part of a whole which is a work based
120on the Program, the distribution of the whole must be on the terms of
121this License, whose permissions for other licensees extend to the
122entire whole, and thus to each and every part regardless of who wrote it.
123
124Thus, it is not the intent of this section to claim rights or contest
125your rights to work written entirely by you; rather, the intent is to
126exercise the right to control the distribution of derivative or
127collective works based on the Program.
128
129In addition, mere aggregation of another work not based on the Program
130with the Program (or with a work based on the Program) on a volume of
131a storage or distribution medium does not bring the other work under
132the scope of this License.
133
134 3. You may copy and distribute the Program (or a work based on it,
135under Section 2) in object code or executable form under the terms of
136Sections 1 and 2 above provided that you also do one of the following:
137
138 a) Accompany it with the complete corresponding machine-readable
139 source code, which must be distributed under the terms of Sections
140 1 and 2 above on a medium customarily used for software interchange; or,
141
142 b) Accompany it with a written offer, valid for at least three
143 years, to give any third party, for a charge no more than your
144 cost of physically performing source distribution, a complete
145 machine-readable copy of the corresponding source code, to be
146 distributed under the terms of Sections 1 and 2 above on a medium
147 customarily used for software interchange; or,
148
149 c) Accompany it with the information you received as to the offer
150 to distribute corresponding source code. (This alternative is
151 allowed only for noncommercial distribution and only if you
152 received the program in object code or executable form with such
153 an offer, in accord with Subsection b above.)
154
155The source code for a work means the preferred form of the work for
156making modifications to it. For an executable work, complete source
157code means all the source code for all modules it contains, plus any
158associated interface definition files, plus the scripts used to
159control compilation and installation of the executable. However, as a
160special exception, the source code distributed need not include
161anything that is normally distributed (in either source or binary
162form) with the major components (compiler, kernel, and so on) of the
163operating system on which the executable runs, unless that component
164itself accompanies the executable.
165
166If distribution of executable or object code is made by offering
167access to copy from a designated place, then offering equivalent
168access to copy the source code from the same place counts as
169distribution of the source code, even though third parties are not
170compelled to copy the source along with the object code.
171
172 4. You may not copy, modify, sublicense, or distribute the Program
173except as expressly provided under this License. Any attempt
174otherwise to copy, modify, sublicense or distribute the Program is
175void, and will automatically terminate your rights under this License.
176However, parties who have received copies, or rights, from you under
177this License will not have their licenses terminated so long as such
178parties remain in full compliance.
179
180 5. You are not required to accept this License, since you have not
181signed it. However, nothing else grants you permission to modify or
182distribute the Program or its derivative works. These actions are
183prohibited by law if you do not accept this License. Therefore, by
184modifying or distributing the Program (or any work based on the
185Program), you indicate your acceptance of this License to do so, and
186all its terms and conditions for copying, distributing or modifying
187the Program or works based on it.
188
189 6. Each time you redistribute the Program (or any work based on the
190Program), the recipient automatically receives a license from the
191original licensor to copy, distribute or modify the Program subject to
192these terms and conditions. You may not impose any further
193restrictions on the recipients' exercise of the rights granted herein.
194You are not responsible for enforcing compliance by third parties to
195this License.
196
197 7. If, as a consequence of a court judgment or allegation of patent
198infringement or for any other reason (not limited to patent issues),
199conditions are imposed on you (whether by court order, agreement or
200otherwise) that contradict the conditions of this License, they do not
201excuse you from the conditions of this License. If you cannot
202distribute so as to satisfy simultaneously your obligations under this
203License and any other pertinent obligations, then as a consequence you
204may not distribute the Program at all. For example, if a patent
205license would not permit royalty-free redistribution of the Program by
206all those who receive copies directly or indirectly through you, then
207the only way you could satisfy both it and this License would be to
208refrain entirely from distribution of the Program.
209
210If any portion of this section is held invalid or unenforceable under
211any particular circumstance, the balance of the section is intended to
212apply and the section as a whole is intended to apply in other
213circumstances.
214
215It is not the purpose of this section to induce you to infringe any
216patents or other property right claims or to contest validity of any
217such claims; this section has the sole purpose of protecting the
218integrity of the free software distribution system, which is
219implemented by public license practices. Many people have made
220generous contributions to the wide range of software distributed
221through that system in reliance on consistent application of that
222system; it is up to the author/donor to decide if he or she is willing
223to distribute software through any other system and a licensee cannot
224impose that choice.
225
226This section is intended to make thoroughly clear what is believed to
227be a consequence of the rest of this License.
228
229 8. If the distribution and/or use of the Program is restricted in
230certain countries either by patents or by copyrighted interfaces, the
231original copyright holder who places the Program under this License
232may add an explicit geographical distribution limitation excluding
233those countries, so that distribution is permitted only in or among
234countries not thus excluded. In such case, this License incorporates
235the limitation as if written in the body of this License.
236
237 9. The Free Software Foundation may publish revised and/or new versions
238of the General Public License from time to time. Such new versions will
239be similar in spirit to the present version, but may differ in detail to
240address new problems or concerns.
241
242Each version is given a distinguishing version number. If the Program
243specifies a version number of this License which applies to it and "any
244later version", you have the option of following the terms and conditions
245either of that version or of any later version published by the Free
246Software Foundation. If the Program does not specify a version number of
247this License, you may choose any version ever published by the Free Software
248Foundation.
249
250 10. If you wish to incorporate parts of the Program into other free
251programs whose distribution conditions are different, write to the author
252to ask for permission. For software which is copyrighted by the Free
253Software Foundation, write to the Free Software Foundation; we sometimes
254make exceptions for this. Our decision will be guided by the two goals
255of preserving the free status of all derivatives of our free software and
256of promoting the sharing and reuse of software generally.
257
258 NO WARRANTY
259
260 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
261FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
262OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
263PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
264OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
265MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
266TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
267PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
268REPAIR OR CORRECTION.
269
270 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
271WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
272REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
273INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
274OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
275TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
276YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
277PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
278POSSIBILITY OF SUCH DAMAGES.
279
280 END OF TERMS AND CONDITIONS
281
282 Appendix: How to Apply These Terms to Your New Programs
283
284 If you develop a new program, and you want it to be of the greatest
285possible use to the public, the best way to achieve this is to make it
286free software which everyone can redistribute and change under these terms.
287
288 To do so, attach the following notices to the program. It is safest
289to attach them to the start of each source file to most effectively
290convey the exclusion of warranty; and each file should have at least
291the "copyright" line and a pointer to where the full notice is found.
292
293 <one line to give the program's name and a brief idea of what it does.>
294 Copyright (C) 19yy <name of author>
295
296 This program is free software; you can redistribute it and/or modify
297 it under the terms of the GNU General Public License as published by
298 the Free Software Foundation; either version 2 of the License, or
299 (at your option) any later version.
300
301 This program is distributed in the hope that it will be useful,
302 but WITHOUT ANY WARRANTY; without even the implied warranty of
303 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
304 GNU General Public License for more details.
305
306 You should have received a copy of the GNU General Public License
307 along with this program; if not, write to the Free Software
308 Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
309
310Also add information on how to contact you by electronic and paper mail.
311
312If the program is interactive, make it output a short notice like this
313when it starts in an interactive mode:
314
315 Gnomovision version 69, Copyright (C) 19yy name of author
316 Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
317 This is free software, and you are welcome to redistribute it
318 under certain conditions; type `show c' for details.
319
320The hypothetical commands `show w' and `show c' should show the appropriate
321parts of the General Public License. Of course, the commands you use may
322be called something other than `show w' and `show c'; they could even be
323mouse-clicks or menu items--whatever suits your program.
324
325You should also get your employer (if you work as a programmer) or your
326school, if any, to sign a "copyright disclaimer" for the program, if
327necessary. Here is a sample; alter the names:
328
329 Yoyodyne, Inc., hereby disclaims all copyright interest in the program
330 `Gnomovision' (which makes passes at compilers) written by James Hacker.
331
332 <signature of Ty Coon>, 1 April 1989
333 Ty Coon, President of Vice
334
335This General Public License does not permit incorporating your program into
336proprietary programs. If your program is a subroutine library, you may
337consider it more useful to permit linking proprietary applications with the
338library. If this is what you want to do, use the GNU Library General
339Public License instead of this License.
diff --git a/Makefile b/Makefile
index 9d35b43..8ebea66 100644
--- a/Makefile
+++ b/Makefile
@@ -1,30 +1,46 @@
1 1
2CC = gcc 2CC=gcc
3SH = /bin/sh 3CFLAGS=-Wall -O2 -fomit-frame-pointer -fno-strength-reduce
4 4
5CFLAGS = -O3 -fomit-frame-pointer -funroll-loops 5OBJS= blocksort.o \
6 6 huffman.o \
7 crctable.o \
8 randtable.o \
9 compress.o \
10 decompress.o \
11 bzlib.o
12
13all: lib bzip2 test
14
15bzip2: lib
16 $(CC) $(CFLAGS) -c bzip2.c
17 $(CC) $(CFLAGS) -o bzip2 bzip2.o -L. -lbz2
18 $(CC) $(CFLAGS) -o bzip2recover bzip2recover.c
7 19
20lib: $(OBJS)
21 rm -f libbz2.a
22 ar clq libbz2.a $(OBJS)
8 23
9all: 24test: bzip2
10 cat words0 25 @cat words1
11 $(CC) $(CFLAGS) -o bzip2 bzip2.c
12 $(CC) $(CFLAGS) -o bzip2recover bzip2recover.c
13 rm -f bunzip2
14 ln -s ./bzip2 ./bunzip2
15 cat words1
16 ./bzip2 -1 < sample1.ref > sample1.rb2 26 ./bzip2 -1 < sample1.ref > sample1.rb2
17 ./bzip2 -2 < sample2.ref > sample2.rb2 27 ./bzip2 -2 < sample2.ref > sample2.rb2
18 ./bunzip2 < sample1.bz2 > sample1.tst 28 ./bzip2 -d < sample1.bz2 > sample1.tst
19 ./bunzip2 < sample2.bz2 > sample2.tst 29 ./bzip2 -d < sample2.bz2 > sample2.tst
20 cat words2 30 @cat words2
21 cmp sample1.bz2 sample1.rb2 31 cmp sample1.bz2 sample1.rb2
22 cmp sample2.bz2 sample2.rb2 32 cmp sample2.bz2 sample2.rb2
23 cmp sample1.tst sample1.ref 33 cmp sample1.tst sample1.ref
24 cmp sample2.tst sample2.ref 34 cmp sample2.tst sample2.ref
25 cat words3 35 @cat words3
36
37
38clean:
39 rm -f *.o libbz2.a bzip2 bzip2recover sample1.rb2 sample2.rb2 sample1.tst sample2.tst
26 40
41.c.o: $*.o bzlib.h bzlib_private.h
42 $(CC) $(CFLAGS) -c $*.c -o $*.o
27 43
28clean: 44tarfile:
29 rm -f bzip2 bunzip2 bzip2recover sample*.tst sample*.rb2 45 tar cvf interim.tar *.c *.h Makefile manual.texi manual.ps LICENSE bzip2.1 bzip2.1.preformatted bzip2.txt words1 words2 words3 sample1.ref sample2.ref sample1.bz2 sample2.bz2 *.html README CHANGES libbz2.def libbz2.dsp dlltest.dsp
30 46
diff --git a/README b/README
index d58bb49..2f59ef7 100644
--- a/README
+++ b/README
@@ -1,194 +1,61 @@
1 1
2GREETINGS!
3 2
4 This is the README for bzip2, my block-sorting file compressor, 3This is the README for bzip2, a block-sorting file compressor, version
5 version 0.1. 40.9.0. This version is fully compatible with the previous public
5release, bzip2-0.1pl2.
6 6
7 bzip2 is distributed under the GNU General Public License version 2; 7bzip2-0.9.0 is distributed under a BSD-style license. For details,
8 for details, see the file LICENSE. Pointers to the algorithms used 8see the file LICENSE.
9 are in ALGORITHMS. Instructions for use are in bzip2.1.preformatted.
10 9
11 Please read all of this file carefully. 10Complete documentation is available in Postscript form (manual.ps)
11or html (manual_toc.html). A plain-text version of the manual page is
12available as bzip2.txt.
12 13
13 14
15HOW TO BUILD -- UNIX
14 16
15HOW TO BUILD 17Type `make'.
16 18
17 -- for UNIX: 19This creates binaries "bzip2" and "bzip2recover".
18 20
19 Type `make'. (tough, huh? :-) 21It also runs four compress-decompress tests to make sure things are
22working properly. If all goes well, you should be up & running.
23Please be sure to read the output from `make' just to be sure that the
24tests went ok.
20 25
21 This creates binaries "bzip2", and "bunzip2", 26To install bzip2 properly:
22 which is a symbolic link to "bzip2".
23 27
24 It also runs four compress-decompress tests to make sure 28* Copy the binaries "bzip2" and "bzip2recover" to a publically visible
25 things are working properly. If all goes well, you should be up & 29 place, possibly /usr/bin or /usr/local/bin.
26 running. Please be sure to read the output from `make'
27 just to be sure that the tests went ok.
28 30
29 To install bzip2 properly: 31* In that directory, make "bunzip2" and "bzcat" be symbolic links
32 to "bzip2".
30 33
31 -- Copy the binary "bzip2" to a publically visible place, 34* Copy the manual page, bzip2.1, to the relevant place.
32 possibly /usr/bin, /usr/common/bin or /usr/local/bin. 35 Probably the right place is /usr/man/man1/.
33
34 -- In that directory, make "bunzip2" be a symbolic link
35 to "bzip2".
36
37 -- Copy the manual page, bzip2.1, to the relevant place.
38 Probably the right place is /usr/man/man1/.
39
40 -- for Windows 95 and NT:
41 36
42 For a start, do you *really* want to recompile bzip2? 37If you want to program with the library, you'll need to copy libbz2.a
43 The standard distribution includes a pre-compiled version 38and bzlib.h to /usr/lib and /usr/include respectively.
44 for Windows 95 and NT, `bzip2.exe'. 39
45 40
46 This executable was created with Jacob Navia's excellent 41HOW TO BUILD -- Windows 95, NT, DOS, Mac, etc.
47 port to Win32 of Chris Fraser & David Hanson's excellent
48 ANSI C compiler, "lcc". You can get to it at the pages
49 of the CS department of Princeton University,
50 www.cs.princeton.edu.
51 I have not tried to compile this version of bzip2 with
52 a commercial C compiler such as MS Visual C, as I don't
53 have one available.
54
55 Note that lcc is designed primarily to be portable and
56 fast. Code quality is a secondary aim, so bzip2.exe
57 runs perhaps 40% slower than it could if compiled with
58 a good optimising compiler.
59
60 I compiled a previous version of bzip (0.21) with Borland
61 C 5.0, which worked fine, and with MS VC++ 2.0, which
62 didn't. Here is an comment from the README for bzip-0.21.
63
64 MS VC++ 2.0's optimising compiler has a bug which, at
65 maximum optimisation, gives an executable which produces
66 garbage compressed files. Proceed with caution.
67 I do not know whether or not this happens with later
68 versions of VC++.
69
70 Edit the defines starting at line 86 of bzip.c to
71 select your platform/compiler combination, and then compile.
72 Then check that the resulting executable (assumed to be
73 called bzip.exe) works correctly, using the SELFTEST.BAT file.
74 Bearing in mind the previous paragraph, the self-test is
75 important.
76
77 Note that the defines which bzip-0.21 had, to support
78 compilation with VC 2.0 and BC 5.0, are gone. Windows
79 is not my preferred operating system, and I am, for the
80 moment, content with the modestly fast executable created
81 by lcc-win32.
82
83 A manual page is supplied, unformatted (bzip2.1),
84 preformatted (bzip2.1.preformatted), and preformatted
85 and sanitised for MS-DOS (bzip2.txt).
86
87
88
89COMPILATION NOTES
90
91 bzip2 should work on any 32 or 64-bit machine. It is known to work
92 [meaning: it has compiled and passed self-tests] on the
93 following platform-os combinations:
94
95 Intel i386/i486 running Linux 2.0.21
96 Sun Sparcs (various) running SunOS 4.1.4 and Solaris 2.5
97 Intel i386/i486 running Windows 95 and NT
98 DEC Alpha running Digital Unix 4.0
99
100 Following the release of bzip-0.21, many people mailed me
101 from around the world to say they had made it work on all sorts
102 of weird and wonderful machines. Chances are, if you have
103 a reasonable ANSI C compiler and a 32-bit machine, you can
104 get it to work.
105
106 The #defines starting at around line 82 of bzip2.c supply some
107 degree of platform-independance. If you configure bzip2 for some
108 new far-out platform which is not covered by the existing definitions,
109 please send me the relevant definitions.
110
111 I recommend GNU C for compilation. The code is standard ANSI C,
112 except for the Unix-specific file handling, so any ANSI C compiler
113 should work. Note however that the many routines marked INLINE
114 should be inlined by your compiler, else performance will be very
115 poor. Asking your compiler to unroll loops gives some
116 small improvement too; for gcc, the relevant flag is
117 -funroll-loops.
118
119 On a 386/486 machines, I'd recommend giving gcc the
120 -fomit-frame-pointer flag; this liberates another register for
121 allocation, which measurably improves performance.
122
123 I used the abovementioned lcc compiler to develop bzip2.
124 I would highly recommend this compiler for day-to-day development;
125 it is fast, reliable, lightweight, has an excellent profiler,
126 and is generally excellent. And it's fun to retarget, if you're
127 into that kind of thing.
128
129 If you compile bzip2 on a new platform or with a new compiler,
130 please be sure to run the four compress-decompress tests, either
131 using the Makefile, or with the test.bat (MSDOS) or test.cmd (OS/2)
132 files. Some compilers have been seen to introduce subtle bugs
133 when optimising, so this check is important. Ideally you should
134 then go on to test bzip2 on a file several megabytes or even
135 tens of megabytes long, just to be 110% sure. ``Professional
136 programmers are paranoid programmers.'' (anon).
137 42
43It's difficult for me to support compilation on all these platforms.
44My approach is to collect binaries for these platforms, and put them
45on my web page (http://www.muraroa.demon.co.uk). Look there.
138 46
139 47
140VALIDATION 48VALIDATION
141 49
142 Correct operation, in the sense that a compressed file can always be 50Correct operation, in the sense that a compressed file can always be
143 decompressed to reproduce the original, is obviously of paramount 51decompressed to reproduce the original, is obviously of paramount
144 importance. To validate bzip2, I used a modified version of 52importance. To validate bzip2, I used a modified version of Mark
145 Mark Nelson's churn program. Churn is an automated test driver 53Nelson's churn program. Churn is an automated test driver which
146 which recursively traverses a directory structure, using bzip2 to 54recursively traverses a directory structure, using bzip2 to compress
147 compress and then decompress each file it encounters, and checking 55and then decompress each file it encounters, and checking that the
148 that the decompressed data is the same as the original. As test 56decompressed data is the same as the original. There are more details
149 material, I used several runs over several filesystems of differing 57in Section 4 of the user guide.
150 sizes.
151
152 One set of tests was done on my base Linux filesystem,
153 410 megabytes in 23,000 files. There were several runs over
154 this filesystem, in various configurations designed to break bzip2.
155 That filesystem also contained some specially constructed test
156 files designed to exercise boundary cases in the code.
157 This included files of zero length, various long, highly repetitive
158 files, and some files which generate blocks with all values the same.
159 58
160 The other set of tests was done just with the "normal" configuration,
161 but on a much larger quantity of data.
162
163 Tests are:
164
165 Linux FS, 410M, 23000 files
166
167 As above, with --repetitive-fast
168
169 As above, with -1
170
171 Low level disk image of a disk containing
172 Windows NT4.0; 420M in a single huge file
173
174 Linux distribution, incl Slackware,
175 all GNU sources. 1900M in 2300 files.
176
177 Approx ~100M compiler sources and related
178 programming tools, running under Purify.
179
180 About 500M of data in 120 files of around
181 4 M each. This is raw data from a
182 biomagnetometer (SQUID-based thing).
183
184 Overall, total volume of test data is about
185 3300 megabytes in 25000 files.
186
187 The distribution does four tests after building bzip. These tests
188 include test decompressions of pre-supplied compressed files, so
189 they not only test that bzip works correctly on the machine it was
190 built on, but can also decompress files compressed on a different
191 machine. This guards against unforseen interoperability problems.
192 59
193 60
194Please read and be aware of the following: 61Please read and be aware of the following:
@@ -234,14 +101,30 @@ PATENTS:
234End of legalities. 101End of legalities.
235 102
236 103
104WHAT'S NEW IN 0.9.0 (as compared to 0.1pl2) ?
105
106 * Approx 10% faster compression, 30% faster decompression
107 * -t (test mode) is a lot quicker
108 * Can decompress concatenated compressed files
109 * Programming interface, so programs can directly read/write .bz2 files
110 * Less restrictive (BSD-style) licensing
111 * Flag handling more compatible with GNU gzip
112 * Much more documentation, i.e., a proper user manual
113 * Hopefully, improved portability (at least of the library)
114
115
237I hope you find bzip2 useful. Feel free to contact me at 116I hope you find bzip2 useful. Feel free to contact me at
238 jseward@acm.org 117 jseward@acm.org
239if you have any suggestions or queries. Many people mailed me with 118if you have any suggestions or queries. Many people mailed me with
240comments, suggestions and patches after the releases of 0.15 and 0.21, 119comments, suggestions and patches after the releases of bzip-0.15,
241and the changes in bzip2 are largely a result of this feedback. 120bzip-0.21 and bzip2-0.1pl2, and the changes in bzip2 are largely a
242I thank you for your comments. 121result of this feedback. I thank you for your comments.
122
123At least for the time being, bzip2's "home" is
124http://www.muraroa.demon.co.uk.
243 125
244Julian Seward 126Julian Seward
127jseward@acm.org
245 128
246Manchester, UK 129Manchester, UK
24718 July 1996 (version 0.15) 13018 July 1996 (version 0.15)
@@ -250,4 +133,5 @@ Manchester, UK
250Guildford, Surrey, UK 133Guildford, Surrey, UK
2517 August 1997 (bzip2, version 0.1) 1347 August 1997 (bzip2, version 0.1)
25229 August 1997 (bzip2, version 0.1pl2) 13529 August 1997 (bzip2, version 0.1pl2)
13623 August 1998 (bzip2, version 0.9.0)
253 137
diff --git a/README.DOS b/README.DOS
deleted file mode 100644
index 048de8c..0000000
--- a/README.DOS
+++ /dev/null
@@ -1,16 +0,0 @@
1
2As of today (3 March 1998) I've removed the
3Win95/NT executables from this distribution, sorry.
4
5You can still get an executable from
6http://www.muraroa.demon.co.uk, or (as a last
7resort) by mailing me at jseward@acm.org.
8
9The reason for this change of packaging is that it
10makes it easier for me to fix problems with specific
11executables if they are not included in the main
12distribution.
13
14J
15
16
diff --git a/blocksort.c b/blocksort.c
new file mode 100644
index 0000000..d8bb26a
--- /dev/null
+++ b/blocksort.c
@@ -0,0 +1,709 @@
1
2/*-------------------------------------------------------------*/
3/*--- Block sorting machinery ---*/
4/*--- blocksort.c ---*/
5/*-------------------------------------------------------------*/
6
7/*--
8 This file is a part of bzip2 and/or libbzip2, a program and
9 library for lossless, block-sorting data compression.
10
11 Copyright (C) 1996-1998 Julian R Seward. All rights reserved.
12
13 Redistribution and use in source and binary forms, with or without
14 modification, are permitted provided that the following conditions
15 are met:
16
17 1. Redistributions of source code must retain the above copyright
18 notice, this list of conditions and the following disclaimer.
19
20 2. The origin of this software must not be misrepresented; you must
21 not claim that you wrote the original software. If you use this
22 software in a product, an acknowledgment in the product
23 documentation would be appreciated but is not required.
24
25 3. Altered source versions must be plainly marked as such, and must
26 not be misrepresented as being the original software.
27
28 4. The name of the author may not be used to endorse or promote
29 products derived from this software without specific prior written
30 permission.
31
32 THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
33 OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
34 WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
35 ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
36 DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
37 DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
38 GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
39 INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
40 WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
41 NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
42 SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
43
44 Julian Seward, Guildford, Surrey, UK.
45 jseward@acm.org
46 bzip2/libbzip2 version 0.9.0c of 18 October 1998
47
48 This program is based on (at least) the work of:
49 Mike Burrows
50 David Wheeler
51 Peter Fenwick
52 Alistair Moffat
53 Radford Neal
54 Ian H. Witten
55 Robert Sedgewick
56 Jon L. Bentley
57
58 For more information on these sources, see the manual.
59--*/
60
61
62#include "bzlib_private.h"
63
64/*---------------------------------------------*/
65/*--
66 Compare two strings in block. We assume (see
67 discussion above) that i1 and i2 have a max
68 offset of 10 on entry, and that the first
69 bytes of both block and quadrant have been
70 copied into the "overshoot area", ie
71 into the subscript range
72 [nblock .. nblock+NUM_OVERSHOOT_BYTES-1].
73--*/
74static __inline__ Bool fullGtU ( UChar* block,
75 UInt16* quadrant,
76 UInt32 nblock,
77 Int32* workDone,
78 Int32 i1,
79 Int32 i2
80 )
81{
82 Int32 k;
83 UChar c1, c2;
84 UInt16 s1, s2;
85
86 AssertD ( i1 != i2, "fullGtU(1)" );
87
88 c1 = block[i1];
89 c2 = block[i2];
90 if (c1 != c2) return (c1 > c2);
91 i1++; i2++;
92
93 c1 = block[i1];
94 c2 = block[i2];
95 if (c1 != c2) return (c1 > c2);
96 i1++; i2++;
97
98 c1 = block[i1];
99 c2 = block[i2];
100 if (c1 != c2) return (c1 > c2);
101 i1++; i2++;
102
103 c1 = block[i1];
104 c2 = block[i2];
105 if (c1 != c2) return (c1 > c2);
106 i1++; i2++;
107
108 c1 = block[i1];
109 c2 = block[i2];
110 if (c1 != c2) return (c1 > c2);
111 i1++; i2++;
112
113 c1 = block[i1];
114 c2 = block[i2];
115 if (c1 != c2) return (c1 > c2);
116 i1++; i2++;
117
118 k = nblock;
119
120 do {
121
122 c1 = block[i1];
123 c2 = block[i2];
124 if (c1 != c2) return (c1 > c2);
125 s1 = quadrant[i1];
126 s2 = quadrant[i2];
127 if (s1 != s2) return (s1 > s2);
128 i1++; i2++;
129
130 c1 = block[i1];
131 c2 = block[i2];
132 if (c1 != c2) return (c1 > c2);
133 s1 = quadrant[i1];
134 s2 = quadrant[i2];
135 if (s1 != s2) return (s1 > s2);
136 i1++; i2++;
137
138 c1 = block[i1];
139 c2 = block[i2];
140 if (c1 != c2) return (c1 > c2);
141 s1 = quadrant[i1];
142 s2 = quadrant[i2];
143 if (s1 != s2) return (s1 > s2);
144 i1++; i2++;
145
146 c1 = block[i1];
147 c2 = block[i2];
148 if (c1 != c2) return (c1 > c2);
149 s1 = quadrant[i1];
150 s2 = quadrant[i2];
151 if (s1 != s2) return (s1 > s2);
152 i1++; i2++;
153
154 if (i1 >= nblock) i1 -= nblock;
155 if (i2 >= nblock) i2 -= nblock;
156
157 k -= 4;
158 (*workDone)++;
159 }
160 while (k >= 0);
161
162 return False;
163}
164
165/*---------------------------------------------*/
166/*--
167 Knuth's increments seem to work better
168 than Incerpi-Sedgewick here. Possibly
169 because the number of elems to sort is
170 usually small, typically <= 20.
171--*/
172static Int32 incs[14] = { 1, 4, 13, 40, 121, 364, 1093, 3280,
173 9841, 29524, 88573, 265720,
174 797161, 2391484 };
175
176static void simpleSort ( EState* s, Int32 lo, Int32 hi, Int32 d )
177{
178 Int32 i, j, h, bigN, hp;
179 Int32 v;
180
181 UChar* block = s->block;
182 UInt32* zptr = s->zptr;
183 UInt16* quadrant = s->quadrant;
184 Int32* workDone = &(s->workDone);
185 Int32 nblock = s->nblock;
186 Int32 workLimit = s->workLimit;
187 Bool firstAttempt = s->firstAttempt;
188
189 bigN = hi - lo + 1;
190 if (bigN < 2) return;
191
192 hp = 0;
193 while (incs[hp] < bigN) hp++;
194 hp--;
195
196 for (; hp >= 0; hp--) {
197 h = incs[hp];
198 i = lo + h;
199 while (True) {
200
201 /*-- copy 1 --*/
202 if (i > hi) break;
203 v = zptr[i];
204 j = i;
205 while ( fullGtU ( block, quadrant, nblock, workDone,
206 zptr[j-h]+d, v+d ) ) {
207 zptr[j] = zptr[j-h];
208 j = j - h;
209 if (j <= (lo + h - 1)) break;
210 }
211 zptr[j] = v;
212 i++;
213
214 /*-- copy 2 --*/
215 if (i > hi) break;
216 v = zptr[i];
217 j = i;
218 while ( fullGtU ( block, quadrant, nblock, workDone,
219 zptr[j-h]+d, v+d ) ) {
220 zptr[j] = zptr[j-h];
221 j = j - h;
222 if (j <= (lo + h - 1)) break;
223 }
224 zptr[j] = v;
225 i++;
226
227 /*-- copy 3 --*/
228 if (i > hi) break;
229 v = zptr[i];
230 j = i;
231 while ( fullGtU ( block, quadrant, nblock, workDone,
232 zptr[j-h]+d, v+d ) ) {
233 zptr[j] = zptr[j-h];
234 j = j - h;
235 if (j <= (lo + h - 1)) break;
236 }
237 zptr[j] = v;
238 i++;
239
240 if (*workDone > workLimit && firstAttempt) return;
241 }
242 }
243}
244
245
246/*---------------------------------------------*/
247/*--
248 The following is an implementation of
249 an elegant 3-way quicksort for strings,
250 described in a paper "Fast Algorithms for
251 Sorting and Searching Strings", by Robert
252 Sedgewick and Jon L. Bentley.
253--*/
254
255#define swap(lv1, lv2) \
256 { Int32 tmp = lv1; lv1 = lv2; lv2 = tmp; }
257
258static void vswap ( UInt32* zptr, Int32 p1, Int32 p2, Int32 n )
259{
260 while (n > 0) {
261 swap(zptr[p1], zptr[p2]);
262 p1++; p2++; n--;
263 }
264}
265
266static UChar med3 ( UChar a, UChar b, UChar c )
267{
268 UChar t;
269 if (a > b) { t = a; a = b; b = t; };
270 if (b > c) { t = b; b = c; c = t; };
271 if (a > b) b = a;
272 return b;
273}
274
275
276#define min(a,b) ((a) < (b)) ? (a) : (b)
277
278typedef
279 struct { Int32 ll; Int32 hh; Int32 dd; }
280 StackElem;
281
282#define push(lz,hz,dz) { stack[sp].ll = lz; \
283 stack[sp].hh = hz; \
284 stack[sp].dd = dz; \
285 sp++; }
286
287#define pop(lz,hz,dz) { sp--; \
288 lz = stack[sp].ll; \
289 hz = stack[sp].hh; \
290 dz = stack[sp].dd; }
291
292#define SMALL_THRESH 20
293#define DEPTH_THRESH 10
294
295/*--
296 If you are ever unlucky/improbable enough
297 to get a stack overflow whilst sorting,
298 increase the following constant and try
299 again. In practice I have never seen the
300 stack go above 27 elems, so the following
301 limit seems very generous.
302--*/
303#define QSORT_STACK_SIZE 1000
304
305
306static void qSort3 ( EState* s, Int32 loSt, Int32 hiSt, Int32 dSt )
307{
308 Int32 unLo, unHi, ltLo, gtHi, med, n, m;
309 Int32 sp, lo, hi, d;
310 StackElem stack[QSORT_STACK_SIZE];
311
312 UChar* block = s->block;
313 UInt32* zptr = s->zptr;
314 Int32* workDone = &(s->workDone);
315 Int32 workLimit = s->workLimit;
316 Bool firstAttempt = s->firstAttempt;
317
318 sp = 0;
319 push ( loSt, hiSt, dSt );
320
321 while (sp > 0) {
322
323 AssertH ( sp < QSORT_STACK_SIZE, 1001 );
324
325 pop ( lo, hi, d );
326
327 if (hi - lo < SMALL_THRESH || d > DEPTH_THRESH) {
328 simpleSort ( s, lo, hi, d );
329 if (*workDone > workLimit && firstAttempt) return;
330 continue;
331 }
332
333 med = med3 ( block[zptr[ lo ]+d],
334 block[zptr[ hi ]+d],
335 block[zptr[ (lo+hi)>>1 ]+d] );
336
337 unLo = ltLo = lo;
338 unHi = gtHi = hi;
339
340 while (True) {
341 while (True) {
342 if (unLo > unHi) break;
343 n = ((Int32)block[zptr[unLo]+d]) - med;
344 if (n == 0) { swap(zptr[unLo], zptr[ltLo]); ltLo++; unLo++; continue; };
345 if (n > 0) break;
346 unLo++;
347 }
348 while (True) {
349 if (unLo > unHi) break;
350 n = ((Int32)block[zptr[unHi]+d]) - med;
351 if (n == 0) { swap(zptr[unHi], zptr[gtHi]); gtHi--; unHi--; continue; };
352 if (n < 0) break;
353 unHi--;
354 }
355 if (unLo > unHi) break;
356 swap(zptr[unLo], zptr[unHi]); unLo++; unHi--;
357 }
358
359 AssertD ( unHi == unLo-1, "bad termination in qSort3" );
360
361 if (gtHi < ltLo) {
362 push(lo, hi, d+1 );
363 continue;
364 }
365
366 n = min(ltLo-lo, unLo-ltLo); vswap(zptr, lo, unLo-n, n);
367 m = min(hi-gtHi, gtHi-unHi); vswap(zptr, unLo, hi-m+1, m);
368
369 n = lo + unLo - ltLo - 1;
370 m = hi - (gtHi - unHi) + 1;
371
372 push ( lo, n, d );
373 push ( n+1, m-1, d+1 );
374 push ( m, hi, d );
375 }
376}
377
378
379/*---------------------------------------------*/
380
381#define BIGFREQ(b) (ftab[((b)+1) << 8] - ftab[(b) << 8])
382
383#define SETMASK (1 << 21)
384#define CLEARMASK (~(SETMASK))
385
386static void sortMain ( EState* s )
387{
388 Int32 i, j, k, ss, sb;
389 Int32 runningOrder[256];
390 Int32 copy[256];
391 Bool bigDone[256];
392 UChar c1, c2;
393 Int32 numQSorted;
394
395 UChar* block = s->block;
396 UInt32* zptr = s->zptr;
397 UInt16* quadrant = s->quadrant;
398 Int32* ftab = s->ftab;
399 Int32* workDone = &(s->workDone);
400 Int32 nblock = s->nblock;
401 Int32 workLimit = s->workLimit;
402 Bool firstAttempt = s->firstAttempt;
403
404 /*--
405 In the various block-sized structures, live data runs
406 from 0 to last+NUM_OVERSHOOT_BYTES inclusive. First,
407 set up the overshoot area for block.
408 --*/
409
410 if (s->verbosity >= 4)
411 VPrintf0( " sort initialise ...\n" );
412
413 for (i = 0; i < BZ_NUM_OVERSHOOT_BYTES; i++)
414 block[nblock+i] = block[i % nblock];
415 for (i = 0; i < nblock+BZ_NUM_OVERSHOOT_BYTES; i++)
416 quadrant[i] = 0;
417
418
419 if (nblock <= 4000) {
420
421 /*--
422 Use simpleSort(), since the full sorting mechanism
423 has quite a large constant overhead.
424 --*/
425 if (s->verbosity >= 4) VPrintf0( " simpleSort ...\n" );
426 for (i = 0; i < nblock; i++) zptr[i] = i;
427 firstAttempt = False;
428 *workDone = workLimit = 0;
429 simpleSort ( s, 0, nblock-1, 0 );
430 if (s->verbosity >= 4) VPrintf0( " simpleSort done.\n" );
431
432 } else {
433
434 numQSorted = 0;
435 for (i = 0; i <= 255; i++) bigDone[i] = False;
436
437 if (s->verbosity >= 4) VPrintf0( " bucket sorting ...\n" );
438
439 for (i = 0; i <= 65536; i++) ftab[i] = 0;
440
441 c1 = block[nblock-1];
442 for (i = 0; i < nblock; i++) {
443 c2 = block[i];
444 ftab[(c1 << 8) + c2]++;
445 c1 = c2;
446 }
447
448 for (i = 1; i <= 65536; i++) ftab[i] += ftab[i-1];
449
450 c1 = block[0];
451 for (i = 0; i < nblock-1; i++) {
452 c2 = block[i+1];
453 j = (c1 << 8) + c2;
454 c1 = c2;
455 ftab[j]--;
456 zptr[ftab[j]] = i;
457 }
458 j = (block[nblock-1] << 8) + block[0];
459 ftab[j]--;
460 zptr[ftab[j]] = nblock-1;
461
462 /*--
463 Now ftab contains the first loc of every small bucket.
464 Calculate the running order, from smallest to largest
465 big bucket.
466 --*/
467
468 for (i = 0; i <= 255; i++) runningOrder[i] = i;
469
470 {
471 Int32 vv;
472 Int32 h = 1;
473 do h = 3 * h + 1; while (h <= 256);
474 do {
475 h = h / 3;
476 for (i = h; i <= 255; i++) {
477 vv = runningOrder[i];
478 j = i;
479 while ( BIGFREQ(runningOrder[j-h]) > BIGFREQ(vv) ) {
480 runningOrder[j] = runningOrder[j-h];
481 j = j - h;
482 if (j <= (h - 1)) goto zero;
483 }
484 zero:
485 runningOrder[j] = vv;
486 }
487 } while (h != 1);
488 }
489
490 /*--
491 The main sorting loop.
492 --*/
493
494 for (i = 0; i <= 255; i++) {
495
496 /*--
497 Process big buckets, starting with the least full.
498 Basically this is a 4-step process in which we call
499 qSort3 to sort the small buckets [ss, j], but
500 also make a big effort to avoid the calls if we can.
501 --*/
502 ss = runningOrder[i];
503
504 /*--
505 Step 1:
506 Complete the big bucket [ss] by quicksorting
507 any unsorted small buckets [ss, j], for j != ss.
508 Hopefully previous pointer-scanning phases have already
509 completed many of the small buckets [ss, j], so
510 we don't have to sort them at all.
511 --*/
512 for (j = 0; j <= 255; j++) {
513 if (j != ss) {
514 sb = (ss << 8) + j;
515 if ( ! (ftab[sb] & SETMASK) ) {
516 Int32 lo = ftab[sb] & CLEARMASK;
517 Int32 hi = (ftab[sb+1] & CLEARMASK) - 1;
518 if (hi > lo) {
519 if (s->verbosity >= 4)
520 VPrintf4( " qsort [0x%x, 0x%x] done %d this %d\n",
521 ss, j, numQSorted, hi - lo + 1 );
522 qSort3 ( s, lo, hi, 2 );
523 numQSorted += ( hi - lo + 1 );
524 if (*workDone > workLimit && firstAttempt) return;
525 }
526 }
527 ftab[sb] |= SETMASK;
528 }
529 }
530
531 /*--
532 Step 2:
533 Deal specially with case [ss, ss]. This establishes the
534 sorted order for [ss, ss] without any comparisons.
535 A clever trick, cryptically described as steps Q6b and Q6c
536 in SRC-124 (aka BW94). This makes it entirely practical to
537 not use a preliminary run-length coder, but unfortunately
538 we are now stuck with the .bz2 file format.
539 --*/
540 {
541 Int32 put0, get0, put1, get1;
542 Int32 sbn = (ss << 8) + ss;
543 Int32 lo = ftab[sbn] & CLEARMASK;
544 Int32 hi = (ftab[sbn+1] & CLEARMASK) - 1;
545 UChar ssc = (UChar)ss;
546 put0 = lo;
547 get0 = ftab[ss << 8] & CLEARMASK;
548 put1 = hi;
549 get1 = (ftab[(ss+1) << 8] & CLEARMASK) - 1;
550 while (get0 < put0) {
551 j = zptr[get0]-1; if (j < 0) j += nblock;
552 c1 = block[j];
553 if (c1 == ssc) { zptr[put0] = j; put0++; };
554 get0++;
555 }
556 while (get1 > put1) {
557 j = zptr[get1]-1; if (j < 0) j += nblock;
558 c1 = block[j];
559 if (c1 == ssc) { zptr[put1] = j; put1--; };
560 get1--;
561 }
562 ftab[sbn] |= SETMASK;
563 }
564
565 /*--
566 Step 3:
567 The [ss] big bucket is now done. Record this fact,
568 and update the quadrant descriptors. Remember to
569 update quadrants in the overshoot area too, if
570 necessary. The "if (i < 255)" test merely skips
571 this updating for the last bucket processed, since
572 updating for the last bucket is pointless.
573
574 The quadrant array provides a way to incrementally
575 cache sort orderings, as they appear, so as to
576 make subsequent comparisons in fullGtU() complete
577 faster. For repetitive blocks this makes a big
578 difference (but not big enough to be able to avoid
579 randomisation for very repetitive data.)
580
581 The precise meaning is: at all times:
582
583 for 0 <= i < nblock and 0 <= j <= nblock
584
585 if block[i] != block[j],
586
587 then the relative values of quadrant[i] and
588 quadrant[j] are meaningless.
589
590 else {
591 if quadrant[i] < quadrant[j]
592 then the string starting at i lexicographically
593 precedes the string starting at j
594
595 else if quadrant[i] > quadrant[j]
596 then the string starting at j lexicographically
597 precedes the string starting at i
598
599 else
600 the relative ordering of the strings starting
601 at i and j has not yet been determined.
602 }
603 --*/
604 bigDone[ss] = True;
605
606 if (i < 255) {
607 Int32 bbStart = ftab[ss << 8] & CLEARMASK;
608 Int32 bbSize = (ftab[(ss+1) << 8] & CLEARMASK) - bbStart;
609 Int32 shifts = 0;
610
611 while ((bbSize >> shifts) > 65534) shifts++;
612
613 for (j = 0; j < bbSize; j++) {
614 Int32 a2update = zptr[bbStart + j];
615 UInt16 qVal = (UInt16)(j >> shifts);
616 quadrant[a2update] = qVal;
617 if (a2update < BZ_NUM_OVERSHOOT_BYTES)
618 quadrant[a2update + nblock] = qVal;
619 }
620
621 AssertH ( ( ((bbSize-1) >> shifts) <= 65535 ), 1002 );
622 }
623
624 /*--
625 Step 4:
626 Now scan this big bucket [ss] so as to synthesise the
627 sorted order for small buckets [t, ss] for all t != ss.
628 This will avoid doing Real Work in subsequent Step 1's.
629 --*/
630 for (j = 0; j <= 255; j++)
631 copy[j] = ftab[(j << 8) + ss] & CLEARMASK;
632
633 for (j = ftab[ss << 8] & CLEARMASK;
634 j < (ftab[(ss+1) << 8] & CLEARMASK);
635 j++) {
636 k = zptr[j]-1; if (k < 0) k += nblock;
637 c1 = block[k];
638 if ( ! bigDone[c1] ) {
639 zptr[copy[c1]] = k;
640 copy[c1] ++;
641 }
642 }
643
644 for (j = 0; j <= 255; j++) ftab[(j << 8) + ss] |= SETMASK;
645 }
646 if (s->verbosity >= 4)
647 VPrintf3( " %d pointers, %d sorted, %d scanned\n",
648 nblock, numQSorted, nblock - numQSorted );
649 }
650}
651
652
653/*---------------------------------------------*/
654static void randomiseBlock ( EState* s )
655{
656 Int32 i;
657 BZ_RAND_INIT_MASK;
658 for (i = 0; i < 256; i++) s->inUse[i] = False;
659
660 for (i = 0; i < s->nblock; i++) {
661 BZ_RAND_UPD_MASK;
662 s->block[i] ^= BZ_RAND_MASK;
663 s->inUse[s->block[i]] = True;
664 }
665}
666
667
668/*---------------------------------------------*/
669void blockSort ( EState* s )
670{
671 Int32 i;
672
673 s->workLimit = s->workFactor * (s->nblock - 1);
674 s->workDone = 0;
675 s->blockRandomised = False;
676 s->firstAttempt = True;
677
678 sortMain ( s );
679
680 if (s->verbosity >= 3)
681 VPrintf3( " %d work, %d block, ratio %5.2f\n",
682 s->workDone, s->nblock-1,
683 (float)(s->workDone) / (float)(s->nblock-1) );
684
685 if (s->workDone > s->workLimit && s->firstAttempt) {
686 if (s->verbosity >= 2)
687 VPrintf0( " sorting aborted; randomising block\n" );
688 randomiseBlock ( s );
689 s->workLimit = s->workDone = 0;
690 s->blockRandomised = True;
691 s->firstAttempt = False;
692 sortMain ( s );
693 if (s->verbosity >= 3)
694 VPrintf3( " %d work, %d block, ratio %f\n",
695 s->workDone, s->nblock-1,
696 (float)(s->workDone) / (float)(s->nblock-1) );
697 }
698
699 s->origPtr = -1;
700 for (i = 0; i < s->nblock; i++)
701 if (s->zptr[i] == 0)
702 { s->origPtr = i; break; };
703
704 AssertH( s->origPtr != -1, 1003 );
705}
706
707/*-------------------------------------------------------------*/
708/*--- end blocksort.c ---*/
709/*-------------------------------------------------------------*/
diff --git a/bzip2.1 b/bzip2.1
index 489668f..a6789a4 100644
--- a/bzip2.1
+++ b/bzip2.1
@@ -1,21 +1,29 @@
1.PU 1.PU
2.TH bzip2 1 2.TH bzip2 1
3.SH NAME 3.SH NAME
4bzip2, bunzip2 \- a block-sorting file compressor, v0.1 4bzip2, bunzip2 \- a block-sorting file compressor, v0.9.0
5.br
6bzcat \- decompresses files to stdout
5.br 7.br
6bzip2recover \- recovers data from damaged bzip2 files 8bzip2recover \- recovers data from damaged bzip2 files
7 9
8.SH SYNOPSIS 10.SH SYNOPSIS
9.ll +8 11.ll +8
10.B bzip2 12.B bzip2
11.RB [ " \-cdfkstvVL123456789 " ] 13.RB [ " \-cdfkstvzVL123456789 " ]
12[ 14[
13.I "filenames \&..." 15.I "filenames \&..."
14] 16]
15.ll -8 17.ll -8
16.br 18.br
17.B bunzip2 19.B bunzip2
18.RB [ " \-kvsVL " ] 20.RB [ " \-fkvsVL " ]
21[
22.I "filenames \&..."
23]
24.br
25.B bzcat
26.RB [ " \-s " ]
19[ 27[
20.I "filenames \&..." 28.I "filenames \&..."
21] 29]
@@ -24,7 +32,7 @@ bzip2recover \- recovers data from damaged bzip2 files
24.I "filename" 32.I "filename"
25 33
26.SH DESCRIPTION 34.SH DESCRIPTION
27.I Bzip2 35.I bzip2
28compresses files using the Burrows-Wheeler block-sorting 36compresses files using the Burrows-Wheeler block-sorting
29text compression algorithm, and Huffman coding. 37text compression algorithm, and Huffman coding.
30Compression is generally considerably 38Compression is generally considerably
@@ -38,7 +46,7 @@ those of
38.I GNU Gzip, 46.I GNU Gzip,
39but they are not identical. 47but they are not identical.
40 48
41.I Bzip2 49.I bzip2
42expects a list of file names to accompany the command-line flags. 50expects a list of file names to accompany the command-line flags.
43Each file is replaced by a compressed version of itself, 51Each file is replaced by a compressed version of itself,
44with the name "original_name.bz2". 52with the name "original_name.bz2".
@@ -50,11 +58,11 @@ original file names, permissions and dates in filesystems
50which lack these concepts, or have serious file name length 58which lack these concepts, or have serious file name length
51restrictions, such as MS-DOS. 59restrictions, such as MS-DOS.
52 60
53.I Bzip2 61.I bzip2
54and 62and
55.I bunzip2 63.I bunzip2
56will not overwrite existing files; if you want this to happen, 64will by default not overwrite existing files;
57you should delete them first. 65if you want this to happen, specify the \-f flag.
58 66
59If no file names are specified, 67If no file names are specified,
60.I bzip2 68.I bzip2
@@ -64,7 +72,7 @@ In this case,
64will decline to write compressed output to a terminal, as 72will decline to write compressed output to a terminal, as
65this would be entirely incomprehensible and therefore pointless. 73this would be entirely incomprehensible and therefore pointless.
66 74
67.I Bunzip2 75.I bunzip2
68(or 76(or
69.I bzip2 \-d 77.I bzip2 \-d
70) decompresses and restores all specified files whose names 78) decompresses and restores all specified files whose names
@@ -73,12 +81,28 @@ Files without this suffix are ignored.
73Again, supplying no filenames 81Again, supplying no filenames
74causes decompression from standard input to standard output. 82causes decompression from standard input to standard output.
75 83
84.I bunzip2
85will correctly decompress a file which is the concatenation
86of two or more compressed files. The result is the concatenation
87of the corresponding uncompressed files. Integrity testing
88(\-t) of concatenated compressed files is also supported.
89
76You can also compress or decompress files to 90You can also compress or decompress files to
77the standard output by giving the \-c flag. 91the standard output by giving the \-c flag.
78You can decompress multiple files like this, but you may 92Multiple files may be compressed and decompressed like this.
79only compress a single file this way, since it would otherwise 93The resulting outputs are fed sequentially to stdout.
80be difficult to separate out the compressed representations of 94Compression of multiple files in this manner generates
81the original files. 95a stream containing multiple compressed file representations.
96Such a stream can be decompressed correctly only by
97.I bzip2
98version 0.9.0 or later. Earlier versions of
99.I bzip2
100will stop after decompressing the first file in the stream.
101
102.I bzcat
103(or
104.I bzip2 \-dc
105) decompresses all specified files to the standard output.
82 106
83Compression is always performed, even if the compressed file is 107Compression is always performed, even if the compressed file is
84slightly larger than the original. Files of less than about 108slightly larger than the original. Files of less than about
@@ -132,7 +156,7 @@ Compression and decompression requirements, in bytes, can be estimated as:
132 156
133 Compression: 400k + ( 7 x block size ) 157 Compression: 400k + ( 7 x block size )
134 158
135 Decompression: 100k + ( 5 x block size ), or 159 Decompression: 100k + ( 4 x block size ), or
136.br 160.br
137 100k + ( 2.5 x block size ) 161 100k + ( 2.5 x block size )
138 162
@@ -147,7 +171,7 @@ choice of block size.
147 171
148For files compressed with the default 900k block size, 172For files compressed with the default 900k block size,
149.I bunzip2 173.I bunzip2
150will require about 4600 kbytes to decompress. 174will require about 3700 kbytes to decompress.
151To support decompression of any file on a 4 megabyte machine, 175To support decompression of any file on a 4 megabyte machine,
152.I bunzip2 176.I bunzip2
153has an option to decompress using approximately half this 177has an option to decompress using approximately half this
@@ -168,8 +192,8 @@ For example, compressing a file 20,000 bytes long with the flag
168\-9 192\-9
169will cause the compressor to allocate around 193will cause the compressor to allocate around
1706700k of memory, but only touch 400k + 20000 * 7 = 540 1946700k of memory, but only touch 400k + 20000 * 7 = 540
171kbytes of it. Similarly, the decompressor will allocate 4600k but 195kbytes of it. Similarly, the decompressor will allocate 3700k but
172only touch 100k + 20000 * 5 = 200 kbytes. 196only touch 100k + 20000 * 4 = 180 kbytes.
173 197
174Here is a table which summarises the maximum memory usage for 198Here is a table which summarises the maximum memory usage for
175different block sizes. Also recorded is the total compressed 199different block sizes. Also recorded is the total compressed
@@ -182,71 +206,73 @@ Corpus is dominated by smaller files.
182 Compress Decompress Decompress Corpus 206 Compress Decompress Decompress Corpus
183 Flag usage usage -s usage Size 207 Flag usage usage -s usage Size
184 208
185 -1 1100k 600k 350k 914704 209 -1 1100k 500k 350k 914704
186 -2 1800k 1100k 600k 877703 210 -2 1800k 900k 600k 877703
187 -3 2500k 1600k 850k 860338 211 -3 2500k 1300k 850k 860338
188 -4 3200k 2100k 1100k 846899 212 -4 3200k 1700k 1100k 846899
189 -5 3900k 2600k 1350k 845160 213 -5 3900k 2100k 1350k 845160
190 -6 4600k 3100k 1600k 838626 214 -6 4600k 2500k 1600k 838626
191 -7 5400k 3600k 1850k 834096 215 -7 5400k 2900k 1850k 834096
192 -8 6000k 4100k 2100k 828642 216 -8 6000k 3300k 2100k 828642
193 -9 6700k 4600k 2350k 828642 217 -9 6700k 3700k 2350k 828642
194 218
195.SH OPTIONS 219.SH OPTIONS
196.TP 220.TP
197.B \-c --stdout 221.B \-c --stdout
198Compress or decompress to standard output. \-c will decompress 222Compress or decompress to standard output. \-c will decompress
199multiple files to stdout, but will only compress a single file to 223multiple files to stdout, but will only compress a single file to
200stdout. 224stdout.
201.TP 225.TP
202.B \-d --decompress 226.B \-d --decompress
203Force decompression. 227Force decompression.
204.I Bzip2 228.I bzip2,
205and
206.I bunzip2 229.I bunzip2
207are really the same program, and the decision about whether to 230and
208compress or decompress is done on the basis of which name is 231.I bzcat
232are really the same program, and the decision about what actions
233to take is done on the basis of which name is
209used. This flag overrides that mechanism, and forces 234used. This flag overrides that mechanism, and forces
210.I bzip2 235.I bzip2
211to decompress. 236to decompress.
212.TP 237.TP
213.B \-f --compress 238.B \-z --compress
214The complement to \-d: forces compression, regardless of the invokation 239The complement to \-d: forces compression, regardless of the invokation
215name. 240name.
216.TP 241.TP
217.B \-t --test 242.B \-t --test
218Check integrity of the specified file(s), but don't decompress them. 243Check integrity of the specified file(s), but don't decompress them.
219This really performs a trial decompression and throws away the result, 244This really performs a trial decompression and throws away the result.
220using the low-memory decompression algorithm (see \-s). 245.TP
246.B \-f --force
247Force overwrite of output files. Normally,
248.I bzip2
249will not overwrite existing output files.
221.TP 250.TP
222.B \-k --keep 251.B \-k --keep
223Keep (don't delete) input files during compression or decompression. 252Keep (don't delete) input files during compression or decompression.
224.TP 253.TP
225.B \-s --small 254.B \-s --small
226Reduce memory usage, both for compression and decompression. 255Reduce memory usage, for compression, decompression and
227Files are decompressed using a modified algorithm which only 256testing.
257Files are decompressed and tested using a modified algorithm which only
228requires 2.5 bytes per block byte. This means any file can be 258requires 2.5 bytes per block byte. This means any file can be
229decompressed in 2300k of memory, albeit somewhat more slowly than 259decompressed in 2300k of memory, albeit at about half the normal
230usual. 260speed.
231 261
232During compression, -s selects a block size of 200k, which limits 262During compression, -s selects a block size of 200k, which limits
233memory use to around the same figure, at the expense of your 263memory use to around the same figure, at the expense of your
234compression ratio. In short, if your machine is low on memory 264compression ratio. In short, if your machine is low on memory
235(8 megabytes or less), use -s for everything. See 265(8 megabytes or less), use -s for everything. See
236MEMORY MANAGEMENT above. 266MEMORY MANAGEMENT above.
237
238.TP 267.TP
239.B \-v --verbose 268.B \-v --verbose
240Verbose mode -- show the compression ratio for each file processed. 269Verbose mode -- show the compression ratio for each file processed.
241Further \-v's increase the verbosity level, spewing out lots of 270Further \-v's increase the verbosity level, spewing out lots of
242information which is primarily of interest for diagnostic purposes. 271information which is primarily of interest for diagnostic purposes.
243.TP 272.TP
244.B \-L --license 273.B \-L --license -V --version
245Display the software version, license terms and conditions. 274Display the software version, license terms and conditions.
246.TP 275.TP
247.B \-V --version
248Same as \-L.
249.TP
250.B \-1 to \-9 276.B \-1 to \-9
251Set the block size to 100 k, 200 k .. 900 k when 277Set the block size to 100 k, 200 k .. 900 k when
252compressing. Has no effect when decompressing. 278compressing. Has no effect when decompressing.
@@ -329,10 +355,6 @@ to compress the latter.
329If you do get a file which causes severe slowness in compression, 355If you do get a file which causes severe slowness in compression,
330try making the block size as small as possible, with flag \-1. 356try making the block size as small as possible, with flag \-1.
331 357
332Incompressible or virtually-incompressible data may decompress
333rather more slowly than one would hope. This is due to
334a naive implementation of the move-to-front coder.
335
336.I bzip2 358.I bzip2
337usually allocates several megabytes of memory to operate in, 359usually allocates several megabytes of memory to operate in,
338and then charges all over it in a fairly random fashion. This 360and then charges all over it in a fairly random fashion. This
@@ -346,28 +368,19 @@ I imagine
346.I bzip2 368.I bzip2
347will perform best on machines with very large caches. 369will perform best on machines with very large caches.
348 370
349Test mode (\-t) uses the low-memory decompression algorithm
350(\-s). This means test mode does not run as fast as it could;
351it could run as fast as the normal decompression machinery.
352This could easily be fixed at the cost of some code bloat.
353
354.SH CAVEATS 371.SH CAVEATS
355I/O error messages are not as helpful as they could be. 372I/O error messages are not as helpful as they could be.
356.I Bzip2 373.I Bzip2
357tries hard to detect I/O errors and exit cleanly, but the 374tries hard to detect I/O errors and exit cleanly, but the
358details of what the problem is sometimes seem rather misleading. 375details of what the problem is sometimes seem rather misleading.
359 376
360This manual page pertains to version 0.1 of 377This manual page pertains to version 0.9.0 of
361.I bzip2. 378.I bzip2.
362It may well happen that some future version will 379Compressed data created by this version is entirely forwards and
363use a different compressed file format. If you try to 380backwards compatible with the previous public release, version 0.1pl2,
364decompress, using 0.1, a .bz2 file created with some 381but with the following exception: 0.9.0 can correctly decompress
365future version which uses a different compressed file format, 382multiple concatenated compressed files. 0.1pl2 cannot do this; it
3660.1 will complain that your file "is not a bzip2 file". 383will stop after decompressing just the first file in the stream.
367If that happens, you should obtain a more recent version
368of
369.I bzip2
370and use that to decompress the file.
371 384
372Wildcard expansion for Windows 95 and NT 385Wildcard expansion for Windows 95 and NT
373is flaky. 386is flaky.
@@ -377,63 +390,25 @@ uses 32-bit integers to represent bit positions in
377compressed files, so it cannot handle compressed files 390compressed files, so it cannot handle compressed files
378more than 512 megabytes long. This could easily be fixed. 391more than 512 megabytes long. This could easily be fixed.
379 392
380.I bzip2recover
381sometimes reports a very small, incomplete final block.
382This is spurious and can be safely ignored.
383
384.SH RELATIONSHIP TO bzip-0.21
385This program is a descendant of the
386.I bzip
387program, version 0.21, which I released in August 1996.
388The primary difference of
389.I bzip2
390is its avoidance of the possibly patented algorithms
391which were used in 0.21.
392.I bzip2
393also brings various useful refinements (\-s, \-t),
394uses less memory, decompresses significantly faster, and
395has support for recovering data from damaged files.
396
397Because
398.I bzip2
399uses Huffman coding to construct the compressed bitstream,
400rather than the arithmetic coding used in 0.21,
401the compressed representations generated by the two programs
402are incompatible, and they will not interoperate. The change
403in suffix from .bz to .bz2 reflects this. It would have been
404helpful to at least allow
405.I bzip2
406to decompress files created by 0.21, but this would
407defeat the primary aim of having a patent-free compressor.
408
409For a more precise statement about patent issues in
410bzip2, please see the README file in the distribution.
411
412Huffman coding necessarily involves some coding inefficiency
413compared to arithmetic coding. This means that
414.I bzip2
415compresses about 1% worse than 0.21, an unfortunate but
416unavoidable fact-of-life. On the other hand, decompression
417is approximately 50% faster for the same reason, and the
418change in file format gave an opportunity to add data-recovery
419features. So it is not all bad.
420
421.SH AUTHOR 393.SH AUTHOR
422Julian Seward, jseward@acm.org. 394Julian Seward, jseward@acm.org.
423 395
396http://www.muraroa.demon.co.uk
397
424The ideas embodied in 398The ideas embodied in
425.I bzip
426and
427.I bzip2 399.I bzip2
428are due to (at least) the following people: 400are due to (at least) the following people:
429Michael Burrows and David Wheeler (for the block sorting 401Michael Burrows and David Wheeler (for the block sorting
430transformation), David Wheeler (again, for the Huffman coder), 402transformation), David Wheeler (again, for the Huffman coder),
431Peter Fenwick (for the structured coding model in 0.21, 403Peter Fenwick (for the structured coding model in the original
404.I bzip,
432and many refinements), 405and many refinements),
433and 406and
434Alistair Moffat, Radford Neal and Ian Witten (for the arithmetic 407Alistair Moffat, Radford Neal and Ian Witten (for the arithmetic
435coder in 0.21). I am much indebted for their help, support and advice. 408coder in the original
436See the file ALGORITHMS in the source distribution for pointers to 409.I bzip).
410I am much indebted for their help, support and advice.
411See the manual in the source distribution for pointers to
437sources of documentation. 412sources of documentation.
438Christian von Roques encouraged me to look for faster 413Christian von Roques encouraged me to look for faster
439sorting algorithms, so as to speed up compression. 414sorting algorithms, so as to speed up compression.
diff --git a/bzip2.1.preformatted b/bzip2.1.preformatted
index 5206e05..8c4fab1 100644
--- a/bzip2.1.preformatted
+++ b/bzip2.1.preformatted
@@ -5,18 +5,20 @@ bzip2(1) bzip2(1)
5 5
6 6
7NNAAMMEE 7NNAAMMEE
8 bzip2, bunzip2 - a block-sorting file compressor, v0.1 8 bzip2, bunzip2 - a block-sorting file compressor, v0.9.0
9 bzcat - decompresses files to stdout
9 bzip2recover - recovers data from damaged bzip2 files 10 bzip2recover - recovers data from damaged bzip2 files
10 11
11 12
12SSYYNNOOPPSSIISS 13SSYYNNOOPPSSIISS
13 bbzziipp22 [ --ccddffkkssttvvVVLL112233445566778899 ] [ _f_i_l_e_n_a_m_e_s _._._. ] 14 bbzziipp22 [ --ccddffkkssttvvzzVVLL112233445566778899 ] [ _f_i_l_e_n_a_m_e_s _._._. ]
14 bbuunnzziipp22 [ --kkvvssVVLL ] [ _f_i_l_e_n_a_m_e_s _._._. ] 15 bbuunnzziipp22 [ --ffkkvvssVVLL ] [ _f_i_l_e_n_a_m_e_s _._._. ]
16 bbzzccaatt [ --ss ] [ _f_i_l_e_n_a_m_e_s _._._. ]
15 bbzziipp22rreeccoovveerr _f_i_l_e_n_a_m_e 17 bbzziipp22rreeccoovveerr _f_i_l_e_n_a_m_e
16 18
17 19
18DDEESSCCRRIIPPTTIIOONN 20DDEESSCCRRIIPPTTIIOONN
19 _B_z_i_p_2 compresses files using the Burrows-Wheeler block- 21 _b_z_i_p_2 compresses files using the Burrows-Wheeler block-
20 sorting text compression algorithm, and Huffman coding. 22 sorting text compression algorithm, and Huffman coding.
21 Compression is generally considerably better than that 23 Compression is generally considerably better than that
22 achieved by more conventional LZ77/LZ78-based compressors, 24 achieved by more conventional LZ77/LZ78-based compressors,
@@ -26,7 +28,7 @@ DDEESSCCRRIIPPTTIIOONN
26 The command-line options are deliberately very similar to 28 The command-line options are deliberately very similar to
27 those of _G_N_U _G_z_i_p_, but they are not identical. 29 those of _G_N_U _G_z_i_p_, but they are not identical.
28 30
29 _B_z_i_p_2 expects a list of file names to accompany the com- 31 _b_z_i_p_2 expects a list of file names to accompany the com-
30 mand-line flags. Each file is replaced by a compressed 32 mand-line flags. Each file is replaced by a compressed
31 version of itself, with the name "original_name.bz2". 33 version of itself, with the name "original_name.bz2".
32 Each compressed file has the same modification date and 34 Each compressed file has the same modification date and
@@ -38,8 +40,8 @@ DDEESSCCRRIIPPTTIIOONN
38 cepts, or have serious file name length restrictions, such 40 cepts, or have serious file name length restrictions, such
39 as MS-DOS. 41 as MS-DOS.
40 42
41 _B_z_i_p_2 and _b_u_n_z_i_p_2 will not overwrite existing files; if 43 _b_z_i_p_2 and _b_u_n_z_i_p_2 will by default not overwrite existing
42 you want this to happen, you should delete them first. 44 files; if you want this to happen, specify the -f flag.
43 45
44 If no file names are specified, _b_z_i_p_2 compresses from 46 If no file names are specified, _b_z_i_p_2 compresses from
45 standard input to standard output. In this case, _b_z_i_p_2 47 standard input to standard output. In this case, _b_z_i_p_2
@@ -47,17 +49,15 @@ DDEESSCCRRIIPPTTIIOONN
47 this would be entirely incomprehensible and therefore 49 this would be entirely incomprehensible and therefore
48 pointless. 50 pointless.
49 51
50 _B_u_n_z_i_p_2 (or _b_z_i_p_2 _-_d ) decompresses and restores all spec- 52 _b_u_n_z_i_p_2 (or _b_z_i_p_2 _-_d ) decompresses and restores all spec-
51 ified files whose names end in ".bz2". Files without this 53 ified files whose names end in ".bz2". Files without this
52 suffix are ignored. Again, supplying no filenames causes 54 suffix are ignored. Again, supplying no filenames causes
53 decompression from standard input to standard output. 55 decompression from standard input to standard output.
54 56
55 You can also compress or decompress files to the standard 57 _b_u_n_z_i_p_2 will correctly decompress a file which is the con-
56 output by giving the -c flag. You can decompress multiple 58 catenation of two or more compressed files. The result is
57 files like this, but you may only compress a single file 59 the concatenation of the corresponding uncompressed files.
58 this way, since it would otherwise be difficult to sepa- 60 Integrity testing (-t) of concatenated compressed files is
59 rate out the compressed representations of the original
60 files.
61 61
62 62
63 63
@@ -70,6 +70,21 @@ DDEESSCCRRIIPPTTIIOONN
70bzip2(1) bzip2(1) 70bzip2(1) bzip2(1)
71 71
72 72
73 also supported.
74
75 You can also compress or decompress files to the standard
76 output by giving the -c flag. Multiple files may be com-
77 pressed and decompressed like this. The resulting outputs
78 are fed sequentially to stdout. Compression of multiple
79 files in this manner generates a stream containing multi-
80 ple compressed file representations. Such a stream can be
81 decompressed correctly only by _b_z_i_p_2 version 0.9.0 or
82 later. Earlier versions of _b_z_i_p_2 will stop after decom-
83 pressing the first file in the stream.
84
85 _b_z_c_a_t (or _b_z_i_p_2 _-_d_c ) decompresses all specified files to
86 the standard output.
87
73 Compression is always performed, even if the compressed 88 Compression is always performed, even if the compressed
74 file is slightly larger than the original. Files of less 89 file is slightly larger than the original. Files of less
75 than about one hundred bytes tend to get larger, since the 90 than about one hundred bytes tend to get larger, since the
@@ -108,36 +123,37 @@ MMEEMMOORRYY MMAANNAAGGEEMMEENNTT
108 file, and _b_u_n_z_i_p_2 then allocates itself just enough memory 123 file, and _b_u_n_z_i_p_2 then allocates itself just enough memory
109 to decompress the file. Since block sizes are stored in 124 to decompress the file. Since block sizes are stored in
110 compressed files, it follows that the flags -1 to -9 are 125 compressed files, it follows that the flags -1 to -9 are
111 irrelevant to and so ignored during decompression. Com- 126 irrelevant to and so ignored during decompression.
112 pression and decompression requirements, in bytes, can be
113 estimated as:
114 127
115 Compression: 400k + ( 7 x block size )
116 128
117 Decompression: 100k + ( 5 x block size ), or
118 100k + ( 2.5 x block size )
119 129
120 Larger block sizes give rapidly diminishing marginal 130 2
121 returns; most of the compression comes from the first two
122 or three hundred k of block size, a fact worth bearing in
123 mind when using _b_z_i_p_2 on small machines. It is also
124 important to appreciate that the decompression memory
125 requirement is set at compression-time by the choice of
126 block size.
127 131
128 132
129 133
130 2
131 134
132 135
136bzip2(1) bzip2(1)
133 137
134 138
139 Compression and decompression requirements, in bytes, can
140 be estimated as:
135 141
136bzip2(1) bzip2(1) 142 Compression: 400k + ( 7 x block size )
137 143
144 Decompression: 100k + ( 4 x block size ), or
145 100k + ( 2.5 x block size )
146
147 Larger block sizes give rapidly diminishing marginal
148 returns; most of the compression comes from the first two
149 or three hundred k of block size, a fact worth bearing in
150 mind when using _b_z_i_p_2 on small machines. It is also
151 important to appreciate that the decompression memory
152 requirement is set at compression-time by the choice of
153 block size.
138 154
139 For files compressed with the default 900k block size, 155 For files compressed with the default 900k block size,
140 _b_u_n_z_i_p_2 will require about 4600 kbytes to decompress. To 156 _b_u_n_z_i_p_2 will require about 3700 kbytes to decompress. To
141 support decompression of any file on a 4 megabyte machine, 157 support decompression of any file on a 4 megabyte machine,
142 _b_u_n_z_i_p_2 has an option to decompress using approximately 158 _b_u_n_z_i_p_2 has an option to decompress using approximately
143 half this amount of memory, about 2300 kbytes. Decompres- 159 half this amount of memory, about 2300 kbytes. Decompres-
@@ -157,8 +173,8 @@ bzip2(1) bzip2(1)
157 file 20,000 bytes long with the flag -9 will cause the 173 file 20,000 bytes long with the flag -9 will cause the
158 compressor to allocate around 6700k of memory, but only 174 compressor to allocate around 6700k of memory, but only
159 touch 400k + 20000 * 7 = 540 kbytes of it. Similarly, the 175 touch 400k + 20000 * 7 = 540 kbytes of it. Similarly, the
160 decompressor will allocate 4600k but only touch 100k + 176 decompressor will allocate 3700k but only touch 100k +
161 20000 * 5 = 200 kbytes. 177 20000 * 4 = 180 kbytes.
162 178
163 Here is a table which summarises the maximum memory usage 179 Here is a table which summarises the maximum memory usage
164 for different block sizes. Also recorded is the total 180 for different block sizes. Also recorded is the total
@@ -172,64 +188,66 @@ bzip2(1) bzip2(1)
172 Compress Decompress Decompress Corpus 188 Compress Decompress Decompress Corpus
173 Flag usage usage -s usage Size 189 Flag usage usage -s usage Size
174 190
175 -1 1100k 600k 350k 914704 191 -1 1100k 500k 350k 914704
176 -2 1800k 1100k 600k 877703 192 -2 1800k 900k 600k 877703
177 -3 2500k 1600k 850k 860338
178 -4 3200k 2100k 1100k 846899
179 -5 3900k 2600k 1350k 845160
180 -6 4600k 3100k 1600k 838626
181 -7 5400k 3600k 1850k 834096
182 -8 6000k 4100k 2100k 828642
183 -9 6700k 4600k 2350k 828642
184 193
185 194
186OOPPTTIIOONNSS
187 --cc ----ssttddoouutt
188 Compress or decompress to standard output. -c will
189 decompress multiple files to stdout, but will only
190 compress a single file to stdout.
191
192 195
196 3
193 197
194 198
195 199
196 3
197 200
198 201
202bzip2(1) bzip2(1)
199 203
200 204
205 -3 2500k 1300k 850k 860338
206 -4 3200k 1700k 1100k 846899
207 -5 3900k 2100k 1350k 845160
208 -6 4600k 2500k 1600k 838626
209 -7 5400k 2900k 1850k 834096
210 -8 6000k 3300k 2100k 828642
211 -9 6700k 3700k 2350k 828642
201 212
202bzip2(1) bzip2(1)
203 213
214OOPPTTIIOONNSS
215 --cc ----ssttddoouutt
216 Compress or decompress to standard output. -c will
217 decompress multiple files to stdout, but will only
218 compress a single file to stdout.
204 219
205 --dd ----ddeeccoommpprreessss 220 --dd ----ddeeccoommpprreessss
206 Force decompression. _B_z_i_p_2 and _b_u_n_z_i_p_2 are really 221 Force decompression. _b_z_i_p_2_, _b_u_n_z_i_p_2 and _b_z_c_a_t are
207 the same program, and the decision about whether to 222 really the same program, and the decision about
208 compress or decompress is done on the basis of 223 what actions to take is done on the basis of which
209 which name is used. This flag overrides that mech- 224 name is used. This flag overrides that mechanism,
210 anism, and forces _b_z_i_p_2 to decompress. 225 and forces _b_z_i_p_2 to decompress.
211 226
212 --ff ----ccoommpprreessss 227 --zz ----ccoommpprreessss
213 The complement to -d: forces compression, regard- 228 The complement to -d: forces compression, regard-
214 less of the invokation name. 229 less of the invokation name.
215 230
216 --tt ----tteesstt 231 --tt ----tteesstt
217 Check integrity of the specified file(s), but don't 232 Check integrity of the specified file(s), but don't
218 decompress them. This really performs a trial 233 decompress them. This really performs a trial
219 decompression and throws away the result, using the 234 decompression and throws away the result.
220 low-memory decompression algorithm (see -s). 235
236 --ff ----ffoorrccee
237 Force overwrite of output files. Normally, _b_z_i_p_2
238 will not overwrite existing output files.
221 239
222 --kk ----kkeeeepp 240 --kk ----kkeeeepp
223 Keep (don't delete) input files during compression 241 Keep (don't delete) input files during compression
224 or decompression. 242 or decompression.
225 243
226 --ss ----ssmmaallll 244 --ss ----ssmmaallll
227 Reduce memory usage, both for compression and 245 Reduce memory usage, for compression, decompression
228 decompression. Files are decompressed using a mod- 246 and testing. Files are decompressed and tested
229 ified algorithm which only requires 2.5 bytes per 247 using a modified algorithm which only requires 2.5
230 block byte. This means any file can be decom- 248 bytes per block byte. This means any file can be
231 pressed in 2300k of memory, albeit somewhat more 249 decompressed in 2300k of memory, albeit at about
232 slowly than usual. 250 half the normal speed.
233 251
234 During compression, -s selects a block size of 252 During compression, -s selects a block size of
235 200k, which limits memory use to around the same 253 200k, which limits memory use to around the same
@@ -239,35 +257,32 @@ bzip2(1) bzip2(1)
239 MEMORY MANAGEMENT above. 257 MEMORY MANAGEMENT above.
240 258
241 259
260
261
262 4
263
264
265
266
267
268bzip2(1) bzip2(1)
269
270
242 --vv ----vveerrbboossee 271 --vv ----vveerrbboossee
243 Verbose mode -- show the compression ratio for each 272 Verbose mode -- show the compression ratio for each
244 file processed. Further -v's increase the ver- 273 file processed. Further -v's increase the ver-
245 bosity level, spewing out lots of information which 274 bosity level, spewing out lots of information which
246 is primarily of interest for diagnostic purposes. 275 is primarily of interest for diagnostic purposes.
247 276
248 --LL ----lliicceennssee 277 --LL ----lliicceennssee --VV ----vveerrssiioonn
249 Display the software version, license terms and 278 Display the software version, license terms and
250 conditions. 279 conditions.
251 280
252 --VV ----vveerrssiioonn
253 Same as -L.
254
255 --11 ttoo --99 281 --11 ttoo --99
256 Set the block size to 100 k, 200 k .. 900 k when 282 Set the block size to 100 k, 200 k .. 900 k when
257 compressing. Has no effect when decompressing. 283 compressing. Has no effect when decompressing.
258 See MEMORY MANAGEMENT above. 284 See MEMORY MANAGEMENT above.
259 285
260
261
262 4
263
264
265
266
267
268bzip2(1) bzip2(1)
269
270
271 ----rreeppeettiittiivvee--ffaasstt 286 ----rreeppeettiittiivvee--ffaasstt
272 _b_z_i_p_2 injects some small pseudo-random variations 287 _b_z_i_p_2 injects some small pseudo-random variations
273 into very repetitive blocks to limit worst-case 288 into very repetitive blocks to limit worst-case
@@ -306,34 +321,34 @@ RREECCOOVVEERRIINNGG DDAATTAA FFRROOMM DDAAMMAAGGEEDD F
306 _b_z_i_p_2_r_e_c_o_v_e_r takes a single argument, the name of the dam- 321 _b_z_i_p_2_r_e_c_o_v_e_r takes a single argument, the name of the dam-
307 aged file, and writes a number of files "rec0001file.bz2", 322 aged file, and writes a number of files "rec0001file.bz2",
308 "rec0002file.bz2", etc, containing the extracted blocks. 323 "rec0002file.bz2", etc, containing the extracted blocks.
309 The output filenames are designed so that the use of wild- 324 The output filenames are designed so that the use of
310 cards in subsequent processing -- for example, "bzip2 -dc
311 rec*file.bz2 > recovered_data" -- lists the files in the
312 "right" order.
313 325
314 _b_z_i_p_2_r_e_c_o_v_e_r should be of most use dealing with large .bz2
315 files, as these will contain many blocks. It is clearly
316 futile to use it on damaged single-block files, since a
317 damaged block cannot be recovered. If you wish to min-
318 imise any potential data loss through media or transmis-
319 sion errors, you might consider compressing with a smaller
320 block size.
321 326
322 327
323PPEERRFFOORRMMAANNCCEE NNOOTTEESS 328 5
324 The sorting phase of compression gathers together similar
325 329
326 330
327 331
328 5
329 332
330 333
334bzip2(1) bzip2(1)
331 335
332 336
337 wildcards in subsequent processing -- for example, "bzip2
338 -dc rec*file.bz2 > recovered_data" -- lists the files in
339 the "right" order.
333 340
334bzip2(1) bzip2(1) 341 _b_z_i_p_2_r_e_c_o_v_e_r should be of most use dealing with large .bz2
342 files, as these will contain many blocks. It is clearly
343 futile to use it on damaged single-block files, since a
344 damaged block cannot be recovered. If you wish to min-
345 imise any potential data loss through media or transmis-
346 sion errors, you might consider compressing with a smaller
347 block size.
335 348
336 349
350PPEERRFFOORRMMAANNCCEE NNOOTTEESS
351 The sorting phase of compression gathers together similar
337 strings in the file. Because of this, files containing 352 strings in the file. Because of this, files containing
338 very long runs of repeated symbols, like "aabaabaabaab 353 very long runs of repeated symbols, like "aabaabaabaab
339 ..." (repeated several hundred times) may compress 354 ..." (repeated several hundred times) may compress
@@ -348,10 +363,6 @@ bzip2(1) bzip2(1)
348 severe slowness in compression, try making the block size 363 severe slowness in compression, try making the block size
349 as small as possible, with flag -1. 364 as small as possible, with flag -1.
350 365
351 Incompressible or virtually-incompressible data may decom-
352 press rather more slowly than one would hope. This is due
353 to a naive implementation of the move-to-front coder.
354
355 _b_z_i_p_2 usually allocates several megabytes of memory to 366 _b_z_i_p_2 usually allocates several megabytes of memory to
356 operate in, and then charges all over it in a fairly ran- 367 operate in, and then charges all over it in a fairly ran-
357 dom fashion. This means that performance, both for com- 368 dom fashion. This means that performance, both for com-
@@ -362,12 +373,6 @@ bzip2(1) bzip2(1)
362 large performance improvements. I imagine _b_z_i_p_2 will per- 373 large performance improvements. I imagine _b_z_i_p_2 will per-
363 form best on machines with very large caches. 374 form best on machines with very large caches.
364 375
365 Test mode (-t) uses the low-memory decompression algorithm
366 (-s). This means test mode does not run as fast as it
367 could; it could run as fast as the normal decompression
368 machinery. This could easily be fixed at the cost of some
369 code bloat.
370
371 376
372CCAAVVEEAATTSS 377CCAAVVEEAATTSS
373 I/O error messages are not as helpful as they could be. 378 I/O error messages are not as helpful as they could be.
@@ -375,19 +380,14 @@ CCAAVVEEAATTSS
375 but the details of what the problem is sometimes seem 380 but the details of what the problem is sometimes seem
376 rather misleading. 381 rather misleading.
377 382
378 This manual page pertains to version 0.1 of _b_z_i_p_2_. It may 383 This manual page pertains to version 0.9.0 of _b_z_i_p_2_. Com-
379 well happen that some future version will use a different 384 pressed data created by this version is entirely forwards
380 compressed file format. If you try to decompress, using 385 and backwards compatible with the previous public release,
381 0.1, a .bz2 file created with some future version which 386 version 0.1pl2, but with the following exception: 0.9.0
382 uses a different compressed file format, 0.1 will complain 387 can correctly decompress multiple concatenated compressed
383 that your file "is not a bzip2 file". If that happens, 388 files. 0.1pl2 cannot do this; it will stop after decom-
384 you should obtain a more recent version of _b_z_i_p_2 and use 389 pressing just the first file in the stream.
385 that to decompress the file.
386 390
387 Wildcard expansion for Windows 95 and NT is flaky.
388
389 _b_z_i_p_2_r_e_c_o_v_e_r uses 32-bit integers to represent bit posi-
390 tions in compressed files, so it cannot handle compressed
391 391
392 392
393 393
@@ -400,61 +400,59 @@ CCAAVVEEAATTSS
400bzip2(1) bzip2(1) 400bzip2(1) bzip2(1)
401 401
402 402
403 files more than 512 megabytes long. This could easily be 403 Wildcard expansion for Windows 95 and NT is flaky.
404
405 _b_z_i_p_2_r_e_c_o_v_e_r uses 32-bit integers to represent bit posi-
406 tions in compressed files, so it cannot handle compressed
407 files more than 512 megabytes long. This could easily be
404 fixed. 408 fixed.
405 409
406 _b_z_i_p_2_r_e_c_o_v_e_r sometimes reports a very small, incomplete 410
407 final block. This is spurious and can be safely ignored. 411AAUUTTHHOORR
412 Julian Seward, jseward@acm.org.
413 http://www.muraroa.demon.co.uk
414
415 The ideas embodied in _b_z_i_p_2 are due to (at least) the fol-
416 lowing people: Michael Burrows and David Wheeler (for the
417 block sorting transformation), David Wheeler (again, for
418 the Huffman coder), Peter Fenwick (for the structured cod-
419 ing model in the original _b_z_i_p_, and many refinements), and
420 Alistair Moffat, Radford Neal and Ian Witten (for the
421 arithmetic coder in the original _b_z_i_p_)_. I am much
422 indebted for their help, support and advice. See the man-
423 ual in the source distribution for pointers to sources of
424 documentation. Christian von Roques encouraged me to look
425 for faster sorting algorithms, so as to speed up compres-
426 sion. Bela Lubkin encouraged me to improve the worst-case
427 compression performance. Many people sent patches, helped
428 with portability problems, lent machines, gave advice and
429 were generally helpful.
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
408 448
409 449
410RREELLAATTIIOONNSSHHIIPP TTOO bbzziipp--00..2211
411 This program is a descendant of the _b_z_i_p program, version
412 0.21, which I released in August 1996. The primary dif-
413 ference of _b_z_i_p_2 is its avoidance of the possibly patented
414 algorithms which were used in 0.21. _b_z_i_p_2 also brings
415 various useful refinements (-s, -t), uses less memory,
416 decompresses significantly faster, and has support for
417 recovering data from damaged files.
418 450
419 Because _b_z_i_p_2 uses Huffman coding to construct the com-
420 pressed bitstream, rather than the arithmetic coding used
421 in 0.21, the compressed representations generated by the
422 two programs are incompatible, and they will not interop-
423 erate. The change in suffix from .bz to .bz2 reflects
424 this. It would have been helpful to at least allow _b_z_i_p_2
425 to decompress files created by 0.21, but this would defeat
426 the primary aim of having a patent-free compressor.
427 451
428 For a more precise statement about patent issues in bzip2,
429 please see the README file in the distribution.
430 452
431 Huffman coding necessarily involves some coding ineffi-
432 ciency compared to arithmetic coding. This means that
433 _b_z_i_p_2 compresses about 1% worse than 0.21, an unfortunate
434 but unavoidable fact-of-life. On the other hand, decom-
435 pression is approximately 50% faster for the same reason,
436 and the change in file format gave an opportunity to add
437 data-recovery features. So it is not all bad.
438 453
439 454
440AAUUTTHHOORR
441 Julian Seward, jseward@acm.org.
442 455
443 The ideas embodied in _b_z_i_p and _b_z_i_p_2 are due to (at least)
444 the following people: Michael Burrows and David Wheeler
445 (for the block sorting transformation), David Wheeler
446 (again, for the Huffman coder), Peter Fenwick (for the
447 structured coding model in 0.21, and many refinements),
448 and Alistair Moffat, Radford Neal and Ian Witten (for the
449 arithmetic coder in 0.21). I am much indebted for their
450 help, support and advice. See the file ALGORITHMS in the
451 source distribution for pointers to sources of documenta-
452 tion. Christian von Roques encouraged me to look for
453 faster sorting algorithms, so as to speed up compression.
454 Bela Lubkin encouraged me to improve the worst-case com-
455 pression performance. Many people sent patches, helped
456 with portability problems, lent machines, gave advice and
457 were generally helpful.
458 456
459 457
460 458
diff --git a/bzip2.c b/bzip2.c
index 53ce10d..6a3ab95 100644
--- a/bzip2.c
+++ b/bzip2.c
@@ -4,28 +4,45 @@
4/*-----------------------------------------------------------*/ 4/*-----------------------------------------------------------*/
5 5
6/*-- 6/*--
7 This program is bzip2, a lossless, block-sorting data compressor, 7 This file is a part of bzip2 and/or libbzip2, a program and
8 version 0.1pl2, dated 29-Aug-1997. 8 library for lossless, block-sorting data compression.
9 9
10 Copyright (C) 1996, 1997 by Julian Seward. 10 Copyright (C) 1996-1998 Julian R Seward. All rights reserved.
11 Guildford, Surrey, UK 11
12 email: jseward@acm.org 12 Redistribution and use in source and binary forms, with or without
13 13 modification, are permitted provided that the following conditions
14 This program is free software; you can redistribute it and/or modify 14 are met:
15 it under the terms of the GNU General Public License as published by 15
16 the Free Software Foundation; either version 2 of the License, or 16 1. Redistributions of source code must retain the above copyright
17 (at your option) any later version. 17 notice, this list of conditions and the following disclaimer.
18 18
19 This program is distributed in the hope that it will be useful, 19 2. The origin of this software must not be misrepresented; you must
20 but WITHOUT ANY WARRANTY; without even the implied warranty of 20 not claim that you wrote the original software. If you use this
21 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 21 software in a product, an acknowledgment in the product
22 GNU General Public License for more details. 22 documentation would be appreciated but is not required.
23 23
24 You should have received a copy of the GNU General Public License 24 3. Altered source versions must be plainly marked as such, and must
25 along with this program; if not, write to the Free Software 25 not be misrepresented as being the original software.
26 Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. 26
27 27 4. The name of the author may not be used to endorse or promote
28 The GNU General Public License is contained in the file LICENSE. 28 products derived from this software without specific prior written
29 permission.
30
31 THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
32 OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
33 WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
34 ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
35 DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
36 DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
37 GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
38 INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
39 WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
40 NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
41 SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
42
43 Julian Seward, Guildford, Surrey, UK.
44 jseward@acm.org
45 bzip2/libbzip2 version 0.9.0c of 18 October 1998
29 46
30 This program is based on (at least) the work of: 47 This program is based on (at least) the work of:
31 Mike Burrows 48 Mike Burrows
@@ -37,21 +54,23 @@
37 Robert Sedgewick 54 Robert Sedgewick
38 Jon L. Bentley 55 Jon L. Bentley
39 56
40 For more information on these sources, see the file ALGORITHMS. 57 For more information on these sources, see the manual.
41--*/ 58--*/
42 59
60
43/*----------------------------------------------------*/ 61/*----------------------------------------------------*/
44/*--- IMPORTANT ---*/ 62/*--- IMPORTANT ---*/
45/*----------------------------------------------------*/ 63/*----------------------------------------------------*/
46 64
47/*-- 65/*--
48 WARNING: 66 WARNING:
49 This program (attempts to) compress data by performing several 67 This program and library (attempts to) compress data by
50 non-trivial transformations on it. Unless you are 100% familiar 68 performing several non-trivial transformations on it.
51 with *all* the algorithms contained herein, and with the 69 Unless you are 100% familiar with *all* the algorithms
52 consequences of modifying them, you should NOT meddle with the 70 contained herein, and with the consequences of modifying them,
53 compression or decompression machinery. Incorrect changes can 71 you should NOT meddle with the compression or decompression
54 and very likely *will* lead to disasterous loss of data. 72 machinery. Incorrect changes can and very likely *will*
73 lead to disasterous loss of data.
55 74
56 DISCLAIMER: 75 DISCLAIMER:
57 I TAKE NO RESPONSIBILITY FOR ANY LOSS OF DATA ARISING FROM THE 76 I TAKE NO RESPONSIBILITY FOR ANY LOSS OF DATA ARISING FROM THE
@@ -65,18 +84,19 @@
65 of various special cases in the code which occur with very low 84 of various special cases in the code which occur with very low
66 but non-zero probability make it impossible to rule out the 85 but non-zero probability make it impossible to rule out the
67 possibility of bugs remaining in the program. DO NOT COMPRESS 86 possibility of bugs remaining in the program. DO NOT COMPRESS
68 ANY DATA WITH THIS PROGRAM UNLESS YOU ARE PREPARED TO ACCEPT THE 87 ANY DATA WITH THIS PROGRAM AND/OR LIBRARY UNLESS YOU ARE PREPARED
69 POSSIBILITY, HOWEVER SMALL, THAT THE DATA WILL NOT BE RECOVERABLE. 88 TO ACCEPT THE POSSIBILITY, HOWEVER SMALL, THAT THE DATA WILL
89 NOT BE RECOVERABLE.
70 90
71 That is not to say this program is inherently unreliable. 91 That is not to say this program is inherently unreliable.
72 Indeed, I very much hope the opposite is true. bzip2 has been 92 Indeed, I very much hope the opposite is true. bzip2/libbzip2
73 carefully constructed and extensively tested. 93 has been carefully constructed and extensively tested.
74 94
75 PATENTS: 95 PATENTS:
76 To the best of my knowledge, bzip2 does not use any patented 96 To the best of my knowledge, bzip2/libbzip2 does not use any
77 algorithms. However, I do not have the resources available to 97 patented algorithms. However, I do not have the resources
78 carry out a full patent search. Therefore I cannot give any 98 available to carry out a full patent search. Therefore I cannot
79 guarantee of the above statement. 99 give any guarantee of the above statement.
80--*/ 100--*/
81 101
82 102
@@ -103,6 +123,10 @@
103--*/ 123--*/
104#define BZ_LCCWIN32 0 124#define BZ_LCCWIN32 0
105 125
126#ifdef _WIN32
127#define BZ_LCCWIN32 1
128#define BZ_UNIX 0
129#endif
106 130
107 131
108/*---------------------------------------------*/ 132/*---------------------------------------------*/
@@ -112,12 +136,10 @@
112 136
113#include <stdio.h> 137#include <stdio.h>
114#include <stdlib.h> 138#include <stdlib.h>
115#if DEBUG
116 #include <assert.h>
117#endif
118#include <string.h> 139#include <string.h>
119#include <signal.h> 140#include <signal.h>
120#include <math.h> 141#include <math.h>
142#include "bzlib.h"
121 143
122#define ERROR_IF_EOF(i) { if ((i) == EOF) ioError(); } 144#define ERROR_IF_EOF(i) { if ((i) == EOF) ioError(); }
123#define ERROR_IF_NOT_ZERO(i) { if ((i) != 0) ioError(); } 145#define ERROR_IF_NOT_ZERO(i) { if ((i) != 0) ioError(); }
@@ -130,68 +152,45 @@
130--*/ 152--*/
131 153
132#if BZ_UNIX 154#if BZ_UNIX
133 #include <sys/types.h> 155# include <sys/types.h>
134 #include <utime.h> 156# include <utime.h>
135 #include <unistd.h> 157# include <unistd.h>
136 #include <malloc.h> 158# include <sys/stat.h>
137 #include <sys/stat.h> 159# include <sys/times.h>
138 #include <sys/times.h> 160
139 161# define PATH_SEP '/'
140 #define Int32 int 162# define MY_LSTAT lstat
141 #define UInt32 unsigned int 163# define MY_S_IFREG S_ISREG
142 #define Char char 164# define MY_STAT stat
143 #define UChar unsigned char 165
144 #define Int16 short 166# define APPEND_FILESPEC(root, name) \
145 #define UInt16 unsigned short
146
147 #define PATH_SEP '/'
148 #define MY_LSTAT lstat
149 #define MY_S_IFREG S_ISREG
150 #define MY_STAT stat
151
152 #define APPEND_FILESPEC(root, name) \
153 root=snocString((root), (name)) 167 root=snocString((root), (name))
154 168
155 #define SET_BINARY_MODE(fd) /**/ 169# define SET_BINARY_MODE(fd) /**/
156 170
157 /*-- 171# ifdef __GNUC__
158 You should try very hard to persuade your C compiler 172# define NORETURN __attribute__ ((noreturn))
159 to inline the bits marked INLINE. Otherwise bzip2 will 173# else
160 run rather slowly. gcc version 2.x is recommended. 174# define NORETURN /**/
161 --*/ 175# endif
162 #ifdef __GNUC__
163 #define INLINE inline
164 #define NORETURN __attribute__ ((noreturn))
165 #else
166 #define INLINE /**/
167 #define NORETURN /**/
168 #endif
169#endif 176#endif
170 177
171 178
172 179
173#if BZ_LCCWIN32 180#if BZ_LCCWIN32
174 #include <io.h> 181# include <io.h>
175 #include <fcntl.h> 182# include <fcntl.h>
176 #include <sys\stat.h> 183# include <sys\stat.h>
177 184
178 #define Int32 int 185# define NORETURN /**/
179 #define UInt32 unsigned int 186# define PATH_SEP '\\'
180 #define Int16 short 187# define MY_LSTAT _stat
181 #define UInt16 unsigned short 188# define MY_STAT _stat
182 #define Char char 189# define MY_S_IFREG(x) ((x) & _S_IFREG)
183 #define UChar unsigned char 190
184 191# if 0
185 #define INLINE /**/
186 #define NORETURN /**/
187 #define PATH_SEP '\\'
188 #define MY_LSTAT _stat
189 #define MY_STAT _stat
190 #define MY_S_IFREG(x) ((x) & _S_IFREG)
191
192 #if 0
193 /*-- lcc-win32 seems to expand wildcards itself --*/ 192 /*-- lcc-win32 seems to expand wildcards itself --*/
194 #define APPEND_FILESPEC(root, spec) \ 193# define APPEND_FILESPEC(root, spec) \
195 do { \ 194 do { \
196 if ((spec)[0] == '-') { \ 195 if ((spec)[0] == '-') { \
197 root = snocString((root), (spec)); \ 196 root = snocString((root), (spec)); \
@@ -211,12 +210,12 @@
211 } \ 210 } \
212 } \ 211 } \
213 } while ( 0 ) 212 } while ( 0 )
214 #else 213# else
215 #define APPEND_FILESPEC(root, name) \ 214# define APPEND_FILESPEC(root, name) \
216 root = snocString ((root), (name)) 215 root = snocString ((root), (name))
217 #endif 216# endif
218 217
219 #define SET_BINARY_MODE(fd) \ 218# define SET_BINARY_MODE(fd) \
220 do { \ 219 do { \
221 int retVal = setmode ( fileno ( fd ), \ 220 int retVal = setmode ( fileno ( fd ), \
222 O_BINARY ); \ 221 O_BINARY ); \
@@ -231,111 +230,32 @@
231 Some more stuff for all platforms :-) 230 Some more stuff for all platforms :-)
232--*/ 231--*/
233 232
234#define Bool unsigned char 233typedef char Char;
235#define True 1 234typedef unsigned char Bool;
236#define False 0 235typedef unsigned char UChar;
236typedef int Int32;
237typedef unsigned int UInt32;
238typedef short Int16;
239typedef unsigned short UInt16;
240
241#define True ((Bool)1)
242#define False ((Bool)0)
237 243
238/*-- 244/*--
239 IntNative is your platform's `native' int size. 245 IntNative is your platform's `native' int size.
240 Only here to avoid probs with 64-bit platforms. 246 Only here to avoid probs with 64-bit platforms.
241--*/ 247--*/
242#define IntNative int 248typedef int IntNative;
243
244
245/*--
246 change to 1, or compile with -DDEBUG=1 to debug
247--*/
248#ifndef DEBUG
249#define DEBUG 0
250#endif
251
252
253/*---------------------------------------------------*/
254/*--- ---*/
255/*---------------------------------------------------*/
256
257/*--
258 Implementation notes, July 1997
259 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
260
261 Memory allocation
262 ~~~~~~~~~~~~~~~~~
263 All large data structures are allocated on the C heap,
264 for better or for worse. That includes the various
265 arrays of pointers, striped words, bytes, frequency
266 tables and buffers for compression and decompression.
267
268 bzip2 can operate at various block-sizes, ranging from
269 100k to 900k in 100k steps, and it allocates only as
270 much as it needs to. When compressing, we know from the
271 command-line options what the block-size is going to be,
272 so all allocation can be done at start-up; if that
273 succeeds, there can be no further allocation problems.
274
275 Decompression is more complicated. Each compressed file
276 contains, in its header, a byte indicating the block
277 size used for compression. This means bzip2 potentially
278 needs to reallocate memory for each file it deals with,
279 which in turn opens the possibility for a memory allocation
280 failure part way through a run of files, by encountering
281 a file requiring a much larger block size than all the
282 ones preceding it.
283
284 The policy is to simply give up if a memory allocation
285 failure occurs. During decompression, it would be
286 possible to move on to subsequent files in the hope that
287 some might ask for a smaller block size, but the
288 complications for doing this seem more trouble than they
289 are worth.
290
291
292 Compressed file formats
293 ~~~~~~~~~~~~~~~~~~~~~~~
294 [This is now entirely different from both 0.21, and from
295 any previous Huffman-coded variant of bzip.
296 See the associated file bzip2.txt for details.]
297
298
299 Error conditions
300 ~~~~~~~~~~~~~~~~
301 Dealing with error conditions is the least satisfactory
302 aspect of bzip2. The policy is to try and leave the
303 filesystem in a consistent state, then quit, even if it
304 means not processing some of the files mentioned in the
305 command line. `A consistent state' means that a file
306 exists either in its compressed or uncompressed form,
307 but not both. This boils down to the rule `delete the
308 output file if an error condition occurs, leaving the
309 input intact'. Input files are only deleted when we can
310 be pretty sure the output file has been written and
311 closed successfully.
312
313 Errors are a dog because there's so many things to
314 deal with. The following can happen mid-file, and
315 require cleaning up.
316
317 internal `panics' -- indicating a bug
318 corrupted or inconsistent compressed file
319 can't allocate enough memory to decompress this file
320 I/O error reading/writing/opening/closing
321 signal catches -- Control-C, SIGTERM, SIGHUP.
322
323 Other conditions, primarily pertaining to file names,
324 can be checked in-between files, which makes dealing
325 with them easier.
326--*/
327
328 249
329 250
330/*---------------------------------------------------*/ 251/*---------------------------------------------------*/
331/*--- Misc (file handling) data decls ---*/ 252/*--- Misc (file handling) data decls ---*/
332/*---------------------------------------------------*/ 253/*---------------------------------------------------*/
333 254
334UInt32 bytesIn, bytesOut;
335Int32 verbosity; 255Int32 verbosity;
336Bool keepInputFiles, smallMode, testFailsExist; 256Bool keepInputFiles, smallMode;
337UInt32 globalCrc; 257Bool forceOverwrite, testFailsExist;
338Int32 numFileNames, numFilesProcessed; 258Int32 numFileNames, numFilesProcessed, blockSize100k;
339 259
340 260
341/*-- source modes; F==file, I==stdin, O==stdout --*/ 261/*-- source modes; F==file, I==stdin, O==stdout --*/
@@ -351,2691 +271,304 @@ Int32 numFileNames, numFilesProcessed;
351Int32 opMode; 271Int32 opMode;
352Int32 srcMode; 272Int32 srcMode;
353 273
274#define FILE_NAME_LEN 1034
354 275
355Int32 longestFileName; 276Int32 longestFileName;
356Char inName[1024]; 277Char inName[FILE_NAME_LEN];
357Char outName[1024]; 278Char outName[FILE_NAME_LEN];
358Char *progName; 279Char *progName;
359Char progNameReally[1024]; 280Char progNameReally[FILE_NAME_LEN];
360FILE *outputHandleJustInCase; 281FILE *outputHandleJustInCase;
361 282Int32 workFactor;
362void panic ( Char* ) NORETURN; 283
363void ioError ( void ) NORETURN; 284void panic ( Char* ) NORETURN;
364void compressOutOfMemory ( Int32, Int32 ) NORETURN; 285void ioError ( void ) NORETURN;
365void uncompressOutOfMemory ( Int32, Int32 ) NORETURN; 286void outOfMemory ( void ) NORETURN;
366void blockOverrun ( void ) NORETURN; 287void blockOverrun ( void ) NORETURN;
367void badBlockHeader ( void ) NORETURN; 288void badBlockHeader ( void ) NORETURN;
368void badBGLengths ( void ) NORETURN; 289void badBGLengths ( void ) NORETURN;
369void crcError ( UInt32, UInt32 ) NORETURN; 290void crcError ( void ) NORETURN;
370void bitStreamEOF ( void ) NORETURN; 291void bitStreamEOF ( void ) NORETURN;
371void cleanUpAndFail ( Int32 ) NORETURN; 292void cleanUpAndFail ( Int32 ) NORETURN;
372void compressedStreamEOF ( void ) NORETURN; 293void compressedStreamEOF ( void ) NORETURN;
373 294
295void copyFileName ( Char*, Char* );
374void* myMalloc ( Int32 ); 296void* myMalloc ( Int32 );
375 297
376 298
377 299
378/*---------------------------------------------------*/ 300/*---------------------------------------------------*/
379/*--- Data decls for the front end ---*/ 301/*--- Processing of complete files and streams ---*/
380/*---------------------------------------------------*/
381
382/*--
383 The overshoot bytes allow us to avoid most of
384 the cost of pointer renormalisation during
385 comparison of rotations in sorting.
386 The figure of 20 is derived as follows:
387 qSort3 allows an overshoot of up to 10.
388 It then calls simpleSort, which calls
389 fullGtU, also with max overshoot 10.
390 fullGtU does up to 10 comparisons without
391 renormalising, giving 10+10 == 20.
392--*/
393#define NUM_OVERSHOOT_BYTES 20
394
395/*--
396 These are the main data structures for
397 the Burrows-Wheeler transform.
398--*/
399
400/*--
401 Pointers to compression and decompression
402 structures. Set by
403 allocateCompressStructures and
404 setDecompressStructureSizes
405
406 The structures are always set to be suitable
407 for a block of size 100000 * blockSize100k.
408--*/
409UChar *block; /*-- compress --*/
410UInt16 *quadrant; /*-- compress --*/
411Int32 *zptr; /*-- compress --*/
412UInt16 *szptr; /*-- overlays zptr ---*/
413Int32 *ftab; /*-- compress --*/
414
415UInt16 *ll16; /*-- small decompress --*/
416UChar *ll4; /*-- small decompress --*/
417
418Int32 *tt; /*-- fast decompress --*/
419UChar *ll8; /*-- fast decompress --*/
420
421
422/*--
423 freq table collected to save a pass over the data
424 during decompression.
425--*/
426Int32 unzftab[256];
427
428
429/*--
430 index of the last char in the block, so
431 the block size == last + 1.
432--*/
433Int32 last;
434
435
436/*--
437 index in zptr[] of original string after sorting.
438--*/
439Int32 origPtr;
440
441
442/*--
443 always: in the range 0 .. 9.
444 The current block size is 100000 * this number.
445--*/
446Int32 blockSize100k;
447
448
449/*--
450 Used when sorting. If too many long comparisons
451 happen, we stop sorting, randomise the block
452 slightly, and try again.
453--*/
454
455Int32 workFactor;
456Int32 workDone;
457Int32 workLimit;
458Bool blockRandomised;
459Bool firstAttempt;
460Int32 nBlocksRandomised;
461
462
463
464/*---------------------------------------------------*/
465/*--- Data decls for the back end ---*/
466/*---------------------------------------------------*/
467
468#define MAX_ALPHA_SIZE 258
469#define MAX_CODE_LEN 23
470
471#define RUNA 0
472#define RUNB 1
473
474#define N_GROUPS 6
475#define G_SIZE 50
476#define N_ITERS 4
477
478#define MAX_SELECTORS (2 + (900000 / G_SIZE))
479
480Bool inUse[256];
481Int32 nInUse;
482
483UChar seqToUnseq[256];
484UChar unseqToSeq[256];
485
486UChar selector [MAX_SELECTORS];
487UChar selectorMtf[MAX_SELECTORS];
488
489Int32 nMTF;
490
491Int32 mtfFreq[MAX_ALPHA_SIZE];
492
493UChar len [N_GROUPS][MAX_ALPHA_SIZE];
494
495/*-- decompress only --*/
496Int32 limit [N_GROUPS][MAX_ALPHA_SIZE];
497Int32 base [N_GROUPS][MAX_ALPHA_SIZE];
498Int32 perm [N_GROUPS][MAX_ALPHA_SIZE];
499Int32 minLens[N_GROUPS];
500
501/*-- compress only --*/
502Int32 code [N_GROUPS][MAX_ALPHA_SIZE];
503Int32 rfreq[N_GROUPS][MAX_ALPHA_SIZE];
504
505
506/*---------------------------------------------------*/
507/*--- 32-bit CRC grunge ---*/
508/*---------------------------------------------------*/
509
510/*--
511 I think this is an implementation of the AUTODIN-II,
512 Ethernet & FDDI 32-bit CRC standard. Vaguely derived
513 from code by Rob Warnock, in Section 51 of the
514 comp.compression FAQ.
515--*/
516
517UInt32 crc32Table[256] = {
518
519 /*-- Ugly, innit? --*/
520
521 0x00000000UL, 0x04c11db7UL, 0x09823b6eUL, 0x0d4326d9UL,
522 0x130476dcUL, 0x17c56b6bUL, 0x1a864db2UL, 0x1e475005UL,
523 0x2608edb8UL, 0x22c9f00fUL, 0x2f8ad6d6UL, 0x2b4bcb61UL,
524 0x350c9b64UL, 0x31cd86d3UL, 0x3c8ea00aUL, 0x384fbdbdUL,
525 0x4c11db70UL, 0x48d0c6c7UL, 0x4593e01eUL, 0x4152fda9UL,
526 0x5f15adacUL, 0x5bd4b01bUL, 0x569796c2UL, 0x52568b75UL,
527 0x6a1936c8UL, 0x6ed82b7fUL, 0x639b0da6UL, 0x675a1011UL,
528 0x791d4014UL, 0x7ddc5da3UL, 0x709f7b7aUL, 0x745e66cdUL,
529 0x9823b6e0UL, 0x9ce2ab57UL, 0x91a18d8eUL, 0x95609039UL,
530 0x8b27c03cUL, 0x8fe6dd8bUL, 0x82a5fb52UL, 0x8664e6e5UL,
531 0xbe2b5b58UL, 0xbaea46efUL, 0xb7a96036UL, 0xb3687d81UL,
532 0xad2f2d84UL, 0xa9ee3033UL, 0xa4ad16eaUL, 0xa06c0b5dUL,
533 0xd4326d90UL, 0xd0f37027UL, 0xddb056feUL, 0xd9714b49UL,
534 0xc7361b4cUL, 0xc3f706fbUL, 0xceb42022UL, 0xca753d95UL,
535 0xf23a8028UL, 0xf6fb9d9fUL, 0xfbb8bb46UL, 0xff79a6f1UL,
536 0xe13ef6f4UL, 0xe5ffeb43UL, 0xe8bccd9aUL, 0xec7dd02dUL,
537 0x34867077UL, 0x30476dc0UL, 0x3d044b19UL, 0x39c556aeUL,
538 0x278206abUL, 0x23431b1cUL, 0x2e003dc5UL, 0x2ac12072UL,
539 0x128e9dcfUL, 0x164f8078UL, 0x1b0ca6a1UL, 0x1fcdbb16UL,
540 0x018aeb13UL, 0x054bf6a4UL, 0x0808d07dUL, 0x0cc9cdcaUL,
541 0x7897ab07UL, 0x7c56b6b0UL, 0x71159069UL, 0x75d48ddeUL,
542 0x6b93dddbUL, 0x6f52c06cUL, 0x6211e6b5UL, 0x66d0fb02UL,
543 0x5e9f46bfUL, 0x5a5e5b08UL, 0x571d7dd1UL, 0x53dc6066UL,
544 0x4d9b3063UL, 0x495a2dd4UL, 0x44190b0dUL, 0x40d816baUL,
545 0xaca5c697UL, 0xa864db20UL, 0xa527fdf9UL, 0xa1e6e04eUL,
546 0xbfa1b04bUL, 0xbb60adfcUL, 0xb6238b25UL, 0xb2e29692UL,
547 0x8aad2b2fUL, 0x8e6c3698UL, 0x832f1041UL, 0x87ee0df6UL,
548 0x99a95df3UL, 0x9d684044UL, 0x902b669dUL, 0x94ea7b2aUL,
549 0xe0b41de7UL, 0xe4750050UL, 0xe9362689UL, 0xedf73b3eUL,
550 0xf3b06b3bUL, 0xf771768cUL, 0xfa325055UL, 0xfef34de2UL,
551 0xc6bcf05fUL, 0xc27dede8UL, 0xcf3ecb31UL, 0xcbffd686UL,
552 0xd5b88683UL, 0xd1799b34UL, 0xdc3abdedUL, 0xd8fba05aUL,
553 0x690ce0eeUL, 0x6dcdfd59UL, 0x608edb80UL, 0x644fc637UL,
554 0x7a089632UL, 0x7ec98b85UL, 0x738aad5cUL, 0x774bb0ebUL,
555 0x4f040d56UL, 0x4bc510e1UL, 0x46863638UL, 0x42472b8fUL,
556 0x5c007b8aUL, 0x58c1663dUL, 0x558240e4UL, 0x51435d53UL,
557 0x251d3b9eUL, 0x21dc2629UL, 0x2c9f00f0UL, 0x285e1d47UL,
558 0x36194d42UL, 0x32d850f5UL, 0x3f9b762cUL, 0x3b5a6b9bUL,
559 0x0315d626UL, 0x07d4cb91UL, 0x0a97ed48UL, 0x0e56f0ffUL,
560 0x1011a0faUL, 0x14d0bd4dUL, 0x19939b94UL, 0x1d528623UL,
561 0xf12f560eUL, 0xf5ee4bb9UL, 0xf8ad6d60UL, 0xfc6c70d7UL,
562 0xe22b20d2UL, 0xe6ea3d65UL, 0xeba91bbcUL, 0xef68060bUL,
563 0xd727bbb6UL, 0xd3e6a601UL, 0xdea580d8UL, 0xda649d6fUL,
564 0xc423cd6aUL, 0xc0e2d0ddUL, 0xcda1f604UL, 0xc960ebb3UL,
565 0xbd3e8d7eUL, 0xb9ff90c9UL, 0xb4bcb610UL, 0xb07daba7UL,
566 0xae3afba2UL, 0xaafbe615UL, 0xa7b8c0ccUL, 0xa379dd7bUL,
567 0x9b3660c6UL, 0x9ff77d71UL, 0x92b45ba8UL, 0x9675461fUL,
568 0x8832161aUL, 0x8cf30badUL, 0x81b02d74UL, 0x857130c3UL,
569 0x5d8a9099UL, 0x594b8d2eUL, 0x5408abf7UL, 0x50c9b640UL,
570 0x4e8ee645UL, 0x4a4ffbf2UL, 0x470cdd2bUL, 0x43cdc09cUL,
571 0x7b827d21UL, 0x7f436096UL, 0x7200464fUL, 0x76c15bf8UL,
572 0x68860bfdUL, 0x6c47164aUL, 0x61043093UL, 0x65c52d24UL,
573 0x119b4be9UL, 0x155a565eUL, 0x18197087UL, 0x1cd86d30UL,
574 0x029f3d35UL, 0x065e2082UL, 0x0b1d065bUL, 0x0fdc1becUL,
575 0x3793a651UL, 0x3352bbe6UL, 0x3e119d3fUL, 0x3ad08088UL,
576 0x2497d08dUL, 0x2056cd3aUL, 0x2d15ebe3UL, 0x29d4f654UL,
577 0xc5a92679UL, 0xc1683bceUL, 0xcc2b1d17UL, 0xc8ea00a0UL,
578 0xd6ad50a5UL, 0xd26c4d12UL, 0xdf2f6bcbUL, 0xdbee767cUL,
579 0xe3a1cbc1UL, 0xe760d676UL, 0xea23f0afUL, 0xeee2ed18UL,
580 0xf0a5bd1dUL, 0xf464a0aaUL, 0xf9278673UL, 0xfde69bc4UL,
581 0x89b8fd09UL, 0x8d79e0beUL, 0x803ac667UL, 0x84fbdbd0UL,
582 0x9abc8bd5UL, 0x9e7d9662UL, 0x933eb0bbUL, 0x97ffad0cUL,
583 0xafb010b1UL, 0xab710d06UL, 0xa6322bdfUL, 0xa2f33668UL,
584 0xbcb4666dUL, 0xb8757bdaUL, 0xb5365d03UL, 0xb1f740b4UL
585};
586
587
588/*---------------------------------------------*/
589void initialiseCRC ( void )
590{
591 globalCrc = 0xffffffffUL;
592}
593
594
595/*---------------------------------------------*/
596UInt32 getFinalCRC ( void )
597{
598 return ~globalCrc;
599}
600
601
602/*---------------------------------------------*/
603UInt32 getGlobalCRC ( void )
604{
605 return globalCrc;
606}
607
608
609/*---------------------------------------------*/
610void setGlobalCRC ( UInt32 newCrc )
611{
612 globalCrc = newCrc;
613}
614
615
616/*---------------------------------------------*/
617#define UPDATE_CRC(crcVar,cha) \
618{ \
619 crcVar = (crcVar << 8) ^ \
620 crc32Table[(crcVar >> 24) ^ \
621 ((UChar)cha)]; \
622}
623
624
625/*---------------------------------------------------*/
626/*--- Bit stream I/O ---*/
627/*---------------------------------------------------*/ 302/*---------------------------------------------------*/
628 303
629
630UInt32 bsBuff;
631Int32 bsLive;
632FILE* bsStream;
633Bool bsWriting;
634
635
636/*---------------------------------------------*/
637void bsSetStream ( FILE* f, Bool wr )
638{
639 if (bsStream != NULL) panic ( "bsSetStream" );
640 bsStream = f;
641 bsLive = 0;
642 bsBuff = 0;
643 bytesOut = 0;
644 bytesIn = 0;
645 bsWriting = wr;
646}
647
648
649/*---------------------------------------------*/
650void bsFinishedWithStream ( void )
651{
652 if (bsWriting)
653 while (bsLive > 0) {
654 fputc ( (UChar)(bsBuff >> 24), bsStream );
655 bsBuff <<= 8;
656 bsLive -= 8;
657 bytesOut++;
658 }
659 bsStream = NULL;
660}
661
662
663/*---------------------------------------------*/
664#define bsNEEDR(nz) \
665{ \
666 while (bsLive < nz) { \
667 Int32 zzi = fgetc ( bsStream ); \
668 if (zzi == EOF) compressedStreamEOF(); \
669 bsBuff = (bsBuff << 8) | (zzi & 0xffL); \
670 bsLive += 8; \
671 } \
672}
673
674
675/*---------------------------------------------*/
676#define bsNEEDW(nz) \
677{ \
678 while (bsLive >= 8) { \
679 fputc ( (UChar)(bsBuff >> 24), \
680 bsStream ); \
681 bsBuff <<= 8; \
682 bsLive -= 8; \
683 bytesOut++; \
684 } \
685}
686
687
688/*---------------------------------------------*/
689#define bsR1(vz) \
690{ \
691 bsNEEDR(1); \
692 vz = (bsBuff >> (bsLive-1)) & 1; \
693 bsLive--; \
694}
695
696
697/*---------------------------------------------*/
698INLINE UInt32 bsR ( Int32 n )
699{
700 UInt32 v;
701 bsNEEDR ( n );
702 v = (bsBuff >> (bsLive-n)) & ((1 << n)-1);
703 bsLive -= n;
704 return v;
705}
706
707
708/*---------------------------------------------*/
709INLINE void bsW ( Int32 n, UInt32 v )
710{
711 bsNEEDW ( n );
712 bsBuff |= (v << (32 - bsLive - n));
713 bsLive += n;
714}
715
716
717/*---------------------------------------------*/
718UChar bsGetUChar ( void )
719{
720 return (UChar)bsR(8);
721}
722
723
724/*---------------------------------------------*/
725void bsPutUChar ( UChar c )
726{
727 bsW(8, (UInt32)c );
728}
729
730
731/*---------------------------------------------*/ 304/*---------------------------------------------*/
732Int32 bsGetUInt32 ( void ) 305Bool myfeof ( FILE* f )
733{ 306{
734 UInt32 u; 307 Int32 c = fgetc ( f );
735 u = 0; 308 if (c == EOF) return True;
736 u = (u << 8) | bsR(8); 309 ungetc ( c, f );
737 u = (u << 8) | bsR(8); 310 return False;
738 u = (u << 8) | bsR(8);
739 u = (u << 8) | bsR(8);
740 return u;
741}
742
743
744/*---------------------------------------------*/
745UInt32 bsGetIntVS ( UInt32 numBits )
746{
747 return (UInt32)bsR(numBits);
748}
749
750
751/*---------------------------------------------*/
752UInt32 bsGetInt32 ( void )
753{
754 return (Int32)bsGetUInt32();
755}
756
757
758/*---------------------------------------------*/
759void bsPutUInt32 ( UInt32 u )
760{
761 bsW ( 8, (u >> 24) & 0xffL );
762 bsW ( 8, (u >> 16) & 0xffL );
763 bsW ( 8, (u >> 8) & 0xffL );
764 bsW ( 8, u & 0xffL );
765}
766
767
768/*---------------------------------------------*/
769void bsPutInt32 ( Int32 c )
770{
771 bsPutUInt32 ( (UInt32)c );
772} 311}
773 312
774 313
775/*---------------------------------------------*/ 314/*---------------------------------------------*/
776void bsPutIntVS ( Int32 numBits, UInt32 c ) 315void compressStream ( FILE *stream, FILE *zStream )
777{ 316{
778 bsW ( numBits, c ); 317 BZFILE* bzf = NULL;
779} 318 UChar ibuf[5000];
780 319 Int32 nIbuf;
781 320 UInt32 nbytes_in, nbytes_out;
782/*---------------------------------------------------*/ 321 Int32 bzerr, bzerr_dummy, ret;
783/*--- Huffman coding low-level stuff ---*/
784/*---------------------------------------------------*/
785
786#define WEIGHTOF(zz0) ((zz0) & 0xffffff00)
787#define DEPTHOF(zz1) ((zz1) & 0x000000ff)
788#define MYMAX(zz2,zz3) ((zz2) > (zz3) ? (zz2) : (zz3))
789
790#define ADDWEIGHTS(zw1,zw2) \
791 (WEIGHTOF(zw1)+WEIGHTOF(zw2)) | \
792 (1 + MYMAX(DEPTHOF(zw1),DEPTHOF(zw2)))
793
794#define UPHEAP(z) \
795{ \
796 Int32 zz, tmp; \
797 zz = z; tmp = heap[zz]; \
798 while (weight[tmp] < weight[heap[zz >> 1]]) { \
799 heap[zz] = heap[zz >> 1]; \
800 zz >>= 1; \
801 } \
802 heap[zz] = tmp; \
803}
804
805#define DOWNHEAP(z) \
806{ \
807 Int32 zz, yy, tmp; \
808 zz = z; tmp = heap[zz]; \
809 while (True) { \
810 yy = zz << 1; \
811 if (yy > nHeap) break; \
812 if (yy < nHeap && \
813 weight[heap[yy+1]] < weight[heap[yy]]) \
814 yy++; \
815 if (weight[tmp] < weight[heap[yy]]) break; \
816 heap[zz] = heap[yy]; \
817 zz = yy; \
818 } \
819 heap[zz] = tmp; \
820}
821 322
323 SET_BINARY_MODE(stream);
324 SET_BINARY_MODE(zStream);
822 325
823/*---------------------------------------------*/ 326 if (ferror(stream)) goto errhandler_io;
824void hbMakeCodeLengths ( UChar *len, 327 if (ferror(zStream)) goto errhandler_io;
825 Int32 *freq,
826 Int32 alphaSize,
827 Int32 maxLen )
828{
829 /*--
830 Nodes and heap entries run from 1. Entry 0
831 for both the heap and nodes is a sentinel.
832 --*/
833 Int32 nNodes, nHeap, n1, n2, i, j, k;
834 Bool tooLong;
835 328
836 Int32 heap [ MAX_ALPHA_SIZE + 2 ]; 329 bzf = bzWriteOpen ( &bzerr, zStream,
837 Int32 weight [ MAX_ALPHA_SIZE * 2 ]; 330 blockSize100k, verbosity, workFactor );
838 Int32 parent [ MAX_ALPHA_SIZE * 2 ]; 331 if (bzerr != BZ_OK) goto errhandler;
839 332
840 for (i = 0; i < alphaSize; i++) 333 if (verbosity >= 2) fprintf ( stderr, "\n" );
841 weight[i+1] = (freq[i] == 0 ? 1 : freq[i]) << 8;
842 334
843 while (True) { 335 while (True) {
844 336
845 nNodes = alphaSize; 337 if (myfeof(stream)) break;
846 nHeap = 0; 338 nIbuf = fread ( ibuf, sizeof(UChar), 5000, stream );
847 339 if (ferror(stream)) goto errhandler_io;
848 heap[0] = 0; 340 if (nIbuf > 0) bzWrite ( &bzerr, bzf, (void*)ibuf, nIbuf );
849 weight[0] = 0; 341 if (bzerr != BZ_OK) goto errhandler;
850 parent[0] = -2;
851
852 for (i = 1; i <= alphaSize; i++) {
853 parent[i] = -1;
854 nHeap++;
855 heap[nHeap] = i;
856 UPHEAP(nHeap);
857 }
858 if (!(nHeap < (MAX_ALPHA_SIZE+2)))
859 panic ( "hbMakeCodeLengths(1)" );
860
861 while (nHeap > 1) {
862 n1 = heap[1]; heap[1] = heap[nHeap]; nHeap--; DOWNHEAP(1);
863 n2 = heap[1]; heap[1] = heap[nHeap]; nHeap--; DOWNHEAP(1);
864 nNodes++;
865 parent[n1] = parent[n2] = nNodes;
866 weight[nNodes] = ADDWEIGHTS(weight[n1], weight[n2]);
867 parent[nNodes] = -1;
868 nHeap++;
869 heap[nHeap] = nNodes;
870 UPHEAP(nHeap);
871 }
872 if (!(nNodes < (MAX_ALPHA_SIZE * 2)))
873 panic ( "hbMakeCodeLengths(2)" );
874
875 tooLong = False;
876 for (i = 1; i <= alphaSize; i++) {
877 j = 0;
878 k = i;
879 while (parent[k] >= 0) { k = parent[k]; j++; }
880 len[i-1] = j;
881 if (j > maxLen) tooLong = True;
882 }
883
884 if (! tooLong) break;
885 342
886 for (i = 1; i < alphaSize; i++) {
887 j = weight[i] >> 8;
888 j = 1 + (j / 2);
889 weight[i] = j << 8;
890 }
891 } 343 }
892}
893
894 344
895/*---------------------------------------------*/ 345 bzWriteClose ( &bzerr, bzf, 0, &nbytes_in, &nbytes_out );
896void hbAssignCodes ( Int32 *code, 346 if (bzerr != BZ_OK) goto errhandler;
897 UChar *length,
898 Int32 minLen,
899 Int32 maxLen,
900 Int32 alphaSize )
901{
902 Int32 n, vec, i;
903 347
904 vec = 0; 348 if (ferror(zStream)) goto errhandler_io;
905 for (n = minLen; n <= maxLen; n++) { 349 ret = fflush ( zStream );
906 for (i = 0; i < alphaSize; i++) 350 if (ret == EOF) goto errhandler_io;
907 if (length[i] == n) { code[i] = vec; vec++; }; 351 if (zStream != stdout) {
908 vec <<= 1; 352 ret = fclose ( zStream );
353 if (ret == EOF) goto errhandler_io;
909 } 354 }
910} 355 if (ferror(stream)) goto errhandler_io;
911 356 ret = fclose ( stream );
912 357 if (ret == EOF) goto errhandler_io;
913/*---------------------------------------------*/
914void hbCreateDecodeTables ( Int32 *limit,
915 Int32 *base,
916 Int32 *perm,
917 UChar *length,
918 Int32 minLen,
919 Int32 maxLen,
920 Int32 alphaSize )
921{
922 Int32 pp, i, j, vec;
923
924 pp = 0;
925 for (i = minLen; i <= maxLen; i++)
926 for (j = 0; j < alphaSize; j++)
927 if (length[j] == i) { perm[pp] = j; pp++; };
928
929 for (i = 0; i < MAX_CODE_LEN; i++) base[i] = 0;
930 for (i = 0; i < alphaSize; i++) base[length[i]+1]++;
931 358
932 for (i = 1; i < MAX_CODE_LEN; i++) base[i] += base[i-1]; 359 if (nbytes_in == 0) nbytes_in = 1;
933 360
934 for (i = 0; i < MAX_CODE_LEN; i++) limit[i] = 0; 361 if (verbosity >= 1)
935 vec = 0; 362 fprintf ( stderr, "%6.3f:1, %6.3f bits/byte, "
936 363 "%5.2f%% saved, %d in, %d out.\n",
937 for (i = minLen; i <= maxLen; i++) { 364 (float)nbytes_in / (float)nbytes_out,
938 vec += (base[i+1] - base[i]); 365 (8.0 * (float)nbytes_out) / (float)nbytes_in,
939 limit[i] = vec-1; 366 100.0 * (1.0 - (float)nbytes_out / (float)nbytes_in),
940 vec <<= 1; 367 nbytes_in,
941 } 368 nbytes_out
942 for (i = minLen + 1; i <= maxLen; i++) 369 );
943 base[i] = ((limit[i-1] + 1) << 1) - base[i];
944}
945
946
947
948/*---------------------------------------------------*/
949/*--- Undoing the reversible transformation ---*/
950/*---------------------------------------------------*/
951
952/*---------------------------------------------*/
953#define SET_LL4(i,n) \
954 { if (((i) & 0x1) == 0) \
955 ll4[(i) >> 1] = (ll4[(i) >> 1] & 0xf0) | (n); else \
956 ll4[(i) >> 1] = (ll4[(i) >> 1] & 0x0f) | ((n) << 4); \
957 }
958
959#define GET_LL4(i) \
960 (((UInt32)(ll4[(i) >> 1])) >> (((i) << 2) & 0x4) & 0xF)
961
962#define SET_LL(i,n) \
963 { ll16[i] = (UInt16)(n & 0x0000ffff); \
964 SET_LL4(i, n >> 16); \
965 }
966
967#define GET_LL(i) \
968 (((UInt32)ll16[i]) | (GET_LL4(i) << 16))
969
970
971/*---------------------------------------------*/
972/*--
973 Manage memory for compression/decompression.
974 When compressing, a single block size applies to
975 all files processed, and that's set when the
976 program starts. But when decompressing, each file
977 processed could have been compressed with a
978 different block size, so we may have to free
979 and reallocate on a per-file basis.
980
981 A call with argument of zero means
982 `free up everything.' And a value of zero for
983 blockSize100k means no memory is currently allocated.
984--*/
985 370
371 return;
986 372
987/*---------------------------------------------*/ 373 errhandler:
988void allocateCompressStructures ( void ) 374 bzWriteClose ( &bzerr_dummy, bzf, 1, &nbytes_in, &nbytes_out );
989{ 375 switch (bzerr) {
990 Int32 n = 100000 * blockSize100k; 376 case BZ_MEM_ERROR:
991 block = malloc ( (n + 1 + NUM_OVERSHOOT_BYTES) * sizeof(UChar) ); 377 outOfMemory ();
992 quadrant = malloc ( (n + NUM_OVERSHOOT_BYTES) * sizeof(Int16) ); 378 case BZ_IO_ERROR:
993 zptr = malloc ( n * sizeof(Int32) ); 379 errhandler_io:
994 ftab = malloc ( 65537 * sizeof(Int32) ); 380 ioError(); break;
995 381 default:
996 if (block == NULL || quadrant == NULL || 382 panic ( "compress:unexpected error" );
997 zptr == NULL || ftab == NULL) {
998 Int32 totalDraw
999 = (n + 1 + NUM_OVERSHOOT_BYTES) * sizeof(UChar) +
1000 (n + NUM_OVERSHOOT_BYTES) * sizeof(Int16) +
1001 n * sizeof(Int32) +
1002 65537 * sizeof(Int32);
1003
1004 compressOutOfMemory ( totalDraw, n );
1005 } 383 }
1006 384
1007 /*-- 385 panic ( "compress:end" );
1008 Since we want valid indexes for block of 386 /*notreached*/
1009 -1 to n + NUM_OVERSHOOT_BYTES - 1
1010 inclusive.
1011 --*/
1012 block++;
1013
1014 /*--
1015 The back end needs a place to store the MTF values
1016 whilst it calculates the coding tables. We could
1017 put them in the zptr array. However, these values
1018 will fit in a short, so we overlay szptr at the
1019 start of zptr, in the hope of reducing the number
1020 of cache misses induced by the multiple traversals
1021 of the MTF values when calculating coding tables.
1022 Seems to improve compression speed by about 1%.
1023 --*/
1024 szptr = (UInt16*)zptr;
1025}
1026
1027
1028/*---------------------------------------------*/
1029void setDecompressStructureSizes ( Int32 newSize100k )
1030{
1031 if (! (0 <= newSize100k && newSize100k <= 9 &&
1032 0 <= blockSize100k && blockSize100k <= 9))
1033 panic ( "setDecompressStructureSizes" );
1034
1035 if (newSize100k == blockSize100k) return;
1036
1037 blockSize100k = newSize100k;
1038
1039 if (ll16 != NULL) free ( ll16 );
1040 if (ll4 != NULL) free ( ll4 );
1041 if (ll8 != NULL) free ( ll8 );
1042 if (tt != NULL) free ( tt );
1043
1044 if (newSize100k == 0) return;
1045
1046 if (smallMode) {
1047
1048 Int32 n = 100000 * newSize100k;
1049 ll16 = malloc ( n * sizeof(UInt16) );
1050 ll4 = malloc ( ((n+1) >> 1) * sizeof(UChar) );
1051
1052 if (ll4 == NULL || ll16 == NULL) {
1053 Int32 totalDraw
1054 = n * sizeof(Int16) + ((n+1) >> 1) * sizeof(UChar);
1055 uncompressOutOfMemory ( totalDraw, n );
1056 }
1057
1058 } else {
1059
1060 Int32 n = 100000 * newSize100k;
1061 ll8 = malloc ( n * sizeof(UChar) );
1062 tt = malloc ( n * sizeof(Int32) );
1063
1064 if (ll8 == NULL || tt == NULL) {
1065 Int32 totalDraw
1066 = n * sizeof(UChar) + n * sizeof(UInt32);
1067 uncompressOutOfMemory ( totalDraw, n );
1068 }
1069
1070 }
1071} 387}
1072 388
1073 389
1074 390
1075/*---------------------------------------------------*/
1076/*--- The new back end ---*/
1077/*---------------------------------------------------*/
1078
1079/*---------------------------------------------*/
1080void makeMaps ( void )
1081{
1082 Int32 i;
1083 nInUse = 0;
1084 for (i = 0; i < 256; i++)
1085 if (inUse[i]) {
1086 seqToUnseq[nInUse] = i;
1087 unseqToSeq[i] = nInUse;
1088 nInUse++;
1089 }
1090}
1091
1092
1093/*---------------------------------------------*/ 391/*---------------------------------------------*/
1094void generateMTFValues ( void ) 392Bool uncompressStream ( FILE *zStream, FILE *stream )
1095{
1096 UChar yy[256];
1097 Int32 i, j;
1098 UChar tmp;
1099 UChar tmp2;
1100 Int32 zPend;
1101 Int32 wr;
1102 Int32 EOB;
1103
1104 makeMaps();
1105 EOB = nInUse+1;
1106
1107 for (i = 0; i <= EOB; i++) mtfFreq[i] = 0;
1108
1109 wr = 0;
1110 zPend = 0;
1111 for (i = 0; i < nInUse; i++) yy[i] = (UChar) i;
1112
1113
1114 for (i = 0; i <= last; i++) {
1115 UChar ll_i;
1116
1117 #if DEBUG
1118 assert (wr <= i);
1119 #endif
1120
1121 ll_i = unseqToSeq[block[zptr[i] - 1]];
1122 #if DEBUG
1123 assert (ll_i < nInUse);
1124 #endif
1125
1126 j = 0;
1127 tmp = yy[j];
1128 while ( ll_i != tmp ) {
1129 j++;
1130 tmp2 = tmp;
1131 tmp = yy[j];
1132 yy[j] = tmp2;
1133 };
1134 yy[0] = tmp;
1135
1136 if (j == 0) {
1137 zPend++;
1138 } else {
1139 if (zPend > 0) {
1140 zPend--;
1141 while (True) {
1142 switch (zPend % 2) {
1143 case 0: szptr[wr] = RUNA; wr++; mtfFreq[RUNA]++; break;
1144 case 1: szptr[wr] = RUNB; wr++; mtfFreq[RUNB]++; break;
1145 };
1146 if (zPend < 2) break;
1147 zPend = (zPend - 2) / 2;
1148 };
1149 zPend = 0;
1150 }
1151 szptr[wr] = j+1; wr++; mtfFreq[j+1]++;
1152 }
1153 }
1154
1155 if (zPend > 0) {
1156 zPend--;
1157 while (True) {
1158 switch (zPend % 2) {
1159 case 0: szptr[wr] = RUNA; wr++; mtfFreq[RUNA]++; break;
1160 case 1: szptr[wr] = RUNB; wr++; mtfFreq[RUNB]++; break;
1161 };
1162 if (zPend < 2) break;
1163 zPend = (zPend - 2) / 2;
1164 };
1165 }
1166
1167 szptr[wr] = EOB; wr++; mtfFreq[EOB]++;
1168
1169 nMTF = wr;
1170}
1171
1172
1173/*---------------------------------------------*/
1174#define LESSER_ICOST 0
1175#define GREATER_ICOST 15
1176
1177void sendMTFValues ( void )
1178{ 393{
1179 Int32 v, t, i, j, gs, ge, totc, bt, bc, iter; 394 BZFILE* bzf = NULL;
1180 Int32 nSelectors, alphaSize, minLen, maxLen, selCtr; 395 Int32 bzerr, bzerr_dummy, ret, nread, streamNo, i;
1181 Int32 nGroups, nBytes; 396 UChar obuf[5000];
1182 397 UChar unused[BZ_MAX_UNUSED];
1183 /*-- 398 Int32 nUnused;
1184 UChar len [N_GROUPS][MAX_ALPHA_SIZE]; 399 UChar* unusedTmp;
1185 is a global since the decoder also needs it.
1186
1187 Int32 code[N_GROUPS][MAX_ALPHA_SIZE];
1188 Int32 rfreq[N_GROUPS][MAX_ALPHA_SIZE];
1189 are also globals only used in this proc.
1190 Made global to keep stack frame size small.
1191 --*/
1192
1193
1194 UInt16 cost[N_GROUPS];
1195 Int32 fave[N_GROUPS];
1196
1197 if (verbosity >= 3)
1198 fprintf ( stderr,
1199 " %d in block, %d after MTF & 1-2 coding, %d+2 syms in use\n",
1200 last+1, nMTF, nInUse );
1201
1202 alphaSize = nInUse+2;
1203 for (t = 0; t < N_GROUPS; t++)
1204 for (v = 0; v < alphaSize; v++)
1205 len[t][v] = GREATER_ICOST;
1206
1207 /*--- Decide how many coding tables to use ---*/
1208 if (nMTF <= 0) panic ( "sendMTFValues(0)" );
1209 if (nMTF < 200) nGroups = 2; else
1210 if (nMTF < 800) nGroups = 4; else
1211 nGroups = 6;
1212
1213 /*--- Generate an initial set of coding tables ---*/
1214 {
1215 Int32 nPart, remF, tFreq, aFreq;
1216
1217 nPart = nGroups;
1218 remF = nMTF;
1219 gs = 0;
1220 while (nPart > 0) {
1221 tFreq = remF / nPart;
1222 ge = gs-1;
1223 aFreq = 0;
1224 while (aFreq < tFreq && ge < alphaSize-1) {
1225 ge++;
1226 aFreq += mtfFreq[ge];
1227 }
1228
1229 if (ge > gs
1230 && nPart != nGroups && nPart != 1
1231 && ((nGroups-nPart) % 2 == 1)) {
1232 aFreq -= mtfFreq[ge];
1233 ge--;
1234 }
1235 400
1236 if (verbosity >= 3) 401 nUnused = 0;
1237 fprintf ( stderr, 402 streamNo = 0;
1238 " initial group %d, [%d .. %d], has %d syms (%4.1f%%)\n",
1239 nPart, gs, ge, aFreq,
1240 (100.0 * (float)aFreq) / (float)nMTF );
1241
1242 for (v = 0; v < alphaSize; v++)
1243 if (v >= gs && v <= ge)
1244 len[nPart-1][v] = LESSER_ICOST; else
1245 len[nPart-1][v] = GREATER_ICOST;
1246
1247 nPart--;
1248 gs = ge+1;
1249 remF -= aFreq;
1250 }
1251 }
1252
1253 /*---
1254 Iterate up to N_ITERS times to improve the tables.
1255 ---*/
1256 for (iter = 0; iter < N_ITERS; iter++) {
1257
1258 for (t = 0; t < nGroups; t++) fave[t] = 0;
1259
1260 for (t = 0; t < nGroups; t++)
1261 for (v = 0; v < alphaSize; v++)
1262 rfreq[t][v] = 0;
1263
1264 nSelectors = 0;
1265 totc = 0;
1266 gs = 0;
1267 while (True) {
1268
1269 /*--- Set group start & end marks. --*/
1270 if (gs >= nMTF) break;
1271 ge = gs + G_SIZE - 1;
1272 if (ge >= nMTF) ge = nMTF-1;
1273
1274 /*--
1275 Calculate the cost of this group as coded
1276 by each of the coding tables.
1277 --*/
1278 for (t = 0; t < nGroups; t++) cost[t] = 0;
1279
1280 if (nGroups == 6) {
1281 register UInt16 cost0, cost1, cost2, cost3, cost4, cost5;
1282 cost0 = cost1 = cost2 = cost3 = cost4 = cost5 = 0;
1283 for (i = gs; i <= ge; i++) {
1284 UInt16 icv = szptr[i];
1285 cost0 += len[0][icv];
1286 cost1 += len[1][icv];
1287 cost2 += len[2][icv];
1288 cost3 += len[3][icv];
1289 cost4 += len[4][icv];
1290 cost5 += len[5][icv];
1291 }
1292 cost[0] = cost0; cost[1] = cost1; cost[2] = cost2;
1293 cost[3] = cost3; cost[4] = cost4; cost[5] = cost5;
1294 } else {
1295 for (i = gs; i <= ge; i++) {
1296 UInt16 icv = szptr[i];
1297 for (t = 0; t < nGroups; t++) cost[t] += len[t][icv];
1298 }
1299 }
1300
1301 /*--
1302 Find the coding table which is best for this group,
1303 and record its identity in the selector table.
1304 --*/
1305 bc = 999999999; bt = -1;
1306 for (t = 0; t < nGroups; t++)
1307 if (cost[t] < bc) { bc = cost[t]; bt = t; };
1308 totc += bc;
1309 fave[bt]++;
1310 selector[nSelectors] = bt;
1311 nSelectors++;
1312
1313 /*--
1314 Increment the symbol frequencies for the selected table.
1315 --*/
1316 for (i = gs; i <= ge; i++)
1317 rfreq[bt][ szptr[i] ]++;
1318
1319 gs = ge+1;
1320 }
1321 if (verbosity >= 3) {
1322 fprintf ( stderr,
1323 " pass %d: size is %d, grp uses are ",
1324 iter+1, totc/8 );
1325 for (t = 0; t < nGroups; t++)
1326 fprintf ( stderr, "%d ", fave[t] );
1327 fprintf ( stderr, "\n" );
1328 }
1329
1330 /*--
1331 Recompute the tables based on the accumulated frequencies.
1332 --*/
1333 for (t = 0; t < nGroups; t++)
1334 hbMakeCodeLengths ( &len[t][0], &rfreq[t][0], alphaSize, 20 );
1335 }
1336 403
404 SET_BINARY_MODE(stream);
405 SET_BINARY_MODE(zStream);
1337 406
1338 if (!(nGroups < 8)) panic ( "sendMTFValues(1)" ); 407 if (ferror(stream)) goto errhandler_io;
1339 if (!(nSelectors < 32768 && 408 if (ferror(zStream)) goto errhandler_io;
1340 nSelectors <= (2 + (900000 / G_SIZE))))
1341 panic ( "sendMTFValues(2)" );
1342
1343
1344 /*--- Compute MTF values for the selectors. ---*/
1345 {
1346 UChar pos[N_GROUPS], ll_i, tmp2, tmp;
1347 for (i = 0; i < nGroups; i++) pos[i] = i;
1348 for (i = 0; i < nSelectors; i++) {
1349 ll_i = selector[i];
1350 j = 0;
1351 tmp = pos[j];
1352 while ( ll_i != tmp ) {
1353 j++;
1354 tmp2 = tmp;
1355 tmp = pos[j];
1356 pos[j] = tmp2;
1357 };
1358 pos[0] = tmp;
1359 selectorMtf[i] = j;
1360 }
1361 };
1362
1363 /*--- Assign actual codes for the tables. --*/
1364 for (t = 0; t < nGroups; t++) {
1365 minLen = 32;
1366 maxLen = 0;
1367 for (i = 0; i < alphaSize; i++) {
1368 if (len[t][i] > maxLen) maxLen = len[t][i];
1369 if (len[t][i] < minLen) minLen = len[t][i];
1370 }
1371 if (maxLen > 20) panic ( "sendMTFValues(3)" );
1372 if (minLen < 1) panic ( "sendMTFValues(4)" );
1373 hbAssignCodes ( &code[t][0], &len[t][0],
1374 minLen, maxLen, alphaSize );
1375 }
1376
1377 /*--- Transmit the mapping table. ---*/
1378 {
1379 Bool inUse16[16];
1380 for (i = 0; i < 16; i++) {
1381 inUse16[i] = False;
1382 for (j = 0; j < 16; j++)
1383 if (inUse[i * 16 + j]) inUse16[i] = True;
1384 }
1385
1386 nBytes = bytesOut;
1387 for (i = 0; i < 16; i++)
1388 if (inUse16[i]) bsW(1,1); else bsW(1,0);
1389
1390 for (i = 0; i < 16; i++)
1391 if (inUse16[i])
1392 for (j = 0; j < 16; j++)
1393 if (inUse[i * 16 + j]) bsW(1,1); else bsW(1,0);
1394
1395 if (verbosity >= 3)
1396 fprintf ( stderr, " bytes: mapping %d, ", bytesOut-nBytes );
1397 }
1398
1399 /*--- Now the selectors. ---*/
1400 nBytes = bytesOut;
1401 bsW ( 3, nGroups );
1402 bsW ( 15, nSelectors );
1403 for (i = 0; i < nSelectors; i++) {
1404 for (j = 0; j < selectorMtf[i]; j++) bsW(1,1);
1405 bsW(1,0);
1406 }
1407 if (verbosity >= 3)
1408 fprintf ( stderr, "selectors %d, ", bytesOut-nBytes );
1409
1410 /*--- Now the coding tables. ---*/
1411 nBytes = bytesOut;
1412
1413 for (t = 0; t < nGroups; t++) {
1414 Int32 curr = len[t][0];
1415 bsW ( 5, curr );
1416 for (i = 0; i < alphaSize; i++) {
1417 while (curr < len[t][i]) { bsW(2,2); curr++; /* 10 */ };
1418 while (curr > len[t][i]) { bsW(2,3); curr--; /* 11 */ };
1419 bsW ( 1, 0 );
1420 }
1421 }
1422
1423 if (verbosity >= 3)
1424 fprintf ( stderr, "code lengths %d, ", bytesOut-nBytes );
1425 409
1426 /*--- And finally, the block data proper ---*/
1427 nBytes = bytesOut;
1428 selCtr = 0;
1429 gs = 0;
1430 while (True) { 410 while (True) {
1431 if (gs >= nMTF) break;
1432 ge = gs + G_SIZE - 1;
1433 if (ge >= nMTF) ge = nMTF-1;
1434 for (i = gs; i <= ge; i++) {
1435 #if DEBUG
1436 assert (selector[selCtr] < nGroups);
1437 #endif
1438 bsW ( len [selector[selCtr]] [szptr[i]],
1439 code [selector[selCtr]] [szptr[i]] );
1440 }
1441 411
1442 gs = ge+1; 412 bzf = bzReadOpen (
1443 selCtr++; 413 &bzerr, zStream, verbosity,
1444 } 414 (int)smallMode, unused, nUnused
1445 if (!(selCtr == nSelectors)) panic ( "sendMTFValues(5)" ); 415 );
1446 416 if (bzf == NULL || bzerr != BZ_OK) goto errhandler;
1447 if (verbosity >= 3) 417 streamNo++;
1448 fprintf ( stderr, "codes %d\n", bytesOut-nBytes ); 418
1449} 419 while (bzerr == BZ_OK) {
1450 420 nread = bzRead ( &bzerr, bzf, obuf, 5000 );
1451 421 if (bzerr == BZ_DATA_ERROR_MAGIC) goto errhandler;
1452/*---------------------------------------------*/ 422 if ((bzerr == BZ_OK || bzerr == BZ_STREAM_END) && nread > 0)
1453void moveToFrontCodeAndSend ( void ) 423 fwrite ( obuf, sizeof(UChar), nread, stream );
1454{ 424 if (ferror(stream)) goto errhandler_io;
1455 bsPutIntVS ( 24, origPtr );
1456 generateMTFValues();
1457 sendMTFValues();
1458}
1459
1460
1461/*---------------------------------------------*/
1462void recvDecodingTables ( void )
1463{
1464 Int32 i, j, t, nGroups, nSelectors, alphaSize;
1465 Int32 minLen, maxLen;
1466 Bool inUse16[16];
1467
1468 /*--- Receive the mapping table ---*/
1469 for (i = 0; i < 16; i++)
1470 if (bsR(1) == 1)
1471 inUse16[i] = True; else
1472 inUse16[i] = False;
1473
1474 for (i = 0; i < 256; i++) inUse[i] = False;
1475
1476 for (i = 0; i < 16; i++)
1477 if (inUse16[i])
1478 for (j = 0; j < 16; j++)
1479 if (bsR(1) == 1) inUse[i * 16 + j] = True;
1480
1481 makeMaps();
1482 alphaSize = nInUse+2;
1483
1484 /*--- Now the selectors ---*/
1485 nGroups = bsR ( 3 );
1486 nSelectors = bsR ( 15 );
1487 for (i = 0; i < nSelectors; i++) {
1488 j = 0;
1489 while (bsR(1) == 1) j++;
1490 selectorMtf[i] = j;
1491 }
1492
1493 /*--- Undo the MTF values for the selectors. ---*/
1494 {
1495 UChar pos[N_GROUPS], tmp, v;
1496 for (v = 0; v < nGroups; v++) pos[v] = v;
1497
1498 for (i = 0; i < nSelectors; i++) {
1499 v = selectorMtf[i];
1500 tmp = pos[v];
1501 while (v > 0) { pos[v] = pos[v-1]; v--; }
1502 pos[0] = tmp;
1503 selector[i] = tmp;
1504 } 425 }
1505 } 426 if (bzerr != BZ_STREAM_END) goto errhandler;
1506
1507 /*--- Now the coding tables ---*/
1508 for (t = 0; t < nGroups; t++) {
1509 Int32 curr = bsR ( 5 );
1510 for (i = 0; i < alphaSize; i++) {
1511 while (bsR(1) == 1) {
1512 if (bsR(1) == 0) curr++; else curr--;
1513 }
1514 len[t][i] = curr;
1515 }
1516 }
1517
1518 /*--- Create the Huffman decoding tables ---*/
1519 for (t = 0; t < nGroups; t++) {
1520 minLen = 32;
1521 maxLen = 0;
1522 for (i = 0; i < alphaSize; i++) {
1523 if (len[t][i] > maxLen) maxLen = len[t][i];
1524 if (len[t][i] < minLen) minLen = len[t][i];
1525 }
1526 hbCreateDecodeTables (
1527 &limit[t][0], &base[t][0], &perm[t][0], &len[t][0],
1528 minLen, maxLen, alphaSize
1529 );
1530 minLens[t] = minLen;
1531 }
1532}
1533
1534
1535/*---------------------------------------------*/
1536#define GET_MTF_VAL(lval) \
1537{ \
1538 Int32 zt, zn, zvec, zj; \
1539 if (groupPos == 0) { \
1540 groupNo++; \
1541 groupPos = G_SIZE; \
1542 } \
1543 groupPos--; \
1544 zt = selector[groupNo]; \
1545 zn = minLens[zt]; \
1546 zvec = bsR ( zn ); \
1547 while (zvec > limit[zt][zn]) { \
1548 zn++; bsR1(zj); \
1549 zvec = (zvec << 1) | zj; \
1550 }; \
1551 lval = perm[zt][zvec - base[zt][zn]]; \
1552}
1553
1554
1555/*---------------------------------------------*/
1556void getAndMoveToFrontDecode ( void )
1557{
1558 UChar yy[256];
1559 Int32 i, j, nextSym, limitLast;
1560 Int32 EOB, groupNo, groupPos;
1561
1562 limitLast = 100000 * blockSize100k;
1563 origPtr = bsGetIntVS ( 24 );
1564
1565 recvDecodingTables();
1566 EOB = nInUse+1;
1567 groupNo = -1;
1568 groupPos = 0;
1569
1570 /*--
1571 Setting up the unzftab entries here is not strictly
1572 necessary, but it does save having to do it later
1573 in a separate pass, and so saves a block's worth of
1574 cache misses.
1575 --*/
1576 for (i = 0; i <= 255; i++) unzftab[i] = 0;
1577
1578 for (i = 0; i <= 255; i++) yy[i] = (UChar) i;
1579
1580 last = -1;
1581 427
1582 GET_MTF_VAL(nextSym); 428 bzReadGetUnused ( &bzerr, bzf, (void**)(&unusedTmp), &nUnused );
429 if (bzerr != BZ_OK) panic ( "decompress:bzReadGetUnused" );
1583 430
1584 while (True) { 431 for (i = 0; i < nUnused; i++) unused[i] = unusedTmp[i];
1585
1586 if (nextSym == EOB) break;
1587
1588 if (nextSym == RUNA || nextSym == RUNB) {
1589 UChar ch;
1590 Int32 s = -1;
1591 Int32 N = 1;
1592 do {
1593 if (nextSym == RUNA) s = s + (0+1) * N; else
1594 if (nextSym == RUNB) s = s + (1+1) * N;
1595 N = N * 2;
1596 GET_MTF_VAL(nextSym);
1597 }
1598 while (nextSym == RUNA || nextSym == RUNB);
1599 432
1600 s++; 433 bzReadClose ( &bzerr, bzf );
1601 ch = seqToUnseq[yy[0]]; 434 if (bzerr != BZ_OK) panic ( "decompress:bzReadGetUnused" );
1602 unzftab[ch] += s;
1603 435
1604 if (smallMode) 436 if (nUnused == 0 && myfeof(zStream)) break;
1605 while (s > 0) {
1606 last++;
1607 ll16[last] = ch;
1608 s--;
1609 }
1610 else
1611 while (s > 0) {
1612 last++;
1613 ll8[last] = ch;
1614 s--;
1615 };
1616
1617 if (last >= limitLast) blockOverrun();
1618 continue;
1619
1620 } else {
1621
1622 UChar tmp;
1623 last++; if (last >= limitLast) blockOverrun();
1624
1625 tmp = yy[nextSym-1];
1626 unzftab[seqToUnseq[tmp]]++;
1627 if (smallMode)
1628 ll16[last] = seqToUnseq[tmp]; else
1629 ll8[last] = seqToUnseq[tmp];
1630
1631 /*--
1632 This loop is hammered during decompression,
1633 hence the unrolling.
1634
1635 for (j = nextSym-1; j > 0; j--) yy[j] = yy[j-1];
1636 --*/
1637
1638 j = nextSym-1;
1639 for (; j > 3; j -= 4) {
1640 yy[j] = yy[j-1];
1641 yy[j-1] = yy[j-2];
1642 yy[j-2] = yy[j-3];
1643 yy[j-3] = yy[j-4];
1644 }
1645 for (; j > 0; j--) yy[j] = yy[j-1];
1646 437
1647 yy[0] = tmp;
1648 GET_MTF_VAL(nextSym);
1649 continue;
1650 }
1651 } 438 }
1652}
1653
1654
1655/*---------------------------------------------------*/
1656/*--- Block-sorting machinery ---*/
1657/*---------------------------------------------------*/
1658 439
1659/*---------------------------------------------*/ 440 if (ferror(zStream)) goto errhandler_io;
1660/*-- 441 ret = fclose ( zStream );
1661 Compare two strings in block. We assume (see 442 if (ret == EOF) goto errhandler_io;
1662 discussion above) that i1 and i2 have a max
1663 offset of 10 on entry, and that the first
1664 bytes of both block and quadrant have been
1665 copied into the "overshoot area", ie
1666 into the subscript range
1667 [last+1 .. last+NUM_OVERSHOOT_BYTES].
1668--*/
1669INLINE Bool fullGtU ( Int32 i1, Int32 i2 )
1670{
1671 Int32 k;
1672 UChar c1, c2;
1673 UInt16 s1, s2;
1674
1675 #if DEBUG
1676 /*--
1677 shellsort shouldn't ask to compare
1678 something with itself.
1679 --*/
1680 assert (i1 != i2);
1681 #endif
1682
1683 c1 = block[i1];
1684 c2 = block[i2];
1685 if (c1 != c2) return (c1 > c2);
1686 i1++; i2++;
1687
1688 c1 = block[i1];
1689 c2 = block[i2];
1690 if (c1 != c2) return (c1 > c2);
1691 i1++; i2++;
1692
1693 c1 = block[i1];
1694 c2 = block[i2];
1695 if (c1 != c2) return (c1 > c2);
1696 i1++; i2++;
1697
1698 c1 = block[i1];
1699 c2 = block[i2];
1700 if (c1 != c2) return (c1 > c2);
1701 i1++; i2++;
1702
1703 c1 = block[i1];
1704 c2 = block[i2];
1705 if (c1 != c2) return (c1 > c2);
1706 i1++; i2++;
1707
1708 c1 = block[i1];
1709 c2 = block[i2];
1710 if (c1 != c2) return (c1 > c2);
1711 i1++; i2++;
1712
1713 k = last + 1;
1714
1715 do {
1716
1717 c1 = block[i1];
1718 c2 = block[i2];
1719 if (c1 != c2) return (c1 > c2);
1720 s1 = quadrant[i1];
1721 s2 = quadrant[i2];
1722 if (s1 != s2) return (s1 > s2);
1723 i1++; i2++;
1724
1725 c1 = block[i1];
1726 c2 = block[i2];
1727 if (c1 != c2) return (c1 > c2);
1728 s1 = quadrant[i1];
1729 s2 = quadrant[i2];
1730 if (s1 != s2) return (s1 > s2);
1731 i1++; i2++;
1732
1733 c1 = block[i1];
1734 c2 = block[i2];
1735 if (c1 != c2) return (c1 > c2);
1736 s1 = quadrant[i1];
1737 s2 = quadrant[i2];
1738 if (s1 != s2) return (s1 > s2);
1739 i1++; i2++;
1740
1741 c1 = block[i1];
1742 c2 = block[i2];
1743 if (c1 != c2) return (c1 > c2);
1744 s1 = quadrant[i1];
1745 s2 = quadrant[i2];
1746 if (s1 != s2) return (s1 > s2);
1747 i1++; i2++;
1748
1749 if (i1 > last) { i1 -= last; i1--; };
1750 if (i2 > last) { i2 -= last; i2--; };
1751
1752 k -= 4;
1753 workDone++;
1754 }
1755 while (k >= 0);
1756 443
1757 return False; 444 if (ferror(stream)) goto errhandler_io;
1758} 445 ret = fflush ( stream );
1759 446 if (ret != 0) goto errhandler_io;
1760/*---------------------------------------------*/ 447 if (stream != stdout) {
1761/*-- 448 ret = fclose ( stream );
1762 Knuth's increments seem to work better 449 if (ret == EOF) goto errhandler_io;
1763 than Incerpi-Sedgewick here. Possibly
1764 because the number of elems to sort is
1765 usually small, typically <= 20.
1766--*/
1767Int32 incs[14] = { 1, 4, 13, 40, 121, 364, 1093, 3280,
1768 9841, 29524, 88573, 265720,
1769 797161, 2391484 };
1770
1771void simpleSort ( Int32 lo, Int32 hi, Int32 d )
1772{
1773 Int32 i, j, h, bigN, hp;
1774 Int32 v;
1775
1776 bigN = hi - lo + 1;
1777 if (bigN < 2) return;
1778
1779 hp = 0;
1780 while (incs[hp] < bigN) hp++;
1781 hp--;
1782
1783 for (; hp >= 0; hp--) {
1784 h = incs[hp];
1785 if (verbosity >= 5)
1786 fprintf ( stderr, " shell increment %d\n", h );
1787
1788 i = lo + h;
1789 while (True) {
1790
1791 /*-- copy 1 --*/
1792 if (i > hi) break;
1793 v = zptr[i];
1794 j = i;
1795 while ( fullGtU ( zptr[j-h]+d, v+d ) ) {
1796 zptr[j] = zptr[j-h];
1797 j = j - h;
1798 if (j <= (lo + h - 1)) break;
1799 }
1800 zptr[j] = v;
1801 i++;
1802
1803 /*-- copy 2 --*/
1804 if (i > hi) break;
1805 v = zptr[i];
1806 j = i;
1807 while ( fullGtU ( zptr[j-h]+d, v+d ) ) {
1808 zptr[j] = zptr[j-h];
1809 j = j - h;
1810 if (j <= (lo + h - 1)) break;
1811 }
1812 zptr[j] = v;
1813 i++;
1814
1815 /*-- copy 3 --*/
1816 if (i > hi) break;
1817 v = zptr[i];
1818 j = i;
1819 while ( fullGtU ( zptr[j-h]+d, v+d ) ) {
1820 zptr[j] = zptr[j-h];
1821 j = j - h;
1822 if (j <= (lo + h - 1)) break;
1823 }
1824 zptr[j] = v;
1825 i++;
1826
1827 if (workDone > workLimit && firstAttempt) return;
1828 }
1829 }
1830}
1831
1832
1833/*---------------------------------------------*/
1834/*--
1835 The following is an implementation of
1836 an elegant 3-way quicksort for strings,
1837 described in a paper "Fast Algorithms for
1838 Sorting and Searching Strings", by Robert
1839 Sedgewick and Jon L. Bentley.
1840--*/
1841
1842#define swap(lv1, lv2) \
1843 { Int32 tmp = lv1; lv1 = lv2; lv2 = tmp; }
1844
1845INLINE void vswap ( Int32 p1, Int32 p2, Int32 n )
1846{
1847 while (n > 0) {
1848 swap(zptr[p1], zptr[p2]);
1849 p1++; p2++; n--;
1850 }
1851}
1852
1853INLINE UChar med3 ( UChar a, UChar b, UChar c )
1854{
1855 UChar t;
1856 if (a > b) { t = a; a = b; b = t; };
1857 if (b > c) { t = b; b = c; c = t; };
1858 if (a > b) b = a;
1859 return b;
1860}
1861
1862
1863#define min(a,b) ((a) < (b)) ? (a) : (b)
1864
1865typedef
1866 struct { Int32 ll; Int32 hh; Int32 dd; }
1867 StackElem;
1868
1869#define push(lz,hz,dz) { stack[sp].ll = lz; \
1870 stack[sp].hh = hz; \
1871 stack[sp].dd = dz; \
1872 sp++; }
1873
1874#define pop(lz,hz,dz) { sp--; \
1875 lz = stack[sp].ll; \
1876 hz = stack[sp].hh; \
1877 dz = stack[sp].dd; }
1878
1879#define SMALL_THRESH 20
1880#define DEPTH_THRESH 10
1881
1882/*--
1883 If you are ever unlucky/improbable enough
1884 to get a stack overflow whilst sorting,
1885 increase the following constant and try
1886 again. In practice I have never seen the
1887 stack go above 27 elems, so the following
1888 limit seems very generous.
1889--*/
1890#define QSORT_STACK_SIZE 1000
1891
1892
1893void qSort3 ( Int32 loSt, Int32 hiSt, Int32 dSt )
1894{
1895 Int32 unLo, unHi, ltLo, gtHi, med, n, m;
1896 Int32 sp, lo, hi, d;
1897 StackElem stack[QSORT_STACK_SIZE];
1898
1899 sp = 0;
1900 push ( loSt, hiSt, dSt );
1901
1902 while (sp > 0) {
1903
1904 if (sp >= QSORT_STACK_SIZE) panic ( "stack overflow in qSort3" );
1905
1906 pop ( lo, hi, d );
1907
1908 if (hi - lo < SMALL_THRESH || d > DEPTH_THRESH) {
1909 simpleSort ( lo, hi, d );
1910 if (workDone > workLimit && firstAttempt) return;
1911 continue;
1912 }
1913
1914 med = med3 ( block[zptr[ lo ]+d],
1915 block[zptr[ hi ]+d],
1916 block[zptr[ (lo+hi)>>1 ]+d] );
1917
1918 unLo = ltLo = lo;
1919 unHi = gtHi = hi;
1920
1921 while (True) {
1922 while (True) {
1923 if (unLo > unHi) break;
1924 n = ((Int32)block[zptr[unLo]+d]) - med;
1925 if (n == 0) { swap(zptr[unLo], zptr[ltLo]); ltLo++; unLo++; continue; };
1926 if (n > 0) break;
1927 unLo++;
1928 }
1929 while (True) {
1930 if (unLo > unHi) break;
1931 n = ((Int32)block[zptr[unHi]+d]) - med;
1932 if (n == 0) { swap(zptr[unHi], zptr[gtHi]); gtHi--; unHi--; continue; };
1933 if (n < 0) break;
1934 unHi--;
1935 }
1936 if (unLo > unHi) break;
1937 swap(zptr[unLo], zptr[unHi]); unLo++; unHi--;
1938 }
1939 #if DEBUG
1940 assert (unHi == unLo-1);
1941 #endif
1942
1943 if (gtHi < ltLo) {
1944 push(lo, hi, d+1 );
1945 continue;
1946 }
1947
1948 n = min(ltLo-lo, unLo-ltLo); vswap(lo, unLo-n, n);
1949 m = min(hi-gtHi, gtHi-unHi); vswap(unLo, hi-m+1, m);
1950
1951 n = lo + unLo - ltLo - 1;
1952 m = hi - (gtHi - unHi) + 1;
1953
1954 push ( lo, n, d );
1955 push ( n+1, m-1, d+1 );
1956 push ( m, hi, d );
1957 }
1958}
1959
1960
1961/*---------------------------------------------*/
1962
1963#define BIGFREQ(b) (ftab[((b)+1) << 8] - ftab[(b) << 8])
1964
1965#define SETMASK (1 << 21)
1966#define CLEARMASK (~(SETMASK))
1967
1968void sortIt ( void )
1969{
1970 Int32 i, j, ss, sb;
1971 Int32 runningOrder[256];
1972 Int32 copy[256];
1973 Bool bigDone[256];
1974 UChar c1, c2;
1975 Int32 numQSorted;
1976
1977 /*--
1978 In the various block-sized structures, live data runs
1979 from 0 to last+NUM_OVERSHOOT_BYTES inclusive. First,
1980 set up the overshoot area for block.
1981 --*/
1982
1983 if (verbosity >= 4) fprintf ( stderr, " sort initialise ...\n" );
1984 for (i = 0; i < NUM_OVERSHOOT_BYTES; i++)
1985 block[last+i+1] = block[i % (last+1)];
1986 for (i = 0; i <= last+NUM_OVERSHOOT_BYTES; i++)
1987 quadrant[i] = 0;
1988
1989 block[-1] = block[last];
1990
1991 if (last < 4000) {
1992
1993 /*--
1994 Use simpleSort(), since the full sorting mechanism
1995 has quite a large constant overhead.
1996 --*/
1997 if (verbosity >= 4) fprintf ( stderr, " simpleSort ...\n" );
1998 for (i = 0; i <= last; i++) zptr[i] = i;
1999 firstAttempt = False;
2000 workDone = workLimit = 0;
2001 simpleSort ( 0, last, 0 );
2002 if (verbosity >= 4) fprintf ( stderr, " simpleSort done.\n" );
2003
2004 } else {
2005
2006 numQSorted = 0;
2007 for (i = 0; i <= 255; i++) bigDone[i] = False;
2008
2009 if (verbosity >= 4) fprintf ( stderr, " bucket sorting ...\n" );
2010
2011 for (i = 0; i <= 65536; i++) ftab[i] = 0;
2012
2013 c1 = block[-1];
2014 for (i = 0; i <= last; i++) {
2015 c2 = block[i];
2016 ftab[(c1 << 8) + c2]++;
2017 c1 = c2;
2018 }
2019
2020 for (i = 1; i <= 65536; i++) ftab[i] += ftab[i-1];
2021
2022 c1 = block[0];
2023 for (i = 0; i < last; i++) {
2024 c2 = block[i+1];
2025 j = (c1 << 8) + c2;
2026 c1 = c2;
2027 ftab[j]--;
2028 zptr[ftab[j]] = i;
2029 }
2030 j = (block[last] << 8) + block[0];
2031 ftab[j]--;
2032 zptr[ftab[j]] = last;
2033
2034 /*--
2035 Now ftab contains the first loc of every small bucket.
2036 Calculate the running order, from smallest to largest
2037 big bucket.
2038 --*/
2039
2040 for (i = 0; i <= 255; i++) runningOrder[i] = i;
2041
2042 {
2043 Int32 vv;
2044 Int32 h = 1;
2045 do h = 3 * h + 1; while (h <= 256);
2046 do {
2047 h = h / 3;
2048 for (i = h; i <= 255; i++) {
2049 vv = runningOrder[i];
2050 j = i;
2051 while ( BIGFREQ(runningOrder[j-h]) > BIGFREQ(vv) ) {
2052 runningOrder[j] = runningOrder[j-h];
2053 j = j - h;
2054 if (j <= (h - 1)) goto zero;
2055 }
2056 zero:
2057 runningOrder[j] = vv;
2058 }
2059 } while (h != 1);
2060 }
2061
2062 /*--
2063 The main sorting loop.
2064 --*/
2065
2066 for (i = 0; i <= 255; i++) {
2067
2068 /*--
2069 Process big buckets, starting with the least full.
2070 --*/
2071 ss = runningOrder[i];
2072
2073 /*--
2074 Complete the big bucket [ss] by quicksorting
2075 any unsorted small buckets [ss, j]. Hopefully
2076 previous pointer-scanning phases have already
2077 completed many of the small buckets [ss, j], so
2078 we don't have to sort them at all.
2079 --*/
2080 for (j = 0; j <= 255; j++) {
2081 sb = (ss << 8) + j;
2082 if ( ! (ftab[sb] & SETMASK) ) {
2083 Int32 lo = ftab[sb] & CLEARMASK;
2084 Int32 hi = (ftab[sb+1] & CLEARMASK) - 1;
2085 if (hi > lo) {
2086 if (verbosity >= 4)
2087 fprintf ( stderr,
2088 " qsort [0x%x, 0x%x] done %d this %d\n",
2089 ss, j, numQSorted, hi - lo + 1 );
2090 qSort3 ( lo, hi, 2 );
2091 numQSorted += ( hi - lo + 1 );
2092 if (workDone > workLimit && firstAttempt) return;
2093 }
2094 ftab[sb] |= SETMASK;
2095 }
2096 }
2097
2098 /*--
2099 The ss big bucket is now done. Record this fact,
2100 and update the quadrant descriptors. Remember to
2101 update quadrants in the overshoot area too, if
2102 necessary. The "if (i < 255)" test merely skips
2103 this updating for the last bucket processed, since
2104 updating for the last bucket is pointless.
2105 --*/
2106 bigDone[ss] = True;
2107
2108 if (i < 255) {
2109 Int32 bbStart = ftab[ss << 8] & CLEARMASK;
2110 Int32 bbSize = (ftab[(ss+1) << 8] & CLEARMASK) - bbStart;
2111 Int32 shifts = 0;
2112
2113 while ((bbSize >> shifts) > 65534) shifts++;
2114
2115 for (j = 0; j < bbSize; j++) {
2116 Int32 a2update = zptr[bbStart + j];
2117 UInt16 qVal = (UInt16)(j >> shifts);
2118 quadrant[a2update] = qVal;
2119 if (a2update < NUM_OVERSHOOT_BYTES)
2120 quadrant[a2update + last + 1] = qVal;
2121 }
2122
2123 if (! ( ((bbSize-1) >> shifts) <= 65535 )) panic ( "sortIt" );
2124 }
2125
2126 /*--
2127 Now scan this big bucket so as to synthesise the
2128 sorted order for small buckets [t, ss] for all t != ss.
2129 --*/
2130 for (j = 0; j <= 255; j++)
2131 copy[j] = ftab[(j << 8) + ss] & CLEARMASK;
2132
2133 for (j = ftab[ss << 8] & CLEARMASK;
2134 j < (ftab[(ss+1) << 8] & CLEARMASK);
2135 j++) {
2136 c1 = block[zptr[j]-1];
2137 if ( ! bigDone[c1] ) {
2138 zptr[copy[c1]] = zptr[j] == 0 ? last : zptr[j] - 1;
2139 copy[c1] ++;
2140 }
2141 }
2142
2143 for (j = 0; j <= 255; j++) ftab[(j << 8) + ss] |= SETMASK;
2144 }
2145 if (verbosity >= 4)
2146 fprintf ( stderr, " %d pointers, %d sorted, %d scanned\n",
2147 last+1, numQSorted, (last+1) - numQSorted );
2148 }
2149}
2150
2151
2152/*---------------------------------------------------*/
2153/*--- Stuff for randomising repetitive blocks ---*/
2154/*---------------------------------------------------*/
2155
2156/*---------------------------------------------*/
2157Int32 rNums[512] = {
2158 619, 720, 127, 481, 931, 816, 813, 233, 566, 247,
2159 985, 724, 205, 454, 863, 491, 741, 242, 949, 214,
2160 733, 859, 335, 708, 621, 574, 73, 654, 730, 472,
2161 419, 436, 278, 496, 867, 210, 399, 680, 480, 51,
2162 878, 465, 811, 169, 869, 675, 611, 697, 867, 561,
2163 862, 687, 507, 283, 482, 129, 807, 591, 733, 623,
2164 150, 238, 59, 379, 684, 877, 625, 169, 643, 105,
2165 170, 607, 520, 932, 727, 476, 693, 425, 174, 647,
2166 73, 122, 335, 530, 442, 853, 695, 249, 445, 515,
2167 909, 545, 703, 919, 874, 474, 882, 500, 594, 612,
2168 641, 801, 220, 162, 819, 984, 589, 513, 495, 799,
2169 161, 604, 958, 533, 221, 400, 386, 867, 600, 782,
2170 382, 596, 414, 171, 516, 375, 682, 485, 911, 276,
2171 98, 553, 163, 354, 666, 933, 424, 341, 533, 870,
2172 227, 730, 475, 186, 263, 647, 537, 686, 600, 224,
2173 469, 68, 770, 919, 190, 373, 294, 822, 808, 206,
2174 184, 943, 795, 384, 383, 461, 404, 758, 839, 887,
2175 715, 67, 618, 276, 204, 918, 873, 777, 604, 560,
2176 951, 160, 578, 722, 79, 804, 96, 409, 713, 940,
2177 652, 934, 970, 447, 318, 353, 859, 672, 112, 785,
2178 645, 863, 803, 350, 139, 93, 354, 99, 820, 908,
2179 609, 772, 154, 274, 580, 184, 79, 626, 630, 742,
2180 653, 282, 762, 623, 680, 81, 927, 626, 789, 125,
2181 411, 521, 938, 300, 821, 78, 343, 175, 128, 250,
2182 170, 774, 972, 275, 999, 639, 495, 78, 352, 126,
2183 857, 956, 358, 619, 580, 124, 737, 594, 701, 612,
2184 669, 112, 134, 694, 363, 992, 809, 743, 168, 974,
2185 944, 375, 748, 52, 600, 747, 642, 182, 862, 81,
2186 344, 805, 988, 739, 511, 655, 814, 334, 249, 515,
2187 897, 955, 664, 981, 649, 113, 974, 459, 893, 228,
2188 433, 837, 553, 268, 926, 240, 102, 654, 459, 51,
2189 686, 754, 806, 760, 493, 403, 415, 394, 687, 700,
2190 946, 670, 656, 610, 738, 392, 760, 799, 887, 653,
2191 978, 321, 576, 617, 626, 502, 894, 679, 243, 440,
2192 680, 879, 194, 572, 640, 724, 926, 56, 204, 700,
2193 707, 151, 457, 449, 797, 195, 791, 558, 945, 679,
2194 297, 59, 87, 824, 713, 663, 412, 693, 342, 606,
2195 134, 108, 571, 364, 631, 212, 174, 643, 304, 329,
2196 343, 97, 430, 751, 497, 314, 983, 374, 822, 928,
2197 140, 206, 73, 263, 980, 736, 876, 478, 430, 305,
2198 170, 514, 364, 692, 829, 82, 855, 953, 676, 246,
2199 369, 970, 294, 750, 807, 827, 150, 790, 288, 923,
2200 804, 378, 215, 828, 592, 281, 565, 555, 710, 82,
2201 896, 831, 547, 261, 524, 462, 293, 465, 502, 56,
2202 661, 821, 976, 991, 658, 869, 905, 758, 745, 193,
2203 768, 550, 608, 933, 378, 286, 215, 979, 792, 961,
2204 61, 688, 793, 644, 986, 403, 106, 366, 905, 644,
2205 372, 567, 466, 434, 645, 210, 389, 550, 919, 135,
2206 780, 773, 635, 389, 707, 100, 626, 958, 165, 504,
2207 920, 176, 193, 713, 857, 265, 203, 50, 668, 108,
2208 645, 990, 626, 197, 510, 357, 358, 850, 858, 364,
2209 936, 638
2210};
2211
2212
2213#define RAND_DECLS \
2214 Int32 rNToGo = 0; \
2215 Int32 rTPos = 0; \
2216
2217#define RAND_MASK ((rNToGo == 1) ? 1 : 0)
2218
2219#define RAND_UPD_MASK \
2220 if (rNToGo == 0) { \
2221 rNToGo = rNums[rTPos]; \
2222 rTPos++; if (rTPos == 512) rTPos = 0; \
2223 } \
2224 rNToGo--;
2225
2226
2227
2228/*---------------------------------------------------*/
2229/*--- The Reversible Transformation (tm) ---*/
2230/*---------------------------------------------------*/
2231
2232/*---------------------------------------------*/
2233void randomiseBlock ( void )
2234{
2235 Int32 i;
2236 RAND_DECLS;
2237 for (i = 0; i < 256; i++) inUse[i] = False;
2238
2239 for (i = 0; i <= last; i++) {
2240 RAND_UPD_MASK;
2241 block[i] ^= RAND_MASK;
2242 inUse[block[i]] = True;
2243 }
2244}
2245
2246
2247/*---------------------------------------------*/
2248void doReversibleTransformation ( void )
2249{
2250 Int32 i;
2251
2252 if (verbosity >= 2) fprintf ( stderr, "\n" );
2253
2254 workLimit = workFactor * last;
2255 workDone = 0;
2256 blockRandomised = False;
2257 firstAttempt = True;
2258
2259 sortIt ();
2260
2261 if (verbosity >= 3)
2262 fprintf ( stderr, " %d work, %d block, ratio %5.2f\n",
2263 workDone, last, (float)workDone / (float)(last) );
2264
2265 if (workDone > workLimit && firstAttempt) {
2266 if (verbosity >= 2)
2267 fprintf ( stderr, " sorting aborted; randomising block\n" );
2268 randomiseBlock ();
2269 workLimit = workDone = 0;
2270 blockRandomised = True;
2271 firstAttempt = False;
2272 sortIt();
2273 if (verbosity >= 3)
2274 fprintf ( stderr, " %d work, %d block, ratio %f\n",
2275 workDone, last, (float)workDone / (float)(last) );
2276 }
2277
2278 origPtr = -1;
2279 for (i = 0; i <= last; i++)
2280 if (zptr[i] == 0)
2281 { origPtr = i; break; };
2282
2283 if (origPtr == -1) panic ( "doReversibleTransformation" );
2284}
2285
2286
2287/*---------------------------------------------*/
2288
2289INLINE Int32 indexIntoF ( Int32 indx, Int32 *cftab )
2290{
2291 Int32 nb, na, mid;
2292 nb = 0;
2293 na = 256;
2294 do {
2295 mid = (nb + na) >> 1;
2296 if (indx >= cftab[mid]) nb = mid; else na = mid;
2297 }
2298 while (na - nb != 1);
2299 return nb;
2300}
2301
2302
2303#define GET_SMALL(cccc) \
2304 \
2305 cccc = indexIntoF ( tPos, cftab ); \
2306 tPos = GET_LL(tPos);
2307
2308
2309void undoReversibleTransformation_small ( FILE* dst )
2310{
2311 Int32 cftab[257], cftabAlso[257];
2312 Int32 i, j, tmp, tPos;
2313 UChar ch;
2314
2315 /*--
2316 We assume here that the global array unzftab will
2317 already be holding the frequency counts for
2318 ll8[0 .. last].
2319 --*/
2320
2321 /*-- Set up cftab to facilitate generation of indexIntoF --*/
2322 cftab[0] = 0;
2323 for (i = 1; i <= 256; i++) cftab[i] = unzftab[i-1];
2324 for (i = 1; i <= 256; i++) cftab[i] += cftab[i-1];
2325
2326 /*-- Make a copy of it, used in generation of T --*/
2327 for (i = 0; i <= 256; i++) cftabAlso[i] = cftab[i];
2328
2329 /*-- compute the T vector --*/
2330 for (i = 0; i <= last; i++) {
2331 ch = (UChar)ll16[i];
2332 SET_LL(i, cftabAlso[ch]);
2333 cftabAlso[ch]++;
2334 }
2335
2336 /*--
2337 Compute T^(-1) by pointer reversal on T. This is rather
2338 subtle, in that, if the original block was two or more
2339 (in general, N) concatenated copies of the same thing,
2340 the T vector will consist of N cycles, each of length
2341 blocksize / N, and decoding will involve traversing one
2342 of these cycles N times. Which particular cycle doesn't
2343 matter -- they are all equivalent. The tricky part is to
2344 make sure that the pointer reversal creates a correct
2345 reversed cycle for us to traverse. So, the code below
2346 simply reverses whatever cycle origPtr happens to fall into,
2347 without regard to the cycle length. That gives one reversed
2348 cycle, which for normal blocks, is the entire block-size long.
2349 For repeated blocks, it will be interspersed with the other
2350 N-1 non-reversed cycles. Providing that the F-subscripting
2351 phase which follows starts at origPtr, all then works ok.
2352 --*/
2353 i = origPtr;
2354 j = GET_LL(i);
2355 do {
2356 tmp = GET_LL(j);
2357 SET_LL(j, i);
2358 i = j;
2359 j = tmp;
2360 }
2361 while (i != origPtr);
2362
2363 /*--
2364 We recreate the original by subscripting F through T^(-1).
2365 The run-length-decoder below requires characters incrementally,
2366 so tPos is set to a starting value, and is updated by
2367 the GET_SMALL macro.
2368 --*/
2369 tPos = origPtr;
2370
2371 /*-------------------------------------------------*/
2372 /*--
2373 This is pretty much a verbatim copy of the
2374 run-length decoder present in the distribution
2375 bzip-0.21; it has to be here to avoid creating
2376 block[] as an intermediary structure. As in 0.21,
2377 this code derives from some sent to me by
2378 Christian von Roques.
2379
2380 It allows dst==NULL, so as to support the test (-t)
2381 option without slowing down the fast decompression
2382 code.
2383 --*/
2384 {
2385 IntNative retVal;
2386 Int32 i2, count, chPrev, ch2;
2387 UInt32 localCrc;
2388
2389 count = 0;
2390 i2 = 0;
2391 ch2 = 256; /*-- not a char and not EOF --*/
2392 localCrc = getGlobalCRC();
2393
2394 {
2395 RAND_DECLS;
2396 while ( i2 <= last ) {
2397 chPrev = ch2;
2398 GET_SMALL(ch2);
2399 if (blockRandomised) {
2400 RAND_UPD_MASK;
2401 ch2 ^= (UInt32)RAND_MASK;
2402 }
2403 i2++;
2404
2405 if (dst)
2406 retVal = putc ( ch2, dst );
2407
2408 UPDATE_CRC ( localCrc, (UChar)ch2 );
2409
2410 if (ch2 != chPrev) {
2411 count = 1;
2412 } else {
2413 count++;
2414 if (count >= 4) {
2415 Int32 j2;
2416 UChar z;
2417 GET_SMALL(z);
2418 if (blockRandomised) {
2419 RAND_UPD_MASK;
2420 z ^= RAND_MASK;
2421 }
2422 for (j2 = 0; j2 < (Int32)z; j2++) {
2423 if (dst) retVal = putc (ch2, dst);
2424 UPDATE_CRC ( localCrc, (UChar)ch2 );
2425 }
2426 i2++;
2427 count = 0;
2428 }
2429 }
2430 }
2431 }
2432
2433 setGlobalCRC ( localCrc );
2434 }
2435 /*-- end of the in-line run-length-decoder. --*/
2436}
2437#undef GET_SMALL
2438
2439
2440/*---------------------------------------------*/
2441
2442#define GET_FAST(cccc) \
2443 \
2444 cccc = ll8[tPos]; \
2445 tPos = tt[tPos];
2446
2447
2448void undoReversibleTransformation_fast ( FILE* dst )
2449{
2450 Int32 cftab[257];
2451 Int32 i, tPos;
2452 UChar ch;
2453
2454 /*--
2455 We assume here that the global array unzftab will
2456 already be holding the frequency counts for
2457 ll8[0 .. last].
2458 --*/
2459
2460 /*-- Set up cftab to facilitate generation of T^(-1) --*/
2461 cftab[0] = 0;
2462 for (i = 1; i <= 256; i++) cftab[i] = unzftab[i-1];
2463 for (i = 1; i <= 256; i++) cftab[i] += cftab[i-1];
2464
2465 /*-- compute the T^(-1) vector --*/
2466 for (i = 0; i <= last; i++) {
2467 ch = (UChar)ll8[i];
2468 tt[cftab[ch]] = i;
2469 cftab[ch]++;
2470 } 450 }
451 if (verbosity >= 2) fprintf ( stderr, "\n " );
452 return True;
2471 453
2472 /*-- 454 errhandler:
2473 We recreate the original by subscripting L through T^(-1). 455 bzReadClose ( &bzerr_dummy, bzf );
2474 The run-length-decoder below requires characters incrementally, 456 switch (bzerr) {
2475 so tPos is set to a starting value, and is updated by 457 case BZ_IO_ERROR:
2476 the GET_FAST macro. 458 errhandler_io:
2477 --*/ 459 ioError(); break;
2478 tPos = tt[origPtr]; 460 case BZ_DATA_ERROR:
2479 461 crcError();
2480 /*-------------------------------------------------*/ 462 case BZ_MEM_ERROR:
2481 /*-- 463 outOfMemory();
2482 This is pretty much a verbatim copy of the 464 case BZ_UNEXPECTED_EOF:
2483 run-length decoder present in the distribution 465 compressedStreamEOF();
2484 bzip-0.21; it has to be here to avoid creating 466 case BZ_DATA_ERROR_MAGIC:
2485 block[] as an intermediary structure. As in 0.21, 467 if (streamNo == 1) {
2486 this code derives from some sent to me by 468 return False;
2487 Christian von Roques. 469 } else {
2488 --*/ 470 fprintf ( stderr,
2489 { 471 "\n%s: %s: trailing garbage after EOF ignored\n",
2490 IntNative retVal; 472 progName, inName );
2491 Int32 i2, count, chPrev, ch2; 473 return True;
2492 UInt32 localCrc;
2493
2494 count = 0;
2495 i2 = 0;
2496 ch2 = 256; /*-- not a char and not EOF --*/
2497 localCrc = getGlobalCRC();
2498
2499 if (blockRandomised) {
2500 RAND_DECLS;
2501 while ( i2 <= last ) {
2502 chPrev = ch2;
2503 GET_FAST(ch2);
2504 RAND_UPD_MASK;
2505 ch2 ^= (UInt32)RAND_MASK;
2506 i2++;
2507
2508 retVal = putc ( ch2, dst );
2509 UPDATE_CRC ( localCrc, (UChar)ch2 );
2510
2511 if (ch2 != chPrev) {
2512 count = 1;
2513 } else {
2514 count++;
2515 if (count >= 4) {
2516 Int32 j2;
2517 UChar z;
2518 GET_FAST(z);
2519 RAND_UPD_MASK;
2520 z ^= RAND_MASK;
2521 for (j2 = 0; j2 < (Int32)z; j2++) {
2522 retVal = putc (ch2, dst);
2523 UPDATE_CRC ( localCrc, (UChar)ch2 );
2524 }
2525 i2++;
2526 count = 0;
2527 }
2528 }
2529 }
2530
2531 } else {
2532
2533 while ( i2 <= last ) {
2534 chPrev = ch2;
2535 GET_FAST(ch2);
2536 i2++;
2537
2538 retVal = putc ( ch2, dst );
2539 UPDATE_CRC ( localCrc, (UChar)ch2 );
2540
2541 if (ch2 != chPrev) {
2542 count = 1;
2543 } else {
2544 count++;
2545 if (count >= 4) {
2546 Int32 j2;
2547 UChar z;
2548 GET_FAST(z);
2549 for (j2 = 0; j2 < (Int32)z; j2++) {
2550 retVal = putc (ch2, dst);
2551 UPDATE_CRC ( localCrc, (UChar)ch2 );
2552 }
2553 i2++;
2554 count = 0;
2555 }
2556 }
2557 } 474 }
2558 475 default:
2559 } /*-- if (blockRandomised) --*/ 476 panic ( "decompress:unexpected error" );
2560
2561 setGlobalCRC ( localCrc );
2562 }
2563 /*-- end of the in-line run-length-decoder. --*/
2564}
2565#undef GET_FAST
2566
2567
2568/*---------------------------------------------------*/
2569/*--- The block loader and RLEr ---*/
2570/*---------------------------------------------------*/
2571
2572/*---------------------------------------------*/
2573/* Top 16: run length, 1 to 255.
2574* Lower 16: the char, or MY_EOF for EOF.
2575*/
2576
2577#define MY_EOF 257
2578
2579INLINE Int32 getRLEpair ( FILE* src )
2580{
2581 Int32 runLength;
2582 IntNative ch, chLatest;
2583
2584 ch = getc ( src );
2585
2586 /*--- Because I have no idea what kind of a value EOF is. ---*/
2587 if (ch == EOF) {
2588 ERROR_IF_NOT_ZERO ( ferror(src));
2589 return (1 << 16) | MY_EOF;
2590 }
2591
2592 runLength = 0;
2593 do {
2594 chLatest = getc ( src );
2595 runLength++;
2596 bytesIn++;
2597 }
2598 while (ch == chLatest && runLength < 255);
2599
2600 if ( chLatest != EOF ) {
2601 if ( ungetc ( chLatest, src ) == EOF )
2602 panic ( "getRLEpair: ungetc failed" );
2603 } else {
2604 ERROR_IF_NOT_ZERO ( ferror(src) );
2605 }
2606
2607 /*--- Conditional is just a speedup hack. ---*/
2608 if (runLength == 1) {
2609 UPDATE_CRC ( globalCrc, (UChar)ch );
2610 return (1 << 16) | ch;
2611 } else {
2612 Int32 i;
2613 for (i = 1; i <= runLength; i++)
2614 UPDATE_CRC ( globalCrc, (UChar)ch );
2615 return (runLength << 16) | ch;
2616 } 477 }
2617}
2618 478
2619 479 panic ( "decompress:end" );
2620/*---------------------------------------------*/ 480 return True; /*notreached*/
2621void loadAndRLEsource ( FILE* src )
2622{
2623 Int32 ch, allowableBlockSize, i;
2624
2625 last = -1;
2626 ch = 0;
2627
2628 for (i = 0; i < 256; i++) inUse[i] = False;
2629
2630 /*--- 20 is just a paranoia constant ---*/
2631 allowableBlockSize = 100000 * blockSize100k - 20;
2632
2633 while (last < allowableBlockSize && ch != MY_EOF) {
2634 Int32 rlePair, runLen;
2635 rlePair = getRLEpair ( src );
2636 ch = rlePair & 0xFFFF;
2637 runLen = (UInt32)rlePair >> 16;
2638
2639 #if DEBUG
2640 assert (runLen >= 1 && runLen <= 255);
2641 #endif
2642
2643 if (ch != MY_EOF) {
2644 inUse[ch] = True;
2645 switch (runLen) {
2646 case 1:
2647 last++; block[last] = (UChar)ch; break;
2648 case 2:
2649 last++; block[last] = (UChar)ch;
2650 last++; block[last] = (UChar)ch; break;
2651 case 3:
2652 last++; block[last] = (UChar)ch;
2653 last++; block[last] = (UChar)ch;
2654 last++; block[last] = (UChar)ch; break;
2655 default:
2656 inUse[runLen-4] = True;
2657 last++; block[last] = (UChar)ch;
2658 last++; block[last] = (UChar)ch;
2659 last++; block[last] = (UChar)ch;
2660 last++; block[last] = (UChar)ch;
2661 last++; block[last] = (UChar)(runLen-4); break;
2662 }
2663 }
2664 }
2665} 481}
2666 482
2667 483
2668/*---------------------------------------------------*/
2669/*--- Processing of complete files and streams ---*/
2670/*---------------------------------------------------*/
2671
2672/*---------------------------------------------*/ 484/*---------------------------------------------*/
2673void compressStream ( FILE *stream, FILE *zStream ) 485Bool testStream ( FILE *zStream )
2674{ 486{
2675 IntNative retVal; 487 BZFILE* bzf = NULL;
2676 UInt32 blockCRC, combinedCRC; 488 Int32 bzerr, bzerr_dummy, ret, nread, streamNo, i;
2677 Int32 blockNo; 489 UChar obuf[5000];
490 UChar unused[BZ_MAX_UNUSED];
491 Int32 nUnused;
492 UChar* unusedTmp;
2678 493
2679 blockNo = 0; 494 nUnused = 0;
2680 bytesIn = 0; 495 streamNo = 0;
2681 bytesOut = 0;
2682 nBlocksRandomised = 0;
2683 496
2684 SET_BINARY_MODE(stream);
2685 SET_BINARY_MODE(zStream); 497 SET_BINARY_MODE(zStream);
2686 498 if (ferror(zStream)) goto errhandler_io;
2687 ERROR_IF_NOT_ZERO ( ferror(stream) );
2688 ERROR_IF_NOT_ZERO ( ferror(zStream) );
2689
2690 bsSetStream ( zStream, True );
2691
2692 /*--- Write `magic' bytes B and Z,
2693 then h indicating file-format == huffmanised,
2694 followed by a digit indicating blockSize100k.
2695 ---*/
2696 bsPutUChar ( 'B' );
2697 bsPutUChar ( 'Z' );
2698 bsPutUChar ( 'h' );
2699 bsPutUChar ( '0' + blockSize100k );
2700
2701 combinedCRC = 0;
2702
2703 if (verbosity >= 2) fprintf ( stderr, "\n" );
2704 499
2705 while (True) { 500 while (True) {
2706 501
2707 blockNo++; 502 bzf = bzReadOpen (
2708 initialiseCRC (); 503 &bzerr, zStream, verbosity,
2709 loadAndRLEsource ( stream ); 504 (int)smallMode, unused, nUnused
2710 ERROR_IF_NOT_ZERO ( ferror(stream) ); 505 );
2711 if (last == -1) break; 506 if (bzf == NULL || bzerr != BZ_OK) goto errhandler;
2712 507 streamNo++;
2713 blockCRC = getFinalCRC ();
2714 combinedCRC = (combinedCRC << 1) | (combinedCRC >> 31);
2715 combinedCRC ^= blockCRC;
2716
2717 if (verbosity >= 2)
2718 fprintf ( stderr, " block %d: crc = 0x%8x, combined CRC = 0x%8x, size = %d",
2719 blockNo, blockCRC, combinedCRC, last+1 );
2720
2721 /*-- sort the block and establish posn of original string --*/
2722 doReversibleTransformation ();
2723
2724 /*--
2725 A 6-byte block header, the value chosen arbitrarily
2726 as 0x314159265359 :-). A 32 bit value does not really
2727 give a strong enough guarantee that the value will not
2728 appear by chance in the compressed datastream. Worst-case
2729 probability of this event, for a 900k block, is about
2730 2.0e-3 for 32 bits, 1.0e-5 for 40 bits and 4.0e-8 for 48 bits.
2731 For a compressed file of size 100Gb -- about 100000 blocks --
2732 only a 48-bit marker will do. NB: normal compression/
2733 decompression do *not* rely on these statistical properties.
2734 They are only important when trying to recover blocks from
2735 damaged files.
2736 --*/
2737 bsPutUChar ( 0x31 ); bsPutUChar ( 0x41 );
2738 bsPutUChar ( 0x59 ); bsPutUChar ( 0x26 );
2739 bsPutUChar ( 0x53 ); bsPutUChar ( 0x59 );
2740
2741 /*-- Now the block's CRC, so it is in a known place. --*/
2742 bsPutUInt32 ( blockCRC );
2743
2744 /*-- Now a single bit indicating randomisation. --*/
2745 if (blockRandomised) {
2746 bsW(1,1); nBlocksRandomised++;
2747 } else
2748 bsW(1,0);
2749
2750 /*-- Finally, block's contents proper. --*/
2751 moveToFrontCodeAndSend ();
2752
2753 ERROR_IF_NOT_ZERO ( ferror(zStream) );
2754 }
2755
2756 if (verbosity >= 2 && nBlocksRandomised > 0)
2757 fprintf ( stderr, " %d block%s needed randomisation\n",
2758 nBlocksRandomised,
2759 nBlocksRandomised == 1 ? "" : "s" );
2760
2761 /*--
2762 Now another magic 48-bit number, 0x177245385090, to
2763 indicate the end of the last block. (sqrt(pi), if
2764 you want to know. I did want to use e, but it contains
2765 too much repetition -- 27 18 28 18 28 46 -- for me
2766 to feel statistically comfortable. Call me paranoid.)
2767 --*/
2768
2769 bsPutUChar ( 0x17 ); bsPutUChar ( 0x72 );
2770 bsPutUChar ( 0x45 ); bsPutUChar ( 0x38 );
2771 bsPutUChar ( 0x50 ); bsPutUChar ( 0x90 );
2772
2773 bsPutUInt32 ( combinedCRC );
2774 if (verbosity >= 2)
2775 fprintf ( stderr, " final combined CRC = 0x%x\n ", combinedCRC );
2776 508
2777 /*-- Close the files in an utterly paranoid way. --*/ 509 while (bzerr == BZ_OK) {
2778 bsFinishedWithStream (); 510 nread = bzRead ( &bzerr, bzf, obuf, 5000 );
2779 511 if (bzerr == BZ_DATA_ERROR_MAGIC) goto errhandler;
2780 ERROR_IF_NOT_ZERO ( ferror(zStream) ); 512 }
2781 retVal = fflush ( zStream ); 513 if (bzerr != BZ_STREAM_END) goto errhandler;
2782 ERROR_IF_EOF ( retVal );
2783 retVal = fclose ( zStream );
2784 ERROR_IF_EOF ( retVal );
2785
2786 ERROR_IF_NOT_ZERO ( ferror(stream) );
2787 retVal = fclose ( stream );
2788 ERROR_IF_EOF ( retVal );
2789
2790 if (bytesIn == 0) bytesIn = 1;
2791 if (bytesOut == 0) bytesOut = 1;
2792 514
2793 if (verbosity >= 1) 515 bzReadGetUnused ( &bzerr, bzf, (void**)(&unusedTmp), &nUnused );
2794 fprintf ( stderr, "%6.3f:1, %6.3f bits/byte, " 516 if (bzerr != BZ_OK) panic ( "test:bzReadGetUnused" );
2795 "%5.2f%% saved, %d in, %d out.\n",
2796 (float)bytesIn / (float)bytesOut,
2797 (8.0 * (float)bytesOut) / (float)bytesIn,
2798 100.0 * (1.0 - (float)bytesOut / (float)bytesIn),
2799 bytesIn,
2800 bytesOut
2801 );
2802}
2803 517
518 for (i = 0; i < nUnused; i++) unused[i] = unusedTmp[i];
2804 519
2805/*---------------------------------------------*/ 520 bzReadClose ( &bzerr, bzf );
2806Bool uncompressStream ( FILE *zStream, FILE *stream ) 521 if (bzerr != BZ_OK) panic ( "test:bzReadGetUnused" );
2807{ 522 if (nUnused == 0 && myfeof(zStream)) break;
2808 UChar magic1, magic2, magic3, magic4;
2809 UChar magic5, magic6;
2810 UInt32 storedBlockCRC, storedCombinedCRC;
2811 UInt32 computedBlockCRC, computedCombinedCRC;
2812 Int32 currBlockNo;
2813 IntNative retVal;
2814 523
2815 SET_BINARY_MODE(stream);
2816 SET_BINARY_MODE(zStream);
2817
2818 ERROR_IF_NOT_ZERO ( ferror(stream) );
2819 ERROR_IF_NOT_ZERO ( ferror(zStream) );
2820
2821 bsSetStream ( zStream, False );
2822
2823 /*--
2824 A bad magic number is `recoverable from';
2825 return with False so the caller skips the file.
2826 --*/
2827 magic1 = bsGetUChar ();
2828 magic2 = bsGetUChar ();
2829 magic3 = bsGetUChar ();
2830 magic4 = bsGetUChar ();
2831 if (magic1 != 'B' ||
2832 magic2 != 'Z' ||
2833 magic3 != 'h' ||
2834 magic4 < '1' ||
2835 magic4 > '9') {
2836 bsFinishedWithStream();
2837 retVal = fclose ( stream );
2838 ERROR_IF_EOF ( retVal );
2839 return False;
2840 } 524 }
2841 525
2842 setDecompressStructureSizes ( magic4 - '0' ); 526 if (ferror(zStream)) goto errhandler_io;
2843 computedCombinedCRC = 0; 527 ret = fclose ( zStream );
2844 528 if (ret == EOF) goto errhandler_io;
2845 if (verbosity >= 2) fprintf ( stderr, "\n " );
2846 currBlockNo = 0;
2847
2848 while (True) {
2849 magic1 = bsGetUChar ();
2850 magic2 = bsGetUChar ();
2851 magic3 = bsGetUChar ();
2852 magic4 = bsGetUChar ();
2853 magic5 = bsGetUChar ();
2854 magic6 = bsGetUChar ();
2855 if (magic1 == 0x17 && magic2 == 0x72 &&
2856 magic3 == 0x45 && magic4 == 0x38 &&
2857 magic5 == 0x50 && magic6 == 0x90) break;
2858
2859 if (magic1 != 0x31 || magic2 != 0x41 ||
2860 magic3 != 0x59 || magic4 != 0x26 ||
2861 magic5 != 0x53 || magic6 != 0x59) badBlockHeader();
2862
2863 storedBlockCRC = bsGetUInt32 ();
2864
2865 if (bsR(1) == 1)
2866 blockRandomised = True; else
2867 blockRandomised = False;
2868
2869 currBlockNo++;
2870 if (verbosity >= 2)
2871 fprintf ( stderr, "[%d: huff+mtf ", currBlockNo );
2872 getAndMoveToFrontDecode ();
2873 ERROR_IF_NOT_ZERO ( ferror(zStream) );
2874
2875 initialiseCRC();
2876 if (verbosity >= 2) fprintf ( stderr, "rt+rld" );
2877 if (smallMode)
2878 undoReversibleTransformation_small ( stream );
2879 else
2880 undoReversibleTransformation_fast ( stream );
2881
2882 ERROR_IF_NOT_ZERO ( ferror(stream) );
2883
2884 computedBlockCRC = getFinalCRC();
2885 if (verbosity >= 3)
2886 fprintf ( stderr, " {0x%x, 0x%x}", storedBlockCRC, computedBlockCRC );
2887 if (verbosity >= 2) fprintf ( stderr, "] " );
2888
2889 /*-- A bad CRC is considered a fatal error. --*/
2890 if (storedBlockCRC != computedBlockCRC)
2891 crcError ( storedBlockCRC, computedBlockCRC );
2892
2893 computedCombinedCRC = (computedCombinedCRC << 1) | (computedCombinedCRC >> 31);
2894 computedCombinedCRC ^= computedBlockCRC;
2895 };
2896 529
2897 if (verbosity >= 2) fprintf ( stderr, "\n " ); 530 if (verbosity >= 2) fprintf ( stderr, "\n " );
2898
2899 storedCombinedCRC = bsGetUInt32 ();
2900 if (verbosity >= 2)
2901 fprintf ( stderr,
2902 "combined CRCs: stored = 0x%x, computed = 0x%x\n ",
2903 storedCombinedCRC, computedCombinedCRC );
2904 if (storedCombinedCRC != computedCombinedCRC)
2905 crcError ( storedCombinedCRC, computedCombinedCRC );
2906
2907
2908 bsFinishedWithStream ();
2909 ERROR_IF_NOT_ZERO ( ferror(zStream) );
2910 retVal = fclose ( zStream );
2911 ERROR_IF_EOF ( retVal );
2912
2913 ERROR_IF_NOT_ZERO ( ferror(stream) );
2914 retVal = fflush ( stream );
2915 ERROR_IF_NOT_ZERO ( retVal );
2916 if (stream != stdout) {
2917 retVal = fclose ( stream );
2918 ERROR_IF_EOF ( retVal );
2919 }
2920 return True; 531 return True;
2921}
2922
2923
2924/*---------------------------------------------*/
2925Bool testStream ( FILE *zStream )
2926{
2927 UChar magic1, magic2, magic3, magic4;
2928 UChar magic5, magic6;
2929 UInt32 storedBlockCRC, storedCombinedCRC;
2930 UInt32 computedBlockCRC, computedCombinedCRC;
2931 Int32 currBlockNo;
2932 IntNative retVal;
2933
2934 SET_BINARY_MODE(zStream);
2935 ERROR_IF_NOT_ZERO ( ferror(zStream) );
2936
2937 bsSetStream ( zStream, False );
2938
2939 magic1 = bsGetUChar ();
2940 magic2 = bsGetUChar ();
2941 magic3 = bsGetUChar ();
2942 magic4 = bsGetUChar ();
2943 if (magic1 != 'B' ||
2944 magic2 != 'Z' ||
2945 magic3 != 'h' ||
2946 magic4 < '1' ||
2947 magic4 > '9') {
2948 bsFinishedWithStream();
2949 fclose ( zStream );
2950 fprintf ( stderr, "\n%s: bad magic number (ie, not created by bzip2)\n",
2951 inName );
2952 return False;
2953 }
2954 532
2955 smallMode = True; 533 errhandler:
2956 setDecompressStructureSizes ( magic4 - '0' ); 534 bzReadClose ( &bzerr_dummy, bzf );
2957 computedCombinedCRC = 0; 535 switch (bzerr) {
2958 536 case BZ_IO_ERROR:
2959 if (verbosity >= 2) fprintf ( stderr, "\n" ); 537 errhandler_io:
2960 currBlockNo = 0; 538 ioError(); break;
2961 539 case BZ_DATA_ERROR:
2962 while (True) {
2963 magic1 = bsGetUChar ();
2964 magic2 = bsGetUChar ();
2965 magic3 = bsGetUChar ();
2966 magic4 = bsGetUChar ();
2967 magic5 = bsGetUChar ();
2968 magic6 = bsGetUChar ();
2969 if (magic1 == 0x17 && magic2 == 0x72 &&
2970 magic3 == 0x45 && magic4 == 0x38 &&
2971 magic5 == 0x50 && magic6 == 0x90) break;
2972
2973 currBlockNo++;
2974 if (magic1 != 0x31 || magic2 != 0x41 ||
2975 magic3 != 0x59 || magic4 != 0x26 ||
2976 magic5 != 0x53 || magic6 != 0x59) {
2977 bsFinishedWithStream();
2978 fclose ( zStream );
2979 fprintf ( stderr, 540 fprintf ( stderr,
2980 "\n%s, block %d: bad header (not == 0x314159265359)\n", 541 "\n%s: data integrity (CRC) error in data\n",
2981 inName, currBlockNo ); 542 inName );
2982 return False; 543 return False;
2983 } 544 case BZ_MEM_ERROR:
2984 storedBlockCRC = bsGetUInt32 (); 545 outOfMemory();
2985 546 case BZ_UNEXPECTED_EOF:
2986 if (bsR(1) == 1) 547 fprintf ( stderr,
2987 blockRandomised = True; else 548 "\n%s: file ends unexpectedly\n",
2988 blockRandomised = False; 549 inName );
2989
2990 if (verbosity >= 2)
2991 fprintf ( stderr, " block [%d: huff+mtf ", currBlockNo );
2992 getAndMoveToFrontDecode ();
2993 ERROR_IF_NOT_ZERO ( ferror(zStream) );
2994
2995 initialiseCRC();
2996 if (verbosity >= 2) fprintf ( stderr, "rt+rld" );
2997 undoReversibleTransformation_small ( NULL );
2998
2999 computedBlockCRC = getFinalCRC();
3000 if (verbosity >= 3)
3001 fprintf ( stderr, " {0x%x, 0x%x}", storedBlockCRC, computedBlockCRC );
3002 if (verbosity >= 2) fprintf ( stderr, "] " );
3003
3004 if (storedBlockCRC != computedBlockCRC) {
3005 bsFinishedWithStream();
3006 fclose ( zStream );
3007 fprintf ( stderr, "\n%s, block %d: computed CRC does not match stored one\n",
3008 inName, currBlockNo );
3009 return False; 550 return False;
3010 } 551 case BZ_DATA_ERROR_MAGIC:
3011 552 if (streamNo == 1) {
3012 if (verbosity >= 2) fprintf ( stderr, "ok\n" ); 553 fprintf ( stderr,
3013 computedCombinedCRC = (computedCombinedCRC << 1) | (computedCombinedCRC >> 31); 554 "\n%s: bad magic number (ie, not created by bzip2)\n",
3014 computedCombinedCRC ^= computedBlockCRC; 555 inName );
3015 }; 556 return False;
3016 557 } else {
3017 storedCombinedCRC = bsGetUInt32 (); 558 fprintf ( stderr,
3018 if (verbosity >= 2) 559 "\n%s: %s: trailing garbage after EOF ignored\n",
3019 fprintf ( stderr, 560 progName, inName );
3020 " combined CRCs: stored = 0x%x, computed = 0x%x\n ", 561 return True;
3021 storedCombinedCRC, computedCombinedCRC ); 562 }
3022 if (storedCombinedCRC != computedCombinedCRC) { 563 default:
3023 bsFinishedWithStream(); 564 panic ( "test:unexpected error" );
3024 fclose ( zStream );
3025 fprintf ( stderr, "\n%s: computed CRC does not match stored one\n",
3026 inName );
3027 return False;
3028 } 565 }
3029 566
3030 bsFinishedWithStream (); 567 panic ( "test:end" );
3031 ERROR_IF_NOT_ZERO ( ferror(zStream) ); 568 return True; /*notreached*/
3032 retVal = fclose ( zStream );
3033 ERROR_IF_EOF ( retVal );
3034 return True;
3035} 569}
3036 570
3037 571
3038
3039/*---------------------------------------------------*/ 572/*---------------------------------------------------*/
3040/*--- Error [non-] handling grunge ---*/ 573/*--- Error [non-] handling grunge ---*/
3041/*---------------------------------------------------*/ 574/*---------------------------------------------------*/
@@ -3059,8 +592,7 @@ void showFileNames ( void )
3059 fprintf ( 592 fprintf (
3060 stderr, 593 stderr,
3061 "\tInput file = %s, output file = %s\n", 594 "\tInput file = %s, output file = %s\n",
3062 inName==NULL ? "(null)" : inName, 595 inName, outName
3063 outName==NULL ? "(null)" : outName
3064 ); 596 );
3065} 597}
3066 598
@@ -3072,8 +604,7 @@ void cleanUpAndFail ( Int32 ec )
3072 604
3073 if ( srcMode == SM_F2F && opMode != OM_TEST ) { 605 if ( srcMode == SM_F2F && opMode != OM_TEST ) {
3074 fprintf ( stderr, "%s: Deleting output file %s, if it exists.\n", 606 fprintf ( stderr, "%s: Deleting output file %s, if it exists.\n",
3075 progName, 607 progName, outName );
3076 outName==NULL ? "(null)" : outName );
3077 if (outputHandleJustInCase != NULL) 608 if (outputHandleJustInCase != NULL)
3078 fclose ( outputHandleJustInCase ); 609 fclose ( outputHandleJustInCase );
3079 retVal = remove ( outName ); 610 retVal = remove ( outName );
@@ -3108,11 +639,10 @@ void panic ( Char* s )
3108 639
3109 640
3110/*---------------------------------------------*/ 641/*---------------------------------------------*/
3111void badBGLengths ( void ) 642void crcError ()
3112{ 643{
3113 fprintf ( stderr, 644 fprintf ( stderr,
3114 "\n%s: error when reading background model code lengths,\n" 645 "\n%s: Data integrity error when decompressing.\n",
3115 "\twhich probably means the compressed file is corrupted.\n",
3116 progName ); 646 progName );
3117 showFileNames(); 647 showFileNames();
3118 cadvise(); 648 cadvise();
@@ -3121,19 +651,6 @@ void badBGLengths ( void )
3121 651
3122 652
3123/*---------------------------------------------*/ 653/*---------------------------------------------*/
3124void crcError ( UInt32 crcStored, UInt32 crcComputed )
3125{
3126 fprintf ( stderr,
3127 "\n%s: Data integrity error when decompressing.\n"
3128 "\tStored CRC = 0x%x, computed CRC = 0x%x\n",
3129 progName, crcStored, crcComputed );
3130 showFileNames();
3131 cadvise();
3132 cleanUpAndFail( 2 );
3133}
3134
3135
3136/*---------------------------------------------*/
3137void compressedStreamEOF ( void ) 654void compressedStreamEOF ( void )
3138{ 655{
3139 fprintf ( stderr, 656 fprintf ( stderr,
@@ -3160,46 +677,6 @@ void ioError ( )
3160 677
3161 678
3162/*---------------------------------------------*/ 679/*---------------------------------------------*/
3163void blockOverrun ()
3164{
3165 fprintf ( stderr,
3166 "\n%s: block overrun during decompression,\n"
3167 "\twhich probably means the compressed file\n"
3168 "\tis corrupted.\n",
3169 progName );
3170 showFileNames();
3171 cadvise();
3172 cleanUpAndFail( 2 );
3173}
3174
3175
3176/*---------------------------------------------*/
3177void badBlockHeader ()
3178{
3179 fprintf ( stderr,
3180 "\n%s: bad block header in the compressed file,\n"
3181 "\twhich probably means it is corrupted.\n",
3182 progName );
3183 showFileNames();
3184 cadvise();
3185 cleanUpAndFail( 2 );
3186}
3187
3188
3189/*---------------------------------------------*/
3190void bitStreamEOF ()
3191{
3192 fprintf ( stderr,
3193 "\n%s: read past the end of compressed data,\n"
3194 "\twhich probably means it is corrupted.\n",
3195 progName );
3196 showFileNames();
3197 cadvise();
3198 cleanUpAndFail( 2 );
3199}
3200
3201
3202/*---------------------------------------------*/
3203void mySignalCatcher ( IntNative n ) 680void mySignalCatcher ( IntNative n )
3204{ 681{
3205 fprintf ( stderr, 682 fprintf ( stderr,
@@ -3233,27 +710,11 @@ void mySIGSEGVorSIGBUScatcher ( IntNative n )
3233 710
3234 711
3235/*---------------------------------------------*/ 712/*---------------------------------------------*/
3236void uncompressOutOfMemory ( Int32 draw, Int32 blockSize ) 713void outOfMemory ( void )
3237{
3238 fprintf ( stderr,
3239 "\n%s: Can't allocate enough memory for decompression.\n"
3240 "\tRequested %d bytes for a block size of %d.\n"
3241 "\tTry selecting space-economic decompress (with flag -s)\n"
3242 "\tand failing that, find a machine with more memory.\n",
3243 progName, draw, blockSize );
3244 showFileNames();
3245 cleanUpAndFail(1);
3246}
3247
3248
3249/*---------------------------------------------*/
3250void compressOutOfMemory ( Int32 draw, Int32 blockSize )
3251{ 714{
3252 fprintf ( stderr, 715 fprintf ( stderr,
3253 "\n%s: Can't allocate enough memory for compression.\n" 716 "\n%s: couldn't allocate enough memory\n",
3254 "\tRequested %d bytes for a block size of %d.\n" 717 progName );
3255 "\tTry selecting a small block size (with flag -s).\n",
3256 progName, draw, blockSize );
3257 showFileNames(); 718 showFileNames();
3258 cleanUpAndFail(1); 719 cleanUpAndFail(1);
3259} 720}
@@ -3274,6 +735,24 @@ void pad ( Char *s )
3274 735
3275 736
3276/*---------------------------------------------*/ 737/*---------------------------------------------*/
738void copyFileName ( Char* to, Char* from )
739{
740 if ( strlen(from) > FILE_NAME_LEN-10 ) {
741 fprintf (
742 stderr,
743 "bzip2: file name\n`%s'\nis suspiciously (> 1024 chars) long.\n"
744 "Try using a reasonable file name instead. Sorry! :)\n",
745 from
746 );
747 exit(1);
748 }
749
750 strncpy(to,from,FILE_NAME_LEN-10);
751 to[FILE_NAME_LEN-10]='\0';
752}
753
754
755/*---------------------------------------------*/
3277Bool fileExists ( Char* name ) 756Bool fileExists ( Char* name )
3278{ 757{
3279 FILE *tmp = fopen ( name, "rb" ); 758 FILE *tmp = fopen ( name, "rb" );
@@ -3287,7 +766,7 @@ Bool fileExists ( Char* name )
3287/*-- 766/*--
3288 if in doubt, return True 767 if in doubt, return True
3289--*/ 768--*/
3290Bool notABogStandardFile ( Char* name ) 769Bool notAStandardFile ( Char* name )
3291{ 770{
3292 IntNative i; 771 IntNative i;
3293 struct MY_STAT statBuf; 772 struct MY_STAT statBuf;
@@ -3300,9 +779,9 @@ Bool notABogStandardFile ( Char* name )
3300 779
3301 780
3302/*---------------------------------------------*/ 781/*---------------------------------------------*/
3303void copyDateAndPermissions ( Char *srcName, Char *dstName ) 782void copyDatePermissionsAndOwner ( Char *srcName, Char *dstName )
3304{ 783{
3305 #if BZ_UNIX 784#if BZ_UNIX
3306 IntNative retVal; 785 IntNative retVal;
3307 struct MY_STAT statBuf; 786 struct MY_STAT statBuf;
3308 struct utimbuf uTimBuf; 787 struct utimbuf uTimBuf;
@@ -3314,13 +793,34 @@ void copyDateAndPermissions ( Char *srcName, Char *dstName )
3314 793
3315 retVal = chmod ( dstName, statBuf.st_mode ); 794 retVal = chmod ( dstName, statBuf.st_mode );
3316 ERROR_IF_NOT_ZERO ( retVal ); 795 ERROR_IF_NOT_ZERO ( retVal );
796 /* Not sure if this is really portable or not. Causes
797 problems on my x86-Linux Redhat 5.0 box. Decided
798 to omit it from 0.9.0. JRS, 27 June 98. If you
799 understand Unix file semantics and portability issues
800 well enough to fix this properly, drop me a line
801 at jseward@acm.org.
802 retVal = chown ( dstName, statBuf.st_uid, statBuf.st_gid );
803 ERROR_IF_NOT_ZERO ( retVal );
804 */
3317 retVal = utime ( dstName, &uTimBuf ); 805 retVal = utime ( dstName, &uTimBuf );
3318 ERROR_IF_NOT_ZERO ( retVal ); 806 ERROR_IF_NOT_ZERO ( retVal );
3319 #endif 807#endif
3320} 808}
3321 809
3322 810
3323/*---------------------------------------------*/ 811/*---------------------------------------------*/
812void setInterimPermissions ( Char *dstName )
813{
814#if BZ_UNIX
815 IntNative retVal;
816 retVal = chmod ( dstName, S_IRUSR | S_IWUSR );
817 ERROR_IF_NOT_ZERO ( retVal );
818#endif
819}
820
821
822
823/*---------------------------------------------*/
3324Bool endsInBz2 ( Char* name ) 824Bool endsInBz2 ( Char* name )
3325{ 825{
3326 Int32 n = strlen ( name ); 826 Int32 n = strlen ( name );
@@ -3353,13 +853,13 @@ void compress ( Char *name )
3353 panic ( "compress: bad modes\n" ); 853 panic ( "compress: bad modes\n" );
3354 854
3355 switch (srcMode) { 855 switch (srcMode) {
3356 case SM_I2O: strcpy ( inName, "(stdin)" ); 856 case SM_I2O: copyFileName ( inName, "(stdin)" );
3357 strcpy ( outName, "(stdout)" ); break; 857 copyFileName ( outName, "(stdout)" ); break;
3358 case SM_F2F: strcpy ( inName, name ); 858 case SM_F2F: copyFileName ( inName, name );
3359 strcpy ( outName, name ); 859 copyFileName ( outName, name );
3360 strcat ( outName, ".bz2" ); break; 860 strcat ( outName, ".bz2" ); break;
3361 case SM_F2O: strcpy ( inName, name ); 861 case SM_F2O: copyFileName ( inName, name );
3362 strcpy ( outName, "(stdout)" ); break; 862 copyFileName ( outName, "(stdout)" ); break;
3363 } 863 }
3364 864
3365 if ( srcMode != SM_I2O && containsDubiousChars ( inName ) ) { 865 if ( srcMode != SM_I2O && containsDubiousChars ( inName ) ) {
@@ -3377,12 +877,12 @@ void compress ( Char *name )
3377 progName, inName ); 877 progName, inName );
3378 return; 878 return;
3379 } 879 }
3380 if ( srcMode != SM_I2O && notABogStandardFile ( inName )) { 880 if ( srcMode != SM_I2O && notAStandardFile ( inName )) {
3381 fprintf ( stderr, "%s: Input file %s is not a normal file, skipping.\n", 881 fprintf ( stderr, "%s: Input file %s is not a normal file, skipping.\n",
3382 progName, inName ); 882 progName, inName );
3383 return; 883 return;
3384 } 884 }
3385 if ( srcMode == SM_F2F && fileExists ( outName ) ) { 885 if ( srcMode == SM_F2F && !forceOverwrite && fileExists ( outName ) ) {
3386 fprintf ( stderr, "%s: Output file %s already exists, skipping.\n", 886 fprintf ( stderr, "%s: Output file %s already exists, skipping.\n",
3387 progName, outName ); 887 progName, outName );
3388 return; 888 return;
@@ -3434,6 +934,7 @@ void compress ( Char *name )
3434 progName, inName ); 934 progName, inName );
3435 return; 935 return;
3436 }; 936 };
937 setInterimPermissions ( outName );
3437 break; 938 break;
3438 939
3439 default: 940 default:
@@ -3454,7 +955,7 @@ void compress ( Char *name )
3454 955
3455 /*--- If there was an I/O error, we won't get here. ---*/ 956 /*--- If there was an I/O error, we won't get here. ---*/
3456 if ( srcMode == SM_F2F ) { 957 if ( srcMode == SM_F2F ) {
3457 copyDateAndPermissions ( inName, outName ); 958 copyDatePermissionsAndOwner ( inName, outName );
3458 if ( !keepInputFiles ) { 959 if ( !keepInputFiles ) {
3459 IntNative retVal = remove ( inName ); 960 IntNative retVal = remove ( inName );
3460 ERROR_IF_NOT_ZERO ( retVal ); 961 ERROR_IF_NOT_ZERO ( retVal );
@@ -3474,15 +975,15 @@ void uncompress ( Char *name )
3474 panic ( "uncompress: bad modes\n" ); 975 panic ( "uncompress: bad modes\n" );
3475 976
3476 switch (srcMode) { 977 switch (srcMode) {
3477 case SM_I2O: strcpy ( inName, "(stdin)" ); 978 case SM_I2O: copyFileName ( inName, "(stdin)" );
3478 strcpy ( outName, "(stdout)" ); break; 979 copyFileName ( outName, "(stdout)" ); break;
3479 case SM_F2F: strcpy ( inName, name ); 980 case SM_F2F: copyFileName ( inName, name );
3480 strcpy ( outName, name ); 981 copyFileName ( outName, name );
3481 if (endsInBz2 ( outName )) 982 if (endsInBz2 ( outName ))
3482 outName [ strlen ( outName ) - 4 ] = '\0'; 983 outName [ strlen ( outName ) - 4 ] = '\0';
3483 break; 984 break;
3484 case SM_F2O: strcpy ( inName, name ); 985 case SM_F2O: copyFileName ( inName, name );
3485 strcpy ( outName, "(stdout)" ); break; 986 copyFileName ( outName, "(stdout)" ); break;
3486 } 987 }
3487 988
3488 if ( srcMode != SM_I2O && containsDubiousChars ( inName ) ) { 989 if ( srcMode != SM_I2O && containsDubiousChars ( inName ) ) {
@@ -3501,12 +1002,12 @@ void uncompress ( Char *name )
3501 progName, inName ); 1002 progName, inName );
3502 return; 1003 return;
3503 } 1004 }
3504 if ( srcMode != SM_I2O && notABogStandardFile ( inName )) { 1005 if ( srcMode != SM_I2O && notAStandardFile ( inName )) {
3505 fprintf ( stderr, "%s: Input file %s is not a normal file, skipping.\n", 1006 fprintf ( stderr, "%s: Input file %s is not a normal file, skipping.\n",
3506 progName, inName ); 1007 progName, inName );
3507 return; 1008 return;
3508 } 1009 }
3509 if ( srcMode == SM_F2F && fileExists ( outName ) ) { 1010 if ( srcMode == SM_F2F && !forceOverwrite && fileExists ( outName ) ) {
3510 fprintf ( stderr, "%s: Output file %s already exists, skipping.\n", 1011 fprintf ( stderr, "%s: Output file %s already exists, skipping.\n",
3511 progName, outName ); 1012 progName, outName );
3512 return; 1013 return;
@@ -3550,6 +1051,7 @@ void uncompress ( Char *name )
3550 progName, inName ); 1051 progName, inName );
3551 return; 1052 return;
3552 }; 1053 };
1054 setInterimPermissions ( outName );
3553 break; 1055 break;
3554 1056
3555 default: 1057 default:
@@ -3571,7 +1073,7 @@ void uncompress ( Char *name )
3571 /*--- If there was an I/O error, we won't get here. ---*/ 1073 /*--- If there was an I/O error, we won't get here. ---*/
3572 if ( magicNumberOK ) { 1074 if ( magicNumberOK ) {
3573 if ( srcMode == SM_F2F ) { 1075 if ( srcMode == SM_F2F ) {
3574 copyDateAndPermissions ( inName, outName ); 1076 copyDatePermissionsAndOwner ( inName, outName );
3575 if ( !keepInputFiles ) { 1077 if ( !keepInputFiles ) {
3576 IntNative retVal = remove ( inName ); 1078 IntNative retVal = remove ( inName );
3577 ERROR_IF_NOT_ZERO ( retVal ); 1079 ERROR_IF_NOT_ZERO ( retVal );
@@ -3607,11 +1109,11 @@ void testf ( Char *name )
3607 if (name == NULL && srcMode != SM_I2O) 1109 if (name == NULL && srcMode != SM_I2O)
3608 panic ( "testf: bad modes\n" ); 1110 panic ( "testf: bad modes\n" );
3609 1111
3610 strcpy ( outName, "(none)" ); 1112 copyFileName ( outName, "(none)" );
3611 switch (srcMode) { 1113 switch (srcMode) {
3612 case SM_I2O: strcpy ( inName, "(stdin)" ); break; 1114 case SM_I2O: copyFileName ( inName, "(stdin)" ); break;
3613 case SM_F2F: strcpy ( inName, name ); break; 1115 case SM_F2F: copyFileName ( inName, name ); break;
3614 case SM_F2O: strcpy ( inName, name ); break; 1116 case SM_F2O: copyFileName ( inName, name ); break;
3615 } 1117 }
3616 1118
3617 if ( srcMode != SM_I2O && containsDubiousChars ( inName ) ) { 1119 if ( srcMode != SM_I2O && containsDubiousChars ( inName ) ) {
@@ -3630,7 +1132,7 @@ void testf ( Char *name )
3630 progName, inName ); 1132 progName, inName );
3631 return; 1133 return;
3632 } 1134 }
3633 if ( srcMode != SM_I2O && notABogStandardFile ( inName )) { 1135 if ( srcMode != SM_I2O && notAStandardFile ( inName )) {
3634 fprintf ( stderr, "%s: Input file %s is not a normal file, skipping.\n", 1136 fprintf ( stderr, "%s: Input file %s is not a normal file, skipping.\n",
3635 progName, inName ); 1137 progName, inName );
3636 return; 1138 return;
@@ -3684,25 +1186,18 @@ void license ( void )
3684 fprintf ( stderr, 1186 fprintf ( stderr,
3685 1187
3686 "bzip2, a block-sorting file compressor. " 1188 "bzip2, a block-sorting file compressor. "
3687 "Version 0.1pl2, 29-Aug-97.\n" 1189 "Version 0.9.0c, 18-Oct-98.\n"
3688 " \n" 1190 " \n"
3689 " Copyright (C) 1996, 1997 by Julian Seward.\n" 1191 " Copyright (C) 1996, 1997, 1998 by Julian Seward.\n"
3690 " \n" 1192 " \n"
3691 " This program is free software; you can redistribute it and/or modify\n" 1193 " This program is free software; you can redistribute it and/or modify\n"
3692 " it under the terms of the GNU General Public License as published by\n" 1194 " it under the terms set out in the LICENSE file, which is included\n"
3693 " the Free Software Foundation; either version 2 of the License, or\n" 1195 " in the bzip2-0.9.0c source distribution.\n"
3694 " (at your option) any later version.\n"
3695 " \n" 1196 " \n"
3696 " This program is distributed in the hope that it will be useful,\n" 1197 " This program is distributed in the hope that it will be useful,\n"
3697 " but WITHOUT ANY WARRANTY; without even the implied warranty of\n" 1198 " but WITHOUT ANY WARRANTY; without even the implied warranty of\n"
3698 " MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the\n" 1199 " MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the\n"
3699 " GNU General Public License for more details.\n" 1200 " LICENSE file for more details.\n"
3700 " \n"
3701 " You should have received a copy of the GNU General Public License\n"
3702 " along with this program; if not, write to the Free Software\n"
3703 " Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.\n"
3704 " \n"
3705 " The GNU General Public License is contained in the file LICENSE.\n"
3706 " \n" 1201 " \n"
3707 ); 1202 );
3708} 1203}
@@ -3714,16 +1209,17 @@ void usage ( Char *fullProgName )
3714 fprintf ( 1209 fprintf (
3715 stderr, 1210 stderr,
3716 "bzip2, a block-sorting file compressor. " 1211 "bzip2, a block-sorting file compressor. "
3717 "Version 0.1pl2, 29-Aug-97.\n" 1212 "Version 0.9.0c, 18-Oct-98.\n"
3718 "\n usage: %s [flags and input files in any order]\n" 1213 "\n usage: %s [flags and input files in any order]\n"
3719 "\n" 1214 "\n"
3720 " -h --help print this message\n" 1215 " -h --help print this message\n"
3721 " -d --decompress force decompression\n" 1216 " -d --decompress force decompression\n"
3722 " -f --compress force compression\n" 1217 " -z --compress force compression\n"
1218 " -k --keep keep (don't delete) input files\n"
1219 " -f --force overwrite existing output filess\n"
3723 " -t --test test compressed file integrity\n" 1220 " -t --test test compressed file integrity\n"
3724 " -c --stdout output to standard out\n" 1221 " -c --stdout output to standard out\n"
3725 " -v --verbose be verbose (a 2nd -v gives more)\n" 1222 " -v --verbose be verbose (a 2nd -v gives more)\n"
3726 " -k --keep keep (don't delete) input files\n"
3727 " -L --license display software version & license\n" 1223 " -L --license display software version & license\n"
3728 " -V --version display software version & license\n" 1224 " -V --version display software version & license\n"
3729 " -s --small use less memory (at most 2500k)\n" 1225 " -s --small use less memory (at most 2500k)\n"
@@ -3731,15 +1227,16 @@ void usage ( Char *fullProgName )
3731 " --repetitive-fast compress repetitive blocks faster\n" 1227 " --repetitive-fast compress repetitive blocks faster\n"
3732 " --repetitive-best compress repetitive blocks better\n" 1228 " --repetitive-best compress repetitive blocks better\n"
3733 "\n" 1229 "\n"
3734 " If invoked as `bzip2', the default action is to compress.\n" 1230 " If invoked as `bzip2', default action is to compress.\n"
3735 " as `bunzip2', the default action is to decompress.\n" 1231 " as `bunzip2', default action is to decompress.\n"
1232 " as `bz2cat', default action is to decompress to stdout.\n"
3736 "\n" 1233 "\n"
3737 " If no file names are given, bzip2 compresses or decompresses\n" 1234 " If no file names are given, bzip2 compresses or decompresses\n"
3738 " from standard input to standard output. You can combine\n" 1235 " from standard input to standard output. You can combine\n"
3739 " flags, so `-v -4' means the same as -v4 or -4v, &c.\n" 1236 " short flags, so `-v -4' means the same as -v4 or -4v, &c.\n"
3740 #if BZ_UNIX 1237#if BZ_UNIX
3741 "\n" 1238 "\n"
3742 #endif 1239#endif
3743 , 1240 ,
3744 1241
3745 fullProgName 1242 fullProgName
@@ -3776,14 +1273,7 @@ void *myMalloc ( Int32 n )
3776 void* p; 1273 void* p;
3777 1274
3778 p = malloc ( (size_t)n ); 1275 p = malloc ( (size_t)n );
3779 if (p == NULL) { 1276 if (p == NULL) outOfMemory ();
3780 fprintf (
3781 stderr,
3782 "%s: `malloc' failed on request for %d bytes.\n",
3783 progName, n
3784 );
3785 exit ( 1 );
3786 }
3787 return p; 1277 return p;
3788} 1278}
3789 1279
@@ -3817,7 +1307,6 @@ Cell *snocString ( Cell *root, Char *name )
3817} 1307}
3818 1308
3819 1309
3820
3821/*---------------------------------------------*/ 1310/*---------------------------------------------*/
3822#define ISFLAG(s) (strcmp(aa->name, (s))==0) 1311#define ISFLAG(s) (strcmp(aa->name, (s))==0)
3823 1312
@@ -3829,11 +1318,6 @@ IntNative main ( IntNative argc, Char *argv[] )
3829 Cell *argList; 1318 Cell *argList;
3830 Cell *aa; 1319 Cell *aa;
3831 1320
3832
3833 #if DEBUG
3834 fprintf ( stderr, "bzip2: *** compiled with debugging ON ***\n" );
3835 #endif
3836
3837 /*-- Be really really really paranoid :-) --*/ 1321 /*-- Be really really really paranoid :-) --*/
3838 if (sizeof(Int32) != 4 || sizeof(UInt32) != 4 || 1322 if (sizeof(Int32) != 4 || sizeof(UInt32) != 4 ||
3839 sizeof(Int16) != 2 || sizeof(UInt16) != 2 || 1323 sizeof(Int16) != 2 || sizeof(UInt16) != 2 ||
@@ -3844,7 +1328,7 @@ IntNative main ( IntNative argc, Char *argv[] )
3844 "\tof 4, 2 and 1 bytes to run properly, and they don't.\n" 1328 "\tof 4, 2 and 1 bytes to run properly, and they don't.\n"
3845 "\tProbably you can fix this by defining them correctly,\n" 1329 "\tProbably you can fix this by defining them correctly,\n"
3846 "\tand recompiling. Bye!\n" ); 1330 "\tand recompiling. Bye!\n" );
3847 exit(1); 1331 exit(3);
3848 } 1332 }
3849 1333
3850 1334
@@ -3852,35 +1336,28 @@ IntNative main ( IntNative argc, Char *argv[] )
3852 signal (SIGINT, mySignalCatcher); 1336 signal (SIGINT, mySignalCatcher);
3853 signal (SIGTERM, mySignalCatcher); 1337 signal (SIGTERM, mySignalCatcher);
3854 signal (SIGSEGV, mySIGSEGVorSIGBUScatcher); 1338 signal (SIGSEGV, mySIGSEGVorSIGBUScatcher);
3855 #if BZ_UNIX 1339#if BZ_UNIX
3856 signal (SIGHUP, mySignalCatcher); 1340 signal (SIGHUP, mySignalCatcher);
3857 signal (SIGBUS, mySIGSEGVorSIGBUScatcher); 1341 signal (SIGBUS, mySIGSEGVorSIGBUScatcher);
3858 #endif 1342#endif
3859 1343
3860 1344
3861 /*-- Initialise --*/ 1345 /*-- Initialise --*/
3862 outputHandleJustInCase = NULL; 1346 outputHandleJustInCase = NULL;
3863 ftab = NULL;
3864 ll4 = NULL;
3865 ll16 = NULL;
3866 ll8 = NULL;
3867 tt = NULL;
3868 block = NULL;
3869 zptr = NULL;
3870 smallMode = False; 1347 smallMode = False;
3871 keepInputFiles = False; 1348 keepInputFiles = False;
1349 forceOverwrite = False;
3872 verbosity = 0; 1350 verbosity = 0;
3873 blockSize100k = 9; 1351 blockSize100k = 9;
3874 testFailsExist = False; 1352 testFailsExist = False;
3875 bsStream = NULL;
3876 numFileNames = 0; 1353 numFileNames = 0;
3877 numFilesProcessed = 0; 1354 numFilesProcessed = 0;
3878 workFactor = 30; 1355 workFactor = 30;
3879 1356
3880 strcpy ( inName, "(none)" ); 1357 copyFileName ( inName, "(none)" );
3881 strcpy ( outName, "(none)" ); 1358 copyFileName ( outName, "(none)" );
3882 1359
3883 strcpy ( progNameReally, argv[0] ); 1360 copyFileName ( progNameReally, argv[0] );
3884 progName = &progNameReally[0]; 1361 progName = &progNameReally[0];
3885 for (tmp = &progNameReally[0]; *tmp != '\0'; tmp++) 1362 for (tmp = &progNameReally[0]; *tmp != '\0'; tmp++)
3886 if (*tmp == PATH_SEP) progName = tmp + 1; 1363 if (*tmp == PATH_SEP) progName = tmp + 1;
@@ -3903,20 +1380,26 @@ IntNative main ( IntNative argc, Char *argv[] )
3903 } 1380 }
3904 1381
3905 1382
3906 /*-- Determine what to do (compress/uncompress/test). --*/ 1383 /*-- Determine source modes; flag handling may change this too. --*/
1384 if (numFileNames == 0)
1385 srcMode = SM_I2O; else srcMode = SM_F2F;
1386
1387
1388 /*-- Determine what to do (compress/uncompress/test/cat). --*/
3907 /*-- Note that subsequent flag handling may change this. --*/ 1389 /*-- Note that subsequent flag handling may change this. --*/
3908 opMode = OM_Z; 1390 opMode = OM_Z;
3909 1391
3910 if ( (strcmp ( "bunzip2", progName ) == 0) || 1392 if ( (strstr ( progName, "unzip" ) != 0) ||
3911 (strcmp ( "BUNZIP2", progName ) == 0) || 1393 (strstr ( progName, "UNZIP" ) != 0) )
3912 (strcmp ( "bunzip2.exe", progName ) == 0) ||
3913 (strcmp ( "BUNZIP2.EXE", progName ) == 0) )
3914 opMode = OM_UNZ; 1394 opMode = OM_UNZ;
3915 1395
3916 1396 if ( (strstr ( progName, "z2cat" ) != 0) ||
3917 /*-- Determine source modes; flag handling may change this too. --*/ 1397 (strstr ( progName, "Z2CAT" ) != 0) ||
3918 if (numFileNames == 0) 1398 (strstr ( progName, "zcat" ) != 0) ||
3919 srcMode = SM_I2O; else srcMode = SM_F2F; 1399 (strstr ( progName, "ZCAT" ) != 0) ) {
1400 opMode = OM_UNZ;
1401 srcMode = (numFileNames == 0) ? SM_I2O : SM_F2O;
1402 }
3920 1403
3921 1404
3922 /*-- Look at the flags. --*/ 1405 /*-- Look at the flags. --*/
@@ -3926,7 +1409,8 @@ IntNative main ( IntNative argc, Char *argv[] )
3926 switch (aa->name[j]) { 1409 switch (aa->name[j]) {
3927 case 'c': srcMode = SM_F2O; break; 1410 case 'c': srcMode = SM_F2O; break;
3928 case 'd': opMode = OM_UNZ; break; 1411 case 'd': opMode = OM_UNZ; break;
3929 case 'f': opMode = OM_Z; break; 1412 case 'z': opMode = OM_Z; break;
1413 case 'f': forceOverwrite = True; break;
3930 case 't': opMode = OM_TEST; break; 1414 case 't': opMode = OM_TEST; break;
3931 case 'k': keepInputFiles = True; break; 1415 case 'k': keepInputFiles = True; break;
3932 case 's': smallMode = True; break; 1416 case 's': smallMode = True; break;
@@ -3957,6 +1441,7 @@ IntNative main ( IntNative argc, Char *argv[] )
3957 if (ISFLAG("--stdout")) srcMode = SM_F2O; else 1441 if (ISFLAG("--stdout")) srcMode = SM_F2O; else
3958 if (ISFLAG("--decompress")) opMode = OM_UNZ; else 1442 if (ISFLAG("--decompress")) opMode = OM_UNZ; else
3959 if (ISFLAG("--compress")) opMode = OM_Z; else 1443 if (ISFLAG("--compress")) opMode = OM_Z; else
1444 if (ISFLAG("--force")) forceOverwrite = True; else
3960 if (ISFLAG("--test")) opMode = OM_TEST; else 1445 if (ISFLAG("--test")) opMode = OM_TEST; else
3961 if (ISFLAG("--keep")) keepInputFiles = True; else 1446 if (ISFLAG("--keep")) keepInputFiles = True; else
3962 if (ISFLAG("--small")) smallMode = True; else 1447 if (ISFLAG("--small")) smallMode = True; else
@@ -3974,14 +1459,9 @@ IntNative main ( IntNative argc, Char *argv[] )
3974 } 1459 }
3975 } 1460 }
3976 1461
1462 if (verbosity > 4) verbosity = 4;
3977 if (opMode == OM_Z && smallMode) blockSize100k = 2; 1463 if (opMode == OM_Z && smallMode) blockSize100k = 2;
3978 1464
3979 if (opMode == OM_Z && srcMode == SM_F2O && numFileNames > 1) {
3980 fprintf ( stderr, "%s: I won't compress multiple files to stdout.\n",
3981 progName );
3982 exit ( 1 );
3983 }
3984
3985 if (srcMode == SM_F2O && numFileNames == 0) { 1465 if (srcMode == SM_F2O && numFileNames == 0) {
3986 fprintf ( stderr, "%s: -c expects at least one filename.\n", 1466 fprintf ( stderr, "%s: -c expects at least one filename.\n",
3987 progName ); 1467 progName );
@@ -3997,7 +1477,6 @@ IntNative main ( IntNative argc, Char *argv[] )
3997 if (opMode != OM_Z) blockSize100k = 0; 1477 if (opMode != OM_Z) blockSize100k = 0;
3998 1478
3999 if (opMode == OM_Z) { 1479 if (opMode == OM_Z) {
4000 allocateCompressStructures();
4001 if (srcMode == SM_I2O) 1480 if (srcMode == SM_I2O)
4002 compress ( NULL ); 1481 compress ( NULL );
4003 else 1482 else
diff --git a/bzip2.txt b/bzip2.txt
index aee8e2b..898dfe8 100644
--- a/bzip2.txt
+++ b/bzip2.txt
@@ -1,22 +1,22 @@
1 1
2
3
4bzip2(1) bzip2(1) 2bzip2(1) bzip2(1)
5 3
6 4
7NAME 5NAME
8 bzip2, bunzip2 - a block-sorting file compressor, v0.1 6 bzip2, bunzip2 - a block-sorting file compressor, v0.9.0
7 bzcat - decompresses files to stdout
9 bzip2recover - recovers data from damaged bzip2 files 8 bzip2recover - recovers data from damaged bzip2 files
10 9
11 10
12SYNOPSIS 11SYNOPSIS
13 bzip2 [ -cdfkstvVL123456789 ] [ filenames ... ] 12 bzip2 [ -cdfkstvzVL123456789 ] [ filenames ... ]
14 bunzip2 [ -kvsVL ] [ filenames ... ] 13 bunzip2 [ -fkvsVL ] [ filenames ... ]
14 bzcat [ -s ] [ filenames ... ]
15 bzip2recover filename 15 bzip2recover filename
16 16
17 17
18DESCRIPTION 18DESCRIPTION
19 Bzip2 compresses files using the Burrows-Wheeler block- 19 bzip2 compresses files using the Burrows-Wheeler block-
20 sorting text compression algorithm, and Huffman coding. 20 sorting text compression algorithm, and Huffman coding.
21 Compression is generally considerably better than that 21 Compression is generally considerably better than that
22 achieved by more conventional LZ77/LZ78-based compressors, 22 achieved by more conventional LZ77/LZ78-based compressors,
@@ -26,7 +26,7 @@ DESCRIPTION
26 The command-line options are deliberately very similar to 26 The command-line options are deliberately very similar to
27 those of GNU Gzip, but they are not identical. 27 those of GNU Gzip, but they are not identical.
28 28
29 Bzip2 expects a list of file names to accompany the com- 29 bzip2 expects a list of file names to accompany the com-
30 mand-line flags. Each file is replaced by a compressed 30 mand-line flags. Each file is replaced by a compressed
31 version of itself, with the name "original_name.bz2". 31 version of itself, with the name "original_name.bz2".
32 Each compressed file has the same modification date and 32 Each compressed file has the same modification date and
@@ -38,8 +38,8 @@ DESCRIPTION
38 cepts, or have serious file name length restrictions, such 38 cepts, or have serious file name length restrictions, such
39 as MS-DOS. 39 as MS-DOS.
40 40
41 Bzip2 and bunzip2 will not overwrite existing files; if 41 bzip2 and bunzip2 will by default not overwrite existing
42 you want this to happen, you should delete them first. 42 files; if you want this to happen, specify the -f flag.
43 43
44 If no file names are specified, bzip2 compresses from 44 If no file names are specified, bzip2 compresses from
45 standard input to standard output. In this case, bzip2 45 standard input to standard output. In this case, bzip2
@@ -47,28 +47,29 @@ DESCRIPTION
47 this would be entirely incomprehensible and therefore 47 this would be entirely incomprehensible and therefore
48 pointless. 48 pointless.
49 49
50 Bunzip2 (or bzip2 -d ) decompresses and restores all spec- 50 bunzip2 (or bzip2 -d ) decompresses and restores all spec-
51 ified files whose names end in ".bz2". Files without this 51 ified files whose names end in ".bz2". Files without this
52 suffix are ignored. Again, supplying no filenames causes 52 suffix are ignored. Again, supplying no filenames causes
53 decompression from standard input to standard output. 53 decompression from standard input to standard output.
54 54
55 You can also compress or decompress files to the standard 55 bunzip2 will correctly decompress a file which is the con-
56 output by giving the -c flag. You can decompress multiple 56 catenation of two or more compressed files. The result is
57 files like this, but you may only compress a single file 57 the concatenation of the corresponding uncompressed files.
58 this way, since it would otherwise be difficult to sepa- 58 Integrity testing (-t) of concatenated compressed files is
59 rate out the compressed representations of the original 59 also supported.
60 files.
61
62
63
64 1
65
66
67
68
69
70bzip2(1) bzip2(1)
71 60
61 You can also compress or decompress files to the standard
62 output by giving the -c flag. Multiple files may be com-
63 pressed and decompressed like this. The resulting outputs
64 are fed sequentially to stdout. Compression of multiple
65 files in this manner generates a stream containing multi-
66 ple compressed file representations. Such a stream can be
67 decompressed correctly only by bzip2 version 0.9.0 or
68 later. Earlier versions of bzip2 will stop after decom-
69 pressing the first file in the stream.
70
71 bzcat (or bzip2 -dc ) decompresses all specified files to
72 the standard output.
72 73
73 Compression is always performed, even if the compressed 74 Compression is always performed, even if the compressed
74 file is slightly larger than the original. Files of less 75 file is slightly larger than the original. Files of less
@@ -108,13 +109,14 @@ MEMORY MANAGEMENT
108 file, and bunzip2 then allocates itself just enough memory 109 file, and bunzip2 then allocates itself just enough memory
109 to decompress the file. Since block sizes are stored in 110 to decompress the file. Since block sizes are stored in
110 compressed files, it follows that the flags -1 to -9 are 111 compressed files, it follows that the flags -1 to -9 are
111 irrelevant to and so ignored during decompression. Com- 112 irrelevant to and so ignored during decompression.
112 pression and decompression requirements, in bytes, can be 113
113 estimated as: 114 Compression and decompression requirements, in bytes, can
115 be estimated as:
114 116
115 Compression: 400k + ( 7 x block size ) 117 Compression: 400k + ( 7 x block size )
116 118
117 Decompression: 100k + ( 5 x block size ), or 119 Decompression: 100k + ( 4 x block size ), or
118 100k + ( 2.5 x block size ) 120 100k + ( 2.5 x block size )
119 121
120 Larger block sizes give rapidly diminishing marginal 122 Larger block sizes give rapidly diminishing marginal
@@ -125,19 +127,8 @@ MEMORY MANAGEMENT
125 requirement is set at compression-time by the choice of 127 requirement is set at compression-time by the choice of
126 block size. 128 block size.
127 129
128
129
130 2
131
132
133
134
135
136bzip2(1) bzip2(1)
137
138
139 For files compressed with the default 900k block size, 130 For files compressed with the default 900k block size,
140 bunzip2 will require about 4600 kbytes to decompress. To 131 bunzip2 will require about 3700 kbytes to decompress. To
141 support decompression of any file on a 4 megabyte machine, 132 support decompression of any file on a 4 megabyte machine,
142 bunzip2 has an option to decompress using approximately 133 bunzip2 has an option to decompress using approximately
143 half this amount of memory, about 2300 kbytes. Decompres- 134 half this amount of memory, about 2300 kbytes. Decompres-
@@ -157,8 +148,8 @@ bzip2(1) bzip2(1)
157 file 20,000 bytes long with the flag -9 will cause the 148 file 20,000 bytes long with the flag -9 will cause the
158 compressor to allocate around 6700k of memory, but only 149 compressor to allocate around 6700k of memory, but only
159 touch 400k + 20000 * 7 = 540 kbytes of it. Similarly, the 150 touch 400k + 20000 * 7 = 540 kbytes of it. Similarly, the
160 decompressor will allocate 4600k but only touch 100k + 151 decompressor will allocate 3700k but only touch 100k +
161 20000 * 5 = 200 kbytes. 152 20000 * 4 = 180 kbytes.
162 153
163 Here is a table which summarises the maximum memory usage 154 Here is a table which summarises the maximum memory usage
164 for different block sizes. Also recorded is the total 155 for different block sizes. Also recorded is the total
@@ -172,15 +163,15 @@ bzip2(1) bzip2(1)
172 Compress Decompress Decompress Corpus 163 Compress Decompress Decompress Corpus
173 Flag usage usage -s usage Size 164 Flag usage usage -s usage Size
174 165
175 -1 1100k 600k 350k 914704 166 -1 1100k 500k 350k 914704
176 -2 1800k 1100k 600k 877703 167 -2 1800k 900k 600k 877703
177 -3 2500k 1600k 850k 860338 168 -3 2500k 1300k 850k 860338
178 -4 3200k 2100k 1100k 846899 169 -4 3200k 1700k 1100k 846899
179 -5 3900k 2600k 1350k 845160 170 -5 3900k 2100k 1350k 845160
180 -6 4600k 3100k 1600k 838626 171 -6 4600k 2500k 1600k 838626
181 -7 5400k 3600k 1850k 834096 172 -7 5400k 2900k 1850k 834096
182 -8 6000k 4100k 2100k 828642 173 -8 6000k 3300k 2100k 828642
183 -9 6700k 4600k 2350k 828642 174 -9 6700k 3700k 2350k 828642
184 175
185 176
186OPTIONS 177OPTIONS
@@ -189,47 +180,37 @@ OPTIONS
189 decompress multiple files to stdout, but will only 180 decompress multiple files to stdout, but will only
190 compress a single file to stdout. 181 compress a single file to stdout.
191 182
192
193
194
195
196 3
197
198
199
200
201
202bzip2(1) bzip2(1)
203
204
205 -d --decompress 183 -d --decompress
206 Force decompression. Bzip2 and bunzip2 are really 184 Force decompression. bzip2, bunzip2 and bzcat are
207 the same program, and the decision about whether to 185 really the same program, and the decision about
208 compress or decompress is done on the basis of 186 what actions to take is done on the basis of which
209 which name is used. This flag overrides that mech- 187 name is used. This flag overrides that mechanism,
210 anism, and forces bzip2 to decompress. 188 and forces bzip2 to decompress.
211 189
212 -f --compress 190 -z --compress
213 The complement to -d: forces compression, regard- 191 The complement to -d: forces compression, regard-
214 less of the invokation name. 192 less of the invokation name.
215 193
216 -t --test 194 -t --test
217 Check integrity of the specified file(s), but don't 195 Check integrity of the specified file(s), but don't
218 decompress them. This really performs a trial 196 decompress them. This really performs a trial
219 decompression and throws away the result, using the 197 decompression and throws away the result.
220 low-memory decompression algorithm (see -s). 198
199 -f --force
200 Force overwrite of output files. Normally, bzip2
201 will not overwrite existing output files.
221 202
222 -k --keep 203 -k --keep
223 Keep (don't delete) input files during compression 204 Keep (don't delete) input files during compression
224 or decompression. 205 or decompression.
225 206
226 -s --small 207 -s --small
227 Reduce memory usage, both for compression and 208 Reduce memory usage, for compression, decompression
228 decompression. Files are decompressed using a mod- 209 and testing. Files are decompressed and tested
229 ified algorithm which only requires 2.5 bytes per 210 using a modified algorithm which only requires 2.5
230 block byte. This means any file can be decom- 211 bytes per block byte. This means any file can be
231 pressed in 2300k of memory, albeit somewhat more 212 decompressed in 2300k of memory, albeit at about
232 slowly than usual. 213 half the normal speed.
233 214
234 During compression, -s selects a block size of 215 During compression, -s selects a block size of
235 200k, which limits memory use to around the same 216 200k, which limits memory use to around the same
@@ -238,36 +219,21 @@ bzip2(1) bzip2(1)
238 megabytes or less), use -s for everything. See 219 megabytes or less), use -s for everything. See
239 MEMORY MANAGEMENT above. 220 MEMORY MANAGEMENT above.
240 221
241
242 -v --verbose 222 -v --verbose
243 Verbose mode -- show the compression ratio for each 223 Verbose mode -- show the compression ratio for each
244 file processed. Further -v's increase the ver- 224 file processed. Further -v's increase the ver-
245 bosity level, spewing out lots of information which 225 bosity level, spewing out lots of information which
246 is primarily of interest for diagnostic purposes. 226 is primarily of interest for diagnostic purposes.
247 227
248 -L --license 228 -L --license -V --version
249 Display the software version, license terms and 229 Display the software version, license terms and
250 conditions. 230 conditions.
251 231
252 -V --version
253 Same as -L.
254
255 -1 to -9 232 -1 to -9
256 Set the block size to 100 k, 200 k .. 900 k when 233 Set the block size to 100 k, 200 k .. 900 k when
257 compressing. Has no effect when decompressing. 234 compressing. Has no effect when decompressing.
258 See MEMORY MANAGEMENT above. 235 See MEMORY MANAGEMENT above.
259 236
260
261
262 4
263
264
265
266
267
268bzip2(1) bzip2(1)
269
270
271 --repetitive-fast 237 --repetitive-fast
272 bzip2 injects some small pseudo-random variations 238 bzip2 injects some small pseudo-random variations
273 into very repetitive blocks to limit worst-case 239 into very repetitive blocks to limit worst-case
@@ -278,7 +244,6 @@ bzip2(1) bzip2(1)
278 would take before resorting to randomisation. This 244 would take before resorting to randomisation. This
279 flag makes it give up much sooner. 245 flag makes it give up much sooner.
280 246
281
282 --repetitive-best 247 --repetitive-best
283 Opposite of --repetitive-fast; try a lot harder 248 Opposite of --repetitive-fast; try a lot harder
284 before resorting to randomisation. 249 before resorting to randomisation.
@@ -306,10 +271,10 @@ RECOVERING DATA FROM DAMAGED FILES
306 bzip2recover takes a single argument, the name of the dam- 271 bzip2recover takes a single argument, the name of the dam-
307 aged file, and writes a number of files "rec0001file.bz2", 272 aged file, and writes a number of files "rec0001file.bz2",
308 "rec0002file.bz2", etc, containing the extracted blocks. 273 "rec0002file.bz2", etc, containing the extracted blocks.
309 The output filenames are designed so that the use of wild- 274 The output filenames are designed so that the use of
310 cards in subsequent processing -- for example, "bzip2 -dc 275 wildcards in subsequent processing -- for example, "bzip2
311 rec*file.bz2 > recovered_data" -- lists the files in the 276 -dc rec*file.bz2 > recovered_data" -- lists the files in
312 "right" order. 277 the "right" order.
313 278
314 bzip2recover should be of most use dealing with large .bz2 279 bzip2recover should be of most use dealing with large .bz2
315 files, as these will contain many blocks. It is clearly 280 files, as these will contain many blocks. It is clearly
@@ -322,18 +287,6 @@ RECOVERING DATA FROM DAMAGED FILES
322 287
323PERFORMANCE NOTES 288PERFORMANCE NOTES
324 The sorting phase of compression gathers together similar 289 The sorting phase of compression gathers together similar
325
326
327
328 5
329
330
331
332
333
334bzip2(1) bzip2(1)
335
336
337 strings in the file. Because of this, files containing 290 strings in the file. Because of this, files containing
338 very long runs of repeated symbols, like "aabaabaabaab 291 very long runs of repeated symbols, like "aabaabaabaab
339 ..." (repeated several hundred times) may compress 292 ..." (repeated several hundred times) may compress
@@ -348,10 +301,6 @@ bzip2(1) bzip2(1)
348 severe slowness in compression, try making the block size 301 severe slowness in compression, try making the block size
349 as small as possible, with flag -1. 302 as small as possible, with flag -1.
350 303
351 Incompressible or virtually-incompressible data may decom-
352 press rather more slowly than one would hope. This is due
353 to a naive implementation of the move-to-front coder.
354
355 bzip2 usually allocates several megabytes of memory to 304 bzip2 usually allocates several megabytes of memory to
356 operate in, and then charges all over it in a fairly ran- 305 operate in, and then charges all over it in a fairly ran-
357 dom fashion. This means that performance, both for com- 306 dom fashion. This means that performance, both for com-
@@ -362,12 +311,6 @@ bzip2(1) bzip2(1)
362 large performance improvements. I imagine bzip2 will per- 311 large performance improvements. I imagine bzip2 will per-
363 form best on machines with very large caches. 312 form best on machines with very large caches.
364 313
365 Test mode (-t) uses the low-memory decompression algorithm
366 (-s). This means test mode does not run as fast as it
367 could; it could run as fast as the normal decompression
368 machinery. This could easily be fixed at the cost of some
369 code bloat.
370
371 314
372CAVEATS 315CAVEATS
373 I/O error messages are not as helpful as they could be. 316 I/O error messages are not as helpful as they could be.
@@ -375,91 +318,38 @@ CAVEATS
375 but the details of what the problem is sometimes seem 318 but the details of what the problem is sometimes seem
376 rather misleading. 319 rather misleading.
377 320
378 This manual page pertains to version 0.1 of bzip2. It may 321 This manual page pertains to version 0.9.0 of bzip2. Com-
379 well happen that some future version will use a different 322 pressed data created by this version is entirely forwards
380 compressed file format. If you try to decompress, using 323 and backwards compatible with the previous public release,
381 0.1, a .bz2 file created with some future version which 324 version 0.1pl2, but with the following exception: 0.9.0
382 uses a different compressed file format, 0.1 will complain 325 can correctly decompress multiple concatenated compressed
383 that your file "is not a bzip2 file". If that happens, 326 files. 0.1pl2 cannot do this; it will stop after decom-
384 you should obtain a more recent version of bzip2 and use 327 pressing just the first file in the stream.
385 that to decompress the file.
386 328
387 Wildcard expansion for Windows 95 and NT is flaky. 329 Wildcard expansion for Windows 95 and NT is flaky.
388 330
389 bzip2recover uses 32-bit integers to represent bit posi- 331 bzip2recover uses 32-bit integers to represent bit posi-
390 tions in compressed files, so it cannot handle compressed 332 tions in compressed files, so it cannot handle compressed
391 333 files more than 512 megabytes long. This could easily be
392
393
394 6
395
396
397
398
399
400bzip2(1) bzip2(1)
401
402
403 files more than 512 megabytes long. This could easily be
404 fixed. 334 fixed.
405 335
406 bzip2recover sometimes reports a very small, incomplete
407 final block. This is spurious and can be safely ignored.
408
409
410RELATIONSHIP TO bzip-0.21
411 This program is a descendant of the bzip program, version
412 0.21, which I released in August 1996. The primary dif-
413 ference of bzip2 is its avoidance of the possibly patented
414 algorithms which were used in 0.21. bzip2 also brings
415 various useful refinements (-s, -t), uses less memory,
416 decompresses significantly faster, and has support for
417 recovering data from damaged files.
418
419 Because bzip2 uses Huffman coding to construct the com-
420 pressed bitstream, rather than the arithmetic coding used
421 in 0.21, the compressed representations generated by the
422 two programs are incompatible, and they will not interop-
423 erate. The change in suffix from .bz to .bz2 reflects
424 this. It would have been helpful to at least allow bzip2
425 to decompress files created by 0.21, but this would defeat
426 the primary aim of having a patent-free compressor.
427
428 For a more precise statement about patent issues in bzip2,
429 please see the README file in the distribution.
430
431 Huffman coding necessarily involves some coding ineffi-
432 ciency compared to arithmetic coding. This means that
433 bzip2 compresses about 1% worse than 0.21, an unfortunate
434 but unavoidable fact-of-life. On the other hand, decom-
435 pression is approximately 50% faster for the same reason,
436 and the change in file format gave an opportunity to add
437 data-recovery features. So it is not all bad.
438
439 336
440AUTHOR 337AUTHOR
441 Julian Seward, jseward@acm.org. 338 Julian Seward, jseward@acm.org.
442 339 http://www.muraroa.demon.co.uk
443 The ideas embodied in bzip and bzip2 are due to (at least) 340
444 the following people: Michael Burrows and David Wheeler 341 The ideas embodied in bzip2 are due to (at least) the fol-
445 (for the block sorting transformation), David Wheeler 342 lowing people: Michael Burrows and David Wheeler (for the
446 (again, for the Huffman coder), Peter Fenwick (for the 343 block sorting transformation), David Wheeler (again, for
447 structured coding model in 0.21, and many refinements), 344 the Huffman coder), Peter Fenwick (for the structured cod-
448 and Alistair Moffat, Radford Neal and Ian Witten (for the 345 ing model in the original bzip, and many refinements), and
449 arithmetic coder in 0.21). I am much indebted for their 346 Alistair Moffat, Radford Neal and Ian Witten (for the
450 help, support and advice. See the file ALGORITHMS in the 347 arithmetic coder in the original bzip). I am much
451 source distribution for pointers to sources of documenta- 348 indebted for their help, support and advice. See the man-
452 tion. Christian von Roques encouraged me to look for 349 ual in the source distribution for pointers to sources of
453 faster sorting algorithms, so as to speed up compression. 350 documentation. Christian von Roques encouraged me to look
454 Bela Lubkin encouraged me to improve the worst-case com- 351 for faster sorting algorithms, so as to speed up compres-
455 pression performance. Many people sent patches, helped 352 sion. Bela Lubkin encouraged me to improve the worst-case
353 compression performance. Many people sent patches, helped
456 with portability problems, lent machines, gave advice and 354 with portability problems, lent machines, gave advice and
457 were generally helpful. 355 were generally helpful.
458
459
460
461
462
463 7
464
465
diff --git a/bzip2recover.c b/bzip2recover.c
index 0eef0e6..0e2822b 100644
--- a/bzip2recover.c
+++ b/bzip2recover.c
@@ -7,43 +7,63 @@
7/*-- 7/*--
8 This program is bzip2recover, a program to attempt data 8 This program is bzip2recover, a program to attempt data
9 salvage from damaged files created by the accompanying 9 salvage from damaged files created by the accompanying
10 bzip2-0.1 program. 10 bzip2-0.9.0c program.
11 11
12 Copyright (C) 1996, 1997 by Julian Seward. 12 Copyright (C) 1996-1998 Julian R Seward. All rights reserved.
13 Guildford, Surrey, UK 13
14 email: jseward@acm.org 14 Redistribution and use in source and binary forms, with or without
15 15 modification, are permitted provided that the following conditions
16 This program is free software; you can redistribute it and/or modify 16 are met:
17 it under the terms of the GNU General Public License as published by 17
18 the Free Software Foundation; either version 2 of the License, or 18 1. Redistributions of source code must retain the above copyright
19 (at your option) any later version. 19 notice, this list of conditions and the following disclaimer.
20 20
21 This program is distributed in the hope that it will be useful, 21 2. The origin of this software must not be misrepresented; you must
22 but WITHOUT ANY WARRANTY; without even the implied warranty of 22 not claim that you wrote the original software. If you use this
23 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 23 software in a product, an acknowledgment in the product
24 GNU General Public License for more details. 24 documentation would be appreciated but is not required.
25 25
26 You should have received a copy of the GNU General Public License 26 3. Altered source versions must be plainly marked as such, and must
27 along with this program; if not, write to the Free Software 27 not be misrepresented as being the original software.
28 Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. 28
29 29 4. The name of the author may not be used to endorse or promote
30 The GNU General Public License is contained in the file LICENSE. 30 products derived from this software without specific prior written
31 permission.
32
33 THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
34 OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
35 WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
36 ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
37 DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
38 DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
39 GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
40 INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
41 WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
42 NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
43 SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
44
45 Julian Seward, Guildford, Surrey, UK.
46 jseward@acm.org
47 bzip2/libbzip2 version 0.9.0c of 18 October 1998
31--*/ 48--*/
32 49
50/*--
51 This program is a complete hack and should be rewritten
52 properly. It isn't very complicated.
53--*/
33 54
34#include <stdio.h> 55#include <stdio.h>
35#include <errno.h> 56#include <errno.h>
36#include <malloc.h>
37#include <stdlib.h> 57#include <stdlib.h>
38#include <strings.h> /*-- or try string.h --*/ 58#include <string.h>
39 59
40#define UInt32 unsigned int 60typedef unsigned int UInt32;
41#define Int32 int 61typedef int Int32;
42#define UChar unsigned char 62typedef unsigned char UChar;
43#define Char char 63typedef char Char;
44#define Bool unsigned char 64typedef unsigned char Bool;
45#define True 1 65#define True ((Bool)1)
46#define False 0 66#define False ((Bool)0)
47 67
48 68
49Char inFileName[2000]; 69Char inFileName[2000];
@@ -191,8 +211,9 @@ void bsClose ( BitStream* bs )
191 if (retVal == EOF) writeError(); 211 if (retVal == EOF) writeError();
192 } 212 }
193 retVal = fclose ( bs->handle ); 213 retVal = fclose ( bs->handle );
194 if (retVal == EOF) 214 if (retVal == EOF) {
195 if (bs->mode == 'w') writeError(); else readError(); 215 if (bs->mode == 'w') writeError(); else readError();
216 }
196 free ( bs ); 217 free ( bs );
197} 218}
198 219
@@ -248,13 +269,19 @@ Int32 main ( Int32 argc, Char** argv )
248 UInt32 bitsRead; 269 UInt32 bitsRead;
249 UInt32 bStart[20000]; 270 UInt32 bStart[20000];
250 UInt32 bEnd[20000]; 271 UInt32 bEnd[20000];
272
273 UInt32 rbStart[20000];
274 UInt32 rbEnd[20000];
275 Int32 rbCtr;
276
277
251 UInt32 buffHi, buffLo, blockCRC; 278 UInt32 buffHi, buffLo, blockCRC;
252 Char* p; 279 Char* p;
253 280
254 strcpy ( progName, argv[0] ); 281 strcpy ( progName, argv[0] );
255 inFileName[0] = outFileName[0] = 0; 282 inFileName[0] = outFileName[0] = 0;
256 283
257 fprintf ( stderr, "bzip2recover: extracts blocks from damaged .bz2 files.\n" ); 284 fprintf ( stderr, "bzip2recover v0.9.0c: extracts blocks from damaged .bz2 files.\n" );
258 285
259 if (argc != 2) { 286 if (argc != 2) {
260 fprintf ( stderr, "%s: usage is `%s damaged_file_name'.\n", 287 fprintf ( stderr, "%s: usage is `%s damaged_file_name'.\n",
@@ -278,6 +305,8 @@ Int32 main ( Int32 argc, Char** argv )
278 currBlock = 0; 305 currBlock = 0;
279 bStart[currBlock] = 0; 306 bStart[currBlock] = 0;
280 307
308 rbCtr = 0;
309
281 while (True) { 310 while (True) {
282 b = bsGetBit ( bsIn ); 311 b = bsGetBit ( bsIn );
283 bitsRead++; 312 bitsRead++;
@@ -303,19 +332,25 @@ Int32 main ( Int32 argc, Char** argv )
303 if (bitsRead > 49) 332 if (bitsRead > 49)
304 bEnd[currBlock] = bitsRead-49; else 333 bEnd[currBlock] = bitsRead-49; else
305 bEnd[currBlock] = 0; 334 bEnd[currBlock] = 0;
306 if (currBlock > 0) 335 if (currBlock > 0 &&
336 (bEnd[currBlock] - bStart[currBlock]) >= 130) {
307 fprintf ( stderr, " block %d runs from %d to %d\n", 337 fprintf ( stderr, " block %d runs from %d to %d\n",
308 currBlock, bStart[currBlock], bEnd[currBlock] ); 338 rbCtr+1, bStart[currBlock], bEnd[currBlock] );
339 rbStart[rbCtr] = bStart[currBlock];
340 rbEnd[rbCtr] = bEnd[currBlock];
341 rbCtr++;
342 }
309 currBlock++; 343 currBlock++;
344
310 bStart[currBlock] = bitsRead; 345 bStart[currBlock] = bitsRead;
311 } 346 }
312 } 347 }
313 348
314 bsClose ( bsIn ); 349 bsClose ( bsIn );
315 350
316 /*-- identified blocks run from 1 to currBlock inclusive. --*/ 351 /*-- identified blocks run from 1 to rbCtr inclusive. --*/
317 352
318 if (currBlock < 1) { 353 if (rbCtr < 1) {
319 fprintf ( stderr, 354 fprintf ( stderr,
320 "%s: sorry, I couldn't find any block boundaries.\n", 355 "%s: sorry, I couldn't find any block boundaries.\n",
321 progName ); 356 progName );
@@ -336,23 +371,23 @@ Int32 main ( Int32 argc, Char** argv )
336 371
337 bitsRead = 0; 372 bitsRead = 0;
338 outFile = NULL; 373 outFile = NULL;
339 wrBlock = 1; 374 wrBlock = 0;
340 while (True) { 375 while (True) {
341 b = bsGetBit(bsIn); 376 b = bsGetBit(bsIn);
342 if (b == 2) break; 377 if (b == 2) break;
343 buffHi = (buffHi << 1) | (buffLo >> 31); 378 buffHi = (buffHi << 1) | (buffLo >> 31);
344 buffLo = (buffLo << 1) | (b & 1); 379 buffLo = (buffLo << 1) | (b & 1);
345 if (bitsRead == 47+bStart[wrBlock]) 380 if (bitsRead == 47+rbStart[wrBlock])
346 blockCRC = (buffHi << 16) | (buffLo >> 16); 381 blockCRC = (buffHi << 16) | (buffLo >> 16);
347 382
348 if (outFile != NULL && bitsRead >= bStart[wrBlock] 383 if (outFile != NULL && bitsRead >= rbStart[wrBlock]
349 && bitsRead <= bEnd[wrBlock]) { 384 && bitsRead <= rbEnd[wrBlock]) {
350 bsPutBit ( bsWr, b ); 385 bsPutBit ( bsWr, b );
351 } 386 }
352 387
353 bitsRead++; 388 bitsRead++;
354 389
355 if (bitsRead == bEnd[wrBlock]+1) { 390 if (bitsRead == rbEnd[wrBlock]+1) {
356 if (outFile != NULL) { 391 if (outFile != NULL) {
357 bsPutUChar ( bsWr, 0x17 ); bsPutUChar ( bsWr, 0x72 ); 392 bsPutUChar ( bsWr, 0x17 ); bsPutUChar ( bsWr, 0x72 );
358 bsPutUChar ( bsWr, 0x45 ); bsPutUChar ( bsWr, 0x38 ); 393 bsPutUChar ( bsWr, 0x45 ); bsPutUChar ( bsWr, 0x38 );
@@ -360,18 +395,18 @@ Int32 main ( Int32 argc, Char** argv )
360 bsPutUInt32 ( bsWr, blockCRC ); 395 bsPutUInt32 ( bsWr, blockCRC );
361 bsClose ( bsWr ); 396 bsClose ( bsWr );
362 } 397 }
363 if (wrBlock >= currBlock) break; 398 if (wrBlock >= rbCtr) break;
364 wrBlock++; 399 wrBlock++;
365 } else 400 } else
366 if (bitsRead == bStart[wrBlock]) { 401 if (bitsRead == rbStart[wrBlock]) {
367 outFileName[0] = 0; 402 outFileName[0] = 0;
368 sprintf ( outFileName, "rec%4d", wrBlock ); 403 sprintf ( outFileName, "rec%4d", wrBlock+1 );
369 for (p = outFileName; *p != 0; p++) if (*p == ' ') *p = '0'; 404 for (p = outFileName; *p != 0; p++) if (*p == ' ') *p = '0';
370 strcat ( outFileName, inFileName ); 405 strcat ( outFileName, inFileName );
371 if ( !endsInBz2(outFileName)) strcat ( outFileName, ".bz2" ); 406 if ( !endsInBz2(outFileName)) strcat ( outFileName, ".bz2" );
372 407
373 fprintf ( stderr, " writing block %d to `%s' ...\n", 408 fprintf ( stderr, " writing block %d to `%s' ...\n",
374 wrBlock, outFileName ); 409 wrBlock+1, outFileName );
375 410
376 outFile = fopen ( outFileName, "wb" ); 411 outFile = fopen ( outFileName, "wb" );
377 if (outFile == NULL) { 412 if (outFile == NULL) {
diff --git a/bzlib.c b/bzlib.c
new file mode 100644
index 0000000..362e8ff
--- /dev/null
+++ b/bzlib.c
@@ -0,0 +1,1512 @@
1
2/*-------------------------------------------------------------*/
3/*--- Library top-level functions. ---*/
4/*--- bzlib.c ---*/
5/*-------------------------------------------------------------*/
6
7/*--
8 This file is a part of bzip2 and/or libbzip2, a program and
9 library for lossless, block-sorting data compression.
10
11 Copyright (C) 1996-1998 Julian R Seward. All rights reserved.
12
13 Redistribution and use in source and binary forms, with or without
14 modification, are permitted provided that the following conditions
15 are met:
16
17 1. Redistributions of source code must retain the above copyright
18 notice, this list of conditions and the following disclaimer.
19
20 2. The origin of this software must not be misrepresented; you must
21 not claim that you wrote the original software. If you use this
22 software in a product, an acknowledgment in the product
23 documentation would be appreciated but is not required.
24
25 3. Altered source versions must be plainly marked as such, and must
26 not be misrepresented as being the original software.
27
28 4. The name of the author may not be used to endorse or promote
29 products derived from this software without specific prior written
30 permission.
31
32 THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
33 OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
34 WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
35 ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
36 DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
37 DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
38 GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
39 INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
40 WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
41 NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
42 SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
43
44 Julian Seward, Guildford, Surrey, UK.
45 jseward@acm.org
46 bzip2/libbzip2 version 0.9.0c of 18 October 1998
47
48 This program is based on (at least) the work of:
49 Mike Burrows
50 David Wheeler
51 Peter Fenwick
52 Alistair Moffat
53 Radford Neal
54 Ian H. Witten
55 Robert Sedgewick
56 Jon L. Bentley
57
58 For more information on these sources, see the manual.
59--*/
60
61/*--
62 CHANGES
63 ~~~~~~~
64 0.9.0 -- original version.
65
66 0.9.0a/b -- no changes in this file.
67
68 0.9.0c
69 * made zero-length BZ_FLUSH work correctly in bzCompress().
70 * fixed bzWrite/bzRead to ignore zero-length requests.
71 * fixed bzread to correctly handle read requests after EOF.
72 * wrong parameter order in call to bzDecompressInit in
73 bzBuffToBuffDecompress. Fixed.
74--*/
75
76#include "bzlib_private.h"
77
78
79/*---------------------------------------------------*/
80/*--- Compression stuff ---*/
81/*---------------------------------------------------*/
82
83
84/*---------------------------------------------------*/
85#ifndef BZ_NO_STDIO
86void bz__AssertH__fail ( int errcode )
87{
88 fprintf(stderr,
89 "\n\nbzip2/libbzip2, v0.9.0c: internal error number %d.\n"
90 "This is a bug in bzip2/libbzip2, v0.9.0c. Please report\n"
91 "it to me at: jseward@acm.org. If this happened when\n"
92 "you were using some program which uses libbzip2 as a\n"
93 "component, you should also report this bug to the author(s)\n"
94 "of that program. Please make an effort to report this bug;\n"
95 "timely and accurate bug reports eventually lead to higher\n"
96 "quality software. Thx. Julian Seward, 18 October 1998.\n\n",
97 errcode
98 );
99 exit(3);
100}
101#endif
102
103
104/*---------------------------------------------------*/
105static
106void* default_bzalloc ( void* opaque, Int32 items, Int32 size )
107{
108 void* v = malloc ( items * size );
109 return v;
110}
111
112static
113void default_bzfree ( void* opaque, void* addr )
114{
115 if (addr != NULL) free ( addr );
116}
117
118
119/*---------------------------------------------------*/
120static
121void prepare_new_block ( EState* s )
122{
123 Int32 i;
124 s->nblock = 0;
125 s->numZ = 0;
126 s->state_out_pos = 0;
127 BZ_INITIALISE_CRC ( s->blockCRC );
128 for (i = 0; i < 256; i++) s->inUse[i] = False;
129 s->blockNo++;
130}
131
132
133/*---------------------------------------------------*/
134static
135void init_RL ( EState* s )
136{
137 s->state_in_ch = 256;
138 s->state_in_len = 0;
139}
140
141
142static
143Bool isempty_RL ( EState* s )
144{
145 if (s->state_in_ch < 256 && s->state_in_len > 0)
146 return False; else
147 return True;
148}
149
150
151/*---------------------------------------------------*/
152int BZ_API(bzCompressInit)
153 ( bz_stream* strm,
154 int blockSize100k,
155 int verbosity,
156 int workFactor )
157{
158 Int32 n;
159 EState* s;
160
161 if (strm == NULL ||
162 blockSize100k < 1 || blockSize100k > 9 ||
163 workFactor < 0 || workFactor > 250)
164 return BZ_PARAM_ERROR;
165
166 if (workFactor == 0) workFactor = 30;
167 if (strm->bzalloc == NULL) strm->bzalloc = default_bzalloc;
168 if (strm->bzfree == NULL) strm->bzfree = default_bzfree;
169
170 s = BZALLOC( sizeof(EState) );
171 if (s == NULL) return BZ_MEM_ERROR;
172 s->strm = strm;
173
174 s->block = NULL;
175 s->quadrant = NULL;
176 s->zptr = NULL;
177 s->ftab = NULL;
178
179 n = 100000 * blockSize100k;
180 s->block = BZALLOC( (n + BZ_NUM_OVERSHOOT_BYTES) * sizeof(UChar) );
181 s->quadrant = BZALLOC( (n + BZ_NUM_OVERSHOOT_BYTES) * sizeof(Int16) );
182 s->zptr = BZALLOC( n * sizeof(Int32) );
183 s->ftab = BZALLOC( 65537 * sizeof(Int32) );
184
185 if (s->block == NULL || s->quadrant == NULL ||
186 s->zptr == NULL || s->ftab == NULL) {
187 if (s->block != NULL) BZFREE(s->block);
188 if (s->quadrant != NULL) BZFREE(s->quadrant);
189 if (s->zptr != NULL) BZFREE(s->zptr);
190 if (s->ftab != NULL) BZFREE(s->ftab);
191 if (s != NULL) BZFREE(s);
192 return BZ_MEM_ERROR;
193 }
194
195 s->szptr = (UInt16*)(s->zptr);
196
197 s->blockNo = 0;
198 s->state = BZ_S_INPUT;
199 s->mode = BZ_M_RUNNING;
200 s->combinedCRC = 0;
201 s->blockSize100k = blockSize100k;
202 s->nblockMAX = 100000 * blockSize100k - 19;
203 s->verbosity = verbosity;
204 s->workFactor = workFactor;
205 s->nBlocksRandomised = 0;
206 strm->state = s;
207 strm->total_in = 0;
208 strm->total_out = 0;
209 init_RL ( s );
210 prepare_new_block ( s );
211 return BZ_OK;
212}
213
214
215/*---------------------------------------------------*/
216static
217void add_pair_to_block ( EState* s )
218{
219 Int32 i;
220 UChar ch = (UChar)(s->state_in_ch);
221 for (i = 0; i < s->state_in_len; i++) {
222 BZ_UPDATE_CRC( s->blockCRC, ch );
223 }
224 s->inUse[s->state_in_ch] = True;
225 switch (s->state_in_len) {
226 case 1:
227 s->block[s->nblock] = (UChar)ch; s->nblock++;
228 break;
229 case 2:
230 s->block[s->nblock] = (UChar)ch; s->nblock++;
231 s->block[s->nblock] = (UChar)ch; s->nblock++;
232 break;
233 case 3:
234 s->block[s->nblock] = (UChar)ch; s->nblock++;
235 s->block[s->nblock] = (UChar)ch; s->nblock++;
236 s->block[s->nblock] = (UChar)ch; s->nblock++;
237 break;
238 default:
239 s->inUse[s->state_in_len-4] = True;
240 s->block[s->nblock] = (UChar)ch; s->nblock++;
241 s->block[s->nblock] = (UChar)ch; s->nblock++;
242 s->block[s->nblock] = (UChar)ch; s->nblock++;
243 s->block[s->nblock] = (UChar)ch; s->nblock++;
244 s->block[s->nblock] = (UChar)(s->state_in_len-4);
245 s->nblock++;
246 break;
247 }
248}
249
250
251/*---------------------------------------------------*/
252static
253void flush_RL ( EState* s )
254{
255 if (s->state_in_ch < 256) add_pair_to_block ( s );
256 init_RL ( s );
257}
258
259
260/*---------------------------------------------------*/
261#define ADD_CHAR_TO_BLOCK(zs,zchh0) \
262{ \
263 UInt32 zchh = (UInt32)(zchh0); \
264 /*-- fast track the common case --*/ \
265 if (zchh != zs->state_in_ch && \
266 zs->state_in_len == 1) { \
267 UChar ch = (UChar)(zs->state_in_ch); \
268 BZ_UPDATE_CRC( zs->blockCRC, ch ); \
269 zs->inUse[zs->state_in_ch] = True; \
270 zs->block[zs->nblock] = (UChar)ch; \
271 zs->nblock++; \
272 zs->state_in_ch = zchh; \
273 } \
274 else \
275 /*-- general, uncommon cases --*/ \
276 if (zchh != zs->state_in_ch || \
277 zs->state_in_len == 255) { \
278 if (zs->state_in_ch < 256) \
279 add_pair_to_block ( zs ); \
280 zs->state_in_ch = zchh; \
281 zs->state_in_len = 1; \
282 } else { \
283 zs->state_in_len++; \
284 } \
285}
286
287
288/*---------------------------------------------------*/
289static
290Bool copy_input_until_stop ( EState* s )
291{
292 Bool progress_in = False;
293
294 if (s->mode == BZ_M_RUNNING) {
295
296 /*-- fast track the common case --*/
297 while (True) {
298 /*-- block full? --*/
299 if (s->nblock >= s->nblockMAX) break;
300 /*-- no input? --*/
301 if (s->strm->avail_in == 0) break;
302 progress_in = True;
303 ADD_CHAR_TO_BLOCK ( s, (UInt32)(*((UChar*)(s->strm->next_in))) );
304 s->strm->next_in++;
305 s->strm->avail_in--;
306 s->strm->total_in++;
307 }
308
309 } else {
310
311 /*-- general, uncommon case --*/
312 while (True) {
313 /*-- block full? --*/
314 if (s->nblock >= s->nblockMAX) break;
315 /*-- no input? --*/
316 if (s->strm->avail_in == 0) break;
317 /*-- flush/finish end? --*/
318 if (s->avail_in_expect == 0) break;
319 progress_in = True;
320 ADD_CHAR_TO_BLOCK ( s, (UInt32)(*((UChar*)(s->strm->next_in))) );
321 s->strm->next_in++;
322 s->strm->avail_in--;
323 s->strm->total_in++;
324 s->avail_in_expect--;
325 }
326 }
327 return progress_in;
328}
329
330
331/*---------------------------------------------------*/
332static
333Bool copy_output_until_stop ( EState* s )
334{
335 Bool progress_out = False;
336
337 while (True) {
338
339 /*-- no output space? --*/
340 if (s->strm->avail_out == 0) break;
341
342 /*-- block done? --*/
343 if (s->state_out_pos >= s->numZ) break;
344
345 progress_out = True;
346 *(s->strm->next_out) = ((UChar*)(s->quadrant))[s->state_out_pos];
347 s->state_out_pos++;
348 s->strm->avail_out--;
349 s->strm->next_out++;
350 s->strm->total_out++;
351
352 }
353
354 return progress_out;
355}
356
357
358/*---------------------------------------------------*/
359static
360Bool handle_compress ( bz_stream* strm )
361{
362 Bool progress_in = False;
363 Bool progress_out = False;
364 EState* s = strm->state;
365
366 while (True) {
367
368 if (s->state == BZ_S_OUTPUT) {
369 progress_out |= copy_output_until_stop ( s );
370 if (s->state_out_pos < s->numZ) break;
371 if (s->mode == BZ_M_FINISHING &&
372 s->avail_in_expect == 0 &&
373 isempty_RL(s)) break;
374 prepare_new_block ( s );
375 s->state = BZ_S_INPUT;
376 if (s->mode == BZ_M_FLUSHING &&
377 s->avail_in_expect == 0 &&
378 isempty_RL(s)) break;
379 }
380
381 if (s->state == BZ_S_INPUT) {
382 progress_in |= copy_input_until_stop ( s );
383 if (s->mode != BZ_M_RUNNING && s->avail_in_expect == 0) {
384 flush_RL ( s );
385 compressBlock ( s, s->mode == BZ_M_FINISHING );
386 s->state = BZ_S_OUTPUT;
387 }
388 else
389 if (s->nblock >= s->nblockMAX) {
390 compressBlock ( s, False );
391 s->state = BZ_S_OUTPUT;
392 }
393 else
394 if (s->strm->avail_in == 0) {
395 break;
396 }
397 }
398
399 }
400
401 return progress_in || progress_out;
402}
403
404
405/*---------------------------------------------------*/
406int BZ_API(bzCompress) ( bz_stream *strm, int action )
407{
408 Bool progress;
409 EState* s;
410 if (strm == NULL) return BZ_PARAM_ERROR;
411 s = strm->state;
412 if (s == NULL) return BZ_PARAM_ERROR;
413 if (s->strm != strm) return BZ_PARAM_ERROR;
414
415 preswitch:
416 switch (s->mode) {
417
418 case BZ_M_IDLE:
419 return BZ_SEQUENCE_ERROR;
420
421 case BZ_M_RUNNING:
422 if (action == BZ_RUN) {
423 progress = handle_compress ( strm );
424 return progress ? BZ_RUN_OK : BZ_PARAM_ERROR;
425 }
426 else
427 if (action == BZ_FLUSH) {
428 s->avail_in_expect = strm->avail_in;
429 s->mode = BZ_M_FLUSHING;
430 goto preswitch;
431 }
432 else
433 if (action == BZ_FINISH) {
434 s->avail_in_expect = strm->avail_in;
435 s->mode = BZ_M_FINISHING;
436 goto preswitch;
437 }
438 else
439 return BZ_PARAM_ERROR;
440
441 case BZ_M_FLUSHING:
442 if (action != BZ_FLUSH) return BZ_SEQUENCE_ERROR;
443 if (s->avail_in_expect != s->strm->avail_in) return BZ_SEQUENCE_ERROR;
444 progress = handle_compress ( strm );
445 if (s->avail_in_expect > 0 || !isempty_RL(s) ||
446 s->state_out_pos < s->numZ) return BZ_FLUSH_OK;
447 s->mode = BZ_M_RUNNING;
448 return BZ_RUN_OK;
449
450 case BZ_M_FINISHING:
451 if (action != BZ_FINISH) return BZ_SEQUENCE_ERROR;
452 if (s->avail_in_expect != s->strm->avail_in) return BZ_SEQUENCE_ERROR;
453 progress = handle_compress ( strm );
454 if (!progress) return BZ_SEQUENCE_ERROR;
455 if (s->avail_in_expect > 0 || !isempty_RL(s) ||
456 s->state_out_pos < s->numZ) return BZ_FINISH_OK;
457 s->mode = BZ_M_IDLE;
458 return BZ_STREAM_END;
459 }
460 return BZ_OK; /*--not reached--*/
461}
462
463
464/*---------------------------------------------------*/
465int BZ_API(bzCompressEnd) ( bz_stream *strm )
466{
467 EState* s;
468 if (strm == NULL) return BZ_PARAM_ERROR;
469 s = strm->state;
470 if (s == NULL) return BZ_PARAM_ERROR;
471 if (s->strm != strm) return BZ_PARAM_ERROR;
472
473 if (s->block != NULL) BZFREE(s->block);
474 if (s->quadrant != NULL) BZFREE(s->quadrant);
475 if (s->zptr != NULL) BZFREE(s->zptr);
476 if (s->ftab != NULL) BZFREE(s->ftab);
477 BZFREE(strm->state);
478
479 strm->state = NULL;
480
481 return BZ_OK;
482}
483
484
485/*---------------------------------------------------*/
486/*--- Decompression stuff ---*/
487/*---------------------------------------------------*/
488
489/*---------------------------------------------------*/
490int BZ_API(bzDecompressInit)
491 ( bz_stream* strm,
492 int verbosity,
493 int small )
494{
495 DState* s;
496
497 if (strm == NULL) return BZ_PARAM_ERROR;
498 if (small != 0 && small != 1) return BZ_PARAM_ERROR;
499 if (verbosity < 0 || verbosity > 4) return BZ_PARAM_ERROR;
500
501 if (strm->bzalloc == NULL) strm->bzalloc = default_bzalloc;
502 if (strm->bzfree == NULL) strm->bzfree = default_bzfree;
503
504 s = BZALLOC( sizeof(DState) );
505 if (s == NULL) return BZ_MEM_ERROR;
506 s->strm = strm;
507 strm->state = s;
508 s->state = BZ_X_MAGIC_1;
509 s->bsLive = 0;
510 s->bsBuff = 0;
511 s->calculatedCombinedCRC = 0;
512 strm->total_in = 0;
513 strm->total_out = 0;
514 s->smallDecompress = (Bool)small;
515 s->ll4 = NULL;
516 s->ll16 = NULL;
517 s->tt = NULL;
518 s->currBlockNo = 0;
519 s->verbosity = verbosity;
520
521 return BZ_OK;
522}
523
524
525/*---------------------------------------------------*/
526static
527void unRLE_obuf_to_output_FAST ( DState* s )
528{
529 UChar k1;
530
531 if (s->blockRandomised) {
532
533 while (True) {
534 /* try to finish existing run */
535 while (True) {
536 if (s->strm->avail_out == 0) return;
537 if (s->state_out_len == 0) break;
538 *( (UChar*)(s->strm->next_out) ) = s->state_out_ch;
539 BZ_UPDATE_CRC ( s->calculatedBlockCRC, s->state_out_ch );
540 s->state_out_len--;
541 s->strm->next_out++;
542 s->strm->avail_out--;
543 s->strm->total_out++;
544 }
545
546 /* can a new run be started? */
547 if (s->nblock_used == s->save_nblock+1) return;
548
549
550 s->state_out_len = 1;
551 s->state_out_ch = s->k0;
552 BZ_GET_FAST(k1); BZ_RAND_UPD_MASK;
553 k1 ^= BZ_RAND_MASK; s->nblock_used++;
554 if (s->nblock_used == s->save_nblock+1) continue;
555 if (k1 != s->k0) { s->k0 = k1; continue; };
556
557 s->state_out_len = 2;
558 BZ_GET_FAST(k1); BZ_RAND_UPD_MASK;
559 k1 ^= BZ_RAND_MASK; s->nblock_used++;
560 if (s->nblock_used == s->save_nblock+1) continue;
561 if (k1 != s->k0) { s->k0 = k1; continue; };
562
563 s->state_out_len = 3;
564 BZ_GET_FAST(k1); BZ_RAND_UPD_MASK;
565 k1 ^= BZ_RAND_MASK; s->nblock_used++;
566 if (s->nblock_used == s->save_nblock+1) continue;
567 if (k1 != s->k0) { s->k0 = k1; continue; };
568
569 BZ_GET_FAST(k1); BZ_RAND_UPD_MASK;
570 k1 ^= BZ_RAND_MASK; s->nblock_used++;
571 s->state_out_len = ((Int32)k1) + 4;
572 BZ_GET_FAST(s->k0); BZ_RAND_UPD_MASK;
573 s->k0 ^= BZ_RAND_MASK; s->nblock_used++;
574 }
575
576 } else {
577
578 /* restore */
579 UInt32 c_calculatedBlockCRC = s->calculatedBlockCRC;
580 UChar c_state_out_ch = s->state_out_ch;
581 Int32 c_state_out_len = s->state_out_len;
582 Int32 c_nblock_used = s->nblock_used;
583 Int32 c_k0 = s->k0;
584 UInt32* c_tt = s->tt;
585 UInt32 c_tPos = s->tPos;
586 char* cs_next_out = s->strm->next_out;
587 unsigned int cs_avail_out = s->strm->avail_out;
588 /* end restore */
589
590 UInt32 avail_out_INIT = cs_avail_out;
591 Int32 s_save_nblockPP = s->save_nblock+1;
592
593 while (True) {
594
595 /* try to finish existing run */
596 if (c_state_out_len > 0) {
597 while (True) {
598 if (cs_avail_out == 0) goto return_notr;
599 if (c_state_out_len == 1) break;
600 *( (UChar*)(cs_next_out) ) = c_state_out_ch;
601 BZ_UPDATE_CRC ( c_calculatedBlockCRC, c_state_out_ch );
602 c_state_out_len--;
603 cs_next_out++;
604 cs_avail_out--;
605 }
606 s_state_out_len_eq_one:
607 {
608 if (cs_avail_out == 0) {
609 c_state_out_len = 1; goto return_notr;
610 };
611 *( (UChar*)(cs_next_out) ) = c_state_out_ch;
612 BZ_UPDATE_CRC ( c_calculatedBlockCRC, c_state_out_ch );
613 cs_next_out++;
614 cs_avail_out--;
615 }
616 }
617 /* can a new run be started? */
618 if (c_nblock_used == s_save_nblockPP) {
619 c_state_out_len = 0; goto return_notr;
620 };
621 c_state_out_ch = c_k0;
622 BZ_GET_FAST_C(k1); c_nblock_used++;
623 if (k1 != c_k0) {
624 c_k0 = k1; goto s_state_out_len_eq_one;
625 };
626 if (c_nblock_used == s_save_nblockPP)
627 goto s_state_out_len_eq_one;
628
629 c_state_out_len = 2;
630 BZ_GET_FAST_C(k1); c_nblock_used++;
631 if (c_nblock_used == s_save_nblockPP) continue;
632 if (k1 != c_k0) { c_k0 = k1; continue; };
633
634 c_state_out_len = 3;
635 BZ_GET_FAST_C(k1); c_nblock_used++;
636 if (c_nblock_used == s_save_nblockPP) continue;
637 if (k1 != c_k0) { c_k0 = k1; continue; };
638
639 BZ_GET_FAST_C(k1); c_nblock_used++;
640 c_state_out_len = ((Int32)k1) + 4;
641 BZ_GET_FAST_C(c_k0); c_nblock_used++;
642 }
643
644 return_notr:
645 s->strm->total_out += (avail_out_INIT - cs_avail_out);
646
647 /* save */
648 s->calculatedBlockCRC = c_calculatedBlockCRC;
649 s->state_out_ch = c_state_out_ch;
650 s->state_out_len = c_state_out_len;
651 s->nblock_used = c_nblock_used;
652 s->k0 = c_k0;
653 s->tt = c_tt;
654 s->tPos = c_tPos;
655 s->strm->next_out = cs_next_out;
656 s->strm->avail_out = cs_avail_out;
657 /* end save */
658 }
659}
660
661
662
663/*---------------------------------------------------*/
664__inline__ Int32 indexIntoF ( Int32 indx, Int32 *cftab )
665{
666 Int32 nb, na, mid;
667 nb = 0;
668 na = 256;
669 do {
670 mid = (nb + na) >> 1;
671 if (indx >= cftab[mid]) nb = mid; else na = mid;
672 }
673 while (na - nb != 1);
674 return nb;
675}
676
677
678/*---------------------------------------------------*/
679static
680void unRLE_obuf_to_output_SMALL ( DState* s )
681{
682 UChar k1;
683
684 if (s->blockRandomised) {
685
686 while (True) {
687 /* try to finish existing run */
688 while (True) {
689 if (s->strm->avail_out == 0) return;
690 if (s->state_out_len == 0) break;
691 *( (UChar*)(s->strm->next_out) ) = s->state_out_ch;
692 BZ_UPDATE_CRC ( s->calculatedBlockCRC, s->state_out_ch );
693 s->state_out_len--;
694 s->strm->next_out++;
695 s->strm->avail_out--;
696 s->strm->total_out++;
697 }
698
699 /* can a new run be started? */
700 if (s->nblock_used == s->save_nblock+1) return;
701
702
703 s->state_out_len = 1;
704 s->state_out_ch = s->k0;
705 BZ_GET_SMALL(k1); BZ_RAND_UPD_MASK;
706 k1 ^= BZ_RAND_MASK; s->nblock_used++;
707 if (s->nblock_used == s->save_nblock+1) continue;
708 if (k1 != s->k0) { s->k0 = k1; continue; };
709
710 s->state_out_len = 2;
711 BZ_GET_SMALL(k1); BZ_RAND_UPD_MASK;
712 k1 ^= BZ_RAND_MASK; s->nblock_used++;
713 if (s->nblock_used == s->save_nblock+1) continue;
714 if (k1 != s->k0) { s->k0 = k1; continue; };
715
716 s->state_out_len = 3;
717 BZ_GET_SMALL(k1); BZ_RAND_UPD_MASK;
718 k1 ^= BZ_RAND_MASK; s->nblock_used++;
719 if (s->nblock_used == s->save_nblock+1) continue;
720 if (k1 != s->k0) { s->k0 = k1; continue; };
721
722 BZ_GET_SMALL(k1); BZ_RAND_UPD_MASK;
723 k1 ^= BZ_RAND_MASK; s->nblock_used++;
724 s->state_out_len = ((Int32)k1) + 4;
725 BZ_GET_SMALL(s->k0); BZ_RAND_UPD_MASK;
726 s->k0 ^= BZ_RAND_MASK; s->nblock_used++;
727 }
728
729 } else {
730
731 while (True) {
732 /* try to finish existing run */
733 while (True) {
734 if (s->strm->avail_out == 0) return;
735 if (s->state_out_len == 0) break;
736 *( (UChar*)(s->strm->next_out) ) = s->state_out_ch;
737 BZ_UPDATE_CRC ( s->calculatedBlockCRC, s->state_out_ch );
738 s->state_out_len--;
739 s->strm->next_out++;
740 s->strm->avail_out--;
741 s->strm->total_out++;
742 }
743
744 /* can a new run be started? */
745 if (s->nblock_used == s->save_nblock+1) return;
746
747 s->state_out_len = 1;
748 s->state_out_ch = s->k0;
749 BZ_GET_SMALL(k1); s->nblock_used++;
750 if (s->nblock_used == s->save_nblock+1) continue;
751 if (k1 != s->k0) { s->k0 = k1; continue; };
752
753 s->state_out_len = 2;
754 BZ_GET_SMALL(k1); s->nblock_used++;
755 if (s->nblock_used == s->save_nblock+1) continue;
756 if (k1 != s->k0) { s->k0 = k1; continue; };
757
758 s->state_out_len = 3;
759 BZ_GET_SMALL(k1); s->nblock_used++;
760 if (s->nblock_used == s->save_nblock+1) continue;
761 if (k1 != s->k0) { s->k0 = k1; continue; };
762
763 BZ_GET_SMALL(k1); s->nblock_used++;
764 s->state_out_len = ((Int32)k1) + 4;
765 BZ_GET_SMALL(s->k0); s->nblock_used++;
766 }
767
768 }
769}
770
771
772/*---------------------------------------------------*/
773int BZ_API(bzDecompress) ( bz_stream *strm )
774{
775 DState* s;
776 if (strm == NULL) return BZ_PARAM_ERROR;
777 s = strm->state;
778 if (s == NULL) return BZ_PARAM_ERROR;
779 if (s->strm != strm) return BZ_PARAM_ERROR;
780
781 while (True) {
782 if (s->state == BZ_X_IDLE) return BZ_SEQUENCE_ERROR;
783 if (s->state == BZ_X_OUTPUT) {
784 if (s->smallDecompress)
785 unRLE_obuf_to_output_SMALL ( s ); else
786 unRLE_obuf_to_output_FAST ( s );
787 if (s->nblock_used == s->save_nblock+1 && s->state_out_len == 0) {
788 BZ_FINALISE_CRC ( s->calculatedBlockCRC );
789 if (s->verbosity >= 3)
790 VPrintf2 ( " {0x%x, 0x%x}", s->storedBlockCRC,
791 s->calculatedBlockCRC );
792 if (s->verbosity >= 2) VPrintf0 ( "]" );
793 if (s->calculatedBlockCRC != s->storedBlockCRC)
794 return BZ_DATA_ERROR;
795 s->calculatedCombinedCRC
796 = (s->calculatedCombinedCRC << 1) |
797 (s->calculatedCombinedCRC >> 31);
798 s->calculatedCombinedCRC ^= s->calculatedBlockCRC;
799 s->state = BZ_X_BLKHDR_1;
800 } else {
801 return BZ_OK;
802 }
803 }
804 if (s->state >= BZ_X_MAGIC_1) {
805 Int32 r = decompress ( s );
806 if (r == BZ_STREAM_END) {
807 if (s->verbosity >= 3)
808 VPrintf2 ( "\n combined CRCs: stored = 0x%x, computed = 0x%x",
809 s->storedCombinedCRC, s->calculatedCombinedCRC );
810 if (s->calculatedCombinedCRC != s->storedCombinedCRC)
811 return BZ_DATA_ERROR;
812 return r;
813 }
814 if (s->state != BZ_X_OUTPUT) return r;
815 }
816 }
817
818 AssertH ( 0, 6001 );
819 /*notreached*/
820}
821
822
823/*---------------------------------------------------*/
824int BZ_API(bzDecompressEnd) ( bz_stream *strm )
825{
826 DState* s;
827 if (strm == NULL) return BZ_PARAM_ERROR;
828 s = strm->state;
829 if (s == NULL) return BZ_PARAM_ERROR;
830 if (s->strm != strm) return BZ_PARAM_ERROR;
831
832 if (s->tt != NULL) BZFREE(s->tt);
833 if (s->ll16 != NULL) BZFREE(s->ll16);
834 if (s->ll4 != NULL) BZFREE(s->ll4);
835
836 BZFREE(strm->state);
837 strm->state = NULL;
838
839 return BZ_OK;
840}
841
842
843#ifndef BZ_NO_STDIO
844/*---------------------------------------------------*/
845/*--- File I/O stuff ---*/
846/*---------------------------------------------------*/
847
848#define BZ_SETERR(eee) \
849{ \
850 if (bzerror != NULL) *bzerror = eee; \
851 if (bzf != NULL) bzf->lastErr = eee; \
852}
853
854typedef
855 struct {
856 FILE* handle;
857 Char buf[BZ_MAX_UNUSED];
858 Int32 bufN;
859 Bool writing;
860 bz_stream strm;
861 Int32 lastErr;
862 Bool initialisedOk;
863 }
864 bzFile;
865
866
867/*---------------------------------------------*/
868static Bool myfeof ( FILE* f )
869{
870 Int32 c = fgetc ( f );
871 if (c == EOF) return True;
872 ungetc ( c, f );
873 return False;
874}
875
876
877/*---------------------------------------------------*/
878BZFILE* BZ_API(bzWriteOpen)
879 ( int* bzerror,
880 FILE* f,
881 int blockSize100k,
882 int verbosity,
883 int workFactor )
884{
885 Int32 ret;
886 bzFile* bzf = NULL;
887
888 BZ_SETERR(BZ_OK);
889
890 if (f == NULL ||
891 (blockSize100k < 1 || blockSize100k > 9) ||
892 (workFactor < 0 || workFactor > 250) ||
893 (verbosity < 0 || verbosity > 4))
894 { BZ_SETERR(BZ_PARAM_ERROR); return NULL; };
895
896 if (ferror(f))
897 { BZ_SETERR(BZ_IO_ERROR); return NULL; };
898
899 bzf = malloc ( sizeof(bzFile) );
900 if (bzf == NULL)
901 { BZ_SETERR(BZ_MEM_ERROR); return NULL; };
902
903 BZ_SETERR(BZ_OK);
904 bzf->initialisedOk = False;
905 bzf->bufN = 0;
906 bzf->handle = f;
907 bzf->writing = True;
908 bzf->strm.bzalloc = NULL;
909 bzf->strm.bzfree = NULL;
910 bzf->strm.opaque = NULL;
911
912 if (workFactor == 0) workFactor = 30;
913 ret = bzCompressInit ( &(bzf->strm), blockSize100k,
914 verbosity, workFactor );
915 if (ret != BZ_OK)
916 { BZ_SETERR(ret); free(bzf); return NULL; };
917
918 bzf->strm.avail_in = 0;
919 bzf->initialisedOk = True;
920 return bzf;
921}
922
923
924
925/*---------------------------------------------------*/
926void BZ_API(bzWrite)
927 ( int* bzerror,
928 BZFILE* b,
929 void* buf,
930 int len )
931{
932 Int32 n, n2, ret;
933 bzFile* bzf = (bzFile*)b;
934
935 BZ_SETERR(BZ_OK);
936 if (bzf == NULL || buf == NULL || len < 0)
937 { BZ_SETERR(BZ_PARAM_ERROR); return; };
938 if (!(bzf->writing))
939 { BZ_SETERR(BZ_SEQUENCE_ERROR); return; };
940 if (ferror(bzf->handle))
941 { BZ_SETERR(BZ_IO_ERROR); return; };
942
943 if (len == 0)
944 { BZ_SETERR(BZ_OK); return; };
945
946 bzf->strm.avail_in = len;
947 bzf->strm.next_in = buf;
948
949 while (True) {
950 bzf->strm.avail_out = BZ_MAX_UNUSED;
951 bzf->strm.next_out = bzf->buf;
952 ret = bzCompress ( &(bzf->strm), BZ_RUN );
953 if (ret != BZ_RUN_OK)
954 { BZ_SETERR(ret); return; };
955
956 if (bzf->strm.avail_out < BZ_MAX_UNUSED) {
957 n = BZ_MAX_UNUSED - bzf->strm.avail_out;
958 n2 = fwrite ( (void*)(bzf->buf), sizeof(UChar),
959 n, bzf->handle );
960 if (n != n2 || ferror(bzf->handle))
961 { BZ_SETERR(BZ_IO_ERROR); return; };
962 }
963
964 if (bzf->strm.avail_in == 0)
965 { BZ_SETERR(BZ_OK); return; };
966 }
967}
968
969
970/*---------------------------------------------------*/
971void BZ_API(bzWriteClose)
972 ( int* bzerror,
973 BZFILE* b,
974 int abandon,
975 unsigned int* nbytes_in,
976 unsigned int* nbytes_out )
977{
978 Int32 n, n2, ret;
979 bzFile* bzf = (bzFile*)b;
980
981 if (bzf == NULL)
982 { BZ_SETERR(BZ_OK); return; };
983 if (!(bzf->writing))
984 { BZ_SETERR(BZ_SEQUENCE_ERROR); return; };
985 if (ferror(bzf->handle))
986 { BZ_SETERR(BZ_IO_ERROR); return; };
987
988 if (nbytes_in != NULL) *nbytes_in = 0;
989 if (nbytes_out != NULL) *nbytes_out = 0;
990
991 if ((!abandon) && bzf->lastErr == BZ_OK) {
992 while (True) {
993 bzf->strm.avail_out = BZ_MAX_UNUSED;
994 bzf->strm.next_out = bzf->buf;
995 ret = bzCompress ( &(bzf->strm), BZ_FINISH );
996 if (ret != BZ_FINISH_OK && ret != BZ_STREAM_END)
997 { BZ_SETERR(ret); return; };
998
999 if (bzf->strm.avail_out < BZ_MAX_UNUSED) {
1000 n = BZ_MAX_UNUSED - bzf->strm.avail_out;
1001 n2 = fwrite ( (void*)(bzf->buf), sizeof(UChar),
1002 n, bzf->handle );
1003 if (n != n2 || ferror(bzf->handle))
1004 { BZ_SETERR(BZ_IO_ERROR); return; };
1005 }
1006
1007 if (ret == BZ_STREAM_END) break;
1008 }
1009 }
1010
1011 if ( !abandon && !ferror ( bzf->handle ) ) {
1012 fflush ( bzf->handle );
1013 if (ferror(bzf->handle))
1014 { BZ_SETERR(BZ_IO_ERROR); return; };
1015 }
1016
1017 if (nbytes_in != NULL) *nbytes_in = bzf->strm.total_in;
1018 if (nbytes_out != NULL) *nbytes_out = bzf->strm.total_out;
1019
1020 BZ_SETERR(BZ_OK);
1021 bzCompressEnd ( &(bzf->strm) );
1022 free ( bzf );
1023}
1024
1025
1026/*---------------------------------------------------*/
1027BZFILE* BZ_API(bzReadOpen)
1028 ( int* bzerror,
1029 FILE* f,
1030 int verbosity,
1031 int small,
1032 void* unused,
1033 int nUnused )
1034{
1035 bzFile* bzf = NULL;
1036 int ret;
1037
1038 BZ_SETERR(BZ_OK);
1039
1040 if (f == NULL ||
1041 (small != 0 && small != 1) ||
1042 (verbosity < 0 || verbosity > 4) ||
1043 (unused == NULL && nUnused != 0) ||
1044 (unused != NULL && (nUnused < 0 || nUnused > BZ_MAX_UNUSED)))
1045 { BZ_SETERR(BZ_PARAM_ERROR); return NULL; };
1046
1047 if (ferror(f))
1048 { BZ_SETERR(BZ_IO_ERROR); return NULL; };
1049
1050 bzf = malloc ( sizeof(bzFile) );
1051 if (bzf == NULL)
1052 { BZ_SETERR(BZ_MEM_ERROR); return NULL; };
1053
1054 BZ_SETERR(BZ_OK);
1055
1056 bzf->initialisedOk = False;
1057 bzf->handle = f;
1058 bzf->bufN = 0;
1059 bzf->writing = False;
1060 bzf->strm.bzalloc = NULL;
1061 bzf->strm.bzfree = NULL;
1062 bzf->strm.opaque = NULL;
1063
1064 while (nUnused > 0) {
1065 bzf->buf[bzf->bufN] = *((UChar*)(unused)); bzf->bufN++;
1066 unused = ((void*)( 1 + ((UChar*)(unused)) ));
1067 nUnused--;
1068 }
1069
1070 ret = bzDecompressInit ( &(bzf->strm), verbosity, small );
1071 if (ret != BZ_OK)
1072 { BZ_SETERR(ret); free(bzf); return NULL; };
1073
1074 bzf->strm.avail_in = bzf->bufN;
1075 bzf->strm.next_in = bzf->buf;
1076
1077 bzf->initialisedOk = True;
1078 return bzf;
1079}
1080
1081
1082/*---------------------------------------------------*/
1083void BZ_API(bzReadClose) ( int *bzerror, BZFILE *b )
1084{
1085 bzFile* bzf = (bzFile*)b;
1086
1087 BZ_SETERR(BZ_OK);
1088 if (bzf == NULL)
1089 { BZ_SETERR(BZ_OK); return; };
1090
1091 if (bzf->writing)
1092 { BZ_SETERR(BZ_SEQUENCE_ERROR); return; };
1093
1094 if (bzf->initialisedOk)
1095 (void)bzDecompressEnd ( &(bzf->strm) );
1096 free ( bzf );
1097}
1098
1099
1100/*---------------------------------------------------*/
1101int BZ_API(bzRead)
1102 ( int* bzerror,
1103 BZFILE* b,
1104 void* buf,
1105 int len )
1106{
1107 Int32 n, ret;
1108 bzFile* bzf = (bzFile*)b;
1109
1110 BZ_SETERR(BZ_OK);
1111
1112 if (bzf == NULL || buf == NULL || len < 0)
1113 { BZ_SETERR(BZ_PARAM_ERROR); return 0; };
1114
1115 if (bzf->writing)
1116 { BZ_SETERR(BZ_SEQUENCE_ERROR); return 0; };
1117
1118 if (len == 0)
1119 { BZ_SETERR(BZ_OK); return 0; };
1120
1121 bzf->strm.avail_out = len;
1122 bzf->strm.next_out = buf;
1123
1124 while (True) {
1125
1126 if (ferror(bzf->handle))
1127 { BZ_SETERR(BZ_IO_ERROR); return 0; };
1128
1129 if (bzf->strm.avail_in == 0 && !myfeof(bzf->handle)) {
1130 n = fread ( bzf->buf, sizeof(UChar),
1131 BZ_MAX_UNUSED, bzf->handle );
1132 if (ferror(bzf->handle))
1133 { BZ_SETERR(BZ_IO_ERROR); return 0; };
1134 bzf->bufN = n;
1135 bzf->strm.avail_in = bzf->bufN;
1136 bzf->strm.next_in = bzf->buf;
1137 }
1138
1139 ret = bzDecompress ( &(bzf->strm) );
1140
1141 if (ret != BZ_OK && ret != BZ_STREAM_END)
1142 { BZ_SETERR(ret); return 0; };
1143
1144 if (ret == BZ_OK && myfeof(bzf->handle) &&
1145 bzf->strm.avail_in == 0 && bzf->strm.avail_out > 0)
1146 { BZ_SETERR(BZ_UNEXPECTED_EOF); return 0; };
1147
1148 if (ret == BZ_STREAM_END)
1149 { BZ_SETERR(BZ_STREAM_END);
1150 return len - bzf->strm.avail_out; };
1151 if (bzf->strm.avail_out == 0)
1152 { BZ_SETERR(BZ_OK); return len; };
1153
1154 }
1155
1156 return 0; /*not reached*/
1157}
1158
1159
1160/*---------------------------------------------------*/
1161void BZ_API(bzReadGetUnused)
1162 ( int* bzerror,
1163 BZFILE* b,
1164 void** unused,
1165 int* nUnused )
1166{
1167 bzFile* bzf = (bzFile*)b;
1168 if (bzf == NULL)
1169 { BZ_SETERR(BZ_PARAM_ERROR); return; };
1170 if (bzf->lastErr != BZ_STREAM_END)
1171 { BZ_SETERR(BZ_SEQUENCE_ERROR); return; };
1172 if (unused == NULL || nUnused == NULL)
1173 { BZ_SETERR(BZ_PARAM_ERROR); return; };
1174
1175 BZ_SETERR(BZ_OK);
1176 *nUnused = bzf->strm.avail_in;
1177 *unused = bzf->strm.next_in;
1178}
1179#endif
1180
1181
1182/*---------------------------------------------------*/
1183/*--- Misc convenience stuff ---*/
1184/*---------------------------------------------------*/
1185
1186/*---------------------------------------------------*/
1187int BZ_API(bzBuffToBuffCompress)
1188 ( char* dest,
1189 unsigned int* destLen,
1190 char* source,
1191 unsigned int sourceLen,
1192 int blockSize100k,
1193 int verbosity,
1194 int workFactor )
1195{
1196 bz_stream strm;
1197 int ret;
1198
1199 if (dest == NULL || destLen == NULL ||
1200 source == NULL ||
1201 blockSize100k < 1 || blockSize100k > 9 ||
1202 verbosity < 0 || verbosity > 4 ||
1203 workFactor < 0 || workFactor > 250)
1204 return BZ_PARAM_ERROR;
1205
1206 if (workFactor == 0) workFactor = 30;
1207 strm.bzalloc = NULL;
1208 strm.bzfree = NULL;
1209 strm.opaque = NULL;
1210 ret = bzCompressInit ( &strm, blockSize100k,
1211 verbosity, workFactor );
1212 if (ret != BZ_OK) return ret;
1213
1214 strm.next_in = source;
1215 strm.next_out = dest;
1216 strm.avail_in = sourceLen;
1217 strm.avail_out = *destLen;
1218
1219 ret = bzCompress ( &strm, BZ_FINISH );
1220 if (ret == BZ_FINISH_OK) goto output_overflow;
1221 if (ret != BZ_STREAM_END) goto errhandler;
1222
1223 /* normal termination */
1224 *destLen -= strm.avail_out;
1225 bzCompressEnd ( &strm );
1226 return BZ_OK;
1227
1228 output_overflow:
1229 bzCompressEnd ( &strm );
1230 return BZ_OUTBUFF_FULL;
1231
1232 errhandler:
1233 bzCompressEnd ( &strm );
1234 return ret;
1235}
1236
1237
1238/*---------------------------------------------------*/
1239int BZ_API(bzBuffToBuffDecompress)
1240 ( char* dest,
1241 unsigned int* destLen,
1242 char* source,
1243 unsigned int sourceLen,
1244 int small,
1245 int verbosity )
1246{
1247 bz_stream strm;
1248 int ret;
1249
1250 if (dest == NULL || destLen == NULL ||
1251 source == NULL ||
1252 (small != 0 && small != 1) ||
1253 verbosity < 0 || verbosity > 4)
1254 return BZ_PARAM_ERROR;
1255
1256 strm.bzalloc = NULL;
1257 strm.bzfree = NULL;
1258 strm.opaque = NULL;
1259 ret = bzDecompressInit ( &strm, verbosity, small );
1260 if (ret != BZ_OK) return ret;
1261
1262 strm.next_in = source;
1263 strm.next_out = dest;
1264 strm.avail_in = sourceLen;
1265 strm.avail_out = *destLen;
1266
1267 ret = bzDecompress ( &strm );
1268 if (ret == BZ_OK) goto output_overflow_or_eof;
1269 if (ret != BZ_STREAM_END) goto errhandler;
1270
1271 /* normal termination */
1272 *destLen -= strm.avail_out;
1273 bzDecompressEnd ( &strm );
1274 return BZ_OK;
1275
1276 output_overflow_or_eof:
1277 if (strm.avail_out > 0) {
1278 bzDecompressEnd ( &strm );
1279 return BZ_UNEXPECTED_EOF;
1280 } else {
1281 bzDecompressEnd ( &strm );
1282 return BZ_OUTBUFF_FULL;
1283 };
1284
1285 errhandler:
1286 bzDecompressEnd ( &strm );
1287 return BZ_SEQUENCE_ERROR;
1288}
1289
1290
1291/*---------------------------------------------------*/
1292/*--
1293 Code contributed by Yoshioka Tsuneo
1294 (QWF00133@niftyserve.or.jp/tsuneo-y@is.aist-nara.ac.jp),
1295 to support better zlib compatibility.
1296 This code is not _officially_ part of libbzip2 (yet);
1297 I haven't tested it, documented it, or considered the
1298 threading-safeness of it.
1299 If this code breaks, please contact both Yoshioka and me.
1300--*/
1301/*---------------------------------------------------*/
1302
1303/*---------------------------------------------------*/
1304/*--
1305 return version like "0.9.0c".
1306--*/
1307const char * BZ_API(bzlibVersion)(void)
1308{
1309 return BZ_VERSION;
1310}
1311
1312
1313#ifndef BZ_NO_STDIO
1314/*---------------------------------------------------*/
1315
1316#if defined(_WIN32) || defined(OS2) || defined(MSDOS)
1317# include <fcntl.h>
1318# include <io.h>
1319# define SET_BINARY_MODE(file) setmode(fileno(file),O_BINARY)
1320#else
1321# define SET_BINARY_MODE(file)
1322#endif
1323static
1324BZFILE * bzopen_or_bzdopen
1325 ( const char *path, /* no use when bzdopen */
1326 int fd, /* no use when bzdopen */
1327 const char *mode,
1328 int open_mode) /* bzopen: 0, bzdopen:1 */
1329{
1330 int bzerr;
1331 char unused[BZ_MAX_UNUSED];
1332 int blockSize100k = 9;
1333 int writing = 0;
1334 char mode2[10] = "";
1335 FILE *fp = NULL;
1336 BZFILE *bzfp = NULL;
1337 int verbosity = 0;
1338 int workFactor = 30;
1339 int smallMode = 0;
1340 int nUnused = 0;
1341
1342 if(mode==NULL){return NULL;}
1343 while(*mode){
1344 switch(*mode){
1345 case 'r':
1346 writing = 0;break;
1347 case 'w':
1348 writing = 1;break;
1349 case 's':
1350 smallMode = 1;break;
1351 default:
1352 if(isdigit(*mode)){
1353 blockSize100k = 0;
1354 while(isdigit(*mode)){
1355 blockSize100k = blockSize100k*10 + *mode-'0';
1356 mode++;
1357 }
1358 }else{
1359 /* ignore */
1360 }
1361 }
1362 mode++;
1363 }
1364 strcat(mode2, writing ? "w" : "r" );
1365 strcat(mode2,"b"); /* binary mode */
1366
1367 if(open_mode==0){
1368 if(path==NULL || strcmp(path,"")==0){
1369 fp = (writing ? stdout : stdin);
1370 SET_BINARY_MODE(fp);
1371 }else{
1372 fp = fopen(path,mode2);
1373 }
1374 }else{
1375#ifdef BZ_STRICT_ANSI
1376 fp = NULL;
1377#else
1378 fp = fdopen(fd,mode2);
1379#endif
1380 }
1381 if(fp==NULL){return NULL;}
1382
1383 if(writing){
1384 bzfp = bzWriteOpen(&bzerr,fp,blockSize100k,verbosity,workFactor);
1385 }else{
1386 bzfp = bzReadOpen(&bzerr,fp,verbosity,smallMode,unused,nUnused);
1387 }
1388 if(bzfp==NULL){
1389 if(fp!=stdin && fp!=stdout) fclose(fp);
1390 return NULL;
1391 }
1392 return bzfp;
1393}
1394
1395
1396/*---------------------------------------------------*/
1397/*--
1398 open file for read or write.
1399 ex) bzopen("file","w9")
1400 case path="" or NULL => use stdin or stdout.
1401--*/
1402BZFILE * BZ_API(bzopen)
1403 ( const char *path,
1404 const char *mode )
1405{
1406 return bzopen_or_bzdopen(path,-1,mode,/*bzopen*/0);
1407}
1408
1409
1410/*---------------------------------------------------*/
1411BZFILE * BZ_API(bzdopen)
1412 ( int fd,
1413 const char *mode )
1414{
1415 return bzopen_or_bzdopen(NULL,fd,mode,/*bzdopen*/1);
1416}
1417
1418
1419/*---------------------------------------------------*/
1420int BZ_API(bzread) (BZFILE* b, void* buf, int len )
1421{
1422 int bzerr, nread;
1423 if (((bzFile*)b)->lastErr == BZ_STREAM_END) return 0;
1424 nread = bzRead(&bzerr,b,buf,len);
1425 if (bzerr == BZ_OK || bzerr == BZ_STREAM_END) {
1426 return nread;
1427 } else {
1428 return -1;
1429 }
1430}
1431
1432
1433/*---------------------------------------------------*/
1434int BZ_API(bzwrite) (BZFILE* b, void* buf, int len )
1435{
1436 int bzerr;
1437
1438 bzWrite(&bzerr,b,buf,len);
1439 if(bzerr == BZ_OK){
1440 return len;
1441 }else{
1442 return -1;
1443 }
1444}
1445
1446
1447/*---------------------------------------------------*/
1448int BZ_API(bzflush) (BZFILE *b)
1449{
1450 /* do nothing now... */
1451 return 0;
1452}
1453
1454
1455/*---------------------------------------------------*/
1456void BZ_API(bzclose) (BZFILE* b)
1457{
1458 int bzerr;
1459 FILE *fp = ((bzFile *)b)->handle;
1460
1461 if(b==NULL){return;}
1462 if(((bzFile*)b)->writing){
1463 bzWriteClose(&bzerr,b,0,NULL,NULL);
1464 if(bzerr != BZ_OK){
1465 bzWriteClose(NULL,b,1,NULL,NULL);
1466 }
1467 }else{
1468 bzReadClose(&bzerr,b);
1469 }
1470 if(fp!=stdin && fp!=stdout){
1471 fclose(fp);
1472 }
1473}
1474
1475
1476/*---------------------------------------------------*/
1477/*--
1478 return last error code
1479--*/
1480static char *bzerrorstrings[] = {
1481 "OK"
1482 ,"SEQUENCE_ERROR"
1483 ,"PARAM_ERROR"
1484 ,"MEM_ERROR"
1485 ,"DATA_ERROR"
1486 ,"DATA_ERROR_MAGIC"
1487 ,"IO_ERROR"
1488 ,"UNEXPECTED_EOF"
1489 ,"OUTBUFF_FULL"
1490 ,"???" /* for future */
1491 ,"???" /* for future */
1492 ,"???" /* for future */
1493 ,"???" /* for future */
1494 ,"???" /* for future */
1495 ,"???" /* for future */
1496};
1497
1498
1499const char * BZ_API(bzerror) (BZFILE *b, int *errnum)
1500{
1501 int err = ((bzFile *)b)->lastErr;
1502
1503 if(err>0) err = 0;
1504 *errnum = err;
1505 return bzerrorstrings[err*-1];
1506}
1507#endif
1508
1509
1510/*-------------------------------------------------------------*/
1511/*--- end bzlib.c ---*/
1512/*-------------------------------------------------------------*/
diff --git a/bzlib.h b/bzlib.h
new file mode 100644
index 0000000..bb62273
--- /dev/null
+++ b/bzlib.h
@@ -0,0 +1,299 @@
1
2/*-------------------------------------------------------------*/
3/*--- Public header file for the library. ---*/
4/*--- bzlib.h ---*/
5/*-------------------------------------------------------------*/
6
7/*--
8 This file is a part of bzip2 and/or libbzip2, a program and
9 library for lossless, block-sorting data compression.
10
11 Copyright (C) 1996-1998 Julian R Seward. All rights reserved.
12
13 Redistribution and use in source and binary forms, with or without
14 modification, are permitted provided that the following conditions
15 are met:
16
17 1. Redistributions of source code must retain the above copyright
18 notice, this list of conditions and the following disclaimer.
19
20 2. The origin of this software must not be misrepresented; you must
21 not claim that you wrote the original software. If you use this
22 software in a product, an acknowledgment in the product
23 documentation would be appreciated but is not required.
24
25 3. Altered source versions must be plainly marked as such, and must
26 not be misrepresented as being the original software.
27
28 4. The name of the author may not be used to endorse or promote
29 products derived from this software without specific prior written
30 permission.
31
32 THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
33 OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
34 WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
35 ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
36 DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
37 DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
38 GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
39 INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
40 WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
41 NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
42 SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
43
44 Julian Seward, Guildford, Surrey, UK.
45 jseward@acm.org
46 bzip2/libbzip2 version 0.9.0c of 18 October 1998
47
48 This program is based on (at least) the work of:
49 Mike Burrows
50 David Wheeler
51 Peter Fenwick
52 Alistair Moffat
53 Radford Neal
54 Ian H. Witten
55 Robert Sedgewick
56 Jon L. Bentley
57
58 For more information on these sources, see the manual.
59--*/
60
61
62#ifndef _BZLIB_H
63#define _BZLIB_H
64
65#define BZ_RUN 0
66#define BZ_FLUSH 1
67#define BZ_FINISH 2
68
69#define BZ_OK 0
70#define BZ_RUN_OK 1
71#define BZ_FLUSH_OK 2
72#define BZ_FINISH_OK 3
73#define BZ_STREAM_END 4
74#define BZ_SEQUENCE_ERROR (-1)
75#define BZ_PARAM_ERROR (-2)
76#define BZ_MEM_ERROR (-3)
77#define BZ_DATA_ERROR (-4)
78#define BZ_DATA_ERROR_MAGIC (-5)
79#define BZ_IO_ERROR (-6)
80#define BZ_UNEXPECTED_EOF (-7)
81#define BZ_OUTBUFF_FULL (-8)
82
83typedef
84 struct {
85 char *next_in;
86 unsigned int avail_in;
87 unsigned int total_in;
88
89 char *next_out;
90 unsigned int avail_out;
91 unsigned int total_out;
92
93 void *state;
94
95 void *(*bzalloc)(void *,int,int);
96 void (*bzfree)(void *,void *);
97 void *opaque;
98 }
99 bz_stream;
100
101
102#ifndef BZ_IMPORT
103#define BZ_EXPORT
104#endif
105
106#ifdef _WIN32
107# include <stdio.h>
108# include <windows.h>
109# ifdef small
110 /* windows.h define small to char */
111# undef small
112# endif
113# ifdef BZ_EXPORT
114# define BZ_API(func) WINAPI func
115# define BZ_EXTERN extern
116# else
117 /* import windows dll dynamically */
118# define BZ_API(func) (WINAPI * func)
119# define BZ_EXTERN
120# endif
121#else
122# define BZ_API(func) func
123# define BZ_EXTERN extern
124#endif
125
126
127/*-- Core (low-level) library functions --*/
128
129BZ_EXTERN int BZ_API(bzCompressInit) (
130 bz_stream* strm,
131 int blockSize100k,
132 int verbosity,
133 int workFactor
134 );
135
136BZ_EXTERN int BZ_API(bzCompress) (
137 bz_stream* strm,
138 int action
139 );
140
141BZ_EXTERN int BZ_API(bzCompressEnd) (
142 bz_stream* strm
143 );
144
145BZ_EXTERN int BZ_API(bzDecompressInit) (
146 bz_stream *strm,
147 int verbosity,
148 int small
149 );
150
151BZ_EXTERN int BZ_API(bzDecompress) (
152 bz_stream* strm
153 );
154
155BZ_EXTERN int BZ_API(bzDecompressEnd) (
156 bz_stream *strm
157 );
158
159
160
161/*-- High(er) level library functions --*/
162
163#ifndef BZ_NO_STDIO
164#define BZ_MAX_UNUSED 5000
165
166typedef void BZFILE;
167
168BZ_EXTERN BZFILE* BZ_API(bzReadOpen) (
169 int* bzerror,
170 FILE* f,
171 int verbosity,
172 int small,
173 void* unused,
174 int nUnused
175 );
176
177BZ_EXTERN void BZ_API(bzReadClose) (
178 int* bzerror,
179 BZFILE* b
180 );
181
182BZ_EXTERN void BZ_API(bzReadGetUnused) (
183 int* bzerror,
184 BZFILE* b,
185 void** unused,
186 int* nUnused
187 );
188
189BZ_EXTERN int BZ_API(bzRead) (
190 int* bzerror,
191 BZFILE* b,
192 void* buf,
193 int len
194 );
195
196BZ_EXTERN BZFILE* BZ_API(bzWriteOpen) (
197 int* bzerror,
198 FILE* f,
199 int blockSize100k,
200 int verbosity,
201 int workFactor
202 );
203
204BZ_EXTERN void BZ_API(bzWrite) (
205 int* bzerror,
206 BZFILE* b,
207 void* buf,
208 int len
209 );
210
211BZ_EXTERN void BZ_API(bzWriteClose) (
212 int* bzerror,
213 BZFILE* b,
214 int abandon,
215 unsigned int* nbytes_in,
216 unsigned int* nbytes_out
217 );
218#endif
219
220
221/*-- Utility functions --*/
222
223BZ_EXTERN int BZ_API(bzBuffToBuffCompress) (
224 char* dest,
225 unsigned int* destLen,
226 char* source,
227 unsigned int sourceLen,
228 int blockSize100k,
229 int verbosity,
230 int workFactor
231 );
232
233BZ_EXTERN int BZ_API(bzBuffToBuffDecompress) (
234 char* dest,
235 unsigned int* destLen,
236 char* source,
237 unsigned int sourceLen,
238 int small,
239 int verbosity
240 );
241
242
243/*--
244 Code contributed by Yoshioka Tsuneo
245 (QWF00133@niftyserve.or.jp/tsuneo-y@is.aist-nara.ac.jp),
246 to support better zlib compatibility.
247 This code is not _officially_ part of libbzip2 (yet);
248 I haven't tested it, documented it, or considered the
249 threading-safeness of it.
250 If this code breaks, please contact both Yoshioka and me.
251--*/
252
253BZ_EXTERN const char * BZ_API(bzlibVersion) (
254 void
255 );
256
257#ifndef BZ_NO_STDIO
258BZ_EXTERN BZFILE * BZ_API(bzopen) (
259 const char *path,
260 const char *mode
261 );
262
263BZ_EXTERN BZFILE * BZ_API(bzdopen) (
264 int fd,
265 const char *mode
266 );
267
268BZ_EXTERN int BZ_API(bzread) (
269 BZFILE* b,
270 void* buf,
271 int len
272 );
273
274BZ_EXTERN int BZ_API(bzwrite) (
275 BZFILE* b,
276 void* buf,
277 int len
278 );
279
280BZ_EXTERN int BZ_API(bzflush) (
281 BZFILE* b
282 );
283
284BZ_EXTERN void BZ_API(bzclose) (
285 BZFILE* b
286 );
287
288BZ_EXTERN const char * BZ_API(bzerror) (
289 BZFILE *b,
290 int *errnum
291 );
292#endif
293
294
295#endif
296
297/*-------------------------------------------------------------*/
298/*--- end bzlib.h ---*/
299/*-------------------------------------------------------------*/
diff --git a/bzlib_private.h b/bzlib_private.h
new file mode 100644
index 0000000..4044aef
--- /dev/null
+++ b/bzlib_private.h
@@ -0,0 +1,523 @@
1
2/*-------------------------------------------------------------*/
3/*--- Private header file for the library. ---*/
4/*--- bzlib_private.h ---*/
5/*-------------------------------------------------------------*/
6
7/*--
8 This file is a part of bzip2 and/or libbzip2, a program and
9 library for lossless, block-sorting data compression.
10
11 Copyright (C) 1996-1998 Julian R Seward. All rights reserved.
12
13 Redistribution and use in source and binary forms, with or without
14 modification, are permitted provided that the following conditions
15 are met:
16
17 1. Redistributions of source code must retain the above copyright
18 notice, this list of conditions and the following disclaimer.
19
20 2. The origin of this software must not be misrepresented; you must
21 not claim that you wrote the original software. If you use this
22 software in a product, an acknowledgment in the product
23 documentation would be appreciated but is not required.
24
25 3. Altered source versions must be plainly marked as such, and must
26 not be misrepresented as being the original software.
27
28 4. The name of the author may not be used to endorse or promote
29 products derived from this software without specific prior written
30 permission.
31
32 THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
33 OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
34 WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
35 ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
36 DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
37 DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
38 GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
39 INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
40 WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
41 NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
42 SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
43
44 Julian Seward, Guildford, Surrey, UK.
45 jseward@acm.org
46 bzip2/libbzip2 version 0.9.0c of 18 October 1998
47
48 This program is based on (at least) the work of:
49 Mike Burrows
50 David Wheeler
51 Peter Fenwick
52 Alistair Moffat
53 Radford Neal
54 Ian H. Witten
55 Robert Sedgewick
56 Jon L. Bentley
57
58 For more information on these sources, see the manual.
59--*/
60
61
62#ifndef _BZLIB_PRIVATE_H
63#define _BZLIB_PRIVATE_H
64
65#include <stdlib.h>
66
67#ifndef BZ_NO_STDIO
68#include <stdio.h>
69#include <ctype.h>
70#include <string.h>
71#endif
72
73#include "bzlib.h"
74
75
76
77/*-- General stuff. --*/
78
79#define BZ_VERSION "0.9.0c"
80
81typedef char Char;
82typedef unsigned char Bool;
83typedef unsigned char UChar;
84typedef int Int32;
85typedef unsigned int UInt32;
86typedef short Int16;
87typedef unsigned short UInt16;
88
89#define True ((Bool)1)
90#define False ((Bool)0)
91
92#ifndef __GNUC__
93#define __inline__ /* */
94#endif
95
96#ifndef BZ_NO_STDIO
97extern void bz__AssertH__fail ( int errcode );
98#define AssertH(cond,errcode) \
99 { if (!(cond)) bz__AssertH__fail ( errcode ); }
100#if BZ_DEBUG
101#define AssertD(cond,msg) \
102 { if (!(cond)) { \
103 fprintf ( stderr, \
104 "\n\nlibbzip2(debug build): internal error\n\t%s\n", msg );\
105 exit(1); \
106 }}
107#else
108#define AssertD(cond,msg) /* */
109#endif
110#define VPrintf0(zf) \
111 fprintf(stderr,zf)
112#define VPrintf1(zf,za1) \
113 fprintf(stderr,zf,za1)
114#define VPrintf2(zf,za1,za2) \
115 fprintf(stderr,zf,za1,za2)
116#define VPrintf3(zf,za1,za2,za3) \
117 fprintf(stderr,zf,za1,za2,za3)
118#define VPrintf4(zf,za1,za2,za3,za4) \
119 fprintf(stderr,zf,za1,za2,za3,za4)
120#define VPrintf5(zf,za1,za2,za3,za4,za5) \
121 fprintf(stderr,zf,za1,za2,za3,za4,za5)
122#else
123extern void bz_internal_error ( int errcode );
124#define AssertH(cond,errcode) \
125 { if (!(cond)) bz_internal_error ( errcode ); }
126#define AssertD(cond,msg) /* */
127#define VPrintf0(zf) /* */
128#define VPrintf1(zf,za1) /* */
129#define VPrintf2(zf,za1,za2) /* */
130#define VPrintf3(zf,za1,za2,za3) /* */
131#define VPrintf4(zf,za1,za2,za3,za4) /* */
132#define VPrintf5(zf,za1,za2,za3,za4,za5) /* */
133#endif
134
135
136#define BZALLOC(nnn) (strm->bzalloc)(strm->opaque,(nnn),1)
137#define BZFREE(ppp) (strm->bzfree)(strm->opaque,(ppp))
138
139
140/*-- Constants for the back end. --*/
141
142#define BZ_MAX_ALPHA_SIZE 258
143#define BZ_MAX_CODE_LEN 23
144
145#define BZ_RUNA 0
146#define BZ_RUNB 1
147
148#define BZ_N_GROUPS 6
149#define BZ_G_SIZE 50
150#define BZ_N_ITERS 4
151
152#define BZ_MAX_SELECTORS (2 + (900000 / BZ_G_SIZE))
153
154
155
156/*-- Stuff for randomising repetitive blocks. --*/
157
158extern Int32 rNums[512];
159
160#define BZ_RAND_DECLS \
161 Int32 rNToGo; \
162 Int32 rTPos \
163
164#define BZ_RAND_INIT_MASK \
165 s->rNToGo = 0; \
166 s->rTPos = 0 \
167
168#define BZ_RAND_MASK ((s->rNToGo == 1) ? 1 : 0)
169
170#define BZ_RAND_UPD_MASK \
171 if (s->rNToGo == 0) { \
172 s->rNToGo = rNums[s->rTPos]; \
173 s->rTPos++; \
174 if (s->rTPos == 512) s->rTPos = 0; \
175 } \
176 s->rNToGo--;
177
178
179
180/*-- Stuff for doing CRCs. --*/
181
182extern UInt32 crc32Table[256];
183
184#define BZ_INITIALISE_CRC(crcVar) \
185{ \
186 crcVar = 0xffffffffL; \
187}
188
189#define BZ_FINALISE_CRC(crcVar) \
190{ \
191 crcVar = ~(crcVar); \
192}
193
194#define BZ_UPDATE_CRC(crcVar,cha) \
195{ \
196 crcVar = (crcVar << 8) ^ \
197 crc32Table[(crcVar >> 24) ^ \
198 ((UChar)cha)]; \
199}
200
201
202
203/*-- States and modes for compression. --*/
204
205#define BZ_M_IDLE 1
206#define BZ_M_RUNNING 2
207#define BZ_M_FLUSHING 3
208#define BZ_M_FINISHING 4
209
210#define BZ_S_OUTPUT 1
211#define BZ_S_INPUT 2
212
213#define BZ_NUM_OVERSHOOT_BYTES 20
214
215
216
217/*-- Structure holding all the compression-side stuff. --*/
218
219typedef
220 struct {
221 /* pointer back to the struct bz_stream */
222 bz_stream* strm;
223
224 /* mode this stream is in, and whether inputting */
225 /* or outputting data */
226 Int32 mode;
227 Int32 state;
228
229 /* remembers avail_in when flush/finish requested */
230 UInt32 avail_in_expect;
231
232 /* for doing the block sorting */
233 UChar* block;
234 UInt16* quadrant;
235 UInt32* zptr;
236 UInt16* szptr;
237 Int32* ftab;
238 Int32 workDone;
239 Int32 workLimit;
240 Int32 workFactor;
241 Bool firstAttempt;
242 Bool blockRandomised;
243 Int32 origPtr;
244
245 /* run-length-encoding of the input */
246 UInt32 state_in_ch;
247 Int32 state_in_len;
248 BZ_RAND_DECLS;
249
250 /* input and output limits and current posns */
251 Int32 nblock;
252 Int32 nblockMAX;
253 Int32 numZ;
254 Int32 state_out_pos;
255
256 /* map of bytes used in block */
257 Int32 nInUse;
258 Bool inUse[256];
259 UChar unseqToSeq[256];
260
261 /* the buffer for bit stream creation */
262 UInt32 bsBuff;
263 Int32 bsLive;
264
265 /* block and combined CRCs */
266 UInt32 blockCRC;
267 UInt32 combinedCRC;
268
269 /* misc administratium */
270 Int32 verbosity;
271 Int32 blockNo;
272 Int32 nBlocksRandomised;
273 Int32 blockSize100k;
274
275 /* stuff for coding the MTF values */
276 Int32 nMTF;
277 Int32 mtfFreq [BZ_MAX_ALPHA_SIZE];
278 UChar selector [BZ_MAX_SELECTORS];
279 UChar selectorMtf[BZ_MAX_SELECTORS];
280
281 UChar len [BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE];
282 Int32 code [BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE];
283 Int32 rfreq[BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE];
284
285 }
286 EState;
287
288
289
290/*-- externs for compression. --*/
291
292extern void
293blockSort ( EState* );
294
295extern void
296compressBlock ( EState*, Bool );
297
298extern void
299bsInitWrite ( EState* );
300
301extern void
302hbAssignCodes ( Int32*, UChar*, Int32, Int32, Int32 );
303
304extern void
305hbMakeCodeLengths ( UChar*, Int32*, Int32, Int32 );
306
307
308
309/*-- states for decompression. --*/
310
311#define BZ_X_IDLE 1
312#define BZ_X_OUTPUT 2
313
314#define BZ_X_MAGIC_1 10
315#define BZ_X_MAGIC_2 11
316#define BZ_X_MAGIC_3 12
317#define BZ_X_MAGIC_4 13
318#define BZ_X_BLKHDR_1 14
319#define BZ_X_BLKHDR_2 15
320#define BZ_X_BLKHDR_3 16
321#define BZ_X_BLKHDR_4 17
322#define BZ_X_BLKHDR_5 18
323#define BZ_X_BLKHDR_6 19
324#define BZ_X_BCRC_1 20
325#define BZ_X_BCRC_2 21
326#define BZ_X_BCRC_3 22
327#define BZ_X_BCRC_4 23
328#define BZ_X_RANDBIT 24
329#define BZ_X_ORIGPTR_1 25
330#define BZ_X_ORIGPTR_2 26
331#define BZ_X_ORIGPTR_3 27
332#define BZ_X_MAPPING_1 28
333#define BZ_X_MAPPING_2 29
334#define BZ_X_SELECTOR_1 30
335#define BZ_X_SELECTOR_2 31
336#define BZ_X_SELECTOR_3 32
337#define BZ_X_CODING_1 33
338#define BZ_X_CODING_2 34
339#define BZ_X_CODING_3 35
340#define BZ_X_MTF_1 36
341#define BZ_X_MTF_2 37
342#define BZ_X_MTF_3 38
343#define BZ_X_MTF_4 39
344#define BZ_X_MTF_5 40
345#define BZ_X_MTF_6 41
346#define BZ_X_ENDHDR_2 42
347#define BZ_X_ENDHDR_3 43
348#define BZ_X_ENDHDR_4 44
349#define BZ_X_ENDHDR_5 45
350#define BZ_X_ENDHDR_6 46
351#define BZ_X_CCRC_1 47
352#define BZ_X_CCRC_2 48
353#define BZ_X_CCRC_3 49
354#define BZ_X_CCRC_4 50
355
356
357
358/*-- Constants for the fast MTF decoder. --*/
359
360#define MTFA_SIZE 4096
361#define MTFL_SIZE 16
362
363
364
365/*-- Structure holding all the decompression-side stuff. --*/
366
367typedef
368 struct {
369 /* pointer back to the struct bz_stream */
370 bz_stream* strm;
371
372 /* state indicator for this stream */
373 Int32 state;
374
375 /* for doing the final run-length decoding */
376 UChar state_out_ch;
377 Int32 state_out_len;
378 Bool blockRandomised;
379 BZ_RAND_DECLS;
380
381 /* the buffer for bit stream reading */
382 UInt32 bsBuff;
383 Int32 bsLive;
384
385 /* misc administratium */
386 Int32 blockSize100k;
387 Bool smallDecompress;
388 Int32 currBlockNo;
389 Int32 verbosity;
390
391 /* for undoing the Burrows-Wheeler transform */
392 Int32 origPtr;
393 UInt32 tPos;
394 Int32 k0;
395 Int32 unzftab[256];
396 Int32 nblock_used;
397 Int32 cftab[257];
398 Int32 cftabCopy[257];
399
400 /* for undoing the Burrows-Wheeler transform (FAST) */
401 UInt32 *tt;
402
403 /* for undoing the Burrows-Wheeler transform (SMALL) */
404 UInt16 *ll16;
405 UChar *ll4;
406
407 /* stored and calculated CRCs */
408 UInt32 storedBlockCRC;
409 UInt32 storedCombinedCRC;
410 UInt32 calculatedBlockCRC;
411 UInt32 calculatedCombinedCRC;
412
413 /* map of bytes used in block */
414 Int32 nInUse;
415 Bool inUse[256];
416 Bool inUse16[16];
417 UChar seqToUnseq[256];
418
419 /* for decoding the MTF values */
420 UChar mtfa [MTFA_SIZE];
421 Int32 mtfbase[256 / MTFL_SIZE];
422 UChar selector [BZ_MAX_SELECTORS];
423 UChar selectorMtf[BZ_MAX_SELECTORS];
424 UChar len [BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE];
425
426 Int32 limit [BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE];
427 Int32 base [BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE];
428 Int32 perm [BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE];
429 Int32 minLens[BZ_N_GROUPS];
430
431 /* save area for scalars in the main decompress code */
432 Int32 save_i;
433 Int32 save_j;
434 Int32 save_t;
435 Int32 save_alphaSize;
436 Int32 save_nGroups;
437 Int32 save_nSelectors;
438 Int32 save_EOB;
439 Int32 save_groupNo;
440 Int32 save_groupPos;
441 Int32 save_nextSym;
442 Int32 save_nblockMAX;
443 Int32 save_nblock;
444 Int32 save_es;
445 Int32 save_N;
446 Int32 save_curr;
447 Int32 save_zt;
448 Int32 save_zn;
449 Int32 save_zvec;
450 Int32 save_zj;
451 Int32 save_gSel;
452 Int32 save_gMinlen;
453 Int32* save_gLimit;
454 Int32* save_gBase;
455 Int32* save_gPerm;
456
457 }
458 DState;
459
460
461
462/*-- Macros for decompression. --*/
463
464#define BZ_GET_FAST(cccc) \
465 s->tPos = s->tt[s->tPos]; \
466 cccc = (UChar)(s->tPos & 0xff); \
467 s->tPos >>= 8;
468
469#define BZ_GET_FAST_C(cccc) \
470 c_tPos = c_tt[c_tPos]; \
471 cccc = (UChar)(c_tPos & 0xff); \
472 c_tPos >>= 8;
473
474#define SET_LL4(i,n) \
475 { if (((i) & 0x1) == 0) \
476 s->ll4[(i) >> 1] = (s->ll4[(i) >> 1] & 0xf0) | (n); else \
477 s->ll4[(i) >> 1] = (s->ll4[(i) >> 1] & 0x0f) | ((n) << 4); \
478 }
479
480#define GET_LL4(i) \
481 (((UInt32)(s->ll4[(i) >> 1])) >> (((i) << 2) & 0x4) & 0xF)
482
483#define SET_LL(i,n) \
484 { s->ll16[i] = (UInt16)(n & 0x0000ffff); \
485 SET_LL4(i, n >> 16); \
486 }
487
488#define GET_LL(i) \
489 (((UInt32)s->ll16[i]) | (GET_LL4(i) << 16))
490
491#define BZ_GET_SMALL(cccc) \
492 cccc = indexIntoF ( s->tPos, s->cftab ); \
493 s->tPos = GET_LL(s->tPos);
494
495
496/*-- externs for decompression. --*/
497
498extern Int32
499indexIntoF ( Int32, Int32* );
500
501extern Int32
502decompress ( DState* );
503
504extern void
505hbCreateDecodeTables ( Int32*, Int32*, Int32*, UChar*,
506 Int32, Int32, Int32 );
507
508
509#endif
510
511
512/*-- BZ_NO_STDIO seems to make NULL disappear on some platforms. --*/
513
514#ifdef BZ_NO_STDIO
515#ifndef NULL
516#define NULL 0
517#endif
518#endif
519
520
521/*-------------------------------------------------------------*/
522/*--- end bzlib_private.h ---*/
523/*-------------------------------------------------------------*/
diff --git a/compress.c b/compress.c
new file mode 100644
index 0000000..23abd43
--- /dev/null
+++ b/compress.c
@@ -0,0 +1,588 @@
1
2/*-------------------------------------------------------------*/
3/*--- Compression machinery (not incl block sorting) ---*/
4/*--- compress.c ---*/
5/*-------------------------------------------------------------*/
6
7/*--
8 This file is a part of bzip2 and/or libbzip2, a program and
9 library for lossless, block-sorting data compression.
10
11 Copyright (C) 1996-1998 Julian R Seward. All rights reserved.
12
13 Redistribution and use in source and binary forms, with or without
14 modification, are permitted provided that the following conditions
15 are met:
16
17 1. Redistributions of source code must retain the above copyright
18 notice, this list of conditions and the following disclaimer.
19
20 2. The origin of this software must not be misrepresented; you must
21 not claim that you wrote the original software. If you use this
22 software in a product, an acknowledgment in the product
23 documentation would be appreciated but is not required.
24
25 3. Altered source versions must be plainly marked as such, and must
26 not be misrepresented as being the original software.
27
28 4. The name of the author may not be used to endorse or promote
29 products derived from this software without specific prior written
30 permission.
31
32 THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
33 OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
34 WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
35 ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
36 DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
37 DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
38 GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
39 INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
40 WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
41 NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
42 SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
43
44 Julian Seward, Guildford, Surrey, UK.
45 jseward@acm.org
46 bzip2/libbzip2 version 0.9.0 of 28 June 1998
47
48 This program is based on (at least) the work of:
49 Mike Burrows
50 David Wheeler
51 Peter Fenwick
52 Alistair Moffat
53 Radford Neal
54 Ian H. Witten
55 Robert Sedgewick
56 Jon L. Bentley
57
58 For more information on these sources, see the manual.
59--*/
60
61/*--
62 CHANGES
63 ~~~~~~~
64 0.9.0 -- original version.
65
66 0.9.0a/b -- no changes in this file.
67
68 0.9.0c
69 * changed setting of nGroups in sendMTFValues() so as to
70 do a bit better on small files
71--*/
72
73#include "bzlib_private.h"
74
75
76/*---------------------------------------------------*/
77/*--- Bit stream I/O ---*/
78/*---------------------------------------------------*/
79
80/*---------------------------------------------------*/
81void bsInitWrite ( EState* s )
82{
83 s->bsLive = 0;
84 s->bsBuff = 0;
85}
86
87
88/*---------------------------------------------------*/
89static
90void bsFinishWrite ( EState* s )
91{
92 while (s->bsLive > 0) {
93 ((UChar*)(s->quadrant))[s->numZ] = (UChar)(s->bsBuff >> 24);
94 s->numZ++;
95 s->bsBuff <<= 8;
96 s->bsLive -= 8;
97 }
98}
99
100
101/*---------------------------------------------------*/
102#define bsNEEDW(nz) \
103{ \
104 while (s->bsLive >= 8) { \
105 ((UChar*)(s->quadrant))[s->numZ] \
106 = (UChar)(s->bsBuff >> 24); \
107 s->numZ++; \
108 s->bsBuff <<= 8; \
109 s->bsLive -= 8; \
110 } \
111}
112
113
114/*---------------------------------------------------*/
115static
116void bsW ( EState* s, Int32 n, UInt32 v )
117{
118 bsNEEDW ( n );
119 s->bsBuff |= (v << (32 - s->bsLive - n));
120 s->bsLive += n;
121}
122
123
124/*---------------------------------------------------*/
125static
126void bsPutUInt32 ( EState* s, UInt32 u )
127{
128 bsW ( s, 8, (u >> 24) & 0xffL );
129 bsW ( s, 8, (u >> 16) & 0xffL );
130 bsW ( s, 8, (u >> 8) & 0xffL );
131 bsW ( s, 8, u & 0xffL );
132}
133
134
135/*---------------------------------------------------*/
136static
137void bsPutUChar ( EState* s, UChar c )
138{
139 bsW( s, 8, (UInt32)c );
140}
141
142
143/*---------------------------------------------------*/
144/*--- The back end proper ---*/
145/*---------------------------------------------------*/
146
147/*---------------------------------------------------*/
148static
149void makeMaps_e ( EState* s )
150{
151 Int32 i;
152 s->nInUse = 0;
153 for (i = 0; i < 256; i++)
154 if (s->inUse[i]) {
155 s->unseqToSeq[i] = s->nInUse;
156 s->nInUse++;
157 }
158}
159
160
161/*---------------------------------------------------*/
162static
163void generateMTFValues ( EState* s )
164{
165 UChar yy[256];
166 Int32 i, j;
167 UChar tmp;
168 UChar tmp2;
169 Int32 zPend;
170 Int32 wr;
171 Int32 EOB;
172
173 makeMaps_e ( s );
174 EOB = s->nInUse+1;
175
176 for (i = 0; i <= EOB; i++) s->mtfFreq[i] = 0;
177
178 wr = 0;
179 zPend = 0;
180 for (i = 0; i < s->nInUse; i++) yy[i] = (UChar) i;
181
182 for (i = 0; i < s->nblock; i++) {
183 UChar ll_i;
184
185 AssertD ( wr <= i, "generateMTFValues(1)" );
186 j = s->zptr[i]-1; if (j < 0) j += s->nblock;
187 ll_i = s->unseqToSeq[s->block[j]];
188 AssertD ( ll_i < s->nInUse, "generateMTFValues(2a)" );
189
190 j = 0;
191 tmp = yy[j];
192 while ( ll_i != tmp ) {
193 j++;
194 tmp2 = tmp;
195 tmp = yy[j];
196 yy[j] = tmp2;
197 };
198 yy[0] = tmp;
199
200 if (j == 0) {
201 zPend++;
202 } else {
203 if (zPend > 0) {
204 zPend--;
205 while (True) {
206 switch (zPend % 2) {
207 case 0: s->szptr[wr] = BZ_RUNA; wr++; s->mtfFreq[BZ_RUNA]++; break;
208 case 1: s->szptr[wr] = BZ_RUNB; wr++; s->mtfFreq[BZ_RUNB]++; break;
209 };
210 if (zPend < 2) break;
211 zPend = (zPend - 2) / 2;
212 };
213 zPend = 0;
214 }
215 s->szptr[wr] = j+1; wr++; s->mtfFreq[j+1]++;
216 }
217 }
218
219 if (zPend > 0) {
220 zPend--;
221 while (True) {
222 switch (zPend % 2) {
223 case 0: s->szptr[wr] = BZ_RUNA; wr++; s->mtfFreq[BZ_RUNA]++; break;
224 case 1: s->szptr[wr] = BZ_RUNB; wr++; s->mtfFreq[BZ_RUNB]++; break;
225 };
226 if (zPend < 2) break;
227 zPend = (zPend - 2) / 2;
228 };
229 }
230
231 s->szptr[wr] = EOB; wr++; s->mtfFreq[EOB]++;
232
233 s->nMTF = wr;
234}
235
236
237/*---------------------------------------------------*/
238#define BZ_LESSER_ICOST 0
239#define BZ_GREATER_ICOST 15
240
241static
242void sendMTFValues ( EState* s )
243{
244 Int32 v, t, i, j, gs, ge, totc, bt, bc, iter;
245 Int32 nSelectors, alphaSize, minLen, maxLen, selCtr;
246 Int32 nGroups, nBytes;
247
248 /*--
249 UChar len [BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE];
250 is a global since the decoder also needs it.
251
252 Int32 code[BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE];
253 Int32 rfreq[BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE];
254 are also globals only used in this proc.
255 Made global to keep stack frame size small.
256 --*/
257
258
259 UInt16 cost[BZ_N_GROUPS];
260 Int32 fave[BZ_N_GROUPS];
261
262 if (s->verbosity >= 3)
263 VPrintf3( " %d in block, %d after MTF & 1-2 coding, "
264 "%d+2 syms in use\n",
265 s->nblock, s->nMTF, s->nInUse );
266
267 alphaSize = s->nInUse+2;
268 for (t = 0; t < BZ_N_GROUPS; t++)
269 for (v = 0; v < alphaSize; v++)
270 s->len[t][v] = BZ_GREATER_ICOST;
271
272 /*--- Decide how many coding tables to use ---*/
273 AssertH ( s->nMTF > 0, 3001 );
274 if (s->nMTF < 200) nGroups = 2; else
275 if (s->nMTF < 600) nGroups = 3; else
276 if (s->nMTF < 1200) nGroups = 4; else
277 if (s->nMTF < 2400) nGroups = 5; else
278 nGroups = 6;
279
280 /*--- Generate an initial set of coding tables ---*/
281 {
282 Int32 nPart, remF, tFreq, aFreq;
283
284 nPart = nGroups;
285 remF = s->nMTF;
286 gs = 0;
287 while (nPart > 0) {
288 tFreq = remF / nPart;
289 ge = gs-1;
290 aFreq = 0;
291 while (aFreq < tFreq && ge < alphaSize-1) {
292 ge++;
293 aFreq += s->mtfFreq[ge];
294 }
295
296 if (ge > gs
297 && nPart != nGroups && nPart != 1
298 && ((nGroups-nPart) % 2 == 1)) {
299 aFreq -= s->mtfFreq[ge];
300 ge--;
301 }
302
303 if (s->verbosity >= 3)
304 VPrintf5( " initial group %d, [%d .. %d], "
305 "has %d syms (%4.1f%%)\n",
306 nPart, gs, ge, aFreq,
307 (100.0 * (float)aFreq) / (float)(s->nMTF) );
308
309 for (v = 0; v < alphaSize; v++)
310 if (v >= gs && v <= ge)
311 s->len[nPart-1][v] = BZ_LESSER_ICOST; else
312 s->len[nPart-1][v] = BZ_GREATER_ICOST;
313
314 nPart--;
315 gs = ge+1;
316 remF -= aFreq;
317 }
318 }
319
320 /*---
321 Iterate up to BZ_N_ITERS times to improve the tables.
322 ---*/
323 for (iter = 0; iter < BZ_N_ITERS; iter++) {
324
325 for (t = 0; t < nGroups; t++) fave[t] = 0;
326
327 for (t = 0; t < nGroups; t++)
328 for (v = 0; v < alphaSize; v++)
329 s->rfreq[t][v] = 0;
330
331 nSelectors = 0;
332 totc = 0;
333 gs = 0;
334 while (True) {
335
336 /*--- Set group start & end marks. --*/
337 if (gs >= s->nMTF) break;
338 ge = gs + BZ_G_SIZE - 1;
339 if (ge >= s->nMTF) ge = s->nMTF-1;
340
341 /*--
342 Calculate the cost of this group as coded
343 by each of the coding tables.
344 --*/
345 for (t = 0; t < nGroups; t++) cost[t] = 0;
346
347 if (nGroups == 6) {
348 register UInt16 cost0, cost1, cost2, cost3, cost4, cost5;
349 cost0 = cost1 = cost2 = cost3 = cost4 = cost5 = 0;
350 for (i = gs; i <= ge; i++) {
351 UInt16 icv = s->szptr[i];
352 cost0 += s->len[0][icv];
353 cost1 += s->len[1][icv];
354 cost2 += s->len[2][icv];
355 cost3 += s->len[3][icv];
356 cost4 += s->len[4][icv];
357 cost5 += s->len[5][icv];
358 }
359 cost[0] = cost0; cost[1] = cost1; cost[2] = cost2;
360 cost[3] = cost3; cost[4] = cost4; cost[5] = cost5;
361 } else {
362 for (i = gs; i <= ge; i++) {
363 UInt16 icv = s->szptr[i];
364 for (t = 0; t < nGroups; t++) cost[t] += s->len[t][icv];
365 }
366 }
367
368 /*--
369 Find the coding table which is best for this group,
370 and record its identity in the selector table.
371 --*/
372 bc = 999999999; bt = -1;
373 for (t = 0; t < nGroups; t++)
374 if (cost[t] < bc) { bc = cost[t]; bt = t; };
375 totc += bc;
376 fave[bt]++;
377 s->selector[nSelectors] = bt;
378 nSelectors++;
379
380 /*--
381 Increment the symbol frequencies for the selected table.
382 --*/
383 for (i = gs; i <= ge; i++)
384 s->rfreq[bt][ s->szptr[i] ]++;
385
386 gs = ge+1;
387 }
388 if (s->verbosity >= 3) {
389 VPrintf2 ( " pass %d: size is %d, grp uses are ",
390 iter+1, totc/8 );
391 for (t = 0; t < nGroups; t++)
392 VPrintf1 ( "%d ", fave[t] );
393 VPrintf0 ( "\n" );
394 }
395
396 /*--
397 Recompute the tables based on the accumulated frequencies.
398 --*/
399 for (t = 0; t < nGroups; t++)
400 hbMakeCodeLengths ( &(s->len[t][0]), &(s->rfreq[t][0]),
401 alphaSize, 20 );
402 }
403
404
405 AssertH( nGroups < 8, 3002 );
406 AssertH( nSelectors < 32768 &&
407 nSelectors <= (2 + (900000 / BZ_G_SIZE)),
408 3003 );
409
410
411 /*--- Compute MTF values for the selectors. ---*/
412 {
413 UChar pos[BZ_N_GROUPS], ll_i, tmp2, tmp;
414 for (i = 0; i < nGroups; i++) pos[i] = i;
415 for (i = 0; i < nSelectors; i++) {
416 ll_i = s->selector[i];
417 j = 0;
418 tmp = pos[j];
419 while ( ll_i != tmp ) {
420 j++;
421 tmp2 = tmp;
422 tmp = pos[j];
423 pos[j] = tmp2;
424 };
425 pos[0] = tmp;
426 s->selectorMtf[i] = j;
427 }
428 };
429
430 /*--- Assign actual codes for the tables. --*/
431 for (t = 0; t < nGroups; t++) {
432 minLen = 32;
433 maxLen = 0;
434 for (i = 0; i < alphaSize; i++) {
435 if (s->len[t][i] > maxLen) maxLen = s->len[t][i];
436 if (s->len[t][i] < minLen) minLen = s->len[t][i];
437 }
438 AssertH ( !(maxLen > 20), 3004 );
439 AssertH ( !(minLen < 1), 3005 );
440 hbAssignCodes ( &(s->code[t][0]), &(s->len[t][0]),
441 minLen, maxLen, alphaSize );
442 }
443
444 /*--- Transmit the mapping table. ---*/
445 {
446 Bool inUse16[16];
447 for (i = 0; i < 16; i++) {
448 inUse16[i] = False;
449 for (j = 0; j < 16; j++)
450 if (s->inUse[i * 16 + j]) inUse16[i] = True;
451 }
452
453 nBytes = s->numZ;
454 for (i = 0; i < 16; i++)
455 if (inUse16[i]) bsW(s,1,1); else bsW(s,1,0);
456
457 for (i = 0; i < 16; i++)
458 if (inUse16[i])
459 for (j = 0; j < 16; j++) {
460 if (s->inUse[i * 16 + j]) bsW(s,1,1); else bsW(s,1,0);
461 }
462
463 if (s->verbosity >= 3)
464 VPrintf1( " bytes: mapping %d, ", s->numZ-nBytes );
465 }
466
467 /*--- Now the selectors. ---*/
468 nBytes = s->numZ;
469 bsW ( s, 3, nGroups );
470 bsW ( s, 15, nSelectors );
471 for (i = 0; i < nSelectors; i++) {
472 for (j = 0; j < s->selectorMtf[i]; j++) bsW(s,1,1);
473 bsW(s,1,0);
474 }
475 if (s->verbosity >= 3)
476 VPrintf1( "selectors %d, ", s->numZ-nBytes );
477
478 /*--- Now the coding tables. ---*/
479 nBytes = s->numZ;
480
481 for (t = 0; t < nGroups; t++) {
482 Int32 curr = s->len[t][0];
483 bsW ( s, 5, curr );
484 for (i = 0; i < alphaSize; i++) {
485 while (curr < s->len[t][i]) { bsW(s,2,2); curr++; /* 10 */ };
486 while (curr > s->len[t][i]) { bsW(s,2,3); curr--; /* 11 */ };
487 bsW ( s, 1, 0 );
488 }
489 }
490
491 if (s->verbosity >= 3)
492 VPrintf1 ( "code lengths %d, ", s->numZ-nBytes );
493
494 /*--- And finally, the block data proper ---*/
495 nBytes = s->numZ;
496 selCtr = 0;
497 gs = 0;
498 while (True) {
499 if (gs >= s->nMTF) break;
500 ge = gs + BZ_G_SIZE - 1;
501 if (ge >= s->nMTF) ge = s->nMTF-1;
502 for (i = gs; i <= ge; i++) {
503 AssertH ( s->selector[selCtr] < nGroups, 3006 );
504 bsW ( s,
505 s->len [s->selector[selCtr]] [s->szptr[i]],
506 s->code [s->selector[selCtr]] [s->szptr[i]] );
507 }
508
509 gs = ge+1;
510 selCtr++;
511 }
512 AssertH( selCtr == nSelectors, 3007 );
513
514 if (s->verbosity >= 3)
515 VPrintf1( "codes %d\n", s->numZ-nBytes );
516}
517
518
519/*---------------------------------------------------*/
520void compressBlock ( EState* s, Bool is_last_block )
521{
522 if (s->nblock > 0) {
523
524 BZ_FINALISE_CRC ( s->blockCRC );
525 s->combinedCRC = (s->combinedCRC << 1) | (s->combinedCRC >> 31);
526 s->combinedCRC ^= s->blockCRC;
527 if (s->blockNo > 1) s->numZ = 0;
528
529 if (s->verbosity >= 2)
530 VPrintf4( " block %d: crc = 0x%8x, "
531 "combined CRC = 0x%8x, size = %d\n",
532 s->blockNo, s->blockCRC, s->combinedCRC, s->nblock );
533
534 blockSort ( s );
535 }
536
537 /*-- If this is the first block, create the stream header. --*/
538 if (s->blockNo == 1) {
539 bsInitWrite ( s );
540 bsPutUChar ( s, 'B' );
541 bsPutUChar ( s, 'Z' );
542 bsPutUChar ( s, 'h' );
543 bsPutUChar ( s, '0' + s->blockSize100k );
544 }
545
546 if (s->nblock > 0) {
547
548 bsPutUChar ( s, 0x31 ); bsPutUChar ( s, 0x41 );
549 bsPutUChar ( s, 0x59 ); bsPutUChar ( s, 0x26 );
550 bsPutUChar ( s, 0x53 ); bsPutUChar ( s, 0x59 );
551
552 /*-- Now the block's CRC, so it is in a known place. --*/
553 bsPutUInt32 ( s, s->blockCRC );
554
555 /*-- Now a single bit indicating randomisation. --*/
556 if (s->blockRandomised) {
557 bsW(s,1,1); s->nBlocksRandomised++;
558 } else
559 bsW(s,1,0);
560
561 bsW ( s, 24, s->origPtr );
562 generateMTFValues ( s );
563 sendMTFValues ( s );
564 }
565
566
567 /*-- If this is the last block, add the stream trailer. --*/
568 if (is_last_block) {
569
570 if (s->verbosity >= 2 && s->nBlocksRandomised > 0)
571 VPrintf2 ( " %d block%s needed randomisation\n",
572 s->nBlocksRandomised,
573 s->nBlocksRandomised == 1 ? "" : "s" );
574
575 bsPutUChar ( s, 0x17 ); bsPutUChar ( s, 0x72 );
576 bsPutUChar ( s, 0x45 ); bsPutUChar ( s, 0x38 );
577 bsPutUChar ( s, 0x50 ); bsPutUChar ( s, 0x90 );
578 bsPutUInt32 ( s, s->combinedCRC );
579 if (s->verbosity >= 2)
580 VPrintf1( " final combined CRC = 0x%x\n ", s->combinedCRC );
581 bsFinishWrite ( s );
582 }
583}
584
585
586/*-------------------------------------------------------------*/
587/*--- end compress.c ---*/
588/*-------------------------------------------------------------*/
diff --git a/crctable.c b/crctable.c
new file mode 100644
index 0000000..2f3eacb
--- /dev/null
+++ b/crctable.c
@@ -0,0 +1,144 @@
1
2/*-------------------------------------------------------------*/
3/*--- Table for doing CRCs ---*/
4/*--- crctable.c ---*/
5/*-------------------------------------------------------------*/
6
7/*--
8 This file is a part of bzip2 and/or libbzip2, a program and
9 library for lossless, block-sorting data compression.
10
11 Copyright (C) 1996-1998 Julian R Seward. All rights reserved.
12
13 Redistribution and use in source and binary forms, with or without
14 modification, are permitted provided that the following conditions
15 are met:
16
17 1. Redistributions of source code must retain the above copyright
18 notice, this list of conditions and the following disclaimer.
19
20 2. The origin of this software must not be misrepresented; you must
21 not claim that you wrote the original software. If you use this
22 software in a product, an acknowledgment in the product
23 documentation would be appreciated but is not required.
24
25 3. Altered source versions must be plainly marked as such, and must
26 not be misrepresented as being the original software.
27
28 4. The name of the author may not be used to endorse or promote
29 products derived from this software without specific prior written
30 permission.
31
32 THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
33 OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
34 WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
35 ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
36 DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
37 DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
38 GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
39 INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
40 WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
41 NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
42 SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
43
44 Julian Seward, Guildford, Surrey, UK.
45 jseward@acm.org
46 bzip2/libbzip2 version 0.9.0c of 18 October 1998
47
48 This program is based on (at least) the work of:
49 Mike Burrows
50 David Wheeler
51 Peter Fenwick
52 Alistair Moffat
53 Radford Neal
54 Ian H. Witten
55 Robert Sedgewick
56 Jon L. Bentley
57
58 For more information on these sources, see the manual.
59--*/
60
61
62#include "bzlib_private.h"
63
64/*--
65 I think this is an implementation of the AUTODIN-II,
66 Ethernet & FDDI 32-bit CRC standard. Vaguely derived
67 from code by Rob Warnock, in Section 51 of the
68 comp.compression FAQ.
69--*/
70
71UInt32 crc32Table[256] = {
72
73 /*-- Ugly, innit? --*/
74
75 0x00000000L, 0x04c11db7L, 0x09823b6eL, 0x0d4326d9L,
76 0x130476dcL, 0x17c56b6bL, 0x1a864db2L, 0x1e475005L,
77 0x2608edb8L, 0x22c9f00fL, 0x2f8ad6d6L, 0x2b4bcb61L,
78 0x350c9b64L, 0x31cd86d3L, 0x3c8ea00aL, 0x384fbdbdL,
79 0x4c11db70L, 0x48d0c6c7L, 0x4593e01eL, 0x4152fda9L,
80 0x5f15adacL, 0x5bd4b01bL, 0x569796c2L, 0x52568b75L,
81 0x6a1936c8L, 0x6ed82b7fL, 0x639b0da6L, 0x675a1011L,
82 0x791d4014L, 0x7ddc5da3L, 0x709f7b7aL, 0x745e66cdL,
83 0x9823b6e0L, 0x9ce2ab57L, 0x91a18d8eL, 0x95609039L,
84 0x8b27c03cL, 0x8fe6dd8bL, 0x82a5fb52L, 0x8664e6e5L,
85 0xbe2b5b58L, 0xbaea46efL, 0xb7a96036L, 0xb3687d81L,
86 0xad2f2d84L, 0xa9ee3033L, 0xa4ad16eaL, 0xa06c0b5dL,
87 0xd4326d90L, 0xd0f37027L, 0xddb056feL, 0xd9714b49L,
88 0xc7361b4cL, 0xc3f706fbL, 0xceb42022L, 0xca753d95L,
89 0xf23a8028L, 0xf6fb9d9fL, 0xfbb8bb46L, 0xff79a6f1L,
90 0xe13ef6f4L, 0xe5ffeb43L, 0xe8bccd9aL, 0xec7dd02dL,
91 0x34867077L, 0x30476dc0L, 0x3d044b19L, 0x39c556aeL,
92 0x278206abL, 0x23431b1cL, 0x2e003dc5L, 0x2ac12072L,
93 0x128e9dcfL, 0x164f8078L, 0x1b0ca6a1L, 0x1fcdbb16L,
94 0x018aeb13L, 0x054bf6a4L, 0x0808d07dL, 0x0cc9cdcaL,
95 0x7897ab07L, 0x7c56b6b0L, 0x71159069L, 0x75d48ddeL,
96 0x6b93dddbL, 0x6f52c06cL, 0x6211e6b5L, 0x66d0fb02L,
97 0x5e9f46bfL, 0x5a5e5b08L, 0x571d7dd1L, 0x53dc6066L,
98 0x4d9b3063L, 0x495a2dd4L, 0x44190b0dL, 0x40d816baL,
99 0xaca5c697L, 0xa864db20L, 0xa527fdf9L, 0xa1e6e04eL,
100 0xbfa1b04bL, 0xbb60adfcL, 0xb6238b25L, 0xb2e29692L,
101 0x8aad2b2fL, 0x8e6c3698L, 0x832f1041L, 0x87ee0df6L,
102 0x99a95df3L, 0x9d684044L, 0x902b669dL, 0x94ea7b2aL,
103 0xe0b41de7L, 0xe4750050L, 0xe9362689L, 0xedf73b3eL,
104 0xf3b06b3bL, 0xf771768cL, 0xfa325055L, 0xfef34de2L,
105 0xc6bcf05fL, 0xc27dede8L, 0xcf3ecb31L, 0xcbffd686L,
106 0xd5b88683L, 0xd1799b34L, 0xdc3abdedL, 0xd8fba05aL,
107 0x690ce0eeL, 0x6dcdfd59L, 0x608edb80L, 0x644fc637L,
108 0x7a089632L, 0x7ec98b85L, 0x738aad5cL, 0x774bb0ebL,
109 0x4f040d56L, 0x4bc510e1L, 0x46863638L, 0x42472b8fL,
110 0x5c007b8aL, 0x58c1663dL, 0x558240e4L, 0x51435d53L,
111 0x251d3b9eL, 0x21dc2629L, 0x2c9f00f0L, 0x285e1d47L,
112 0x36194d42L, 0x32d850f5L, 0x3f9b762cL, 0x3b5a6b9bL,
113 0x0315d626L, 0x07d4cb91L, 0x0a97ed48L, 0x0e56f0ffL,
114 0x1011a0faL, 0x14d0bd4dL, 0x19939b94L, 0x1d528623L,
115 0xf12f560eL, 0xf5ee4bb9L, 0xf8ad6d60L, 0xfc6c70d7L,
116 0xe22b20d2L, 0xe6ea3d65L, 0xeba91bbcL, 0xef68060bL,
117 0xd727bbb6L, 0xd3e6a601L, 0xdea580d8L, 0xda649d6fL,
118 0xc423cd6aL, 0xc0e2d0ddL, 0xcda1f604L, 0xc960ebb3L,
119 0xbd3e8d7eL, 0xb9ff90c9L, 0xb4bcb610L, 0xb07daba7L,
120 0xae3afba2L, 0xaafbe615L, 0xa7b8c0ccL, 0xa379dd7bL,
121 0x9b3660c6L, 0x9ff77d71L, 0x92b45ba8L, 0x9675461fL,
122 0x8832161aL, 0x8cf30badL, 0x81b02d74L, 0x857130c3L,
123 0x5d8a9099L, 0x594b8d2eL, 0x5408abf7L, 0x50c9b640L,
124 0x4e8ee645L, 0x4a4ffbf2L, 0x470cdd2bL, 0x43cdc09cL,
125 0x7b827d21L, 0x7f436096L, 0x7200464fL, 0x76c15bf8L,
126 0x68860bfdL, 0x6c47164aL, 0x61043093L, 0x65c52d24L,
127 0x119b4be9L, 0x155a565eL, 0x18197087L, 0x1cd86d30L,
128 0x029f3d35L, 0x065e2082L, 0x0b1d065bL, 0x0fdc1becL,
129 0x3793a651L, 0x3352bbe6L, 0x3e119d3fL, 0x3ad08088L,
130 0x2497d08dL, 0x2056cd3aL, 0x2d15ebe3L, 0x29d4f654L,
131 0xc5a92679L, 0xc1683bceL, 0xcc2b1d17L, 0xc8ea00a0L,
132 0xd6ad50a5L, 0xd26c4d12L, 0xdf2f6bcbL, 0xdbee767cL,
133 0xe3a1cbc1L, 0xe760d676L, 0xea23f0afL, 0xeee2ed18L,
134 0xf0a5bd1dL, 0xf464a0aaL, 0xf9278673L, 0xfde69bc4L,
135 0x89b8fd09L, 0x8d79e0beL, 0x803ac667L, 0x84fbdbd0L,
136 0x9abc8bd5L, 0x9e7d9662L, 0x933eb0bbL, 0x97ffad0cL,
137 0xafb010b1L, 0xab710d06L, 0xa6322bdfL, 0xa2f33668L,
138 0xbcb4666dL, 0xb8757bdaL, 0xb5365d03L, 0xb1f740b4L
139};
140
141
142/*-------------------------------------------------------------*/
143/*--- end crctable.c ---*/
144/*-------------------------------------------------------------*/
diff --git a/decompress.c b/decompress.c
new file mode 100644
index 0000000..ac2b0a5
--- /dev/null
+++ b/decompress.c
@@ -0,0 +1,636 @@
1
2/*-------------------------------------------------------------*/
3/*--- Decompression machinery ---*/
4/*--- decompress.c ---*/
5/*-------------------------------------------------------------*/
6
7/*--
8 This file is a part of bzip2 and/or libbzip2, a program and
9 library for lossless, block-sorting data compression.
10
11 Copyright (C) 1996-1998 Julian R Seward. All rights reserved.
12
13 Redistribution and use in source and binary forms, with or without
14 modification, are permitted provided that the following conditions
15 are met:
16
17 1. Redistributions of source code must retain the above copyright
18 notice, this list of conditions and the following disclaimer.
19
20 2. The origin of this software must not be misrepresented; you must
21 not claim that you wrote the original software. If you use this
22 software in a product, an acknowledgment in the product
23 documentation would be appreciated but is not required.
24
25 3. Altered source versions must be plainly marked as such, and must
26 not be misrepresented as being the original software.
27
28 4. The name of the author may not be used to endorse or promote
29 products derived from this software without specific prior written
30 permission.
31
32 THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
33 OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
34 WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
35 ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
36 DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
37 DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
38 GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
39 INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
40 WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
41 NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
42 SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
43
44 Julian Seward, Guildford, Surrey, UK.
45 jseward@acm.org
46 bzip2/libbzip2 version 0.9.0c of 18 October 1998
47
48 This program is based on (at least) the work of:
49 Mike Burrows
50 David Wheeler
51 Peter Fenwick
52 Alistair Moffat
53 Radford Neal
54 Ian H. Witten
55 Robert Sedgewick
56 Jon L. Bentley
57
58 For more information on these sources, see the manual.
59--*/
60
61
62#include "bzlib_private.h"
63
64
65/*---------------------------------------------------*/
66static
67void makeMaps_d ( DState* s )
68{
69 Int32 i;
70 s->nInUse = 0;
71 for (i = 0; i < 256; i++)
72 if (s->inUse[i]) {
73 s->seqToUnseq[s->nInUse] = i;
74 s->nInUse++;
75 }
76}
77
78
79/*---------------------------------------------------*/
80#define RETURN(rrr) \
81 { retVal = rrr; goto save_state_and_return; };
82
83#define GET_BITS(lll,vvv,nnn) \
84 case lll: s->state = lll; \
85 while (True) { \
86 if (s->bsLive >= nnn) { \
87 UInt32 v; \
88 v = (s->bsBuff >> \
89 (s->bsLive-nnn)) & ((1 << nnn)-1); \
90 s->bsLive -= nnn; \
91 vvv = v; \
92 break; \
93 } \
94 if (s->strm->avail_in == 0) RETURN(BZ_OK); \
95 s->bsBuff \
96 = (s->bsBuff << 8) | \
97 ((UInt32) \
98 (*((UChar*)(s->strm->next_in)))); \
99 s->bsLive += 8; \
100 s->strm->next_in++; \
101 s->strm->avail_in--; \
102 s->strm->total_in++; \
103 }
104
105#define GET_UCHAR(lll,uuu) \
106 GET_BITS(lll,uuu,8)
107
108#define GET_BIT(lll,uuu) \
109 GET_BITS(lll,uuu,1)
110
111/*---------------------------------------------------*/
112#define GET_MTF_VAL(label1,label2,lval) \
113{ \
114 if (groupPos == 0) { \
115 groupNo++; \
116 groupPos = BZ_G_SIZE; \
117 gSel = s->selector[groupNo]; \
118 gMinlen = s->minLens[gSel]; \
119 gLimit = &(s->limit[gSel][0]); \
120 gPerm = &(s->perm[gSel][0]); \
121 gBase = &(s->base[gSel][0]); \
122 } \
123 groupPos--; \
124 zn = gMinlen; \
125 GET_BITS(label1, zvec, zn); \
126 while (zvec > gLimit[zn]) { \
127 zn++; \
128 GET_BIT(label2, zj); \
129 zvec = (zvec << 1) | zj; \
130 }; \
131 lval = gPerm[zvec - gBase[zn]]; \
132}
133
134
135/*---------------------------------------------------*/
136Int32 decompress ( DState* s )
137{
138 UChar uc;
139 Int32 retVal;
140 Int32 minLen, maxLen;
141 bz_stream* strm = s->strm;
142
143 /* stuff that needs to be saved/restored */
144 Int32 i ;
145 Int32 j;
146 Int32 t;
147 Int32 alphaSize;
148 Int32 nGroups;
149 Int32 nSelectors;
150 Int32 EOB;
151 Int32 groupNo;
152 Int32 groupPos;
153 Int32 nextSym;
154 Int32 nblockMAX;
155 Int32 nblock;
156 Int32 es;
157 Int32 N;
158 Int32 curr;
159 Int32 zt;
160 Int32 zn;
161 Int32 zvec;
162 Int32 zj;
163 Int32 gSel;
164 Int32 gMinlen;
165 Int32* gLimit;
166 Int32* gBase;
167 Int32* gPerm;
168
169 if (s->state == BZ_X_MAGIC_1) {
170 /*initialise the save area*/
171 s->save_i = 0;
172 s->save_j = 0;
173 s->save_t = 0;
174 s->save_alphaSize = 0;
175 s->save_nGroups = 0;
176 s->save_nSelectors = 0;
177 s->save_EOB = 0;
178 s->save_groupNo = 0;
179 s->save_groupPos = 0;
180 s->save_nextSym = 0;
181 s->save_nblockMAX = 0;
182 s->save_nblock = 0;
183 s->save_es = 0;
184 s->save_N = 0;
185 s->save_curr = 0;
186 s->save_zt = 0;
187 s->save_zn = 0;
188 s->save_zvec = 0;
189 s->save_zj = 0;
190 s->save_gSel = 0;
191 s->save_gMinlen = 0;
192 s->save_gLimit = NULL;
193 s->save_gBase = NULL;
194 s->save_gPerm = NULL;
195 }
196
197 /*restore from the save area*/
198 i = s->save_i;
199 j = s->save_j;
200 t = s->save_t;
201 alphaSize = s->save_alphaSize;
202 nGroups = s->save_nGroups;
203 nSelectors = s->save_nSelectors;
204 EOB = s->save_EOB;
205 groupNo = s->save_groupNo;
206 groupPos = s->save_groupPos;
207 nextSym = s->save_nextSym;
208 nblockMAX = s->save_nblockMAX;
209 nblock = s->save_nblock;
210 es = s->save_es;
211 N = s->save_N;
212 curr = s->save_curr;
213 zt = s->save_zt;
214 zn = s->save_zn;
215 zvec = s->save_zvec;
216 zj = s->save_zj;
217 gSel = s->save_gSel;
218 gMinlen = s->save_gMinlen;
219 gLimit = s->save_gLimit;
220 gBase = s->save_gBase;
221 gPerm = s->save_gPerm;
222
223 retVal = BZ_OK;
224
225 switch (s->state) {
226
227 GET_UCHAR(BZ_X_MAGIC_1, uc);
228 if (uc != 'B') RETURN(BZ_DATA_ERROR_MAGIC);
229
230 GET_UCHAR(BZ_X_MAGIC_2, uc);
231 if (uc != 'Z') RETURN(BZ_DATA_ERROR_MAGIC);
232
233 GET_UCHAR(BZ_X_MAGIC_3, uc)
234 if (uc != 'h') RETURN(BZ_DATA_ERROR_MAGIC);
235
236 GET_BITS(BZ_X_MAGIC_4, s->blockSize100k, 8)
237 if (s->blockSize100k < '1' ||
238 s->blockSize100k > '9') RETURN(BZ_DATA_ERROR_MAGIC);
239 s->blockSize100k -= '0';
240
241 if (s->smallDecompress) {
242 s->ll16 = BZALLOC( s->blockSize100k * 100000 * sizeof(UInt16) );
243 s->ll4 = BZALLOC(
244 ((1 + s->blockSize100k * 100000) >> 1) * sizeof(UChar)
245 );
246 if (s->ll16 == NULL || s->ll4 == NULL) RETURN(BZ_MEM_ERROR);
247 } else {
248 s->tt = BZALLOC( s->blockSize100k * 100000 * sizeof(Int32) );
249 if (s->tt == NULL) RETURN(BZ_MEM_ERROR);
250 }
251
252 GET_UCHAR(BZ_X_BLKHDR_1, uc);
253
254 if (uc == 0x17) goto endhdr_2;
255 if (uc != 0x31) RETURN(BZ_DATA_ERROR);
256 GET_UCHAR(BZ_X_BLKHDR_2, uc);
257 if (uc != 0x41) RETURN(BZ_DATA_ERROR);
258 GET_UCHAR(BZ_X_BLKHDR_3, uc);
259 if (uc != 0x59) RETURN(BZ_DATA_ERROR);
260 GET_UCHAR(BZ_X_BLKHDR_4, uc);
261 if (uc != 0x26) RETURN(BZ_DATA_ERROR);
262 GET_UCHAR(BZ_X_BLKHDR_5, uc);
263 if (uc != 0x53) RETURN(BZ_DATA_ERROR);
264 GET_UCHAR(BZ_X_BLKHDR_6, uc);
265 if (uc != 0x59) RETURN(BZ_DATA_ERROR);
266
267 s->currBlockNo++;
268 if (s->verbosity >= 2)
269 VPrintf1 ( "\n [%d: huff+mtf ", s->currBlockNo );
270
271 s->storedBlockCRC = 0;
272 GET_UCHAR(BZ_X_BCRC_1, uc);
273 s->storedBlockCRC = (s->storedBlockCRC << 8) | ((UInt32)uc);
274 GET_UCHAR(BZ_X_BCRC_2, uc);
275 s->storedBlockCRC = (s->storedBlockCRC << 8) | ((UInt32)uc);
276 GET_UCHAR(BZ_X_BCRC_3, uc);
277 s->storedBlockCRC = (s->storedBlockCRC << 8) | ((UInt32)uc);
278 GET_UCHAR(BZ_X_BCRC_4, uc);
279 s->storedBlockCRC = (s->storedBlockCRC << 8) | ((UInt32)uc);
280
281 GET_BITS(BZ_X_RANDBIT, s->blockRandomised, 1);
282
283 s->origPtr = 0;
284 GET_UCHAR(BZ_X_ORIGPTR_1, uc);
285 s->origPtr = (s->origPtr << 8) | ((Int32)uc);
286 GET_UCHAR(BZ_X_ORIGPTR_2, uc);
287 s->origPtr = (s->origPtr << 8) | ((Int32)uc);
288 GET_UCHAR(BZ_X_ORIGPTR_3, uc);
289 s->origPtr = (s->origPtr << 8) | ((Int32)uc);
290
291 /*--- Receive the mapping table ---*/
292 for (i = 0; i < 16; i++) {
293 GET_BIT(BZ_X_MAPPING_1, uc);
294 if (uc == 1)
295 s->inUse16[i] = True; else
296 s->inUse16[i] = False;
297 }
298
299 for (i = 0; i < 256; i++) s->inUse[i] = False;
300
301 for (i = 0; i < 16; i++)
302 if (s->inUse16[i])
303 for (j = 0; j < 16; j++) {
304 GET_BIT(BZ_X_MAPPING_2, uc);
305 if (uc == 1) s->inUse[i * 16 + j] = True;
306 }
307 makeMaps_d ( s );
308 alphaSize = s->nInUse+2;
309
310 /*--- Now the selectors ---*/
311 GET_BITS(BZ_X_SELECTOR_1, nGroups, 3);
312 GET_BITS(BZ_X_SELECTOR_2, nSelectors, 15);
313 for (i = 0; i < nSelectors; i++) {
314 j = 0;
315 while (True) {
316 GET_BIT(BZ_X_SELECTOR_3, uc);
317 if (uc == 0) break;
318 j++;
319 if (j > 5) RETURN(BZ_DATA_ERROR);
320 }
321 s->selectorMtf[i] = j;
322 }
323
324 /*--- Undo the MTF values for the selectors. ---*/
325 {
326 UChar pos[BZ_N_GROUPS], tmp, v;
327 for (v = 0; v < nGroups; v++) pos[v] = v;
328
329 for (i = 0; i < nSelectors; i++) {
330 v = s->selectorMtf[i];
331 tmp = pos[v];
332 while (v > 0) { pos[v] = pos[v-1]; v--; }
333 pos[0] = tmp;
334 s->selector[i] = tmp;
335 }
336 }
337
338 /*--- Now the coding tables ---*/
339 for (t = 0; t < nGroups; t++) {
340 GET_BITS(BZ_X_CODING_1, curr, 5);
341 for (i = 0; i < alphaSize; i++) {
342 while (True) {
343 if (curr < 1 || curr > 20) RETURN(BZ_DATA_ERROR);
344 GET_BIT(BZ_X_CODING_2, uc);
345 if (uc == 0) break;
346 GET_BIT(BZ_X_CODING_3, uc);
347 if (uc == 0) curr++; else curr--;
348 }
349 s->len[t][i] = curr;
350 }
351 }
352
353 /*--- Create the Huffman decoding tables ---*/
354 for (t = 0; t < nGroups; t++) {
355 minLen = 32;
356 maxLen = 0;
357 for (i = 0; i < alphaSize; i++) {
358 if (s->len[t][i] > maxLen) maxLen = s->len[t][i];
359 if (s->len[t][i] < minLen) minLen = s->len[t][i];
360 }
361 hbCreateDecodeTables (
362 &(s->limit[t][0]),
363 &(s->base[t][0]),
364 &(s->perm[t][0]),
365 &(s->len[t][0]),
366 minLen, maxLen, alphaSize
367 );
368 s->minLens[t] = minLen;
369 }
370
371 /*--- Now the MTF values ---*/
372
373 EOB = s->nInUse+1;
374 nblockMAX = 100000 * s->blockSize100k;
375 groupNo = -1;
376 groupPos = 0;
377
378 for (i = 0; i <= 255; i++) s->unzftab[i] = 0;
379
380 /*-- MTF init --*/
381 {
382 Int32 ii, jj, kk;
383 kk = MTFA_SIZE-1;
384 for (ii = 256 / MTFL_SIZE - 1; ii >= 0; ii--) {
385 for (jj = MTFL_SIZE-1; jj >= 0; jj--) {
386 s->mtfa[kk] = (UChar)(ii * MTFL_SIZE + jj);
387 kk--;
388 }
389 s->mtfbase[ii] = kk + 1;
390 }
391 }
392 /*-- end MTF init --*/
393
394 nblock = 0;
395
396 GET_MTF_VAL(BZ_X_MTF_1, BZ_X_MTF_2, nextSym);
397
398 while (True) {
399
400 if (nextSym == EOB) break;
401
402 if (nextSym == BZ_RUNA || nextSym == BZ_RUNB) {
403
404 es = -1;
405 N = 1;
406 do {
407 if (nextSym == BZ_RUNA) es = es + (0+1) * N; else
408 if (nextSym == BZ_RUNB) es = es + (1+1) * N;
409 N = N * 2;
410 GET_MTF_VAL(BZ_X_MTF_3, BZ_X_MTF_4, nextSym);
411 }
412 while (nextSym == BZ_RUNA || nextSym == BZ_RUNB);
413
414 es++;
415 uc = s->seqToUnseq[ s->mtfa[s->mtfbase[0]] ];
416 s->unzftab[uc] += es;
417
418 if (s->smallDecompress)
419 while (es > 0) {
420 s->ll16[nblock] = (UInt16)uc;
421 nblock++;
422 es--;
423 }
424 else
425 while (es > 0) {
426 s->tt[nblock] = (UInt32)uc;
427 nblock++;
428 es--;
429 };
430
431 if (nblock > nblockMAX) RETURN(BZ_DATA_ERROR);
432 continue;
433
434 } else {
435
436 if (nblock > nblockMAX) RETURN(BZ_DATA_ERROR);
437
438 /*-- uc = MTF ( nextSym-1 ) --*/
439 {
440 Int32 ii, jj, kk, pp, lno, off;
441 UInt32 nn;
442 nn = (UInt32)(nextSym - 1);
443
444 if (nn < MTFL_SIZE) {
445 /* avoid general-case expense */
446 pp = s->mtfbase[0];
447 uc = s->mtfa[pp+nn];
448 while (nn > 3) {
449 Int32 z = pp+nn;
450 s->mtfa[(z) ] = s->mtfa[(z)-1];
451 s->mtfa[(z)-1] = s->mtfa[(z)-2];
452 s->mtfa[(z)-2] = s->mtfa[(z)-3];
453 s->mtfa[(z)-3] = s->mtfa[(z)-4];
454 nn -= 4;
455 }
456 while (nn > 0) {
457 s->mtfa[(pp+nn)] = s->mtfa[(pp+nn)-1]; nn--;
458 };
459 s->mtfa[pp] = uc;
460 } else {
461 /* general case */
462 lno = nn / MTFL_SIZE;
463 off = nn % MTFL_SIZE;
464 pp = s->mtfbase[lno] + off;
465 uc = s->mtfa[pp];
466 while (pp > s->mtfbase[lno]) {
467 s->mtfa[pp] = s->mtfa[pp-1]; pp--;
468 };
469 s->mtfbase[lno]++;
470 while (lno > 0) {
471 s->mtfbase[lno]--;
472 s->mtfa[s->mtfbase[lno]]
473 = s->mtfa[s->mtfbase[lno-1] + MTFL_SIZE - 1];
474 lno--;
475 }
476 s->mtfbase[0]--;
477 s->mtfa[s->mtfbase[0]] = uc;
478 if (s->mtfbase[0] == 0) {
479 kk = MTFA_SIZE-1;
480 for (ii = 256 / MTFL_SIZE-1; ii >= 0; ii--) {
481 for (jj = MTFL_SIZE-1; jj >= 0; jj--) {
482 s->mtfa[kk] = s->mtfa[s->mtfbase[ii] + jj];
483 kk--;
484 }
485 s->mtfbase[ii] = kk + 1;
486 }
487 }
488 }
489 }
490 /*-- end uc = MTF ( nextSym-1 ) --*/
491
492 s->unzftab[s->seqToUnseq[uc]]++;
493 if (s->smallDecompress)
494 s->ll16[nblock] = (UInt16)(s->seqToUnseq[uc]); else
495 s->tt[nblock] = (UInt32)(s->seqToUnseq[uc]);
496 nblock++;
497
498 GET_MTF_VAL(BZ_X_MTF_5, BZ_X_MTF_6, nextSym);
499 continue;
500 }
501 }
502
503 s->state_out_len = 0;
504 s->state_out_ch = 0;
505 BZ_INITIALISE_CRC ( s->calculatedBlockCRC );
506 s->state = BZ_X_OUTPUT;
507 if (s->verbosity >= 2) VPrintf0 ( "rt+rld" );
508
509 /*-- Set up cftab to facilitate generation of T^(-1) --*/
510 s->cftab[0] = 0;
511 for (i = 1; i <= 256; i++) s->cftab[i] = s->unzftab[i-1];
512 for (i = 1; i <= 256; i++) s->cftab[i] += s->cftab[i-1];
513
514 if (s->smallDecompress) {
515
516 /*-- Make a copy of cftab, used in generation of T --*/
517 for (i = 0; i <= 256; i++) s->cftabCopy[i] = s->cftab[i];
518
519 /*-- compute the T vector --*/
520 for (i = 0; i < nblock; i++) {
521 uc = (UChar)(s->ll16[i]);
522 SET_LL(i, s->cftabCopy[uc]);
523 s->cftabCopy[uc]++;
524 }
525
526 /*-- Compute T^(-1) by pointer reversal on T --*/
527 i = s->origPtr;
528 j = GET_LL(i);
529 do {
530 Int32 tmp = GET_LL(j);
531 SET_LL(j, i);
532 i = j;
533 j = tmp;
534 }
535 while (i != s->origPtr);
536
537 s->tPos = s->origPtr;
538 s->nblock_used = 0;
539 if (s->blockRandomised) {
540 BZ_RAND_INIT_MASK;
541 BZ_GET_SMALL(s->k0); s->nblock_used++;
542 BZ_RAND_UPD_MASK; s->k0 ^= BZ_RAND_MASK;
543 } else {
544 BZ_GET_SMALL(s->k0); s->nblock_used++;
545 }
546
547 } else {
548
549 /*-- compute the T^(-1) vector --*/
550 for (i = 0; i < nblock; i++) {
551 uc = (UChar)(s->tt[i] & 0xff);
552 s->tt[s->cftab[uc]] |= (i << 8);
553 s->cftab[uc]++;
554 }
555
556 s->tPos = s->tt[s->origPtr] >> 8;
557 s->nblock_used = 0;
558 if (s->blockRandomised) {
559 BZ_RAND_INIT_MASK;
560 BZ_GET_FAST(s->k0); s->nblock_used++;
561 BZ_RAND_UPD_MASK; s->k0 ^= BZ_RAND_MASK;
562 } else {
563 BZ_GET_FAST(s->k0); s->nblock_used++;
564 }
565
566 }
567
568 RETURN(BZ_OK);
569
570
571
572 endhdr_2:
573
574 GET_UCHAR(BZ_X_ENDHDR_2, uc);
575 if (uc != 0x72) RETURN(BZ_DATA_ERROR);
576 GET_UCHAR(BZ_X_ENDHDR_3, uc);
577 if (uc != 0x45) RETURN(BZ_DATA_ERROR);
578 GET_UCHAR(BZ_X_ENDHDR_4, uc);
579 if (uc != 0x38) RETURN(BZ_DATA_ERROR);
580 GET_UCHAR(BZ_X_ENDHDR_5, uc);
581 if (uc != 0x50) RETURN(BZ_DATA_ERROR);
582 GET_UCHAR(BZ_X_ENDHDR_6, uc);
583 if (uc != 0x90) RETURN(BZ_DATA_ERROR);
584
585 s->storedCombinedCRC = 0;
586 GET_UCHAR(BZ_X_CCRC_1, uc);
587 s->storedCombinedCRC = (s->storedCombinedCRC << 8) | ((UInt32)uc);
588 GET_UCHAR(BZ_X_CCRC_2, uc);
589 s->storedCombinedCRC = (s->storedCombinedCRC << 8) | ((UInt32)uc);
590 GET_UCHAR(BZ_X_CCRC_3, uc);
591 s->storedCombinedCRC = (s->storedCombinedCRC << 8) | ((UInt32)uc);
592 GET_UCHAR(BZ_X_CCRC_4, uc);
593 s->storedCombinedCRC = (s->storedCombinedCRC << 8) | ((UInt32)uc);
594
595 s->state = BZ_X_IDLE;
596 RETURN(BZ_STREAM_END);
597
598 default: AssertH ( False, 4001 );
599 }
600
601 AssertH ( False, 4002 );
602
603 save_state_and_return:
604
605 s->save_i = i;
606 s->save_j = j;
607 s->save_t = t;
608 s->save_alphaSize = alphaSize;
609 s->save_nGroups = nGroups;
610 s->save_nSelectors = nSelectors;
611 s->save_EOB = EOB;
612 s->save_groupNo = groupNo;
613 s->save_groupPos = groupPos;
614 s->save_nextSym = nextSym;
615 s->save_nblockMAX = nblockMAX;
616 s->save_nblock = nblock;
617 s->save_es = es;
618 s->save_N = N;
619 s->save_curr = curr;
620 s->save_zt = zt;
621 s->save_zn = zn;
622 s->save_zvec = zvec;
623 s->save_zj = zj;
624 s->save_gSel = gSel;
625 s->save_gMinlen = gMinlen;
626 s->save_gLimit = gLimit;
627 s->save_gBase = gBase;
628 s->save_gPerm = gPerm;
629
630 return retVal;
631}
632
633
634/*-------------------------------------------------------------*/
635/*--- end decompress.c ---*/
636/*-------------------------------------------------------------*/
diff --git a/dlltest.c b/dlltest.c
new file mode 100644
index 0000000..ee81bcd
--- /dev/null
+++ b/dlltest.c
@@ -0,0 +1,163 @@
1/*
2 minibz2
3 libbz2.dll test program.
4 by Yoshioka Tsuneo(QWF00133@nifty.ne.jp/tsuneo-y@is.aist-nara.ac.jp)
5 This file is Public Domain.
6 welcome any email to me.
7
8 usage: minibz2 [-d] [-{1,2,..9}] [[srcfilename] destfilename]
9*/
10
11#define BZ_IMPORT
12#include "bzlib.h"
13#include <stdio.h>
14#include <stdlib.h>
15#ifdef _WIN32
16#include <io.h>
17#endif
18
19
20#ifdef _WIN32
21
22#include <windows.h>
23static int BZ2DLLLoaded = 0;
24static HINSTANCE BZ2DLLhLib;
25int BZ2DLLLoadLibrary(void)
26{
27 HINSTANCE hLib;
28
29 if(BZ2DLLLoaded==1){return 0;}
30 hLib=LoadLibrary("libbz2.dll");
31 if(hLib == NULL){
32 puts("Can't load libbz2.dll");
33 return -1;
34 }
35 BZ2DLLLoaded=1;
36 BZ2DLLhLib=hLib;
37 bzlibVersion=GetProcAddress(hLib,"bzlibVersion");
38 bzopen=GetProcAddress(hLib,"bzopen");
39 bzdopen=GetProcAddress(hLib,"bzdopen");
40 bzread=GetProcAddress(hLib,"bzread");
41 bzwrite=GetProcAddress(hLib,"bzwrite");
42 bzflush=GetProcAddress(hLib,"bzflush");
43 bzclose=GetProcAddress(hLib,"bzclose");
44 bzerror=GetProcAddress(hLib,"bzerror");
45 return 0;
46
47}
48int BZ2DLLFreeLibrary(void)
49{
50 if(BZ2DLLLoaded==0){return 0;}
51 FreeLibrary(BZ2DLLhLib);
52 BZ2DLLLoaded=0;
53}
54#endif /* WIN32 */
55
56void usage(void)
57{
58 puts("usage: minibz2 [-d] [-{1,2,..9}] [[srcfilename] destfilename]");
59}
60
61void main(int argc,char *argv[])
62{
63 int decompress = 0;
64 int level = 9;
65 char *fn_r,*fn_w;
66
67#ifdef _WIN32
68 if(BZ2DLLLoadLibrary()<0){
69 puts("can't load dll");
70 exit(1);
71 }
72#endif
73 while(++argv,--argc){
74 if(**argv =='-' || **argv=='/'){
75 char *p;
76
77 for(p=*argv+1;*p;p++){
78 if(*p=='d'){
79 decompress = 1;
80 }else if('1'<=*p && *p<='9'){
81 level = *p - '0';
82 }else{
83 usage();
84 exit(1);
85 }
86 }
87 }else{
88 break;
89 }
90 }
91 if(argc>=1){
92 fn_r = *argv;
93 argc--;argv++;
94 }else{
95 fn_r = NULL;
96 }
97 if(argc>=1){
98 fn_w = *argv;
99 argc--;argv++;
100 }else{
101 fn_w = NULL;
102 }
103 {
104 int len;
105 char buff[0x1000];
106 char mode[10];
107
108 if(decompress){
109 BZFILE *BZ2fp_r;
110 FILE *fp_w;
111
112 if(fn_w){
113 if((fp_w = fopen(fn_w,"wb"))==NULL){
114 printf("can't open [%s]\n",fn_w);
115 perror("reason:");
116 exit(1);
117 }
118 }else{
119 fp_w = stdout;
120 }
121 if((BZ2fp_r == NULL && (BZ2fp_r = bzdopen(fileno(stdin),"rb"))==NULL)
122 || (BZ2fp_r != NULL && (BZ2fp_r = bzopen(fn_r,"rb"))==NULL)){
123 printf("can't bz2openstream\n");
124 exit(1);
125 }
126 while((len=bzread(BZ2fp_r,buff,0x1000))>0){
127 fwrite(buff,1,len,fp_w);
128 }
129 bzclose(BZ2fp_r);
130 if(fp_w != stdout) fclose(fp_w);
131 }else{
132 BZFILE *BZ2fp_w;
133 FILE *fp_r;
134
135 if(fn_r){
136 if((fp_r = fopen(fn_r,"rb"))==NULL){
137 printf("can't open [%s]\n",fn_r);
138 perror("reason:");
139 exit(1);
140 }
141 }else{
142 fp_r = stdin;
143 }
144 mode[0]='w';
145 mode[1] = '0' + level;
146 mode[2] = '\0';
147
148 if((fn_w == NULL && (BZ2fp_w = bzdopen(fileno(stdout),mode))==NULL)
149 || (fn_w !=NULL && (BZ2fp_w = bzopen(fn_w,mode))==NULL)){
150 printf("can't bz2openstream\n");
151 exit(1);
152 }
153 while((len=fread(buff,1,0x1000,fp_r))>0){
154 bzwrite(BZ2fp_w,buff,len);
155 }
156 bzclose(BZ2fp_w);
157 if(fp_r!=stdin)fclose(fp_r);
158 }
159 }
160#ifdef _WIN32
161 BZ2DLLFreeLibrary();
162#endif
163}
diff --git a/dlltest.dsp b/dlltest.dsp
new file mode 100644
index 0000000..4b1615e
--- /dev/null
+++ b/dlltest.dsp
@@ -0,0 +1,93 @@
1# Microsoft Developer Studio Project File - Name="dlltest" - Package Owner=<4>
2# Microsoft Developer Studio Generated Build File, Format Version 5.00
3# ** •ÒW‚µ‚È‚¢‚Å‚­‚¾‚³‚¢ **
4
5# TARGTYPE "Win32 (x86) Console Application" 0x0103
6
7CFG=dlltest - Win32 Debug
8!MESSAGE ‚±‚ê‚Í—LŒø‚ÈÒ²¸Ì§²Ù‚Å‚Í‚ ‚è‚Ü‚¹‚ñB ‚±‚ÌÌßÛ¼Þª¸Ä‚ðËÞÙÄÞ‚·‚邽‚ß‚É‚Í NMAKE ‚ðŽg—p‚µ‚Ä‚­‚¾‚³‚¢B
9!MESSAGE [Ò²¸Ì§²Ù‚Ì´¸½Îß°Ä] ºÏÝÄÞ‚ðŽg—p‚µ‚ÄŽÀs‚µ‚Ä‚­‚¾‚³‚¢
10!MESSAGE
11!MESSAGE NMAKE /f "dlltest.mak".
12!MESSAGE
13!MESSAGE NMAKE ‚ÌŽÀsŽž‚É\¬‚ðŽw’è‚Å‚«‚Ü‚·
14!MESSAGE ºÏÝÄÞ ×²Ýã‚ÅϸۂÌÝ’è‚ð’è‹`‚µ‚Ü‚·B—á:
15!MESSAGE
16!MESSAGE NMAKE /f "dlltest.mak" CFG="dlltest - Win32 Debug"
17!MESSAGE
18!MESSAGE ‘I‘ð‰Â”\‚ÈËÞÙÄÞ Ó°ÄÞ:
19!MESSAGE
20!MESSAGE "dlltest - Win32 Release" ("Win32 (x86) Console Application" —p)
21!MESSAGE "dlltest - Win32 Debug" ("Win32 (x86) Console Application" —p)
22!MESSAGE
23
24# Begin Project
25# PROP Scc_ProjName ""
26# PROP Scc_LocalPath ""
27CPP=cl.exe
28RSC=rc.exe
29
30!IF "$(CFG)" == "dlltest - Win32 Release"
31
32# PROP BASE Use_MFC 0
33# PROP BASE Use_Debug_Libraries 0
34# PROP BASE Output_Dir "Release"
35# PROP BASE Intermediate_Dir "Release"
36# PROP BASE Target_Dir ""
37# PROP Use_MFC 0
38# PROP Use_Debug_Libraries 0
39# PROP Output_Dir "Release"
40# PROP Intermediate_Dir "Release"
41# PROP Ignore_Export_Lib 0
42# PROP Target_Dir ""
43# ADD BASE CPP /nologo /W3 /GX /O2 /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /c
44# ADD CPP /nologo /W3 /GX /O2 /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /c
45# ADD BASE RSC /l 0x411 /d "NDEBUG"
46# ADD RSC /l 0x411 /d "NDEBUG"
47BSC32=bscmake.exe
48# ADD BASE BSC32 /nologo
49# ADD BSC32 /nologo
50LINK32=link.exe
51# ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /machine:I386
52# ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /machine:I386 /out:"minibz2.exe"
53
54!ELSEIF "$(CFG)" == "dlltest - Win32 Debug"
55
56# PROP BASE Use_MFC 0
57# PROP BASE Use_Debug_Libraries 1
58# PROP BASE Output_Dir "dlltest_"
59# PROP BASE Intermediate_Dir "dlltest_"
60# PROP BASE Target_Dir ""
61# PROP Use_MFC 0
62# PROP Use_Debug_Libraries 1
63# PROP Output_Dir "dlltest_"
64# PROP Intermediate_Dir "dlltest_"
65# PROP Ignore_Export_Lib 0
66# PROP Target_Dir ""
67# ADD BASE CPP /nologo /W3 /Gm /GX /Zi /Od /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /c
68# ADD CPP /nologo /W3 /Gm /GX /Zi /Od /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /c
69# ADD BASE RSC /l 0x411 /d "_DEBUG"
70# ADD RSC /l 0x411 /d "_DEBUG"
71BSC32=bscmake.exe
72# ADD BASE BSC32 /nologo
73# ADD BSC32 /nologo
74LINK32=link.exe
75# ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /debug /machine:I386 /pdbtype:sept
76# ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /debug /machine:I386 /out:"minibz2.exe" /pdbtype:sept
77
78!ENDIF
79
80# Begin Target
81
82# Name "dlltest - Win32 Release"
83# Name "dlltest - Win32 Debug"
84# Begin Source File
85
86SOURCE=.\bzlib.h
87# End Source File
88# Begin Source File
89
90SOURCE=.\dlltest.c
91# End Source File
92# End Target
93# End Project
diff --git a/howbig.c b/howbig.c
new file mode 100644
index 0000000..9f2ad7c
--- /dev/null
+++ b/howbig.c
@@ -0,0 +1,37 @@
1
2#include <stdio.h>
3#include <assert.h>
4#include "bzlib.h"
5
6unsigned char ibuff[1000000];
7unsigned char obuff[1000000];
8
9void doone ( int n )
10{
11 int i, j, k, q, nobuff;
12 q = 0;
13
14 for (k = 0; k < 1; k++) {
15 for (i = 0; i < n; i++)
16 ibuff[i] = ((unsigned long)(random())) & 0xff;
17 nobuff = 1000000;
18 j = bzBuffToBuffCompress ( obuff, &nobuff, ibuff, n, 9,0,0 );
19 assert (j == BZ_OK);
20 if (nobuff > q) q = nobuff;
21 }
22 printf ( "%d %d(%d)\n", n, q, (int)((float)n * 1.01 - (float)q) );
23}
24
25int main ( int argc, char** argv )
26{
27 int i;
28 i = 0;
29 while (1) {
30 if (i >= 900000) break;
31 doone(i);
32 if ( (int)(1.10 * i) > i )
33 i = (int)(1.10 * i); else i++;
34 }
35
36 return 0;
37} \ No newline at end of file
diff --git a/huffman.c b/huffman.c
new file mode 100644
index 0000000..8254990
--- /dev/null
+++ b/huffman.c
@@ -0,0 +1,228 @@
1
2/*-------------------------------------------------------------*/
3/*--- Huffman coding low-level stuff ---*/
4/*--- huffman.c ---*/
5/*-------------------------------------------------------------*/
6
7/*--
8 This file is a part of bzip2 and/or libbzip2, a program and
9 library for lossless, block-sorting data compression.
10
11 Copyright (C) 1996-1998 Julian R Seward. All rights reserved.
12
13 Redistribution and use in source and binary forms, with or without
14 modification, are permitted provided that the following conditions
15 are met:
16
17 1. Redistributions of source code must retain the above copyright
18 notice, this list of conditions and the following disclaimer.
19
20 2. The origin of this software must not be misrepresented; you must
21 not claim that you wrote the original software. If you use this
22 software in a product, an acknowledgment in the product
23 documentation would be appreciated but is not required.
24
25 3. Altered source versions must be plainly marked as such, and must
26 not be misrepresented as being the original software.
27
28 4. The name of the author may not be used to endorse or promote
29 products derived from this software without specific prior written
30 permission.
31
32 THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
33 OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
34 WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
35 ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
36 DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
37 DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
38 GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
39 INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
40 WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
41 NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
42 SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
43
44 Julian Seward, Guildford, Surrey, UK.
45 jseward@acm.org
46 bzip2/libbzip2 version 0.9.0c of 18 October 1998
47
48 This program is based on (at least) the work of:
49 Mike Burrows
50 David Wheeler
51 Peter Fenwick
52 Alistair Moffat
53 Radford Neal
54 Ian H. Witten
55 Robert Sedgewick
56 Jon L. Bentley
57
58 For more information on these sources, see the manual.
59--*/
60
61
62#include "bzlib_private.h"
63
64/*---------------------------------------------------*/
65#define WEIGHTOF(zz0) ((zz0) & 0xffffff00)
66#define DEPTHOF(zz1) ((zz1) & 0x000000ff)
67#define MYMAX(zz2,zz3) ((zz2) > (zz3) ? (zz2) : (zz3))
68
69#define ADDWEIGHTS(zw1,zw2) \
70 (WEIGHTOF(zw1)+WEIGHTOF(zw2)) | \
71 (1 + MYMAX(DEPTHOF(zw1),DEPTHOF(zw2)))
72
73#define UPHEAP(z) \
74{ \
75 Int32 zz, tmp; \
76 zz = z; tmp = heap[zz]; \
77 while (weight[tmp] < weight[heap[zz >> 1]]) { \
78 heap[zz] = heap[zz >> 1]; \
79 zz >>= 1; \
80 } \
81 heap[zz] = tmp; \
82}
83
84#define DOWNHEAP(z) \
85{ \
86 Int32 zz, yy, tmp; \
87 zz = z; tmp = heap[zz]; \
88 while (True) { \
89 yy = zz << 1; \
90 if (yy > nHeap) break; \
91 if (yy < nHeap && \
92 weight[heap[yy+1]] < weight[heap[yy]]) \
93 yy++; \
94 if (weight[tmp] < weight[heap[yy]]) break; \
95 heap[zz] = heap[yy]; \
96 zz = yy; \
97 } \
98 heap[zz] = tmp; \
99}
100
101
102/*---------------------------------------------------*/
103void hbMakeCodeLengths ( UChar *len,
104 Int32 *freq,
105 Int32 alphaSize,
106 Int32 maxLen )
107{
108 /*--
109 Nodes and heap entries run from 1. Entry 0
110 for both the heap and nodes is a sentinel.
111 --*/
112 Int32 nNodes, nHeap, n1, n2, i, j, k;
113 Bool tooLong;
114
115 Int32 heap [ BZ_MAX_ALPHA_SIZE + 2 ];
116 Int32 weight [ BZ_MAX_ALPHA_SIZE * 2 ];
117 Int32 parent [ BZ_MAX_ALPHA_SIZE * 2 ];
118
119 for (i = 0; i < alphaSize; i++)
120 weight[i+1] = (freq[i] == 0 ? 1 : freq[i]) << 8;
121
122 while (True) {
123
124 nNodes = alphaSize;
125 nHeap = 0;
126
127 heap[0] = 0;
128 weight[0] = 0;
129 parent[0] = -2;
130
131 for (i = 1; i <= alphaSize; i++) {
132 parent[i] = -1;
133 nHeap++;
134 heap[nHeap] = i;
135 UPHEAP(nHeap);
136 }
137
138 AssertH( nHeap < (BZ_MAX_ALPHA_SIZE+2), 2001 );
139
140 while (nHeap > 1) {
141 n1 = heap[1]; heap[1] = heap[nHeap]; nHeap--; DOWNHEAP(1);
142 n2 = heap[1]; heap[1] = heap[nHeap]; nHeap--; DOWNHEAP(1);
143 nNodes++;
144 parent[n1] = parent[n2] = nNodes;
145 weight[nNodes] = ADDWEIGHTS(weight[n1], weight[n2]);
146 parent[nNodes] = -1;
147 nHeap++;
148 heap[nHeap] = nNodes;
149 UPHEAP(nHeap);
150 }
151
152 AssertH( nNodes < (BZ_MAX_ALPHA_SIZE * 2), 2002 );
153
154 tooLong = False;
155 for (i = 1; i <= alphaSize; i++) {
156 j = 0;
157 k = i;
158 while (parent[k] >= 0) { k = parent[k]; j++; }
159 len[i-1] = j;
160 if (j > maxLen) tooLong = True;
161 }
162
163 if (! tooLong) break;
164
165 for (i = 1; i < alphaSize; i++) {
166 j = weight[i] >> 8;
167 j = 1 + (j / 2);
168 weight[i] = j << 8;
169 }
170 }
171}
172
173
174/*---------------------------------------------------*/
175void hbAssignCodes ( Int32 *code,
176 UChar *length,
177 Int32 minLen,
178 Int32 maxLen,
179 Int32 alphaSize )
180{
181 Int32 n, vec, i;
182
183 vec = 0;
184 for (n = minLen; n <= maxLen; n++) {
185 for (i = 0; i < alphaSize; i++)
186 if (length[i] == n) { code[i] = vec; vec++; };
187 vec <<= 1;
188 }
189}
190
191
192/*---------------------------------------------------*/
193void hbCreateDecodeTables ( Int32 *limit,
194 Int32 *base,
195 Int32 *perm,
196 UChar *length,
197 Int32 minLen,
198 Int32 maxLen,
199 Int32 alphaSize )
200{
201 Int32 pp, i, j, vec;
202
203 pp = 0;
204 for (i = minLen; i <= maxLen; i++)
205 for (j = 0; j < alphaSize; j++)
206 if (length[j] == i) { perm[pp] = j; pp++; };
207
208 for (i = 0; i < BZ_MAX_CODE_LEN; i++) base[i] = 0;
209 for (i = 0; i < alphaSize; i++) base[length[i]+1]++;
210
211 for (i = 1; i < BZ_MAX_CODE_LEN; i++) base[i] += base[i-1];
212
213 for (i = 0; i < BZ_MAX_CODE_LEN; i++) limit[i] = 0;
214 vec = 0;
215
216 for (i = minLen; i <= maxLen; i++) {
217 vec += (base[i+1] - base[i]);
218 limit[i] = vec-1;
219 vec <<= 1;
220 }
221 for (i = minLen + 1; i <= maxLen; i++)
222 base[i] = ((limit[i-1] + 1) << 1) - base[i];
223}
224
225
226/*-------------------------------------------------------------*/
227/*--- end huffman.c ---*/
228/*-------------------------------------------------------------*/
diff --git a/libbz2.def b/libbz2.def
new file mode 100644
index 0000000..ba0f54e
--- /dev/null
+++ b/libbz2.def
@@ -0,0 +1,25 @@
1LIBRARY LIBBZ2
2DESCRIPTION "libbzip2: library for data compression"
3EXPORTS
4 bzCompressInit
5 bzCompress
6 bzCompressEnd
7 bzDecompressInit
8 bzDecompress
9 bzDecompressEnd
10 bzReadOpen
11 bzReadClose
12 bzReadGetUnused
13 bzRead
14 bzWriteOpen
15 bzWrite
16 bzWriteClose
17 bzBuffToBuffCompress
18 bzBuffToBuffDecompress
19 bzlibVersion
20 bzopen
21 bzdopen
22 bzread
23 bzwrite
24 bzflush
25 bzclose
diff --git a/libbz2.dsp b/libbz2.dsp
new file mode 100644
index 0000000..a21a20f
--- /dev/null
+++ b/libbz2.dsp
@@ -0,0 +1,130 @@
1# Microsoft Developer Studio Project File - Name="libbz2" - Package Owner=<4>
2# Microsoft Developer Studio Generated Build File, Format Version 5.00
3# ** •ÒW‚µ‚È‚¢‚Å‚­‚¾‚³‚¢ **
4
5# TARGTYPE "Win32 (x86) Dynamic-Link Library" 0x0102
6
7CFG=libbz2 - Win32 Debug
8!MESSAGE ‚±‚ê‚Í—LŒø‚ÈÒ²¸Ì§²Ù‚Å‚Í‚ ‚è‚Ü‚¹‚ñB ‚±‚ÌÌßÛ¼Þª¸Ä‚ðËÞÙÄÞ‚·‚邽‚ß‚É‚Í NMAKE ‚ðŽg—p‚µ‚Ä‚­‚¾‚³‚¢B
9!MESSAGE [Ò²¸Ì§²Ù‚Ì´¸½Îß°Ä] ºÏÝÄÞ‚ðŽg—p‚µ‚ÄŽÀs‚µ‚Ä‚­‚¾‚³‚¢
10!MESSAGE
11!MESSAGE NMAKE /f "libbz2.mak".
12!MESSAGE
13!MESSAGE NMAKE ‚ÌŽÀsŽž‚É\¬‚ðŽw’è‚Å‚«‚Ü‚·
14!MESSAGE ºÏÝÄÞ ×²Ýã‚ÅϸۂÌÝ’è‚ð’è‹`‚µ‚Ü‚·B—á:
15!MESSAGE
16!MESSAGE NMAKE /f "libbz2.mak" CFG="libbz2 - Win32 Debug"
17!MESSAGE
18!MESSAGE ‘I‘ð‰Â”\‚ÈËÞÙÄÞ Ó°ÄÞ:
19!MESSAGE
20!MESSAGE "libbz2 - Win32 Release" ("Win32 (x86) Dynamic-Link Library" —p)
21!MESSAGE "libbz2 - Win32 Debug" ("Win32 (x86) Dynamic-Link Library" —p)
22!MESSAGE
23
24# Begin Project
25# PROP Scc_ProjName ""
26# PROP Scc_LocalPath ""
27CPP=cl.exe
28MTL=midl.exe
29RSC=rc.exe
30
31!IF "$(CFG)" == "libbz2 - Win32 Release"
32
33# PROP BASE Use_MFC 0
34# PROP BASE Use_Debug_Libraries 0
35# PROP BASE Output_Dir "Release"
36# PROP BASE Intermediate_Dir "Release"
37# PROP BASE Target_Dir ""
38# PROP Use_MFC 0
39# PROP Use_Debug_Libraries 0
40# PROP Output_Dir "Release"
41# PROP Intermediate_Dir "Release"
42# PROP Ignore_Export_Lib 0
43# PROP Target_Dir ""
44# ADD BASE CPP /nologo /MT /W3 /GX /O2 /D "WIN32" /D "NDEBUG" /D "_WINDOWS" /YX /FD /c
45# ADD CPP /nologo /MT /W3 /GX /O2 /D "WIN32" /D "NDEBUG" /D "_WINDOWS" /YX /FD /c
46# ADD BASE MTL /nologo /D "NDEBUG" /mktyplib203 /o NUL /win32
47# ADD MTL /nologo /D "NDEBUG" /mktyplib203 /o NUL /win32
48# ADD BASE RSC /l 0x411 /d "NDEBUG"
49# ADD RSC /l 0x411 /d "NDEBUG"
50BSC32=bscmake.exe
51# ADD BASE BSC32 /nologo
52# ADD BSC32 /nologo
53LINK32=link.exe
54# ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:windows /dll /machine:I386
55# ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:windows /dll /machine:I386 /out:"libbz2.dll"
56
57!ELSEIF "$(CFG)" == "libbz2 - Win32 Debug"
58
59# PROP BASE Use_MFC 0
60# PROP BASE Use_Debug_Libraries 1
61# PROP BASE Output_Dir "Debug"
62# PROP BASE Intermediate_Dir "Debug"
63# PROP BASE Target_Dir ""
64# PROP Use_MFC 0
65# PROP Use_Debug_Libraries 1
66# PROP Output_Dir "Debug"
67# PROP Intermediate_Dir "Debug"
68# PROP Ignore_Export_Lib 0
69# PROP Target_Dir ""
70# ADD BASE CPP /nologo /MTd /W3 /Gm /GX /Zi /Od /D "WIN32" /D "_DEBUG" /D "_WINDOWS" /YX /FD /c
71# ADD CPP /nologo /MTd /W3 /Gm /GX /Zi /Od /D "WIN32" /D "_DEBUG" /D "_WINDOWS" /YX /FD /c
72# ADD BASE MTL /nologo /D "_DEBUG" /mktyplib203 /o NUL /win32
73# ADD MTL /nologo /D "_DEBUG" /mktyplib203 /o NUL /win32
74# ADD BASE RSC /l 0x411 /d "_DEBUG"
75# ADD RSC /l 0x411 /d "_DEBUG"
76BSC32=bscmake.exe
77# ADD BASE BSC32 /nologo
78# ADD BSC32 /nologo
79LINK32=link.exe
80# ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:windows /dll /debug /machine:I386 /pdbtype:sept
81# ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:windows /dll /debug /machine:I386 /out:"libbz2.dll" /pdbtype:sept
82
83!ENDIF
84
85# Begin Target
86
87# Name "libbz2 - Win32 Release"
88# Name "libbz2 - Win32 Debug"
89# Begin Source File
90
91SOURCE=.\blocksort.c
92# End Source File
93# Begin Source File
94
95SOURCE=.\bzlib.c
96# End Source File
97# Begin Source File
98
99SOURCE=.\bzlib.h
100# End Source File
101# Begin Source File
102
103SOURCE=.\bzlib_private.h
104# End Source File
105# Begin Source File
106
107SOURCE=.\compress.c
108# End Source File
109# Begin Source File
110
111SOURCE=.\crctable.c
112# End Source File
113# Begin Source File
114
115SOURCE=.\decompress.c
116# End Source File
117# Begin Source File
118
119SOURCE=.\huffman.c
120# End Source File
121# Begin Source File
122
123SOURCE=.\libbz2.def
124# End Source File
125# Begin Source File
126
127SOURCE=.\randtable.c
128# End Source File
129# End Target
130# End Project
diff --git a/manual.texi b/manual.texi
new file mode 100644
index 0000000..99ce661
--- /dev/null
+++ b/manual.texi
@@ -0,0 +1,2100 @@
1\input texinfo @c -*- Texinfo -*-
2@setfilename bzip2.info
3
4@ignore
5This file documents bzip2 version 0.9.0c, and associated library
6libbzip2, written by Julian Seward (jseward@acm.org).
7
8Copyright (C) 1996-1998 Julian R Seward
9
10Permission is granted to make and distribute verbatim copies of
11this manual provided the copyright notice and this permission notice
12are preserved on all copies.
13
14Permission is granted to copy and distribute translations of this manual
15into another language, under the above conditions for verbatim copies.
16@end ignore
17
18@ifinfo
19@format
20START-INFO-DIR-ENTRY
21* Bzip2: (bzip2). A program and library for data compression.
22END-INFO-DIR-ENTRY
23@end format
24
25@end ifinfo
26
27@iftex
28@c @finalout
29@settitle bzip2 and libbzip2
30@titlepage
31@title bzip2 and libbzip2
32@subtitle a program and library for data compression
33@subtitle copyright (C) 1996-1998 Julian Seward
34@subtitle version 0.9.0c of 18 October 1998
35@author Julian Seward
36
37@end titlepage
38@end iftex
39
40
41@parindent 0mm
42@parskip 2mm
43
44
45This program, @code{bzip2},
46and associated library @code{libbzip2}, are
47Copyright (C) 1996-1998 Julian R Seward. All rights reserved.
48
49Redistribution and use in source and binary forms, with or without
50modification, are permitted provided that the following conditions
51are met:
52@itemize @bullet
53@item
54 Redistributions of source code must retain the above copyright
55 notice, this list of conditions and the following disclaimer.
56@item
57 The origin of this software must not be misrepresented; you must
58 not claim that you wrote the original software. If you use this
59 software in a product, an acknowledgment in the product
60 documentation would be appreciated but is not required.
61@item
62 Altered source versions must be plainly marked as such, and must
63 not be misrepresented as being the original software.
64@item
65 The name of the author may not be used to endorse or promote
66 products derived from this software without specific prior written
67 permission.
68@end itemize
69THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
70OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
71WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
72ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
73DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
74DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
75GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
76INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
77WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
78NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
79SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
80
81Julian Seward, Guildford, Surrey, UK.
82
83@code{jseward@@acm.org}
84
85@code{http://www.muraroa.demon.co.uk}
86
87@code{bzip2}/@code{libbzip2} version 0.9.0c of 18 October 1998.
88
89PATENTS: To the best of my knowledge, @code{bzip2} does not use any patented
90algorithms. However, I do not have the resources available to carry out
91a full patent search. Therefore I cannot give any guarantee of the
92above statement.
93
94
95
96
97
98
99
100@node Overview, Implementation, Top, Top
101@chapter Introduction
102
103@code{bzip2} compresses files using the Burrows-Wheeler
104block-sorting text compression algorithm, and Huffman coding.
105Compression is generally considerably better than that
106achieved by more conventional LZ77/LZ78-based compressors,
107and approaches the performance of the PPM family of statistical compressors.
108
109@code{bzip2} is built on top of @code{libbzip2}, a flexible library
110for handling compressed data in the @code{bzip2} format. This manual
111describes both how to use the program and
112how to work with the library interface. Most of the
113manual is devoted to this library, not the program,
114which is good news if your interest is only in the program.
115
116Chapter 2 describes how to use @code{bzip2}; this is the only part
117you need to read if you just want to know how to operate the program.
118Chapter 3 describes the programming interfaces in detail, and
119Chapter 4 records some miscellaneous notes which I thought
120ought to be recorded somewhere.
121
122
123@chapter How to use @code{bzip2}
124
125This chapter contains a copy of the @code{bzip2} man page,
126and nothing else.
127@example
128NAME
129 bzip2, bunzip2 - a block-sorting file compressor, v0.9.0
130 bzcat - decompresses files to stdout
131 bzip2recover - recovers data from damaged bzip2 files
132
133
134SYNOPSIS
135 bzip2 [ -cdfkstvzVL123456789 ] [ filenames ... ]
136 bunzip2 [ -fkvsVL ] [ filenames ... ]
137 bzcat [ -s ] [ filenames ... ]
138 bzip2recover filename
139
140
141DESCRIPTION
142 bzip2 compresses files using the Burrows-Wheeler block-
143 sorting text compression algorithm, and Huffman coding.
144 Compression is generally considerably better than that
145 achieved by more conventional LZ77/LZ78-based compressors,
146 and approaches the performance of the PPM family of sta-
147 tistical compressors.
148
149 The command-line options are deliberately very similar to
150 those of GNU Gzip, but they are not identical.
151
152 bzip2 expects a list of file names to accompany the com-
153 mand-line flags. Each file is replaced by a compressed
154 version of itself, with the name "original_name.bz2".
155 Each compressed file has the same modification date and
156 permissions as the corresponding original, so that these
157 properties can be correctly restored at decompression
158 time. File name handling is naive in the sense that there
159 is no mechanism for preserving original file names, per-
160 missions and dates in filesystems which lack these con-
161 cepts, or have serious file name length restrictions, such
162 as MS-DOS.
163
164 bzip2 and bunzip2 will by default not overwrite existing
165 files; if you want this to happen, specify the -f flag.
166
167 If no file names are specified, bzip2 compresses from
168 standard input to standard output. In this case, bzip2
169 will decline to write compressed output to a terminal, as
170 this would be entirely incomprehensible and therefore
171 pointless.
172
173 bunzip2 (or bzip2 -d ) decompresses and restores all spec-
174 ified files whose names end in ".bz2". Files without this
175 suffix are ignored. Again, supplying no filenames causes
176 decompression from standard input to standard output.
177
178 bunzip2 will correctly decompress a file which is the con-
179 catenation of two or more compressed files. The result is
180 the concatenation of the corresponding uncompressed files.
181 Integrity testing (-t) of concatenated compressed files is
182 also supported.
183
184 You can also compress or decompress files to the standard
185 output by giving the -c flag. Multiple files may be com-
186 pressed and decompressed like this. The resulting outputs
187 are fed sequentially to stdout. Compression of multiple
188 files in this manner generates a stream containing multi-
189 ple compressed file representations. Such a stream can be
190 decompressed correctly only by bzip2 version 0.9.0 or
191 later. Earlier versions of bzip2 will stop after decom-
192 pressing the first file in the stream.
193
194 bzcat (or bzip2 -dc ) decompresses all specified files to
195 the standard output.
196
197 Compression is always performed, even if the compressed
198 file is slightly larger than the original. Files of less
199 than about one hundred bytes tend to get larger, since the
200 compression mechanism has a constant overhead in the
201 region of 50 bytes. Random data (including the output of
202 most file compressors) is coded at about 8.05 bits per
203 byte, giving an expansion of around 0.5%.
204
205 As a self-check for your protection, bzip2 uses 32-bit
206 CRCs to make sure that the decompressed version of a file
207 is identical to the original. This guards against corrup-
208 tion of the compressed data, and against undetected bugs
209 in bzip2 (hopefully very unlikely). The chances of data
210 corruption going undetected is microscopic, about one
211 chance in four billion for each file processed. Be aware,
212 though, that the check occurs upon decompression, so it
213 can only tell you that that something is wrong. It can't
214 help you recover the original uncompressed data. You can
215 use bzip2recover to try to recover data from damaged
216 files.
217
218 Return values: 0 for a normal exit, 1 for environmental
219 problems (file not found, invalid flags, I/O errors, &c),
220 2 to indicate a corrupt compressed file, 3 for an internal
221 consistency error (eg, bug) which caused bzip2 to panic.
222
223
224MEMORY MANAGEMENT
225 Bzip2 compresses large files in blocks. The block size
226 affects both the compression ratio achieved, and the
227 amount of memory needed both for compression and decom-
228 pression. The flags -1 through -9 specify the block size
229 to be 100,000 bytes through 900,000 bytes (the default)
230 respectively. At decompression-time, the block size used
231 for compression is read from the header of the compressed
232 file, and bunzip2 then allocates itself just enough memory
233 to decompress the file. Since block sizes are stored in
234 compressed files, it follows that the flags -1 to -9 are
235 irrelevant to and so ignored during decompression.
236
237 Compression and decompression requirements, in bytes, can
238 be estimated as:
239
240 Compression: 400k + ( 7 x block size )
241
242 Decompression: 100k + ( 4 x block size ), or
243 100k + ( 2.5 x block size )
244
245 Larger block sizes give rapidly diminishing marginal
246 returns; most of the compression comes from the first two
247 or three hundred k of block size, a fact worth bearing in
248 mind when using bzip2 on small machines. It is also
249 important to appreciate that the decompression memory
250 requirement is set at compression-time by the choice of
251 block size.
252
253 For files compressed with the default 900k block size,
254 bunzip2 will require about 3700 kbytes to decompress. To
255 support decompression of any file on a 4 megabyte machine,
256 bunzip2 has an option to decompress using approximately
257 half this amount of memory, about 2300 kbytes. Decompres-
258 sion speed is also halved, so you should use this option
259 only where necessary. The relevant flag is -s.
260
261 In general, try and use the largest block size memory con-
262 straints allow, since that maximises the compression
263 achieved. Compression and decompression speed are virtu-
264 ally unaffected by block size.
265
266 Another significant point applies to files which fit in a
267 single block -- that means most files you'd encounter
268 using a large block size. The amount of real memory
269 touched is proportional to the size of the file, since the
270 file is smaller than a block. For example, compressing a
271 file 20,000 bytes long with the flag -9 will cause the
272 compressor to allocate around 6700k of memory, but only
273 touch 400k + 20000 * 7 = 540 kbytes of it. Similarly, the
274 decompressor will allocate 3700k but only touch 100k +
275 20000 * 4 = 180 kbytes.
276
277 Here is a table which summarises the maximum memory usage
278 for different block sizes. Also recorded is the total
279 compressed size for 14 files of the Calgary Text Compres-
280 sion Corpus totalling 3,141,622 bytes. This column gives
281 some feel for how compression varies with block size.
282 These figures tend to understate the advantage of larger
283 block sizes for larger files, since the Corpus is domi-
284 nated by smaller files.
285
286 Compress Decompress Decompress Corpus
287 Flag usage usage -s usage Size
288
289 -1 1100k 500k 350k 914704
290 -2 1800k 900k 600k 877703
291 -3 2500k 1300k 850k 860338
292 -4 3200k 1700k 1100k 846899
293 -5 3900k 2100k 1350k 845160
294 -6 4600k 2500k 1600k 838626
295 -7 5400k 2900k 1850k 834096
296 -8 6000k 3300k 2100k 828642
297 -9 6700k 3700k 2350k 828642
298
299
300OPTIONS
301 -c --stdout
302 Compress or decompress to standard output. -c will
303 decompress multiple files to stdout, but will only
304 compress a single file to stdout.
305
306 -d --decompress
307 Force decompression. bzip2, bunzip2 and bzcat are
308 really the same program, and the decision about
309 what actions to take is done on the basis of which
310 name is used. This flag overrides that mechanism,
311 and forces bzip2 to decompress.
312
313 -z --compress
314 The complement to -d: forces compression, regard-
315 less of the invokation name.
316
317 -t --test
318 Check integrity of the specified file(s), but don't
319 decompress them. This really performs a trial
320 decompression and throws away the result.
321
322 -f --force
323 Force overwrite of output files. Normally, bzip2
324 will not overwrite existing output files.
325
326 -k --keep
327 Keep (don't delete) input files during compression
328 or decompression.
329
330 -s --small
331 Reduce memory usage, for compression, decompression
332 and testing. Files are decompressed and tested
333 using a modified algorithm which only requires 2.5
334 bytes per block byte. This means any file can be
335 decompressed in 2300k of memory, albeit at about
336 half the normal speed.
337
338 During compression, -s selects a block size of
339 200k, which limits memory use to around the same
340 figure, at the expense of your compression ratio.
341 In short, if your machine is low on memory (8
342 megabytes or less), use -s for everything. See
343 MEMORY MANAGEMENT above.
344
345 -v --verbose
346 Verbose mode -- show the compression ratio for each
347 file processed. Further -v's increase the ver-
348 bosity level, spewing out lots of information which
349 is primarily of interest for diagnostic purposes.
350
351 -L --license -V --version
352 Display the software version, license terms and
353 conditions.
354
355 -1 to -9
356 Set the block size to 100 k, 200 k .. 900 k when
357 compressing. Has no effect when decompressing.
358 See MEMORY MANAGEMENT above.
359
360 --repetitive-fast
361 bzip2 injects some small pseudo-random variations
362 into very repetitive blocks to limit worst-case
363 performance during compression. If sorting runs
364 into difficulties, the block is randomised, and
365 sorting is restarted. Very roughly, bzip2 persists
366 for three times as long as a well-behaved input
367 would take before resorting to randomisation. This
368 flag makes it give up much sooner.
369
370 --repetitive-best
371 Opposite of --repetitive-fast; try a lot harder
372 before resorting to randomisation.
373
374
375RECOVERING DATA FROM DAMAGED FILES
376 bzip2 compresses files in blocks, usually 900kbytes long.
377 Each block is handled independently. If a media or trans-
378 mission error causes a multi-block .bz2 file to become
379 damaged, it may be possible to recover data from the
380 undamaged blocks in the file.
381
382 The compressed representation of each block is delimited
383 by a 48-bit pattern, which makes it possible to find the
384 block boundaries with reasonable certainty. Each block
385 also carries its own 32-bit CRC, so damaged blocks can be
386 distinguished from undamaged ones.
387
388 bzip2recover is a simple program whose purpose is to
389 search for blocks in .bz2 files, and write each block out
390 into its own .bz2 file. You can then use bzip2 -t to test
391 the integrity of the resulting files, and decompress those
392 which are undamaged.
393
394 bzip2recover takes a single argument, the name of the dam-
395 aged file, and writes a number of files "rec0001file.bz2",
396 "rec0002file.bz2", etc, containing the extracted blocks.
397 The output filenames are designed so that the use of
398 wildcards in subsequent processing -- for example, "bzip2
399 -dc rec*file.bz2 > recovered_data" -- lists the files in
400 the "right" order.
401
402 bzip2recover should be of most use dealing with large .bz2
403 files, as these will contain many blocks. It is clearly
404 futile to use it on damaged single-block files, since a
405 damaged block cannot be recovered. If you wish to min-
406 imise any potential data loss through media or transmis-
407 sion errors, you might consider compressing with a smaller
408 block size.
409
410
411PERFORMANCE NOTES
412 The sorting phase of compression gathers together similar
413 strings in the file. Because of this, files containing
414 very long runs of repeated symbols, like "aabaabaabaab
415 ..." (repeated several hundred times) may compress
416 extraordinarily slowly. You can use the -vvvvv option to
417 monitor progress in great detail, if you want. Decompres-
418 sion speed is unaffected.
419
420 Such pathological cases seem rare in practice, appearing
421 mostly in artificially-constructed test files, and in low-
422 level disk images. It may be inadvisable to use bzip2 to
423 compress the latter. If you do get a file which causes
424 severe slowness in compression, try making the block size
425 as small as possible, with flag -1.
426
427 bzip2 usually allocates several megabytes of memory to
428 operate in, and then charges all over it in a fairly ran-
429 dom fashion. This means that performance, both for com-
430 pressing and decompressing, is largely determined by the
431 speed at which your machine can service cache misses.
432 Because of this, small changes to the code to reduce the
433 miss rate have been observed to give disproportionately
434 large performance improvements. I imagine bzip2 will per-
435 form best on machines with very large caches.
436
437
438CAVEATS
439 I/O error messages are not as helpful as they could be.
440 Bzip2 tries hard to detect I/O errors and exit cleanly,
441 but the details of what the problem is sometimes seem
442 rather misleading.
443
444 This manual page pertains to version 0.9.0 of bzip2. Com-
445 pressed data created by this version is entirely forwards
446 and backwards compatible with the previous public release,
447 version 0.1pl2, but with the following exception: 0.9.0
448 can correctly decompress multiple concatenated compressed
449 files. 0.1pl2 cannot do this; it will stop after decom-
450 pressing just the first file in the stream.
451
452 Wildcard expansion for Windows 95 and NT is flaky.
453
454 bzip2recover uses 32-bit integers to represent bit posi-
455 tions in compressed files, so it cannot handle compressed
456 files more than 512 megabytes long. This could easily be
457 fixed.
458
459
460AUTHOR
461 Julian Seward, jseward@@acm.org.
462
463 The ideas embodied in bzip2 are due to (at least) the fol-
464 lowing people: Michael Burrows and David Wheeler (for the
465 block sorting transformation), David Wheeler (again, for
466 the Huffman coder), Peter Fenwick (for the structured cod-
467 ing model in the original bzip, and many refinements), and
468 Alistair Moffat, Radford Neal and Ian Witten (for the
469 arithmetic coder in the original bzip). I am much
470 indebted for their help, support and advice. See the man-
471 ual in the source distribution for pointers to sources of
472 documentation. Christian von Roques encouraged me to look
473 for faster sorting algorithms, so as to speed up compres-
474 sion. Bela Lubkin encouraged me to improve the worst-case
475 compression performance. Many people sent patches, helped
476 with portability problems, lent machines, gave advice and
477 were generally helpful.
478@end example
479
480
481
482
483
484@chapter Programming with @code{libbzip2}
485
486This chapter describes the programming interface to @code{libbzip2}.
487
488For general background information, particularly about memory
489use and performance aspects, you'd be well advised to read Chapter 2
490as well.
491
492@section Top-level structure
493
494@code{libbzip2} is a flexible library for compressing and decompressing
495data in the @code{bzip2} data format. Although packaged as a single
496entity, it helps to regard the library as three separate parts: the low
497level interface, and the high level interface, and some utility
498functions.
499
500The structure of @code{libbzip2}'s interfaces is similar to
501that of Jean-loup Gailly's and Mark Adler's excellent @code{zlib}
502library.
503
504@subsection Low-level summary
505
506This interface provides services for compressing and decompressing
507data in memory. There's no provision for dealing with files, streams
508or any other I/O mechanisms, just straight memory-to-memory work.
509In fact, this part of the library can be compiled without inclusion
510of @code{stdio.h}, which may be helpful for embedded applications.
511
512The low-level part of the library has no global variables and
513is therefore thread-safe.
514
515Six routines make up the low level interface:
516@code{bzCompressInit}, @code{bzCompress}, and @* @code{bzCompressEnd}
517for compression,
518and a corresponding trio @code{bzDecompressInit}, @* @code{bzDecompress}
519and @code{bzDecompressEnd} for decompression.
520The @code{*Init} functions allocate
521memory for compression/decompression and do other
522initialisations, whilst the @code{*End} functions close down operations
523and release memory.
524
525The real work is done by @code{bzCompress} and @code{bzDecompress}.
526These compress/decompress data from a user-supplied input buffer
527to a user-supplied output buffer. These buffers can be any size;
528arbitrary quantities of data are handled by making repeated calls
529to these functions. This is a flexible mechanism allowing a
530consumer-pull style of activity, or producer-push, or a mixture of
531both.
532
533
534
535@subsection High-level summary
536
537This interface provides some handy wrappers around the low-level
538interface to facilitate reading and writing @code{bzip2} format
539files (@code{.bz2} files). The routines provide hooks to facilitate
540reading files in which the @code{bzip2} data stream is embedded
541within some larger-scale file structure, or where there are
542multiple @code{bzip2} data streams concatenated end-to-end.
543
544For reading files, @code{bzReadOpen}, @code{bzRead}, @code{bzReadClose}
545and @code{bzReadGetUnused} are supplied. For writing files,
546@code{bzWriteOpen}, @code{bzWrite} and @code{bzWriteFinish} are
547available.
548
549As with the low-level library, no global variables are used
550so the library is per se thread-safe. However, if I/O errors
551occur whilst reading or writing the underlying compressed files,
552you may have to consult @code{errno} to determine the cause of
553the error. In that case, you'd need a C library which correctly
554supports @code{errno} in a multithreaded environment.
555
556To make the library a little simpler and more portable,
557@code{bzReadOpen} and @code{bzWriteOpen} require you to pass them file
558handles (@code{FILE*}s) which have previously been opened for reading or
559writing respectively. That avoids portability problems associated with
560file operations and file attributes, whilst not being much of an
561imposition on the programmer.
562
563
564
565@subsection Utility functions summary
566For very simple needs, @code{bzBuffToBuffCompress} and
567@code{bzBuffToBuffDecompress} are provided. These compress
568data in memory from one buffer to another buffer in a single
569function call. You should assess whether these functions
570fulfill your memory-to-memory compression/decompression
571requirements before investing effort in understanding the more
572general but more complex low-level interface.
573
574Yoshioka Tsuneo (@code{QWF00133@@niftyserve.or.jp} /
575@code{tsuneo-y@@is.aist-nara.ac.jp}) has contributed some functions to
576give better @code{zlib} compatibility. These functions are
577@code{bzopen}, @code{bzread}, @code{bzwrite}, @code{bzflush},
578@code{bzclose},
579@code{bzerror} and @code{bzlibVersion}. You may find these functions
580more convenient for simple file reading and writing, than those in the
581high-level interface. These functions are not (yet) officially part of
582the library, and are not further documented here. If they break, you
583get to keep all the pieces. I hope to document them properly when time
584permits.
585
586Yoshioka also contributed modifications to allow the library to be
587built as a Windows DLL.
588
589
590@section Error handling
591
592The library is designed to recover cleanly in all situations, including
593the worst-case situation of decompressing random data. I'm not
594100% sure that it can always do this, so you might want to add
595a signal handler to catch segmentation violations during decompression
596if you are feeling especially paranoid. I would be interested in
597hearing more about the robustness of the library to corrupted
598compressed data.
599
600The file @code{bzlib.h} contains all definitions needed to use
601the library. In particular, you should definitely not include
602@code{bzlib_private.h}.
603
604In @code{bzlib.h}, the various return values are defined. The following
605list is not intended as an exhaustive description of the circumstances
606in which a given value may be returned -- those descriptions are given
607later. Rather, it is intended to convey the rough meaning of each
608return value. The first five actions are normal and not intended to
609denote an error situation.
610@table @code
611@item BZ_OK
612The requested action was completed successfully.
613@item BZ_RUN_OK
614@itemx BZ_FLUSH_OK
615@itemx BZ_FINISH_OK
616In @code{bzCompress}, the requested flush/finish/nothing-special action
617was completed successfully.
618@item BZ_STREAM_END
619Compression of data was completed, or the logical stream end was
620detected during decompression.
621@end table
622
623The following return values indicate an error of some kind.
624@table @code
625@item BZ_SEQUENCE_ERROR
626When using the library, it is important to call the functions in the
627correct sequence and with data structures (buffers etc) in the correct
628states. @code{libbzip2} checks as much as it can to ensure this is
629happening, and returns @code{BZ_SEQUENCE_ERROR} if not. Code which
630complies precisely with the function semantics, as detailed below,
631should never receive this value; such an event denotes buggy code
632which you should investigate.
633@item BZ_PARAM_ERROR
634Returned when a parameter to a function call is out of range
635or otherwise manifestly incorrect. As with @code{BZ_SEQUENCE_ERROR},
636this denotes a bug in the client code. The distinction between
637@code{BZ_PARAM_ERROR} and @code{BZ_SEQUENCE_ERROR} is a bit hazy, but still worth
638making.
639@item BZ_MEM_ERROR
640Returned when a request to allocate memory failed. Note that the
641quantity of memory needed to decompress a stream cannot be determined
642until the stream's header has been read. So @code{bzDecompress} and
643@code{bzRead} may return @code{BZ_MEM_ERROR} even though some of
644the compressed data has been read. The same is not true for
645compression; once @code{bzCompressInit} or @code{bzWriteOpen} have
646successfully completed, @code{BZ_MEM_ERROR} cannot occur.
647@item BZ_DATA_ERROR
648Returned when a data integrity error is detected during decompression.
649Most importantly, this means when stored and computed CRCs for the
650data do not match. This value is also returned upon detection of any
651other anomaly in the compressed data.
652@item BZ_DATA_ERROR_MAGIC
653As a special case of @code{BZ_DATA_ERROR}, it is sometimes useful to
654know when the compressed stream does not start with the correct
655magic bytes (@code{'B' 'Z' 'h'}).
656@item BZ_IO_ERROR
657Returned by @code{bzRead} and @code{bzRead} when there is an error
658reading or writing in the compressed file, and by @code{bzReadOpen}
659and @code{bzWriteOpen} for attempts to use a file for which the
660error indicator (viz, @code{ferror(f)}) is set.
661On receipt of @code{BZ_IO_ERROR}, the caller should consult
662@code{errno} and/or @code{perror} to acquire operating-system
663specific information about the problem.
664@item BZ_UNEXPECTED_EOF
665Returned by @code{bzRead} when the compressed file finishes
666before the logical end of stream is detected.
667@item BZ_OUTBUFF_FULL
668Returned by @code{bzBuffToBuffCompress} and
669@code{bzBuffToBuffDecompress} to indicate that the output data
670will not fit into the output buffer provided.
671@end table
672
673
674
675@section Low-level interface
676
677@subsection @code{bzCompressInit}
678@example
679typedef
680 struct @{
681 char *next_in;
682 unsigned int avail_in;
683 unsigned int total_in;
684
685 char *next_out;
686 unsigned int avail_out;
687 unsigned int total_out;
688
689 void *state;
690
691 void *(*bzalloc)(void *,int,int);
692 void (*bzfree)(void *,void *);
693 void *opaque;
694 @}
695 bz_stream;
696
697int bzCompressInit ( bz_stream *strm,
698 int blockSize100k,
699 int verbosity,
700 int workFactor );
701
702@end example
703
704Prepares for compression. The @code{bz_stream} structure
705holds all data pertaining to the compression activity.
706A @code{bz_stream} structure should be allocated and initialised
707prior to the call.
708The fields of @code{bz_stream}
709comprise the entirety of the user-visible data. @code{state}
710is a pointer to the private data structures required for compression.
711
712Custom memory allocators are supported, via fields @code{bzalloc},
713@code{bzfree},
714and @code{opaque}. The value
715@code{opaque} is passed to as the first argument to
716all calls to @code{bzalloc} and @code{bzfree}, but is
717otherwise ignored by the library.
718The call @code{bzalloc ( opaque, n, m )} is expected to return a
719pointer @code{p} to
720@code{n * m} bytes of memory, and @code{bzfree ( opaque, p )}
721should free
722that memory.
723
724If you don't want to use a custom memory allocator, set @code{bzalloc},
725@code{bzfree} and
726@code{opaque} to @code{NULL},
727and the library will then use the standard @code{malloc}/@code{free}
728routines.
729
730Before calling @code{bzCompressInit}, fields @code{bzalloc},
731@code{bzfree} and @code{opaque} should
732be filled appropriately, as just described. Upon return, the internal
733state will have been allocated and initialised, and @code{total_in} and
734@code{total_out} will have been set to zero.
735These last two fields are used by the library
736to inform the caller of the total amount of data passed into and out of
737the library, respectively. You should not try to change them.
738
739Parameter @code{blockSize100k} specifies the block size to be used for
740compression. It should be a value between 1 and 9 inclusive, and the
741actual block size used is 100000 x this figure. 9 gives the best
742compression but takes most memory.
743
744Parameter @code{verbosity} should be set to a number between 0 and 4
745inclusive. 0 is silent, and greater numbers give increasingly verbose
746monitoring/debugging output. If the library has been compiled with
747@code{-DBZ_NO_STDIO}, no such output will appear for any verbosity
748setting.
749
750Parameter @code{workFactor} controls how the compression phase behaves
751when presented with worst case, highly repetitive, input data.
752If compression runs into difficulties caused by repetitive data,
753some pseudo-random variations are inserted into the block, and
754compression is restarted. Lower values of @code{workFactor}
755reduce the tolerance of compression to repetitive data.
756You should set this parameter carefully; too low, and
757compression ratio suffers, too high, and your average-to-worst
758case compression times can become very large.
759The default value of 30
760gives reasonable behaviour over a wide range of circumstances.
761
762Allowable values range from 0 to 250 inclusive. 0 is a special
763case, equivalent to using the default value of 30.
764
765Note that the randomisation process is entirely transparent.
766If the library decides to randomise and restart compression on a
767block, it does so without comment. Randomised blocks are
768automatically de-randomised during decompression, so data
769integrity is never compromised.
770
771Possible return values:
772@display
773 @code{BZ_PARAM_ERROR}
774 if @code{strm} is @code{NULL}
775 or @code{blockSize} < 1 or @code{blockSize} > 9
776 or @code{verbosity} < 0 or @code{verbosity} > 4
777 or @code{workFactor} < 0 or @code{workFactor} > 250
778 @code{BZ_MEM_ERROR}
779 if not enough memory is available
780 @code{BZ_OK}
781 otherwise
782@end display
783Allowable next actions:
784@display
785 @code{bzCompress}
786 if @code{BZ_OK} is returned
787 no specific action needed in case of error
788@end display
789
790@subsection @code{bzCompress}
791@example
792 int bzCompress ( bz_stream *strm, int action );
793@end example
794Provides more input and/or output buffer space for the library. The
795caller maintains input and output buffers, and calls @code{bzCompress} to
796transfer data between them.
797
798Before each call to @code{bzCompress}, @code{next_in} should point at
799the data to be compressed, and @code{avail_in} should indicate how many
800bytes the library may read. @code{bzCompress} updates @code{next_in},
801@code{avail_in} and @code{total_in} to reflect the number of bytes it
802has read.
803
804Similarly, @code{next_out} should point to a buffer in which the
805compressed data is to be placed, with @code{avail_out} indicating how
806much output space is available. @code{bzCompress} updates
807@code{next_out}, @code{avail_out} and @code{total_out} to reflect the
808number of bytes output.
809
810You may provide and remove as little or as much data as you like on each
811call of @code{bzCompress}. In the limit, it is acceptable to supply and
812remove data one byte at a time, although this would be terribly
813inefficient. You should always ensure that at least one byte of output
814space is available at each call.
815
816A second purpose of @code{bzCompress} is to request a change of mode of the
817compressed stream.
818
819Conceptually, a compressed stream can be in one of four states: IDLE,
820RUNNING, FLUSHING and FINISHING. Before initialisation
821(@code{bzCompressInit}) and after termination (@code{bzCompressEnd}), a
822stream is regarded as IDLE.
823
824Upon initialisation (@code{bzCompressInit}), the stream is placed in the
825RUNNING state. Subsequent calls to @code{bzCompress} should pass
826@code{BZ_RUN} as the requested action; other actions are illegal and
827will result in @code{BZ_SEQUENCE_ERROR}.
828
829At some point, the calling program will have provided all the input data
830it wants to. It will then want to finish up -- in effect, asking the
831library to process any data it might have buffered internally. In this
832state, @code{bzCompress} will no longer attempt to read data from
833@code{next_in}, but it will want to write data to @code{next_out}.
834Because the output buffer supplied by the user can be arbitrarily small,
835the finishing-up operation cannot necessarily be done with a single call
836of @code{bzCompress}.
837
838Instead, the calling program passes @code{BZ_FINISH} as an action to
839@code{bzCompress}. This changes the stream's state to FINISHING. Any
840remaining input (ie, @code{next_in[0 .. avail_in-1]}) is compressed and
841transferred to the output buffer. To do this, @code{bzCompress} must be
842called repeatedly until all the output has been consumed. At that
843point, @code{bzCompress} returns @code{BZ_STREAM_END}, and the stream's
844state is set back to IDLE. @code{bzCompressEnd} should then be
845called.
846
847Just to make sure the calling program does not cheat, the library makes
848a note of @code{avail_in} at the time of the first call to
849@code{bzCompress} which has @code{BZ_FINISH} as an action (ie, at the
850time the program has announced its intention to not supply any more
851input). By comparing this value with that of @code{avail_in} over
852subsequent calls to @code{bzCompress}, the library can detect any
853attempts to slip in more data to compress. Any calls for which this is
854detected will return @code{BZ_SEQUENCE_ERROR}. This indicates a
855programming mistake which should be corrected.
856
857Instead of asking to finish, the calling program may ask
858@code{bzCompress} to take all the remaining input, compress it and
859terminate the current (Burrows-Wheeler) compression block. This could
860be useful for error control purposes. The mechanism is analogous to
861that for finishing: call @code{bzCompress} with an action of
862@code{BZ_FLUSH}, remove output data, and persist with the
863@code{BZ_FLUSH} action until the value @code{BZ_RUN} is returned. As
864with finishing, @code{bzCompress} detects any attempt to provide more
865input data once the flush has begun.
866
867Once the flush is complete, the stream returns to the normal RUNNING
868state.
869
870This all sounds pretty complex, but isn't really. Here's a table
871which shows which actions are allowable in each state, what action
872will be taken, what the next state is, and what the non-error return
873values are. Note that you can't explicitly ask what state the
874stream is in, but nor do you need to -- it can be inferred from the
875values returned by @code{bzCompress}.
876@display
877IDLE/@code{any}
878 Illegal. IDLE state only exists after @code{bzCompressEnd} or
879 before @code{bzCompressInit}.
880 Return value = @code{BZ_SEQUENCE_ERROR}
881
882RUNNING/@code{BZ_RUN}
883 Compress from @code{next_in} to @code{next_out} as much as possible.
884 Next state = RUNNING
885 Return value = @code{BZ_RUN_OK}
886
887RUNNING/@code{BZ_FLUSH}
888 Remember current value of @code{next_in}. Compress from @code{next_in}
889 to @code{next_out} as much as possible, but do not accept any more input.
890 Next state = FLUSHING
891 Return value = @code{BZ_FLUSH_OK}
892
893RUNNING/@code{BZ_FINISH}
894 Remember current value of @code{next_in}. Compress from @code{next_in}
895 to @code{next_out} as much as possible, but do not accept any more input.
896 Next state = FINISHING
897 Return value = @code{BZ_FINISH_OK}
898
899FLUSHING/@code{BZ_FLUSH}
900 Compress from @code{next_in} to @code{next_out} as much as possible,
901 but do not accept any more input.
902 If all the existing input has been used up and all compressed
903 output has been removed
904 Next state = RUNNING; Return value = @code{BZ_RUN_OK}
905 else
906 Next state = FLUSHING; Return value = @code{BZ_FLUSH_OK}
907
908FLUSHING/other
909 Illegal.
910 Return value = @code{BZ_SEQUENCE_ERROR}
911
912FINISHING/@code{BZ_FINISH}
913 Compress from @code{next_in} to @code{next_out} as much as possible,
914 but to not accept any more input.
915 If all the existing input has been used up and all compressed
916 output has been removed
917 Next state = IDLE; Return value = @code{BZ_STREAM_END}
918 else
919 Next state = FINISHING; Return value = @code{BZ_FINISHING}
920
921FINISHING/other
922 Illegal.
923 Return value = @code{BZ_SEQUENCE_ERROR}
924@end display
925
926That still looks complicated? Well, fair enough. The usual sequence
927of calls for compressing a load of data is:
928@itemize @bullet
929@item Get started with @code{bzCompressInit}.
930@item Shovel data in and shlurp out its compressed form using zero or more
931calls of @code{bzCompress} with action = @code{BZ_RUN}.
932@item Finish up.
933Repeatedly call @code{bzCompress} with action = @code{BZ_FINISH},
934copying out the compressed output, until @code{BZ_STREAM_END} is returned.
935@item Close up and go home. Call @code{bzCompressEnd}.
936@end itemize
937If the data you want to compress fits into your input buffer all
938at once, you can skip the calls of @code{bzCompress ( ..., BZ_RUN )} and
939just do the @code{bzCompress ( ..., BZ_FINISH )} calls.
940
941All required memory is allocated by @code{bzCompressInit}. The
942compression library can accept any data at all (obviously). So you
943shouldn't get any error return values from the @code{bzCompress} calls.
944If you do, they will be @code{BZ_SEQUENCE_ERROR}, and indicate a bug in
945your programming.
946
947Trivial other possible return values:
948@display
949 @code{BZ_PARAM_ERROR}
950 if @code{strm} is @code{NULL}, or @code{strm->s} is @code{NULL}
951@end display
952
953@subsection @code{bzCompressEnd}
954@example
955int bzCompressEnd ( bz_stream *strm );
956@end example
957Releases all memory associated with a compression stream.
958
959Possible return values:
960@display
961 @code{BZ_PARAM_ERROR} if @code{strm} is @code{NULL} or @code{strm->s} is @code{NULL}
962 @code{BZ_OK} otherwise
963@end display
964
965
966@subsection @code{bzDecompressInit}
967@example
968int bzDecompressInit ( bz_stream *strm, int verbosity, int small );
969@end example
970Prepares for decompression. As with @code{bzCompressInit}, a
971@code{bz_stream} record should be allocated and initialised before the
972call. Fields @code{bzalloc}, @code{bzfree} and @code{opaque} should be
973set if a custom memory allocator is required, or made @code{NULL} for
974the normal @code{malloc}/@code{free} routines. Upon return, the internal
975state will have been initialised, and @code{total_in} and
976@code{total_out} will be zero.
977
978For the meaning of parameter @code{verbosity}, see @code{bzCompressInit}.
979
980If @code{small} is nonzero, the library will use an alternative
981decompression algorithm which uses less memory but at the cost of
982decompressing more slowly (roughly speaking, half the speed, but the
983maximum memory requirement drops to around 2300k). See Chapter 2 for
984more information on memory management.
985
986Note that the amount of memory needed to decompress
987a stream cannot be determined until the stream's header has been read,
988so even if @code{bzDecompressInit} succeeds, a subsequent
989@code{bzDecompress} could fail with @code{BZ_MEM_ERROR}.
990
991Possible return values:
992@display
993 @code{BZ_PARAM_ERROR}
994 if @code{(small != 0 && small != 1)}
995 or @code{(verbosity < 0 || verbosity > 4)}
996 @code{BZ_MEM_ERROR}
997 if insufficient memory is available
998@end display
999
1000Allowable next actions:
1001@display
1002 @code{bzDecompress}
1003 if @code{BZ_OK} was returned
1004 no specific action required in case of error
1005@end display
1006
1007
1008
1009@subsection @code{bzDecompress}
1010@example
1011int bzDecompress ( bz_stream *strm );
1012@end example
1013Provides more input and/out output buffer space for the library. The
1014caller maintains input and output buffers, and uses @code{bzDecompress}
1015to transfer data between them.
1016
1017Before each call to @code{bzDecompress}, @code{next_in}
1018should point at the compressed data,
1019and @code{avail_in} should indicate how many bytes the library
1020may read. @code{bzDecompress} updates @code{next_in}, @code{avail_in}
1021and @code{total_in}
1022to reflect the number of bytes it has read.
1023
1024Similarly, @code{next_out} should point to a buffer in which the uncompressed
1025output is to be placed, with @code{avail_out} indicating how much output space
1026is available. @code{bzCompress} updates @code{next_out},
1027@code{avail_out} and @code{total_out} to reflect
1028the number of bytes output.
1029
1030You may provide and remove as little or as much data as you like on
1031each call of @code{bzDecompress}.
1032In the limit, it is acceptable to
1033supply and remove data one byte at a time, although this would be
1034terribly inefficient. You should always ensure that at least one
1035byte of output space is available at each call.
1036
1037Use of @code{bzDecompress} is simpler than @code{bzCompress}.
1038
1039You should provide input and remove output as described above, and
1040repeatedly call @code{bzDecompress} until @code{BZ_STREAM_END} is
1041returned. Appearance of @code{BZ_STREAM_END} denotes that
1042@code{bzDecompress} has detected the logical end of the compressed
1043stream. @code{bzDecompress} will not produce @code{BZ_STREAM_END} until
1044all output data has been placed into the output buffer, so once
1045@code{BZ_STREAM_END} appears, you are guaranteed to have available all
1046the decompressed output, and @code{bzDecompressEnd} can safely be
1047called.
1048
1049If case of an error return value, you should call @code{bzDecompressEnd}
1050to clean up and release memory.
1051
1052Possible return values:
1053@display
1054 @code{BZ_PARAM_ERROR}
1055 if @code{strm} is @code{NULL} or @code{strm->s} is @code{NULL}
1056 or @code{strm->avail_out < 1}
1057 @code{BZ_DATA_ERROR}
1058 if a data integrity error is detected in the compressed stream
1059 @code{BZ_DATA_ERROR_MAGIC}
1060 if the compressed stream doesn't begin with the right magic bytes
1061 @code{BZ_MEM_ERROR}
1062 if there wasn't enough memory available
1063 @code{BZ_STREAM_END}
1064 if the logical end of the data stream was detected and all
1065 output in has been consumed, eg @code{s->avail_out > 0}
1066 @code{BZ_OK}
1067 otherwise
1068@end display
1069Allowable next actions:
1070@display
1071 @code{bzDecompress}
1072 if @code{BZ_OK} was returned
1073 @code{bzDecompressEnd}
1074 otherwise
1075@end display
1076
1077
1078@subsection @code{bzDecompressEnd}
1079@example
1080int bzDecompressEnd ( bz_stream *strm );
1081@end example
1082Releases all memory associated with a decompression stream.
1083
1084Possible return values:
1085@display
1086 @code{BZ_PARAM_ERROR}
1087 if @code{strm} is @code{NULL} or @code{strm->s} is @code{NULL}
1088 @code{BZ_OK}
1089 otherwise
1090@end display
1091
1092Allowable next actions:
1093@display
1094 None.
1095@end display
1096
1097
1098@section High-level interface
1099
1100This interface provides functions for reading and writing
1101@code{bzip2} format files. First, some general points.
1102
1103@itemize @bullet
1104@item All of the functions take an @code{int*} first argument,
1105 @code{bzerror}.
1106 After each call, @code{bzerror} should be consulted first to determine
1107 the outcome of the call. If @code{bzerror} is @code{BZ_OK},
1108 the call completed
1109 successfully, and only then should the return value of the function
1110 (if any) be consulted. If @code{bzerror} is @code{BZ_IO_ERROR},
1111 there was an error
1112 reading/writing the underlying compressed file, and you should
1113 then consult @code{errno}/@code{perror} to determine the
1114 cause of the difficulty.
1115 @code{bzerror} may also be set to various other values; precise details are
1116 given on a per-function basis below.
1117@item If @code{bzerror} indicates an error
1118 (ie, anything except @code{BZ_OK} and @code{BZ_STREAM_END}),
1119 you should immediately call @code{bzReadClose} (or @code{bzWriteClose},
1120 depending on whether you are attempting to read or to write)
1121 to free up all resources associated
1122 with the stream. Once an error has been indicated, behaviour of all calls
1123 except @code{bzReadClose} (@code{bzWriteClose}) is undefined.
1124 The implication is that (1) @code{bzerror} should
1125 be checked after each call, and (2) if @code{bzerror} indicates an error,
1126 @code{bzReadClose} (@code{bzWriteClose}) should then be called to clean up.
1127@item The @code{FILE*} arguments passed to
1128 @code{bzReadOpen}/@code{bzWriteOpen}
1129 should be set to binary mode.
1130 Most Unix systems will do this by default, but other platforms,
1131 including Windows and Mac, will not. If you omit this, you may
1132 encounter problems when moving code to new platforms.
1133@item Memory allocation requests are handled by
1134 @code{malloc}/@code{free}.
1135 At present
1136 there is no facility for user-defined memory allocators in the file I/O
1137 functions (could easily be added, though).
1138@end itemize
1139
1140
1141
1142@subsection @code{bzReadOpen}
1143@example
1144 typedef void BZFILE;
1145
1146 BZFILE *bzReadOpen ( int *bzerror, FILE *f,
1147 int small, int verbosity,
1148 void *unused, int nUnused );
1149@end example
1150Prepare to read compressed data from file handle @code{f}. @code{f}
1151should refer to a file which has been opened for reading, and for which
1152the error indicator (@code{ferror(f)})is not set. If @code{small} is 1,
1153the library will try to decompress using less memory, at the expense of
1154speed.
1155
1156For reasons explained below, @code{bzRead} will decompress the
1157@code{nUnused} bytes starting at @code{unused}, before starting to read
1158from the file @code{f}. At most @code{BZ_MAX_UNUSED} bytes may be
1159supplied like this. If this facility is not required, you should pass
1160@code{NULL} and @code{0} for @code{unused} and n@code{Unused}
1161respectively.
1162
1163For the meaning of parameters @code{small} and @code{verbosity},
1164see @code{bzDecompressInit}.
1165
1166The amount of memory needed to decompress a file cannot be determined
1167until the file's header has been read. So it is possible that
1168@code{bzReadOpen} returns @code{BZ_OK} but a subsequent call of
1169@code{bzRead} will return @code{BZ_MEM_ERROR}.
1170
1171Possible assignments to @code{bzerror}:
1172@display
1173 @code{BZ_PARAM_ERROR}
1174 if @code{f} is @code{NULL}
1175 or @code{small} is neither @code{0} nor @code{1}
1176 or @code{(unused == NULL && nUnused != 0)}
1177 or @code{(unused != NULL && !(0 <= nUnused <= BZ_MAX_UNUSED))}
1178 @code{BZ_IO_ERROR}
1179 if @code{ferror(f)} is nonzero
1180 @code{BZ_MEM_ERROR}
1181 if insufficient memory is available
1182 @code{BZ_OK}
1183 otherwise.
1184@end display
1185
1186Possible return values:
1187@display
1188 Pointer to an abstract @code{BZFILE}
1189 if @code{bzerror} is @code{BZ_OK}
1190 @code{NULL}
1191 otherwise
1192@end display
1193
1194Allowable next actions:
1195@display
1196 @code{bzRead}
1197 if @code{bzerror} is @code{BZ_OK}
1198 @code{bzClose}
1199 otherwise
1200@end display
1201
1202
1203@subsection @code{bzRead}
1204@example
1205 int bzRead ( int *bzerror, BZFILE *b, void *buf, int len );
1206@end example
1207Reads up to @code{len} (uncompressed) bytes from the compressed file
1208@code{b} into
1209the buffer @code{buf}. If the read was successful,
1210@code{bzerror} is set to @code{BZ_OK}
1211and the number of bytes read is returned. If the logical end-of-stream
1212was detected, @code{bzerror} will be set to @code{BZ_STREAM_END},
1213and the number
1214of bytes read is returned. All other @code{bzerror} values denote an error.
1215
1216@code{bzRead} will supply @code{len} bytes,
1217unless the logical stream end is detected
1218or an error occurs. Because of this, it is possible to detect the
1219stream end by observing when the number of bytes returned is
1220less than the number
1221requested. Nevertheless, this is regarded as inadvisable; you should
1222instead check @code{bzerror} after every call and watch out for
1223@code{BZ_STREAM_END}.
1224
1225Internally, @code{bzRead} copies data from the compressed file in chunks
1226of size @code{BZ_MAX_UNUSED} bytes
1227before decompressing it. If the file contains more bytes than strictly
1228needed to reach the logical end-of-stream, @code{bzRead} will almost certainly
1229read some of the trailing data before signalling @code{BZ_SEQUENCE_END}.
1230To collect the read but unused data once @code{BZ_SEQUENCE_END} has
1231appeared, call @code{bzReadGetUnused} immediately before @code{bzReadClose}.
1232
1233Possible assignments to @code{bzerror}:
1234@display
1235 @code{BZ_PARAM_ERROR}
1236 if @code{b} is @code{NULL} or @code{buf} is @code{NULL} or @code{len < 0}
1237 @code{BZ_SEQUENCE_ERROR}
1238 if @code{b} was opened with @code{bzWriteOpen}
1239 @code{BZ_IO_ERROR}
1240 if there is an error reading from the compressed file
1241 @code{BZ_UNEXPECTED_EOF}
1242 if the compressed file ended before the logical end-of-stream was detected
1243 @code{BZ_DATA_ERROR}
1244 if a data integrity error was detected in the compressed stream
1245 @code{BZ_DATA_ERROR_MAGIC}
1246 if the stream does not begin with the requisite header bytes (ie, is not
1247 a @code{bzip2} data file). This is really a special case of @code{BZ_DATA_ERROR}.
1248 @code{BZ_MEM_ERROR}
1249 if insufficient memory was available
1250 @code{BZ_STREAM_END}
1251 if the logical end of stream was detected.
1252 @code{BZ_OK}
1253 otherwise.
1254@end display
1255
1256Possible return values:
1257@display
1258 number of bytes read
1259 if @code{bzerror} is @code{BZ_OK} or @code{BZ_STREAM_END}
1260 undefined
1261 otherwise
1262@end display
1263
1264Allowable next actions:
1265@display
1266 collect data from @code{buf}, then @code{bzRead} or @code{bzReadClose}
1267 if @code{bzerror} is @code{BZ_OK}
1268 collect data from @code{buf}, then @code{bzReadClose} or @code{bzReadGetUnused}
1269 if @code{bzerror} is @code{BZ_SEQUENCE_END}
1270 @code{bzReadClose}
1271 otherwise
1272@end display
1273
1274
1275
1276@subsection @code{bzReadGetUnused}
1277@example
1278 void bzReadGetUnused ( int* bzerror, BZFILE *b,
1279 void** unused, int* nUnused );
1280@end example
1281Returns data which was read from the compressed file but was not needed
1282to get to the logical end-of-stream. @code{*unused} is set to the address
1283of the data, and @code{*nUnused} to the number of bytes. @code{*nUnused} will
1284be set to a value between @code{0} and @code{BZ_MAX_UNUSED} inclusive.
1285
1286This function may only be called once @code{bzRead} has signalled
1287@code{BZ_STREAM_END} but before @code{bzReadClose}.
1288
1289Possible assignments to @code{bzerror}:
1290@display
1291 @code{BZ_PARAM_ERROR}
1292 if @code{b} is @code{NULL}
1293 or @code{unused} is @code{NULL} or @code{nUnused} is @code{NULL}
1294 @code{BZ_SEQUENCE_ERROR}
1295 if @code{BZ_STREAM_END} has not been signalled
1296 or if @code{b} was opened with @code{bzWriteOpen}
1297 @code{BZ_OK}
1298 otherwise
1299@end display
1300
1301Allowable next actions:
1302@display
1303 @code{bzReadClose}
1304@end display
1305
1306
1307@subsection @code{bzReadClose}
1308@example
1309 void bzReadClose ( int *bzerror, BZFILE *b );
1310@end example
1311Releases all memory pertaining to the compressed file @code{b}.
1312@code{bzReadClose} does not call @code{fclose} on the underlying file
1313handle, so you should do that yourself if appropriate.
1314@code{bzReadClose} should be called to clean up after all error
1315situations.
1316
1317Possible assignments to @code{bzerror}:
1318@display
1319 @code{BZ_SEQUENCE_ERROR}
1320 if @code{b} was opened with @code{bzOpenWrite}
1321 @code{BZ_OK}
1322 otherwise
1323@end display
1324
1325Allowable next actions:
1326@display
1327 none
1328@end display
1329
1330
1331
1332@subsection @code{bzWriteOpen}
1333@example
1334 BZFILE *bzWriteOpen ( int *bzerror, FILE *f,
1335 int blockSize100k, int verbosity,
1336 int workFactor );
1337@end example
1338Prepare to write compressed data to file handle @code{f}.
1339@code{f} should refer to
1340a file which has been opened for writing, and for which the error
1341indicator (@code{ferror(f)})is not set.
1342
1343For the meaning of parameters @code{blockSize100k},
1344@code{verbosity} and @code{workFactor}, see
1345@* @code{bzCompressInit}.
1346
1347All required memory is allocated at this stage, so if the call
1348completes successfully, @code{BZ_MEM_ERROR} cannot be signalled by a
1349subsequent call to @code{bzWrite}.
1350
1351Possible assignments to @code{bzerror}:
1352@display
1353 @code{BZ_PARAM_ERROR}
1354 if @code{f} is @code{NULL}
1355 or @code{blockSize100k < 1} or @code{blockSize100k > 9}
1356 @code{BZ_IO_ERROR}
1357 if @code{ferror(f)} is nonzero
1358 @code{BZ_MEM_ERROR}
1359 if insufficient memory is available
1360 @code{BZ_OK}
1361 otherwise
1362@end display
1363
1364Possible return values:
1365@display
1366 Pointer to an abstract @code{BZFILE}
1367 if @code{bzerror} is @code{BZ_OK}
1368 @code{NULL}
1369 otherwise
1370@end display
1371
1372Allowable next actions:
1373@display
1374 @code{bzWrite}
1375 if @code{bzerror} is @code{BZ_OK}
1376 (you could go directly to @code{bzWriteClose}, but this would be pretty pointless)
1377 @code{bzWriteClose}
1378 otherwise
1379@end display
1380
1381
1382
1383@subsection @code{bzWrite}
1384@example
1385 void bzWrite ( int *bzerror, BZFILE *b, void *buf, int len );
1386@end example
1387Absorbs @code{len} bytes from the buffer @code{buf}, eventually to be
1388compressed and written to the file.
1389
1390Possible assignments to @code{bzerror}:
1391@display
1392 @code{BZ_PARAM_ERROR}
1393 if @code{b} is @code{NULL} or @code{buf} is @code{NULL} or @code{len < 0}
1394 @code{BZ_SEQUENCE_ERROR}
1395 if b was opened with @code{bzReadOpen}
1396 @code{BZ_IO_ERROR}
1397 if there is an error writing the compressed file.
1398 @code{BZ_OK}
1399 otherwise
1400@end display
1401
1402
1403
1404
1405@subsection @code{bzWriteClose}
1406@example
1407 int bzWriteClose ( int *bzerror, BZFILE* f,
1408 int abandon,
1409 unsigned int* nbytes_in,
1410 unsigned int* nbytes_out );
1411@end example
1412
1413Compresses and flushes to the compressed file all data so far supplied
1414by @code{bzWrite}. The logical end-of-stream markers are also written, so
1415subsequent calls to @code{bzWrite} are illegal. All memory associated
1416with the compressed file @code{b} is released.
1417@code{fflush} is called on the
1418compressed file, but it is not @code{fclose}'d.
1419
1420If @code{bzWriteClose} is called to clean up after an error, the only
1421action is to release the memory. The library records the error codes
1422issued by previous calls, so this situation will be detected
1423automatically. There is no attempt to complete the compression
1424operation, nor to @code{fflush} the compressed file. You can force this
1425behaviour to happen even in the case of no error, by passing a nonzero
1426value to @code{abandon}.
1427
1428If @code{nbytes_in} is non-null, @code{*nbytes_in} will be set to be the
1429total volume of uncompressed data handled. Similarly, @code{nbytes_out}
1430will be set to the total volume of compressed data written.
1431
1432Possible assignments to @code{bzerror}:
1433@display
1434 @code{BZ_SEQUENCE_ERROR}
1435 if @code{b} was opened with @code{bzReadOpen}
1436 @code{BZ_IO_ERROR}
1437 if there is an error writing the compressed file
1438 @code{BZ_OK}
1439 otherwise
1440@end display
1441
1442@subsection Handling embedded compressed data streams
1443
1444The high-level library facilitates use of
1445@code{bzip2} data streams which form some part of a surrounding, larger
1446data stream.
1447@itemize @bullet
1448@item For writing, the library takes an open file handle, writes
1449compressed data to it, @code{fflush}es it but does not @code{fclose} it.
1450The calling application can write its own data before and after the
1451compressed data stream, using that same file handle.
1452@item Reading is more complex, and the facilities are not as general
1453as they could be since generality is hard to reconcile with efficiency.
1454@code{bzRead} reads from the compressed file in blocks of size
1455@code{BZ_MAX_UNUSED} bytes, and in doing so probably will overshoot
1456the logical end of compressed stream.
1457To recover this data once decompression has
1458ended, call @code{bzReadGetUnused} after the last call of @code{bzRead}
1459(the one returning @code{BZ_STREAM_END}) but before calling
1460@code{bzReadClose}.
1461@end itemize
1462
1463This mechanism makes it easy to decompress multiple @code{bzip2}
1464streams placed end-to-end. As the end of one stream, when @code{bzRead}
1465returns @code{BZ_STREAM_END}, call @code{bzReadGetUnused} to collect the
1466unused data (copy it into your own buffer somewhere).
1467That data forms the start of the next compressed stream.
1468To start uncompressing that next stream, call @code{bzReadOpen} again,
1469feeding in the unused data via the @code{unused}/@code{nUnused}
1470parameters.
1471Keep doing this until @code{BZ_STREAM_END} return coincides with the
1472physical end of file (@code{feof(f)}). In this situation
1473@code{bzReadGetUnused}
1474will of course return no data.
1475
1476This should give some feel for how the high-level interface can be used.
1477If you require extra flexibility, you'll have to bite the bullet and get
1478to grips with the low-level interface.
1479
1480@subsection Standard file-reading/writing code
1481Here's how you'd write data to a compressed file:
1482@example @code
1483FILE* f;
1484BZFILE* b;
1485int nBuf;
1486char buf[ /* whatever size you like */ ];
1487int bzerror;
1488int nWritten;
1489
1490f = fopen ( "myfile.bz2", "w" );
1491if (!f) @{
1492 /* handle error */
1493@}
1494b = bzWriteOpen ( &bzerror, f, 9 );
1495if (bzerror != BZ_OK) @{
1496 bzWriteClose ( b );
1497 /* handle error */
1498@}
1499
1500while ( /* condition */ ) @{
1501 /* get data to write into buf, and set nBuf appropriately */
1502 nWritten = bzWrite ( &bzerror, b, buf, nBuf );
1503 if (bzerror == BZ_IO_ERROR) @{
1504 bzWriteClose ( &bzerror, b );
1505 /* handle error */
1506 @}
1507@}
1508
1509bzWriteClose ( &bzerror, b );
1510if (bzerror == BZ_IO_ERROR) @{
1511 /* handle error */
1512@}
1513@end example
1514And to read from a compressed file:
1515@example
1516FILE* f;
1517BZFILE* b;
1518int nBuf;
1519char buf[ /* whatever size you like */ ];
1520int bzerror;
1521int nWritten;
1522
1523f = fopen ( "myfile.bz2", "r" );
1524if (!f) @{
1525 /* handle error */
1526@}
1527b = bzReadOpen ( &bzerror, f, 0, NULL, 0 );
1528if (bzerror != BZ_OK) @{
1529 bzReadClose ( &bzerror, b );
1530 /* handle error */
1531@}
1532
1533bzerror = BZ_OK;
1534while (bzerror == BZ_OK && /* arbitrary other conditions */) @{
1535 nBuf = bzRead ( &bzerror, b, buf, /* size of buf */ );
1536 if (bzerror == BZ_OK) @{
1537 /* do something with buf[0 .. nBuf-1] */
1538 @}
1539@}
1540if (bzerror != BZ_STREAM_END) @{
1541 bzReadClose ( &bzerror, b );
1542 /* handle error */
1543@} else @{
1544 bzReadClose ( &bzerror );
1545@}
1546@end example
1547
1548
1549
1550@section Utility functions
1551@subsection @code{bzBuffToBuffCompress}
1552@example
1553 int bzBuffToBuffCompress( char* dest,
1554 unsigned int* destLen,
1555 char* source,
1556 unsigned int sourceLen,
1557 int blockSize100k,
1558 int verbosity,
1559 int workFactor );
1560@end example
1561Attempts to compress the data in @code{source[0 .. sourceLen-1]}
1562into the destination buffer, @code{dest[0 .. *destLen-1]}.
1563If the destination buffer is big enough, @code{*destLen} is
1564set to the size of the compressed data, and @code{BZ_OK} is
1565returned. If the compressed data won't fit, @code{*destLen}
1566is unchanged, and @code{BZ_OUTBUFF_FULL} is returned.
1567
1568Compression in this manner is a one-shot event, done with a single call
1569to this function. The resulting compressed data is a complete
1570@code{bzip2} format data stream. There is no mechanism for making
1571additional calls to provide extra input data. If you want that kind of
1572mechanism, use the low-level interface.
1573
1574For the meaning of parameters @code{blockSize100k}, @code{verbosity}
1575and @code{workFactor}, @* see @code{bzCompressInit}.
1576
1577To guarantee that the compressed data will fit in its buffer, allocate
1578an output buffer of size 1% larger than the uncompressed data, plus
1579six hundred extra bytes.
1580
1581@code{bzBuffToBuffDecompress} will not write data at or
1582beyond @code{dest[*destLen]}, even in case of buffer overflow.
1583
1584Possible return values:
1585@display
1586 @code{BZ_PARAM_ERROR}
1587 if @code{dest} is @code{NULL} or @code{destLen} is @code{NULL}
1588 or @code{blockSize100k < 1} or @code{blockSize100k > 9}
1589 or @code{verbosity < 0} or @code{verbosity > 4}
1590 or @code{workFactor < 0} or @code{workFactor > 250}
1591 @code{BZ_MEM_ERROR}
1592 if insufficient memory is available
1593 @code{BZ_OUTBUFF_FULL}
1594 if the size of the compressed data exceeds @code{*destLen}
1595 @code{BZ_OK}
1596 otherwise
1597@end display
1598
1599
1600
1601@subsection @code{bzBuffToBuffDecompress}
1602@example
1603 int bzBuffToBuffDecompress ( char* dest,
1604 unsigned int* destLen,
1605 char* source,
1606 unsigned int sourceLen,
1607 int small,
1608 int verbosity );
1609@end example
1610Attempts to decompress the data in @code{source[0 .. sourceLen-1]}
1611into the destination buffer, @code{dest[0 .. *destLen-1]}.
1612If the destination buffer is big enough, @code{*destLen} is
1613set to the size of the uncompressed data, and @code{BZ_OK} is
1614returned. If the compressed data won't fit, @code{*destLen}
1615is unchanged, and @code{BZ_OUTBUFF_FULL} is returned.
1616
1617@code{source} is assumed to hold a complete @code{bzip2} format
1618data stream. @code{bzBuffToBuffDecompress} tries to decompress
1619the entirety of the stream into the output buffer.
1620
1621For the meaning of parameters @code{small} and @code{verbosity},
1622see @code{bzDecompressInit}.
1623
1624Because the compression ratio of the compressed data cannot be known in
1625advance, there is no easy way to guarantee that the output buffer will
1626be big enough. You may of course make arrangements in your code to
1627record the size of the uncompressed data, but such a mechanism is beyond
1628the scope of this library.
1629
1630@code{bzBuffToBuffDecompress} will not write data at or
1631beyond @code{dest[*destLen]}, even in case of buffer overflow.
1632
1633Possible return values:
1634@display
1635 @code{BZ_PARAM_ERROR}
1636 if @code{dest} is @code{NULL} or @code{destLen} is @code{NULL}
1637 or @code{small != 0 && small != 1}
1638 or @code{verbosity < 0} or @code{verbosity > 4}
1639 @code{BZ_MEM_ERROR}
1640 if insufficient memory is available
1641 @code{BZ_OUTBUFF_FULL}
1642 if the size of the compressed data exceeds @code{*destLen}
1643 @code{BZ_DATA_ERROR}
1644 if a data integrity error was detected in the compressed data
1645 @code{BZ_DATA_ERROR_MAGIC}
1646 if the compressed data doesn't begin with the right magic bytes
1647 @code{BZ_UNEXPECTED_EOF}
1648 if the compressed data ends unexpectedly
1649 @code{BZ_OK}
1650 otherwise
1651@end display
1652
1653
1654
1655@section Using the library in a @code{stdio}-free environment
1656
1657@subsection Getting rid of @code{stdio}
1658
1659In a deeply embedded application, you might want to use just
1660the memory-to-memory functions. You can do this conveniently
1661by compiling the library with preprocessor symbol @code{BZ_NO_STDIO}
1662defined. Doing this gives you a library containing only the following
1663eight functions:
1664
1665@code{bzCompressInit}, @code{bzCompress}, @code{bzCompressEnd} @*
1666@code{bzDecompressInit}, @code{bzDecompress}, @code{bzDecompressEnd} @*
1667@code{bzBuffToBuffCompress}, @code{bzBuffToBuffDecompress}
1668
1669When compiled like this, all functions will ignore @code{verbosity}
1670settings.
1671
1672@subsection Critical error handling
1673@code{libbzip2} contains a number of internal assertion checks which
1674should, needless to say, never be activated. Nevertheless, if an
1675assertion should fail, behaviour depends on whether or not the library
1676was compiled with @code{BZ_NO_STDIO} set.
1677
1678For a normal compile, an assertion failure yields the message
1679@example
1680 bzip2/libbzip2, v0.9.0: internal error number N.
1681 This is a bug in bzip2/libbzip2, v0.9.0. Please report
1682 it to me at: jseward@@acm.org. If this happened when
1683 you were using some program which uses libbzip2 as a
1684 component, you should also report this bug to the author(s)
1685 of that program. Please make an effort to report this bug;
1686 timely and accurate bug reports eventually lead to higher
1687 quality software. Thx. Julian Seward, 27 June 1998.
1688@end example
1689where @code{N} is some error code number. @code{exit(3)}
1690is then called.
1691
1692For a @code{stdio}-free library, assertion failures result
1693in a call to a function declared as:
1694@example
1695 extern void bz_internal_error ( int errcode );
1696@end example
1697The relevant code is passed as a parameter. You should supply
1698such a function.
1699
1700In either case, once an assertion failure has occurred, any
1701@code{bz_stream} records involved can be regarded as invalid.
1702You should not attempt to resume normal operation with them.
1703
1704You may, of course, change critical error handling to suit
1705your needs. As I said above, critical errors indicate bugs
1706in the library and should not occur. All "normal" error
1707situations are indicated via error return codes from functions,
1708and can be recovered from.
1709
1710
1711@section Making a Windows DLL
1712Everything related to Windows has been contributed by Yoshioka Tsuneo
1713@* (@code{QWF00133@@niftyserve.or.jp} /
1714@code{tsuneo-y@@is.aist-nara.ac.jp}), so you should send your queries to
1715him (but perhaps Cc: me, @code{jseward@@acm.org}).
1716
1717My vague understanding of what to do is: using Visual C++ 5.0,
1718open the project file @code{libbz2.dsp}, and build. That's all.
1719
1720If you can't
1721open the project file for some reason, make a new one, naming these files:
1722@code{blocksort.c}, @code{bzlib.c}, @code{compress.c},
1723@code{crctable.c}, @code{decompress.c}, @code{huffman.c}, @*
1724@code{randtable.c} and @code{libbz2.def}. You might also need
1725to name the header files @code{bzlib.h} and @code{bzlib_private.h}.
1726
1727If you don't use VC++, you may need to define the proprocessor symbol
1728@code{_WIN32}.
1729
1730Finally, @code{dlltest.c} is a sample program using the DLL. It has a
1731project file, @code{dlltest.dsp}.
1732
1733I haven't tried any of this stuff myself, but it all looks plausible.
1734
1735
1736
1737@chapter Miscellanea
1738
1739These are just some random thoughts of mine. Your mileage may
1740vary.
1741
1742@section Limitations of the compressed file format
1743@code{bzip2-0.9.0} uses exactly the same file format as the previous
1744version, @code{bzip2-0.1}. This decision was made in the interests of
1745stability. Creating yet another incompatible compressed file format
1746would create further confusion and disruption for users.
1747
1748Nevertheless, this is not a painless decision. Development
1749work since the release of @code{bzip2-0.1} in August 1997
1750has shown complexities in the file format which slow down
1751decompression and, in retrospect, are unnecessary. These are:
1752@itemize @bullet
1753@item The run-length encoder, which is the first of the
1754 compression transformations, is entirely irrelevant.
1755 The original purpose was to protect the sorting algorithm
1756 from the very worst case input: a string of repeated
1757 symbols. But algorithm steps Q6a and Q6b in the original
1758 Burrows-Wheeler technical report (SRC-124) show how
1759 repeats can be handled without difficulty in block
1760 sorting.
1761@item The randomisation mechanism doesn't really need to be
1762 there. Udi Manber and Gene Myers published a suffix
1763 array construction algorithm a few years back, which
1764 can be employed to sort any block, no matter how
1765 repetitive, in O(N log N) time. Subsequent work by
1766 Kunihiko Sadakane has produced a derivative O(N (log N)^2)
1767 algorithm which usually outperforms the Manber-Myers
1768 algorithm.
1769
1770 I could have changed to Sadakane's algorithm, but I find
1771 it to be slower than @code{bzip2}'s existing algorithm for
1772 most inputs, and the randomisation mechanism protects
1773 adequately against bad cases. I didn't think it was
1774 a good tradeoff to make. Partly this is due to the fact
1775 that I was not flooded with email complaints about
1776 @code{bzip2-0.1}'s performance on repetitive data, so
1777 perhaps it isn't a problem for real inputs.
1778
1779 Probably the best long-term solution
1780 is to use the existing sorting
1781 algorithm initially, and fall back to a O(N (log N)^2)
1782 algorithm if the standard algorithm gets into difficulties.
1783 This can be done without much difficulty; I made
1784 a prototype implementation of it some months now.
1785@item The compressed file format was never designed to be
1786 handled by a library, and I have had to jump though
1787 some hoops to produce an efficient implementation of
1788 decompression. It's a bit hairy. Try passing
1789 @code{decompress.c} through the C preprocessor
1790 and you'll see what I mean. Much of this complexity
1791 could have been avoided if the compressed size of
1792 each block of data was recorded in the data stream.
1793@item An Adler-32 checksum, rather than a CRC32 checksum,
1794 would be faster to compute.
1795@end itemize
1796It would be fair to say that the @code{bzip2} format was frozen
1797before I properly and fully understood the performance
1798consequences of doing so.
1799
1800Improvements which I have been able to incorporate into
18010.9.0, despite using the same file format, are:
1802@itemize @bullet
1803@item Single array implementation of the inverse BWT. This
1804 significantly speeds up decompression, presumably
1805 because it reduces the number of cache misses.
1806@item Faster inverse MTF transform for large MTF values. The
1807 new implementation is based on the notion of sliding blocks
1808 of values.
1809@item @code{bzip2-0.9.0} now reads and writes files with @code{fread}
1810 and @code{fwrite}; version 0.1 used @code{putc} and @code{getc}.
1811 Duh! I'm embarrassed at my own moronicness (moronicity?) on this
1812 one.
1813
1814@end itemize
1815Further ahead, it would be nice
1816to be able to do random access into files. This will
1817require some careful design of compressed file formats.
1818
1819
1820
1821@section Portability issues
1822After some consideration, I have decided not to use
1823GNU @code{autoconf} to configure 0.9.0.
1824
1825@code{autoconf}, admirable and wonderful though it is,
1826mainly assists with portability problems between Unix-like
1827platforms. But @code{bzip2} doesn't have much in the way
1828of portability problems on Unix; most of the difficulties appear
1829when porting to the Mac, or to Microsoft's operating systems.
1830@code{autoconf} doesn't help in those cases, and brings in a
1831whole load of new complexity.
1832
1833Most people should be able to compile the library and program
1834under Unix straight out-of-the-box, so to speak, especially
1835if you have a version of GNU C available.
1836
1837There are a couple of @code{__inline__} directives in the code. GNU C
1838(@code{gcc}) should be able to handle them. If your compiler doesn't
1839like them, just @code{#define} @code{__inline__} to be null. One
1840easy way to do this is to compile with the flag @code{-D__inline__=},
1841which should be understood by most Unix compilers.
1842
1843If you still have difficulties, try compiling with the macro
1844@code{BZ_STRICT_ANSI} defined. This should enable you to build the
1845library in a strictly ANSI compliant environment. Building the program
1846itself like this is dangerous and not supported, since you remove
1847@code{bzip2}'s checks against compressing directories, symbolic links,
1848devices, and other not-really-a-file entities. This could cause
1849filesystem corruption!
1850
1851One other thing: if you create a @code{bzip2} binary for public
1852distribution, please try and link it statically (@code{gcc -s}). This
1853avoids all sorts of library-version issues that others may encounter
1854later on.
1855
1856
1857@section Reporting bugs
1858I tried pretty hard to make sure @code{bzip2} is
1859bug free, both by design and by testing. Hopefully
1860you'll never need to read this section for real.
1861
1862Nevertheless, if @code{bzip2} dies with a segmentation
1863fault, a bus error or an internal assertion failure, it
1864will ask you to email me a bug report. Experience with
1865version 0.1 shows that almost all these problems can
1866be traced to either compiler bugs or hardware problems.
1867@itemize @bullet
1868@item
1869Recompile the program with no optimisation, and see if it
1870works. And/or try a different compiler.
1871I heard all sorts of stories about various flavours
1872of GNU C (and other compilers) generating bad code for
1873@code{bzip2}, and I've run across two such examples myself.
1874
18752.7.X versions of GNU C are known to generate bad code from
1876time to time, at high optimisation levels.
1877If you get problems, try using the flags
1878@code{-O2} @code{-fomit-frame-pointer} @code{-fno-strength-reduce}.
1879You should specifically @emph{not} use @code{-funroll-loops}.
1880
1881You may notice that the Makefile runs four tests as part of
1882the build process. If the program passes all of these, it's
1883a pretty good (but not 100%) indication that the compiler has
1884done its job correctly.
1885@item
1886If @code{bzip2} crashes randomly, and the crashes are not
1887repeatable, you may have a flaky memory subsystem. @code{bzip2}
1888really hammers your memory hierarchy, and if it's a bit marginal,
1889you may get these problems. Ditto if your disk or I/O subsystem
1890is slowly failing. Yup, this really does happen.
1891
1892Try using a different machine of the same type, and see if
1893you can repeat the problem.
1894@item This isn't really a bug, but ... If @code{bzip2} tells
1895you your file is corrupted on decompression, and you
1896obtained the file via FTP, there is a possibility that you
1897forgot to tell FTP to do a binary mode transfer. That absolutely
1898will cause the file to be non-decompressible. You'll have to transfer
1899it again.
1900@end itemize
1901
1902If you've incorporated @code{libbzip2} into your own program
1903and are getting problems, please, please, please, check that the
1904parameters you are passing in calls to the library, are
1905correct, and in accordance with what the documentation says
1906is allowable. I have tried to make the library robust against
1907such problems, but I'm sure I haven't succeeded.
1908
1909Finally, if the above comments don't help, you'll have to send
1910me a bug report. Now, it's just amazing how many people will
1911send me a bug report saying something like
1912@display
1913 bzip2 crashed with segmentation fault on my machine
1914@end display
1915and absolutely nothing else. Needless to say, a such a report
1916is @emph{totally, utterly, completely and comprehensively 100% useless;
1917a waste of your time, my time, and net bandwidth}.
1918With no details at all, there's no way I can possibly begin
1919to figure out what the problem is.
1920
1921The rules of the game are: facts, facts, facts. Don't omit
1922them because "oh, they won't be relevant". At the bare
1923minimum:
1924@display
1925 Machine type. Operating system version.
1926 Exact version of @code{bzip2} (do @code{bzip2 -V}).
1927 Exact version of the compiler used.
1928 Flags passed to the compiler.
1929@end display
1930However, the most important single thing that will help me is
1931the file that you were trying to compress or decompress at the
1932time the problem happened. Without that, my ability to do anything
1933more than speculate about the cause, is limited.
1934
1935Please remember that I connect to the Internet with a modem, so
1936you should contact me before mailing me huge files.
1937
1938
1939@section Did you get the right package?
1940
1941@code{bzip2} is a resource hog. It soaks up large amounts of CPU cycles
1942and memory. Also, it gives very large latencies. In the worst case, you
1943can feed many megabytes of uncompressed data into the library before
1944getting any compressed output, so this probably rules out applications
1945requiring interactive behaviour.
1946
1947These aren't faults of my implementation, I hope, but more
1948an intrinsic property of the Burrows-Wheeler transform (unfortunately).
1949Maybe this isn't what you want.
1950
1951If you want a compressor and/or library which is faster, uses less
1952memory but gets pretty good compression, and has minimal latency,
1953consider Jean-loup
1954Gailly's and Mark Adler's work, @code{zlib-1.1.2} and
1955@code{gzip-1.2.4}. Look for them at
1956@code{http://www.cdrom.com/pub/infozip/zlib} and
1957@code{http://www.gzip.org} respectively.
1958
1959For something faster and lighter still, you might try Markus F X J
1960Oberhumer's @code{LZO} real-time compression/decompression library, at
1961@* @code{http://wildsau.idv.uni-linz.ac.at/mfx/lzo.html}.
1962
1963If you want to use the @code{bzip2} algorithms to compress small blocks
1964of data, 64k bytes or smaller, for example on an on-the-fly disk
1965compressor, you'd be well advised not to use this library. Instead,
1966I've made a special library tuned for that kind of use. It's part of
1967@code{e2compr-0.40}, an on-the-fly disk compressor for the Linux
1968@code{ext2} filesystem. Look at
1969@code{http://www.netspace.net.au/~reiter/e2compr}.
1970
1971
1972
1973@section Testing
1974
1975A record of the tests I've done.
1976
1977First, some data sets:
1978@itemize @bullet
1979@item B: a directory containing a 6001 files, one for every length in the
1980 range 0 to 6000 bytes. The files contain random lowercase
1981 letters. 18.7 megabytes.
1982@item H: my home directory tree. Documents, source code, mail files,
1983 compressed data. H contains B, and also a directory of
1984 files designed as boundary cases for the sorting; mostly very
1985 repetitive, nasty files. 445 megabytes.
1986@item A: directory tree holding various applications built from source:
1987 @code{egcs-1.0.2}, @code{gcc-2.8.1}, KDE Beta 4, GTK, Octave, etc.
1988 827 megabytes.
1989@item P: directory tree holding large amounts of source code (@code{.tar}
1990 files) of the entire GNU distribution, plus a couple of
1991 Linux distributions. 2400 megabytes.
1992@end itemize
1993The tests conducted are as follows. Each test means compressing
1994(a copy of) each file in the data set, decompressing it and
1995comparing it against the original.
1996
1997First, a bunch of tests with block sizes, internal buffer
1998sizes and randomisation lengths set very small,
1999to detect any problems with the
2000blocking, buffering and randomisation mechanisms.
2001This required modifying the source code so as to try to
2002break it.
2003@enumerate
2004@item Data set H, with
2005 buffer size of 1 byte, and block size of 23 bytes.
2006@item Data set B, buffer sizes 1 byte, block size 1 byte.
2007@item As (2) but small-mode decompression (first 1700 files).
2008@item As (2) with block size 2 bytes.
2009@item As (2) with block size 3 bytes.
2010@item As (2) with block size 4 bytes.
2011@item As (2) with block size 5 bytes.
2012@item As (2) with block size 6 bytes and small-mode decompression.
2013@item H with normal buffer sizes (5000 bytes), normal block
2014 size (up to 900000 bytes), but with randomisation
2015 mechanism running intensely (randomising approximately every
2016 third byte).
2017@item As (9) with small-mode decompression.
2018@end enumerate
2019Then some tests with unmodified source code.
2020@enumerate
2021@item H, all settings normal.
2022@item As (1), with small-mode decompress.
2023@item H, compress with flag @code{-1}.
2024@item H, compress with flag @code{-s}, decompress with flag @code{-s}.
2025@item Forwards compatibility: H, @code{bzip2-0.1pl2} compressing,
2026 @code{bzip2-0.9.0} decompressing, all settings normal.
2027@item Backwards compatibility: H, @code{bzip2-0.9.0} compressing,
2028 @code{bzip2-0.1pl2} decompressing, all settings normal.
2029@item Bigger tests: A, all settings normal.
2030@item P, all settings normal.
2031@item Misc test: about 100 megabytes of @code{.tar} files with
2032 @code{bzip2} compiled with Purify.
2033@item Misc tests to make sure it builds and runs ok on non-Linux/x86
2034 platforms.
2035@end enumerate
2036These tests were conducted on a 205 MHz Cyrix 6x86MX machine, running
2037Linux 2.0.32. They represent nearly a week of continuous computation.
2038All tests completed successfully.
2039
2040
2041@section Further reading
2042@code{bzip2} is not research work, in the sense that it doesn't present
2043any new ideas. Rather, it's an engineering exercise based on existing
2044ideas.
2045
2046Four documents describe essentially all the ideas behind @code{bzip2}:
2047@example
2048Michael Burrows and D. J. Wheeler:
2049 "A block-sorting lossless data compression algorithm"
2050 10th May 1994.
2051 Digital SRC Research Report 124.
2052 ftp://ftp.digital.com/pub/DEC/SRC/research-reports/SRC-124.ps.gz
2053 If you have trouble finding it, try searching at the
2054 New Zealand Digital Library, http://www.nzdl.org.
2055
2056Daniel S. Hirschberg and Debra A. LeLewer
2057 "Efficient Decoding of Prefix Codes"
2058 Communications of the ACM, April 1990, Vol 33, Number 4.
2059 You might be able to get an electronic copy of this
2060 from the ACM Digital Library.
2061
2062David J. Wheeler
2063 Program bred3.c and accompanying document bred3.ps.
2064 This contains the idea behind the multi-table Huffman
2065 coding scheme.
2066 ftp://ftp.cl.cam.ac.uk/pub/user/djw3/
2067
2068Jon L. Bentley and Robert Sedgewick
2069 "Fast Algorithms for Sorting and Searching Strings"
2070 Available from Sedgewick's web page,
2071 www.cs.princeton.edu/~rs
2072@end example
2073The following paper gives valuable additional insights into the
2074algorithm, but is not immediately the basis of any code
2075used in bzip2.
2076@example
2077Peter Fenwick:
2078 Block Sorting Text Compression
2079 Proceedings of the 19th Australasian Computer Science Conference,
2080 Melbourne, Australia. Jan 31 - Feb 2, 1996.
2081 ftp://ftp.cs.auckland.ac.nz/pub/peter-f/ACSC96paper.ps
2082@end example
2083Kunihiko Sadakane's sorting algorithm, mentioned above,
2084is available from:
2085@example
2086http://naomi.is.s.u-tokyo.ac.jp/~sada/papers/Sada98b.ps.gz
2087@end example
2088The Manber-Myers suffix array construction
2089algorithm is described in a paper
2090available from:
2091@example
2092http://www.cs.arizona.edu/people/gene/PAPERS/suffix.ps
2093@end example
2094
2095
2096
2097@contents
2098
2099@bye
2100
diff --git a/randtable.c b/randtable.c
new file mode 100644
index 0000000..27b34af
--- /dev/null
+++ b/randtable.c
@@ -0,0 +1,124 @@
1
2/*-------------------------------------------------------------*/
3/*--- Table for randomising repetitive blocks ---*/
4/*--- randtable.c ---*/
5/*-------------------------------------------------------------*/
6
7/*--
8 This file is a part of bzip2 and/or libbzip2, a program and
9 library for lossless, block-sorting data compression.
10
11 Copyright (C) 1996-1998 Julian R Seward. All rights reserved.
12
13 Redistribution and use in source and binary forms, with or without
14 modification, are permitted provided that the following conditions
15 are met:
16
17 1. Redistributions of source code must retain the above copyright
18 notice, this list of conditions and the following disclaimer.
19
20 2. The origin of this software must not be misrepresented; you must
21 not claim that you wrote the original software. If you use this
22 software in a product, an acknowledgment in the product
23 documentation would be appreciated but is not required.
24
25 3. Altered source versions must be plainly marked as such, and must
26 not be misrepresented as being the original software.
27
28 4. The name of the author may not be used to endorse or promote
29 products derived from this software without specific prior written
30 permission.
31
32 THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
33 OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
34 WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
35 ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
36 DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
37 DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
38 GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
39 INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
40 WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
41 NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
42 SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
43
44 Julian Seward, Guildford, Surrey, UK.
45 jseward@acm.org
46 bzip2/libbzip2 version 0.9.0c of 18 October 1998
47
48 This program is based on (at least) the work of:
49 Mike Burrows
50 David Wheeler
51 Peter Fenwick
52 Alistair Moffat
53 Radford Neal
54 Ian H. Witten
55 Robert Sedgewick
56 Jon L. Bentley
57
58 For more information on these sources, see the manual.
59--*/
60
61
62#include "bzlib_private.h"
63
64
65/*---------------------------------------------*/
66Int32 rNums[512] = {
67 619, 720, 127, 481, 931, 816, 813, 233, 566, 247,
68 985, 724, 205, 454, 863, 491, 741, 242, 949, 214,
69 733, 859, 335, 708, 621, 574, 73, 654, 730, 472,
70 419, 436, 278, 496, 867, 210, 399, 680, 480, 51,
71 878, 465, 811, 169, 869, 675, 611, 697, 867, 561,
72 862, 687, 507, 283, 482, 129, 807, 591, 733, 623,
73 150, 238, 59, 379, 684, 877, 625, 169, 643, 105,
74 170, 607, 520, 932, 727, 476, 693, 425, 174, 647,
75 73, 122, 335, 530, 442, 853, 695, 249, 445, 515,
76 909, 545, 703, 919, 874, 474, 882, 500, 594, 612,
77 641, 801, 220, 162, 819, 984, 589, 513, 495, 799,
78 161, 604, 958, 533, 221, 400, 386, 867, 600, 782,
79 382, 596, 414, 171, 516, 375, 682, 485, 911, 276,
80 98, 553, 163, 354, 666, 933, 424, 341, 533, 870,
81 227, 730, 475, 186, 263, 647, 537, 686, 600, 224,
82 469, 68, 770, 919, 190, 373, 294, 822, 808, 206,
83 184, 943, 795, 384, 383, 461, 404, 758, 839, 887,
84 715, 67, 618, 276, 204, 918, 873, 777, 604, 560,
85 951, 160, 578, 722, 79, 804, 96, 409, 713, 940,
86 652, 934, 970, 447, 318, 353, 859, 672, 112, 785,
87 645, 863, 803, 350, 139, 93, 354, 99, 820, 908,
88 609, 772, 154, 274, 580, 184, 79, 626, 630, 742,
89 653, 282, 762, 623, 680, 81, 927, 626, 789, 125,
90 411, 521, 938, 300, 821, 78, 343, 175, 128, 250,
91 170, 774, 972, 275, 999, 639, 495, 78, 352, 126,
92 857, 956, 358, 619, 580, 124, 737, 594, 701, 612,
93 669, 112, 134, 694, 363, 992, 809, 743, 168, 974,
94 944, 375, 748, 52, 600, 747, 642, 182, 862, 81,
95 344, 805, 988, 739, 511, 655, 814, 334, 249, 515,
96 897, 955, 664, 981, 649, 113, 974, 459, 893, 228,
97 433, 837, 553, 268, 926, 240, 102, 654, 459, 51,
98 686, 754, 806, 760, 493, 403, 415, 394, 687, 700,
99 946, 670, 656, 610, 738, 392, 760, 799, 887, 653,
100 978, 321, 576, 617, 626, 502, 894, 679, 243, 440,
101 680, 879, 194, 572, 640, 724, 926, 56, 204, 700,
102 707, 151, 457, 449, 797, 195, 791, 558, 945, 679,
103 297, 59, 87, 824, 713, 663, 412, 693, 342, 606,
104 134, 108, 571, 364, 631, 212, 174, 643, 304, 329,
105 343, 97, 430, 751, 497, 314, 983, 374, 822, 928,
106 140, 206, 73, 263, 980, 736, 876, 478, 430, 305,
107 170, 514, 364, 692, 829, 82, 855, 953, 676, 246,
108 369, 970, 294, 750, 807, 827, 150, 790, 288, 923,
109 804, 378, 215, 828, 592, 281, 565, 555, 710, 82,
110 896, 831, 547, 261, 524, 462, 293, 465, 502, 56,
111 661, 821, 976, 991, 658, 869, 905, 758, 745, 193,
112 768, 550, 608, 933, 378, 286, 215, 979, 792, 961,
113 61, 688, 793, 644, 986, 403, 106, 366, 905, 644,
114 372, 567, 466, 434, 645, 210, 389, 550, 919, 135,
115 780, 773, 635, 389, 707, 100, 626, 958, 165, 504,
116 920, 176, 193, 713, 857, 265, 203, 50, 668, 108,
117 645, 990, 626, 197, 510, 357, 358, 850, 858, 364,
118 936, 638
119};
120
121
122/*-------------------------------------------------------------*/
123/*--- end randtable.c ---*/
124/*-------------------------------------------------------------*/
diff --git a/test.bat b/test.bat
deleted file mode 100644
index 30b747d..0000000
--- a/test.bat
+++ /dev/null
@@ -1,9 +0,0 @@
1@rem
2@rem MSDOS test driver for bzip2
3@rem
4type words1
5.\bzip2 -1 < sample1.ref > sample1.rbz
6.\bzip2 -2 < sample2.ref > sample2.rbz
7.\bzip2 -dvv < sample1.bz2 > sample1.tst
8.\bzip2 -dvv < sample2.bz2 > sample2.tst
9type words3sh \ No newline at end of file
diff --git a/test.cmd b/test.cmd
deleted file mode 100644
index f7bc866..0000000
--- a/test.cmd
+++ /dev/null
@@ -1,9 +0,0 @@
1@rem
2@rem OS/2 test driver for bzip2
3@rem
4type words1
5.\bzip2 -1 < sample1.ref > sample1.rbz
6.\bzip2 -2 < sample2.ref > sample2.rbz
7.\bzip2 -dvv < sample1.bz2 > sample1.tst
8.\bzip2 -dvv < sample2.bz2 > sample2.tst
9type words3sh \ No newline at end of file
diff --git a/words0 b/words0
deleted file mode 100644
index 527fb43..0000000
--- a/words0
+++ /dev/null
@@ -1,7 +0,0 @@
1***-------------------------------------------------***
2***--------- IMPORTANT: READ WHAT FOLLOWS! ---------***
3***--------- viz: pay attention :-) ---------***
4***-------------------------------------------------***
5
6Compiling bzip2 ...
7
diff --git a/words1 b/words1
index c75293b..a891431 100644
--- a/words1
+++ b/words1
@@ -1,5 +1,4 @@
1 1
2
3Doing 4 tests (2 compress, 2 uncompress) ... 2Doing 4 tests (2 compress, 2 uncompress) ...
4If there's a problem, things might stop at this point. 3If there's a problem, things might stop at this point.
5 4
diff --git a/words2 b/words2
index d3cafb9..203ee39 100644
--- a/words2
+++ b/words2
@@ -1,5 +1,4 @@
1 1
2
3Checking test results. If any of the four "cmp"s which follow 2Checking test results. If any of the four "cmp"s which follow
4report any differences, something is wrong. If you can't easily 3report any differences, something is wrong. If you can't easily
5figure out what, please let me know (jseward@acm.org). 4figure out what, please let me know (jseward@acm.org).
diff --git a/words3 b/words3
index 5739d18..10bb2e9 100644
--- a/words3
+++ b/words3
@@ -1,23 +1,20 @@
1 1
2
3If you got this far and the "cmp"s didn't find anything amiss, looks 2If you got this far and the "cmp"s didn't find anything amiss, looks
4like you're in business. You should install bzip2 and bunzip2: 3like you're in business. You should install bzip2, bunzip2 and bzcat:
5 4
6 copy bzip2 to a public place, maybe /usr/bin. 5 Copy bzip2 and bzip2recover to a public place, maybe /usr/bin.
7 In that public place, make bunzip2 a symbolic link 6 In that public place, make bunzip2 and bzcat be
8 to the bzip2 you just copied there. 7 symbolic links to the bzip2 you just copied there.
9 Put the manual page, bzip2.1, somewhere appropriate; 8 Put the manual page, bzip2.1, somewhere appropriate;
10 perhaps in /usr/man/man1. 9 perhaps in /usr/man/man1.
11 10
12Complete instructions for use are in the preformatted 11Instructions for use are in the preformatted manual page, in the file
13manual page, in the file bzip2.1.preformatted. 12bzip2.txt. For more detailed documentation, read the full manual.
13It is available in Postscript form (manual.ps) and HTML form
14(manual_toc.html).
14 15
15You can also do "bzip2 --help" to see some helpful information. 16You can also do "bzip2 --help" to see some helpful information.
16
17"bzip2 -L" displays the software license. 17"bzip2 -L" displays the software license.
18 18
19Please read the README file carefully. 19Happy compressing. -- JRS, 30 August 1998.
20Finally, note that bzip2 comes with ABSOLUTELY NO WARRANTY.
21
22Happy compressing!
23 20
diff --git a/words3sh b/words3sh
deleted file mode 100644
index 1139177..0000000
--- a/words3sh
+++ /dev/null
@@ -1,12 +0,0 @@
1If you got this far and the "bzip2 -dvv"s give identical
2stored vs computed CRCs, you're probably in business.
3Complete instructions for use are in the preformatted manual page,
4in the file bzip2.txt.
5
6You can also do "bzip2 --help" to see some helpful information.
7"bzip2 -L" displays the software license.
8
9Please read the README file carefully.
10Finally, note that bzip2 comes with ABSOLUTELY NO WARRANTY.
11
12Happy compressing! \ No newline at end of file