diff options
| author | Julian Seward <jseward@acm.org> | 1998-08-23 22:13:13 +0200 |
|---|---|---|
| committer | Julian Seward <jseward@acm.org> | 1998-08-23 22:13:13 +0200 |
| commit | 977101ad5f833f5c0a574bfeea408e5301a6b052 (patch) | |
| tree | fc1e8fed202869c116cbf6b8c362456042494a0a | |
| parent | 1eb67a9d8f7f05ae310bc9ef297d176f3a3f8a37 (diff) | |
| download | bzip2-0.9.0c.tar.gz bzip2-0.9.0c.tar.bz2 bzip2-0.9.0c.zip | |
bzip2-0.9.0cbzip2-0.9.0c
| -rw-r--r-- | ALGORITHMS | 47 | ||||
| -rw-r--r-- | CHANGES | 45 | ||||
| -rw-r--r-- | LICENSE | 360 | ||||
| -rw-r--r-- | Makefile | 52 | ||||
| -rw-r--r-- | README | 230 | ||||
| -rw-r--r-- | README.DOS | 16 | ||||
| -rw-r--r-- | blocksort.c | 709 | ||||
| -rw-r--r-- | bzip2.1 | 191 | ||||
| -rw-r--r-- | bzip2.1.preformatted | 318 | ||||
| -rw-r--r-- | bzip2.c | 3389 | ||||
| -rw-r--r-- | bzip2.txt | 292 | ||||
| -rw-r--r-- | bzip2recover.c | 125 | ||||
| -rw-r--r-- | bzlib.c | 1512 | ||||
| -rw-r--r-- | bzlib.h | 299 | ||||
| -rw-r--r-- | bzlib_private.h | 523 | ||||
| -rw-r--r-- | compress.c | 588 | ||||
| -rw-r--r-- | crctable.c | 144 | ||||
| -rw-r--r-- | decompress.c | 636 | ||||
| -rw-r--r-- | dlltest.c | 163 | ||||
| -rw-r--r-- | dlltest.dsp | 93 | ||||
| -rw-r--r-- | howbig.c | 37 | ||||
| -rw-r--r-- | huffman.c | 228 | ||||
| -rw-r--r-- | libbz2.def | 25 | ||||
| -rw-r--r-- | libbz2.dsp | 130 | ||||
| -rw-r--r-- | manual.texi | 2100 | ||||
| -rw-r--r-- | randtable.c | 124 | ||||
| -rw-r--r-- | test.bat | 9 | ||||
| -rw-r--r-- | test.cmd | 9 | ||||
| -rw-r--r-- | words0 | 7 | ||||
| -rw-r--r-- | words1 | 1 | ||||
| -rw-r--r-- | words2 | 1 | ||||
| -rw-r--r-- | words3 | 21 | ||||
| -rw-r--r-- | words3sh | 12 |
33 files changed, 8332 insertions, 4104 deletions
diff --git a/ALGORITHMS b/ALGORITHMS deleted file mode 100644 index 7c7d2ca..0000000 --- a/ALGORITHMS +++ /dev/null | |||
| @@ -1,47 +0,0 @@ | |||
| 1 | |||
| 2 | Bzip2 is not research work, in the sense that it doesn't present any | ||
| 3 | new ideas. Rather, it's an engineering exercise based on existing | ||
| 4 | ideas. | ||
| 5 | |||
| 6 | Four documents describe essentially all the ideas behind bzip2: | ||
| 7 | |||
| 8 | Michael Burrows and D. J. Wheeler: | ||
| 9 | "A block-sorting lossless data compression algorithm" | ||
| 10 | 10th May 1994. | ||
| 11 | Digital SRC Research Report 124. | ||
| 12 | ftp://ftp.digital.com/pub/DEC/SRC/research-reports/SRC-124.ps.gz | ||
| 13 | |||
| 14 | Daniel S. Hirschberg and Debra A. LeLewer | ||
| 15 | "Efficient Decoding of Prefix Codes" | ||
| 16 | Communications of the ACM, April 1990, Vol 33, Number 4. | ||
| 17 | You might be able to get an electronic copy of this | ||
| 18 | from the ACM Digital Library. | ||
| 19 | |||
| 20 | David J. Wheeler | ||
| 21 | Program bred3.c and accompanying document bred3.ps. | ||
| 22 | This contains the idea behind the multi-table Huffman | ||
| 23 | coding scheme. | ||
| 24 | ftp://ftp.cl.cam.ac.uk/pub/user/djw3/ | ||
| 25 | |||
| 26 | Jon L. Bentley and Robert Sedgewick | ||
| 27 | "Fast Algorithms for Sorting and Searching Strings" | ||
| 28 | Available from Sedgewick's web page, | ||
| 29 | www.cs.princeton.edu/~rs | ||
| 30 | |||
| 31 | The following paper gives valuable additional insights into the | ||
| 32 | algorithm, but is not immediately the basis of any code | ||
| 33 | used in bzip2. | ||
| 34 | |||
| 35 | Peter Fenwick: | ||
| 36 | Block Sorting Text Compression | ||
| 37 | Proceedings of the 19th Australasian Computer Science Conference, | ||
| 38 | Melbourne, Australia. Jan 31 - Feb 2, 1996. | ||
| 39 | ftp://ftp.cs.auckland.ac.nz/pub/peter-f/ACSC96paper.ps | ||
| 40 | |||
| 41 | All three are well written, and make fascinating reading. If you want | ||
| 42 | to modify bzip2 in any non-trivial way, I strongly suggest you obtain, | ||
| 43 | read and understand these papers. | ||
| 44 | |||
| 45 | I am much indebted to the various authors for their help, support and | ||
| 46 | advice. | ||
| 47 | |||
| @@ -0,0 +1,45 @@ | |||
| 1 | |||
| 2 | |||
| 3 | 0.9.0 | ||
| 4 | ~~~~~ | ||
| 5 | First version. | ||
| 6 | |||
| 7 | |||
| 8 | 0.9.0a | ||
| 9 | ~~~~~~ | ||
| 10 | Removed 'ranlib' from Makefile, since most modern Unix-es | ||
| 11 | don't need it, or even know about it. | ||
| 12 | |||
| 13 | |||
| 14 | 0.9.0b | ||
| 15 | ~~~~~~ | ||
| 16 | Fixed a problem with error reporting in bzip2.c. This does not effect | ||
| 17 | the library in any way. Problem is: versions 0.9.0 and 0.9.0a (of the | ||
| 18 | program proper) compress and decompress correctly, but give misleading | ||
| 19 | error messages (internal panics) when an I/O error occurs, instead of | ||
| 20 | reporting the problem correctly. This shouldn't give any data loss | ||
| 21 | (as far as I can see), but is confusing. | ||
| 22 | |||
| 23 | Made the inline declarations disappear for non-GCC compilers. | ||
| 24 | |||
| 25 | |||
| 26 | 0.9.0c | ||
| 27 | ~~~~~~ | ||
| 28 | Fixed some problems in the library pertaining to some boundary cases. | ||
| 29 | This makes the library behave more correctly in those situations. The | ||
| 30 | fixes apply only to features (calls and parameters) not used by | ||
| 31 | bzip2.c, so the non-fixedness of them in previous versions has no | ||
| 32 | effect on reliability of bzip2.c. | ||
| 33 | |||
| 34 | In bzlib.c: | ||
| 35 | * made zero-length BZ_FLUSH work correctly in bzCompress(). | ||
| 36 | * fixed bzWrite/bzRead to ignore zero-length requests. | ||
| 37 | * fixed bzread to correctly handle read requests after EOF. | ||
| 38 | * wrong parameter order in call to bzDecompressInit in | ||
| 39 | bzBuffToBuffDecompress. Fixed. | ||
| 40 | |||
| 41 | In compress.c: | ||
| 42 | * changed setting of nGroups in sendMTFValues() so as to | ||
| 43 | do a bit better on small files. This _does_ effect | ||
| 44 | bzip2.c. | ||
| 45 | |||
| @@ -1,339 +1,39 @@ | |||
| 1 | GNU GENERAL PUBLIC LICENSE | ||
| 2 | Version 2, June 1991 | ||
| 3 | 1 | ||
| 4 | Copyright (C) 1989, 1991 Free Software Foundation, Inc. | 2 | This program, "bzip2" and associated library "libbzip2", are |
| 5 | 675 Mass Ave, Cambridge, MA 02139, USA | 3 | copyright (C) 1996-1998 Julian R Seward. All rights reserved. |
| 6 | Everyone is permitted to copy and distribute verbatim copies | ||
| 7 | of this license document, but changing it is not allowed. | ||
| 8 | 4 | ||
| 9 | Preamble | 5 | Redistribution and use in source and binary forms, with or without |
| 6 | modification, are permitted provided that the following conditions | ||
| 7 | are met: | ||
| 10 | 8 | ||
| 11 | The licenses for most software are designed to take away your | 9 | 1. Redistributions of source code must retain the above copyright |
| 12 | freedom to share and change it. By contrast, the GNU General Public | 10 | notice, this list of conditions and the following disclaimer. |
| 13 | License is intended to guarantee your freedom to share and change free | ||
| 14 | software--to make sure the software is free for all its users. This | ||
| 15 | General Public License applies to most of the Free Software | ||
| 16 | Foundation's software and to any other program whose authors commit to | ||
| 17 | using it. (Some other Free Software Foundation software is covered by | ||
| 18 | the GNU Library General Public License instead.) You can apply it to | ||
| 19 | your programs, too. | ||
| 20 | 11 | ||
| 21 | When we speak of free software, we are referring to freedom, not | 12 | 2. The origin of this software must not be misrepresented; you must |
| 22 | price. Our General Public Licenses are designed to make sure that you | 13 | not claim that you wrote the original software. If you use this |
| 23 | have the freedom to distribute copies of free software (and charge for | 14 | software in a product, an acknowledgment in the product |
| 24 | this service if you wish), that you receive source code or can get it | 15 | documentation would be appreciated but is not required. |
| 25 | if you want it, that you can change the software or use pieces of it | ||
| 26 | in new free programs; and that you know you can do these things. | ||
| 27 | 16 | ||
| 28 | To protect your rights, we need to make restrictions that forbid | 17 | 3. Altered source versions must be plainly marked as such, and must |
| 29 | anyone to deny you these rights or to ask you to surrender the rights. | 18 | not be misrepresented as being the original software. |
| 30 | These restrictions translate to certain responsibilities for you if you | ||
| 31 | distribute copies of the software, or if you modify it. | ||
| 32 | 19 | ||
| 33 | For example, if you distribute copies of such a program, whether | 20 | 4. The name of the author may not be used to endorse or promote |
| 34 | gratis or for a fee, you must give the recipients all the rights that | 21 | products derived from this software without specific prior written |
| 35 | you have. You must make sure that they, too, receive or can get the | 22 | permission. |
| 36 | source code. And you must show them these terms so they know their | ||
| 37 | rights. | ||
| 38 | 23 | ||
| 39 | We protect your rights with two steps: (1) copyright the software, and | 24 | THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS |
| 40 | (2) offer you this license which gives you legal permission to copy, | 25 | OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED |
| 41 | distribute and/or modify the software. | 26 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE |
| 27 | ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY | ||
| 28 | DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL | ||
| 29 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE | ||
| 30 | GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS | ||
| 31 | INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, | ||
| 32 | WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING | ||
| 33 | NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS | ||
| 34 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | ||
| 42 | 35 | ||
| 43 | Also, for each author's protection and ours, we want to make certain | 36 | Julian Seward, Guildford, Surrey, UK. |
| 44 | that everyone understands that there is no warranty for this free | 37 | jseward@acm.org |
| 45 | software. If the software is modified by someone else and passed on, we | 38 | bzip2/libbzip2 version 0.9.0 of 28 June 1998 |
| 46 | want its recipients to know that what they have is not the original, so | ||
| 47 | that any problems introduced by others will not reflect on the original | ||
| 48 | authors' reputations. | ||
| 49 | 39 | ||
| 50 | Finally, any free program is threatened constantly by software | ||
| 51 | patents. We wish to avoid the danger that redistributors of a free | ||
| 52 | program will individually obtain patent licenses, in effect making the | ||
| 53 | program proprietary. To prevent this, we have made it clear that any | ||
| 54 | patent must be licensed for everyone's free use or not licensed at all. | ||
| 55 | |||
| 56 | The precise terms and conditions for copying, distribution and | ||
| 57 | modification follow. | ||
| 58 | |||
| 59 | GNU GENERAL PUBLIC LICENSE | ||
| 60 | TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION | ||
| 61 | |||
| 62 | 0. This License applies to any program or other work which contains | ||
| 63 | a notice placed by the copyright holder saying it may be distributed | ||
| 64 | under the terms of this General Public License. The "Program", below, | ||
| 65 | refers to any such program or work, and a "work based on the Program" | ||
| 66 | means either the Program or any derivative work under copyright law: | ||
| 67 | that is to say, a work containing the Program or a portion of it, | ||
| 68 | either verbatim or with modifications and/or translated into another | ||
| 69 | language. (Hereinafter, translation is included without limitation in | ||
| 70 | the term "modification".) Each licensee is addressed as "you". | ||
| 71 | |||
| 72 | Activities other than copying, distribution and modification are not | ||
| 73 | covered by this License; they are outside its scope. The act of | ||
| 74 | running the Program is not restricted, and the output from the Program | ||
| 75 | is covered only if its contents constitute a work based on the | ||
| 76 | Program (independent of having been made by running the Program). | ||
| 77 | Whether that is true depends on what the Program does. | ||
| 78 | |||
| 79 | 1. You may copy and distribute verbatim copies of the Program's | ||
| 80 | source code as you receive it, in any medium, provided that you | ||
| 81 | conspicuously and appropriately publish on each copy an appropriate | ||
| 82 | copyright notice and disclaimer of warranty; keep intact all the | ||
| 83 | notices that refer to this License and to the absence of any warranty; | ||
| 84 | and give any other recipients of the Program a copy of this License | ||
| 85 | along with the Program. | ||
| 86 | |||
| 87 | You may charge a fee for the physical act of transferring a copy, and | ||
| 88 | you may at your option offer warranty protection in exchange for a fee. | ||
| 89 | |||
| 90 | 2. You may modify your copy or copies of the Program or any portion | ||
| 91 | of it, thus forming a work based on the Program, and copy and | ||
| 92 | distribute such modifications or work under the terms of Section 1 | ||
| 93 | above, provided that you also meet all of these conditions: | ||
| 94 | |||
| 95 | a) You must cause the modified files to carry prominent notices | ||
| 96 | stating that you changed the files and the date of any change. | ||
| 97 | |||
| 98 | b) You must cause any work that you distribute or publish, that in | ||
| 99 | whole or in part contains or is derived from the Program or any | ||
| 100 | part thereof, to be licensed as a whole at no charge to all third | ||
| 101 | parties under the terms of this License. | ||
| 102 | |||
| 103 | c) If the modified program normally reads commands interactively | ||
| 104 | when run, you must cause it, when started running for such | ||
| 105 | interactive use in the most ordinary way, to print or display an | ||
| 106 | announcement including an appropriate copyright notice and a | ||
| 107 | notice that there is no warranty (or else, saying that you provide | ||
| 108 | a warranty) and that users may redistribute the program under | ||
| 109 | these conditions, and telling the user how to view a copy of this | ||
| 110 | License. (Exception: if the Program itself is interactive but | ||
| 111 | does not normally print such an announcement, your work based on | ||
| 112 | the Program is not required to print an announcement.) | ||
| 113 | |||
| 114 | These requirements apply to the modified work as a whole. If | ||
| 115 | identifiable sections of that work are not derived from the Program, | ||
| 116 | and can be reasonably considered independent and separate works in | ||
| 117 | themselves, then this License, and its terms, do not apply to those | ||
| 118 | sections when you distribute them as separate works. But when you | ||
| 119 | distribute the same sections as part of a whole which is a work based | ||
| 120 | on the Program, the distribution of the whole must be on the terms of | ||
| 121 | this License, whose permissions for other licensees extend to the | ||
| 122 | entire whole, and thus to each and every part regardless of who wrote it. | ||
| 123 | |||
| 124 | Thus, it is not the intent of this section to claim rights or contest | ||
| 125 | your rights to work written entirely by you; rather, the intent is to | ||
| 126 | exercise the right to control the distribution of derivative or | ||
| 127 | collective works based on the Program. | ||
| 128 | |||
| 129 | In addition, mere aggregation of another work not based on the Program | ||
| 130 | with the Program (or with a work based on the Program) on a volume of | ||
| 131 | a storage or distribution medium does not bring the other work under | ||
| 132 | the scope of this License. | ||
| 133 | |||
| 134 | 3. You may copy and distribute the Program (or a work based on it, | ||
| 135 | under Section 2) in object code or executable form under the terms of | ||
| 136 | Sections 1 and 2 above provided that you also do one of the following: | ||
| 137 | |||
| 138 | a) Accompany it with the complete corresponding machine-readable | ||
| 139 | source code, which must be distributed under the terms of Sections | ||
| 140 | 1 and 2 above on a medium customarily used for software interchange; or, | ||
| 141 | |||
| 142 | b) Accompany it with a written offer, valid for at least three | ||
| 143 | years, to give any third party, for a charge no more than your | ||
| 144 | cost of physically performing source distribution, a complete | ||
| 145 | machine-readable copy of the corresponding source code, to be | ||
| 146 | distributed under the terms of Sections 1 and 2 above on a medium | ||
| 147 | customarily used for software interchange; or, | ||
| 148 | |||
| 149 | c) Accompany it with the information you received as to the offer | ||
| 150 | to distribute corresponding source code. (This alternative is | ||
| 151 | allowed only for noncommercial distribution and only if you | ||
| 152 | received the program in object code or executable form with such | ||
| 153 | an offer, in accord with Subsection b above.) | ||
| 154 | |||
| 155 | The source code for a work means the preferred form of the work for | ||
| 156 | making modifications to it. For an executable work, complete source | ||
| 157 | code means all the source code for all modules it contains, plus any | ||
| 158 | associated interface definition files, plus the scripts used to | ||
| 159 | control compilation and installation of the executable. However, as a | ||
| 160 | special exception, the source code distributed need not include | ||
| 161 | anything that is normally distributed (in either source or binary | ||
| 162 | form) with the major components (compiler, kernel, and so on) of the | ||
| 163 | operating system on which the executable runs, unless that component | ||
| 164 | itself accompanies the executable. | ||
| 165 | |||
| 166 | If distribution of executable or object code is made by offering | ||
| 167 | access to copy from a designated place, then offering equivalent | ||
| 168 | access to copy the source code from the same place counts as | ||
| 169 | distribution of the source code, even though third parties are not | ||
| 170 | compelled to copy the source along with the object code. | ||
| 171 | |||
| 172 | 4. You may not copy, modify, sublicense, or distribute the Program | ||
| 173 | except as expressly provided under this License. Any attempt | ||
| 174 | otherwise to copy, modify, sublicense or distribute the Program is | ||
| 175 | void, and will automatically terminate your rights under this License. | ||
| 176 | However, parties who have received copies, or rights, from you under | ||
| 177 | this License will not have their licenses terminated so long as such | ||
| 178 | parties remain in full compliance. | ||
| 179 | |||
| 180 | 5. You are not required to accept this License, since you have not | ||
| 181 | signed it. However, nothing else grants you permission to modify or | ||
| 182 | distribute the Program or its derivative works. These actions are | ||
| 183 | prohibited by law if you do not accept this License. Therefore, by | ||
| 184 | modifying or distributing the Program (or any work based on the | ||
| 185 | Program), you indicate your acceptance of this License to do so, and | ||
| 186 | all its terms and conditions for copying, distributing or modifying | ||
| 187 | the Program or works based on it. | ||
| 188 | |||
| 189 | 6. Each time you redistribute the Program (or any work based on the | ||
| 190 | Program), the recipient automatically receives a license from the | ||
| 191 | original licensor to copy, distribute or modify the Program subject to | ||
| 192 | these terms and conditions. You may not impose any further | ||
| 193 | restrictions on the recipients' exercise of the rights granted herein. | ||
| 194 | You are not responsible for enforcing compliance by third parties to | ||
| 195 | this License. | ||
| 196 | |||
| 197 | 7. If, as a consequence of a court judgment or allegation of patent | ||
| 198 | infringement or for any other reason (not limited to patent issues), | ||
| 199 | conditions are imposed on you (whether by court order, agreement or | ||
| 200 | otherwise) that contradict the conditions of this License, they do not | ||
| 201 | excuse you from the conditions of this License. If you cannot | ||
| 202 | distribute so as to satisfy simultaneously your obligations under this | ||
| 203 | License and any other pertinent obligations, then as a consequence you | ||
| 204 | may not distribute the Program at all. For example, if a patent | ||
| 205 | license would not permit royalty-free redistribution of the Program by | ||
| 206 | all those who receive copies directly or indirectly through you, then | ||
| 207 | the only way you could satisfy both it and this License would be to | ||
| 208 | refrain entirely from distribution of the Program. | ||
| 209 | |||
| 210 | If any portion of this section is held invalid or unenforceable under | ||
| 211 | any particular circumstance, the balance of the section is intended to | ||
| 212 | apply and the section as a whole is intended to apply in other | ||
| 213 | circumstances. | ||
| 214 | |||
| 215 | It is not the purpose of this section to induce you to infringe any | ||
| 216 | patents or other property right claims or to contest validity of any | ||
| 217 | such claims; this section has the sole purpose of protecting the | ||
| 218 | integrity of the free software distribution system, which is | ||
| 219 | implemented by public license practices. Many people have made | ||
| 220 | generous contributions to the wide range of software distributed | ||
| 221 | through that system in reliance on consistent application of that | ||
| 222 | system; it is up to the author/donor to decide if he or she is willing | ||
| 223 | to distribute software through any other system and a licensee cannot | ||
| 224 | impose that choice. | ||
| 225 | |||
| 226 | This section is intended to make thoroughly clear what is believed to | ||
| 227 | be a consequence of the rest of this License. | ||
| 228 | |||
| 229 | 8. If the distribution and/or use of the Program is restricted in | ||
| 230 | certain countries either by patents or by copyrighted interfaces, the | ||
| 231 | original copyright holder who places the Program under this License | ||
| 232 | may add an explicit geographical distribution limitation excluding | ||
| 233 | those countries, so that distribution is permitted only in or among | ||
| 234 | countries not thus excluded. In such case, this License incorporates | ||
| 235 | the limitation as if written in the body of this License. | ||
| 236 | |||
| 237 | 9. The Free Software Foundation may publish revised and/or new versions | ||
| 238 | of the General Public License from time to time. Such new versions will | ||
| 239 | be similar in spirit to the present version, but may differ in detail to | ||
| 240 | address new problems or concerns. | ||
| 241 | |||
| 242 | Each version is given a distinguishing version number. If the Program | ||
| 243 | specifies a version number of this License which applies to it and "any | ||
| 244 | later version", you have the option of following the terms and conditions | ||
| 245 | either of that version or of any later version published by the Free | ||
| 246 | Software Foundation. If the Program does not specify a version number of | ||
| 247 | this License, you may choose any version ever published by the Free Software | ||
| 248 | Foundation. | ||
| 249 | |||
| 250 | 10. If you wish to incorporate parts of the Program into other free | ||
| 251 | programs whose distribution conditions are different, write to the author | ||
| 252 | to ask for permission. For software which is copyrighted by the Free | ||
| 253 | Software Foundation, write to the Free Software Foundation; we sometimes | ||
| 254 | make exceptions for this. Our decision will be guided by the two goals | ||
| 255 | of preserving the free status of all derivatives of our free software and | ||
| 256 | of promoting the sharing and reuse of software generally. | ||
| 257 | |||
| 258 | NO WARRANTY | ||
| 259 | |||
| 260 | 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY | ||
| 261 | FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN | ||
| 262 | OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES | ||
| 263 | PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED | ||
| 264 | OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF | ||
| 265 | MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS | ||
| 266 | TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE | ||
| 267 | PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, | ||
| 268 | REPAIR OR CORRECTION. | ||
| 269 | |||
| 270 | 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING | ||
| 271 | WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR | ||
| 272 | REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, | ||
| 273 | INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING | ||
| 274 | OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED | ||
| 275 | TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY | ||
| 276 | YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER | ||
| 277 | PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE | ||
| 278 | POSSIBILITY OF SUCH DAMAGES. | ||
| 279 | |||
| 280 | END OF TERMS AND CONDITIONS | ||
| 281 | |||
| 282 | Appendix: How to Apply These Terms to Your New Programs | ||
| 283 | |||
| 284 | If you develop a new program, and you want it to be of the greatest | ||
| 285 | possible use to the public, the best way to achieve this is to make it | ||
| 286 | free software which everyone can redistribute and change under these terms. | ||
| 287 | |||
| 288 | To do so, attach the following notices to the program. It is safest | ||
| 289 | to attach them to the start of each source file to most effectively | ||
| 290 | convey the exclusion of warranty; and each file should have at least | ||
| 291 | the "copyright" line and a pointer to where the full notice is found. | ||
| 292 | |||
| 293 | <one line to give the program's name and a brief idea of what it does.> | ||
| 294 | Copyright (C) 19yy <name of author> | ||
| 295 | |||
| 296 | This program is free software; you can redistribute it and/or modify | ||
| 297 | it under the terms of the GNU General Public License as published by | ||
| 298 | the Free Software Foundation; either version 2 of the License, or | ||
| 299 | (at your option) any later version. | ||
| 300 | |||
| 301 | This program is distributed in the hope that it will be useful, | ||
| 302 | but WITHOUT ANY WARRANTY; without even the implied warranty of | ||
| 303 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | ||
| 304 | GNU General Public License for more details. | ||
| 305 | |||
| 306 | You should have received a copy of the GNU General Public License | ||
| 307 | along with this program; if not, write to the Free Software | ||
| 308 | Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. | ||
| 309 | |||
| 310 | Also add information on how to contact you by electronic and paper mail. | ||
| 311 | |||
| 312 | If the program is interactive, make it output a short notice like this | ||
| 313 | when it starts in an interactive mode: | ||
| 314 | |||
| 315 | Gnomovision version 69, Copyright (C) 19yy name of author | ||
| 316 | Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. | ||
| 317 | This is free software, and you are welcome to redistribute it | ||
| 318 | under certain conditions; type `show c' for details. | ||
| 319 | |||
| 320 | The hypothetical commands `show w' and `show c' should show the appropriate | ||
| 321 | parts of the General Public License. Of course, the commands you use may | ||
| 322 | be called something other than `show w' and `show c'; they could even be | ||
| 323 | mouse-clicks or menu items--whatever suits your program. | ||
| 324 | |||
| 325 | You should also get your employer (if you work as a programmer) or your | ||
| 326 | school, if any, to sign a "copyright disclaimer" for the program, if | ||
| 327 | necessary. Here is a sample; alter the names: | ||
| 328 | |||
| 329 | Yoyodyne, Inc., hereby disclaims all copyright interest in the program | ||
| 330 | `Gnomovision' (which makes passes at compilers) written by James Hacker. | ||
| 331 | |||
| 332 | <signature of Ty Coon>, 1 April 1989 | ||
| 333 | Ty Coon, President of Vice | ||
| 334 | |||
| 335 | This General Public License does not permit incorporating your program into | ||
| 336 | proprietary programs. If your program is a subroutine library, you may | ||
| 337 | consider it more useful to permit linking proprietary applications with the | ||
| 338 | library. If this is what you want to do, use the GNU Library General | ||
| 339 | Public License instead of this License. | ||
| @@ -1,30 +1,46 @@ | |||
| 1 | 1 | ||
| 2 | CC = gcc | 2 | CC=gcc |
| 3 | SH = /bin/sh | 3 | CFLAGS=-Wall -O2 -fomit-frame-pointer -fno-strength-reduce |
| 4 | 4 | ||
| 5 | CFLAGS = -O3 -fomit-frame-pointer -funroll-loops | 5 | OBJS= blocksort.o \ |
| 6 | 6 | huffman.o \ | |
| 7 | crctable.o \ | ||
| 8 | randtable.o \ | ||
| 9 | compress.o \ | ||
| 10 | decompress.o \ | ||
| 11 | bzlib.o | ||
| 12 | |||
| 13 | all: lib bzip2 test | ||
| 14 | |||
| 15 | bzip2: lib | ||
| 16 | $(CC) $(CFLAGS) -c bzip2.c | ||
| 17 | $(CC) $(CFLAGS) -o bzip2 bzip2.o -L. -lbz2 | ||
| 18 | $(CC) $(CFLAGS) -o bzip2recover bzip2recover.c | ||
| 7 | 19 | ||
| 20 | lib: $(OBJS) | ||
| 21 | rm -f libbz2.a | ||
| 22 | ar clq libbz2.a $(OBJS) | ||
| 8 | 23 | ||
| 9 | all: | 24 | test: bzip2 |
| 10 | cat words0 | 25 | @cat words1 |
| 11 | $(CC) $(CFLAGS) -o bzip2 bzip2.c | ||
| 12 | $(CC) $(CFLAGS) -o bzip2recover bzip2recover.c | ||
| 13 | rm -f bunzip2 | ||
| 14 | ln -s ./bzip2 ./bunzip2 | ||
| 15 | cat words1 | ||
| 16 | ./bzip2 -1 < sample1.ref > sample1.rb2 | 26 | ./bzip2 -1 < sample1.ref > sample1.rb2 |
| 17 | ./bzip2 -2 < sample2.ref > sample2.rb2 | 27 | ./bzip2 -2 < sample2.ref > sample2.rb2 |
| 18 | ./bunzip2 < sample1.bz2 > sample1.tst | 28 | ./bzip2 -d < sample1.bz2 > sample1.tst |
| 19 | ./bunzip2 < sample2.bz2 > sample2.tst | 29 | ./bzip2 -d < sample2.bz2 > sample2.tst |
| 20 | cat words2 | 30 | @cat words2 |
| 21 | cmp sample1.bz2 sample1.rb2 | 31 | cmp sample1.bz2 sample1.rb2 |
| 22 | cmp sample2.bz2 sample2.rb2 | 32 | cmp sample2.bz2 sample2.rb2 |
| 23 | cmp sample1.tst sample1.ref | 33 | cmp sample1.tst sample1.ref |
| 24 | cmp sample2.tst sample2.ref | 34 | cmp sample2.tst sample2.ref |
| 25 | cat words3 | 35 | @cat words3 |
| 36 | |||
| 37 | |||
| 38 | clean: | ||
| 39 | rm -f *.o libbz2.a bzip2 bzip2recover sample1.rb2 sample2.rb2 sample1.tst sample2.tst | ||
| 26 | 40 | ||
| 41 | .c.o: $*.o bzlib.h bzlib_private.h | ||
| 42 | $(CC) $(CFLAGS) -c $*.c -o $*.o | ||
| 27 | 43 | ||
| 28 | clean: | 44 | tarfile: |
| 29 | rm -f bzip2 bunzip2 bzip2recover sample*.tst sample*.rb2 | 45 | tar cvf interim.tar *.c *.h Makefile manual.texi manual.ps LICENSE bzip2.1 bzip2.1.preformatted bzip2.txt words1 words2 words3 sample1.ref sample2.ref sample1.bz2 sample2.bz2 *.html README CHANGES libbz2.def libbz2.dsp dlltest.dsp |
| 30 | 46 | ||
| @@ -1,194 +1,61 @@ | |||
| 1 | 1 | ||
| 2 | GREETINGS! | ||
| 3 | 2 | ||
| 4 | This is the README for bzip2, my block-sorting file compressor, | 3 | This is the README for bzip2, a block-sorting file compressor, version |
| 5 | version 0.1. | 4 | 0.9.0. This version is fully compatible with the previous public |
| 5 | release, bzip2-0.1pl2. | ||
| 6 | 6 | ||
| 7 | bzip2 is distributed under the GNU General Public License version 2; | 7 | bzip2-0.9.0 is distributed under a BSD-style license. For details, |
| 8 | for details, see the file LICENSE. Pointers to the algorithms used | 8 | see the file LICENSE. |
| 9 | are in ALGORITHMS. Instructions for use are in bzip2.1.preformatted. | ||
| 10 | 9 | ||
| 11 | Please read all of this file carefully. | 10 | Complete documentation is available in Postscript form (manual.ps) |
| 11 | or html (manual_toc.html). A plain-text version of the manual page is | ||
| 12 | available as bzip2.txt. | ||
| 12 | 13 | ||
| 13 | 14 | ||
| 15 | HOW TO BUILD -- UNIX | ||
| 14 | 16 | ||
| 15 | HOW TO BUILD | 17 | Type `make'. |
| 16 | 18 | ||
| 17 | -- for UNIX: | 19 | This creates binaries "bzip2" and "bzip2recover". |
| 18 | 20 | ||
| 19 | Type `make'. (tough, huh? :-) | 21 | It also runs four compress-decompress tests to make sure things are |
| 22 | working properly. If all goes well, you should be up & running. | ||
| 23 | Please be sure to read the output from `make' just to be sure that the | ||
| 24 | tests went ok. | ||
| 20 | 25 | ||
| 21 | This creates binaries "bzip2", and "bunzip2", | 26 | To install bzip2 properly: |
| 22 | which is a symbolic link to "bzip2". | ||
| 23 | 27 | ||
| 24 | It also runs four compress-decompress tests to make sure | 28 | * Copy the binaries "bzip2" and "bzip2recover" to a publically visible |
| 25 | things are working properly. If all goes well, you should be up & | 29 | place, possibly /usr/bin or /usr/local/bin. |
| 26 | running. Please be sure to read the output from `make' | ||
| 27 | just to be sure that the tests went ok. | ||
| 28 | 30 | ||
| 29 | To install bzip2 properly: | 31 | * In that directory, make "bunzip2" and "bzcat" be symbolic links |
| 32 | to "bzip2". | ||
| 30 | 33 | ||
| 31 | -- Copy the binary "bzip2" to a publically visible place, | 34 | * Copy the manual page, bzip2.1, to the relevant place. |
| 32 | possibly /usr/bin, /usr/common/bin or /usr/local/bin. | 35 | Probably the right place is /usr/man/man1/. |
| 33 | |||
| 34 | -- In that directory, make "bunzip2" be a symbolic link | ||
| 35 | to "bzip2". | ||
| 36 | |||
| 37 | -- Copy the manual page, bzip2.1, to the relevant place. | ||
| 38 | Probably the right place is /usr/man/man1/. | ||
| 39 | |||
| 40 | -- for Windows 95 and NT: | ||
| 41 | 36 | ||
| 42 | For a start, do you *really* want to recompile bzip2? | 37 | If you want to program with the library, you'll need to copy libbz2.a |
| 43 | The standard distribution includes a pre-compiled version | 38 | and bzlib.h to /usr/lib and /usr/include respectively. |
| 44 | for Windows 95 and NT, `bzip2.exe'. | 39 | |
| 45 | 40 | ||
| 46 | This executable was created with Jacob Navia's excellent | 41 | HOW TO BUILD -- Windows 95, NT, DOS, Mac, etc. |
| 47 | port to Win32 of Chris Fraser & David Hanson's excellent | ||
| 48 | ANSI C compiler, "lcc". You can get to it at the pages | ||
| 49 | of the CS department of Princeton University, | ||
| 50 | www.cs.princeton.edu. | ||
| 51 | I have not tried to compile this version of bzip2 with | ||
| 52 | a commercial C compiler such as MS Visual C, as I don't | ||
| 53 | have one available. | ||
| 54 | |||
| 55 | Note that lcc is designed primarily to be portable and | ||
| 56 | fast. Code quality is a secondary aim, so bzip2.exe | ||
| 57 | runs perhaps 40% slower than it could if compiled with | ||
| 58 | a good optimising compiler. | ||
| 59 | |||
| 60 | I compiled a previous version of bzip (0.21) with Borland | ||
| 61 | C 5.0, which worked fine, and with MS VC++ 2.0, which | ||
| 62 | didn't. Here is an comment from the README for bzip-0.21. | ||
| 63 | |||
| 64 | MS VC++ 2.0's optimising compiler has a bug which, at | ||
| 65 | maximum optimisation, gives an executable which produces | ||
| 66 | garbage compressed files. Proceed with caution. | ||
| 67 | I do not know whether or not this happens with later | ||
| 68 | versions of VC++. | ||
| 69 | |||
| 70 | Edit the defines starting at line 86 of bzip.c to | ||
| 71 | select your platform/compiler combination, and then compile. | ||
| 72 | Then check that the resulting executable (assumed to be | ||
| 73 | called bzip.exe) works correctly, using the SELFTEST.BAT file. | ||
| 74 | Bearing in mind the previous paragraph, the self-test is | ||
| 75 | important. | ||
| 76 | |||
| 77 | Note that the defines which bzip-0.21 had, to support | ||
| 78 | compilation with VC 2.0 and BC 5.0, are gone. Windows | ||
| 79 | is not my preferred operating system, and I am, for the | ||
| 80 | moment, content with the modestly fast executable created | ||
| 81 | by lcc-win32. | ||
| 82 | |||
| 83 | A manual page is supplied, unformatted (bzip2.1), | ||
| 84 | preformatted (bzip2.1.preformatted), and preformatted | ||
| 85 | and sanitised for MS-DOS (bzip2.txt). | ||
| 86 | |||
| 87 | |||
| 88 | |||
| 89 | COMPILATION NOTES | ||
| 90 | |||
| 91 | bzip2 should work on any 32 or 64-bit machine. It is known to work | ||
| 92 | [meaning: it has compiled and passed self-tests] on the | ||
| 93 | following platform-os combinations: | ||
| 94 | |||
| 95 | Intel i386/i486 running Linux 2.0.21 | ||
| 96 | Sun Sparcs (various) running SunOS 4.1.4 and Solaris 2.5 | ||
| 97 | Intel i386/i486 running Windows 95 and NT | ||
| 98 | DEC Alpha running Digital Unix 4.0 | ||
| 99 | |||
| 100 | Following the release of bzip-0.21, many people mailed me | ||
| 101 | from around the world to say they had made it work on all sorts | ||
| 102 | of weird and wonderful machines. Chances are, if you have | ||
| 103 | a reasonable ANSI C compiler and a 32-bit machine, you can | ||
| 104 | get it to work. | ||
| 105 | |||
| 106 | The #defines starting at around line 82 of bzip2.c supply some | ||
| 107 | degree of platform-independance. If you configure bzip2 for some | ||
| 108 | new far-out platform which is not covered by the existing definitions, | ||
| 109 | please send me the relevant definitions. | ||
| 110 | |||
| 111 | I recommend GNU C for compilation. The code is standard ANSI C, | ||
| 112 | except for the Unix-specific file handling, so any ANSI C compiler | ||
| 113 | should work. Note however that the many routines marked INLINE | ||
| 114 | should be inlined by your compiler, else performance will be very | ||
| 115 | poor. Asking your compiler to unroll loops gives some | ||
| 116 | small improvement too; for gcc, the relevant flag is | ||
| 117 | -funroll-loops. | ||
| 118 | |||
| 119 | On a 386/486 machines, I'd recommend giving gcc the | ||
| 120 | -fomit-frame-pointer flag; this liberates another register for | ||
| 121 | allocation, which measurably improves performance. | ||
| 122 | |||
| 123 | I used the abovementioned lcc compiler to develop bzip2. | ||
| 124 | I would highly recommend this compiler for day-to-day development; | ||
| 125 | it is fast, reliable, lightweight, has an excellent profiler, | ||
| 126 | and is generally excellent. And it's fun to retarget, if you're | ||
| 127 | into that kind of thing. | ||
| 128 | |||
| 129 | If you compile bzip2 on a new platform or with a new compiler, | ||
| 130 | please be sure to run the four compress-decompress tests, either | ||
| 131 | using the Makefile, or with the test.bat (MSDOS) or test.cmd (OS/2) | ||
| 132 | files. Some compilers have been seen to introduce subtle bugs | ||
| 133 | when optimising, so this check is important. Ideally you should | ||
| 134 | then go on to test bzip2 on a file several megabytes or even | ||
| 135 | tens of megabytes long, just to be 110% sure. ``Professional | ||
| 136 | programmers are paranoid programmers.'' (anon). | ||
| 137 | 42 | ||
| 43 | It's difficult for me to support compilation on all these platforms. | ||
| 44 | My approach is to collect binaries for these platforms, and put them | ||
| 45 | on my web page (http://www.muraroa.demon.co.uk). Look there. | ||
| 138 | 46 | ||
| 139 | 47 | ||
| 140 | VALIDATION | 48 | VALIDATION |
| 141 | 49 | ||
| 142 | Correct operation, in the sense that a compressed file can always be | 50 | Correct operation, in the sense that a compressed file can always be |
| 143 | decompressed to reproduce the original, is obviously of paramount | 51 | decompressed to reproduce the original, is obviously of paramount |
| 144 | importance. To validate bzip2, I used a modified version of | 52 | importance. To validate bzip2, I used a modified version of Mark |
| 145 | Mark Nelson's churn program. Churn is an automated test driver | 53 | Nelson's churn program. Churn is an automated test driver which |
| 146 | which recursively traverses a directory structure, using bzip2 to | 54 | recursively traverses a directory structure, using bzip2 to compress |
| 147 | compress and then decompress each file it encounters, and checking | 55 | and then decompress each file it encounters, and checking that the |
| 148 | that the decompressed data is the same as the original. As test | 56 | decompressed data is the same as the original. There are more details |
| 149 | material, I used several runs over several filesystems of differing | 57 | in Section 4 of the user guide. |
| 150 | sizes. | ||
| 151 | |||
| 152 | One set of tests was done on my base Linux filesystem, | ||
| 153 | 410 megabytes in 23,000 files. There were several runs over | ||
| 154 | this filesystem, in various configurations designed to break bzip2. | ||
| 155 | That filesystem also contained some specially constructed test | ||
| 156 | files designed to exercise boundary cases in the code. | ||
| 157 | This included files of zero length, various long, highly repetitive | ||
| 158 | files, and some files which generate blocks with all values the same. | ||
| 159 | 58 | ||
| 160 | The other set of tests was done just with the "normal" configuration, | ||
| 161 | but on a much larger quantity of data. | ||
| 162 | |||
| 163 | Tests are: | ||
| 164 | |||
| 165 | Linux FS, 410M, 23000 files | ||
| 166 | |||
| 167 | As above, with --repetitive-fast | ||
| 168 | |||
| 169 | As above, with -1 | ||
| 170 | |||
| 171 | Low level disk image of a disk containing | ||
| 172 | Windows NT4.0; 420M in a single huge file | ||
| 173 | |||
| 174 | Linux distribution, incl Slackware, | ||
| 175 | all GNU sources. 1900M in 2300 files. | ||
| 176 | |||
| 177 | Approx ~100M compiler sources and related | ||
| 178 | programming tools, running under Purify. | ||
| 179 | |||
| 180 | About 500M of data in 120 files of around | ||
| 181 | 4 M each. This is raw data from a | ||
| 182 | biomagnetometer (SQUID-based thing). | ||
| 183 | |||
| 184 | Overall, total volume of test data is about | ||
| 185 | 3300 megabytes in 25000 files. | ||
| 186 | |||
| 187 | The distribution does four tests after building bzip. These tests | ||
| 188 | include test decompressions of pre-supplied compressed files, so | ||
| 189 | they not only test that bzip works correctly on the machine it was | ||
| 190 | built on, but can also decompress files compressed on a different | ||
| 191 | machine. This guards against unforseen interoperability problems. | ||
| 192 | 59 | ||
| 193 | 60 | ||
| 194 | Please read and be aware of the following: | 61 | Please read and be aware of the following: |
| @@ -234,14 +101,30 @@ PATENTS: | |||
| 234 | End of legalities. | 101 | End of legalities. |
| 235 | 102 | ||
| 236 | 103 | ||
| 104 | WHAT'S NEW IN 0.9.0 (as compared to 0.1pl2) ? | ||
| 105 | |||
| 106 | * Approx 10% faster compression, 30% faster decompression | ||
| 107 | * -t (test mode) is a lot quicker | ||
| 108 | * Can decompress concatenated compressed files | ||
| 109 | * Programming interface, so programs can directly read/write .bz2 files | ||
| 110 | * Less restrictive (BSD-style) licensing | ||
| 111 | * Flag handling more compatible with GNU gzip | ||
| 112 | * Much more documentation, i.e., a proper user manual | ||
| 113 | * Hopefully, improved portability (at least of the library) | ||
| 114 | |||
| 115 | |||
| 237 | I hope you find bzip2 useful. Feel free to contact me at | 116 | I hope you find bzip2 useful. Feel free to contact me at |
| 238 | jseward@acm.org | 117 | jseward@acm.org |
| 239 | if you have any suggestions or queries. Many people mailed me with | 118 | if you have any suggestions or queries. Many people mailed me with |
| 240 | comments, suggestions and patches after the releases of 0.15 and 0.21, | 119 | comments, suggestions and patches after the releases of bzip-0.15, |
| 241 | and the changes in bzip2 are largely a result of this feedback. | 120 | bzip-0.21 and bzip2-0.1pl2, and the changes in bzip2 are largely a |
| 242 | I thank you for your comments. | 121 | result of this feedback. I thank you for your comments. |
| 122 | |||
| 123 | At least for the time being, bzip2's "home" is | ||
| 124 | http://www.muraroa.demon.co.uk. | ||
| 243 | 125 | ||
| 244 | Julian Seward | 126 | Julian Seward |
| 127 | jseward@acm.org | ||
| 245 | 128 | ||
| 246 | Manchester, UK | 129 | Manchester, UK |
| 247 | 18 July 1996 (version 0.15) | 130 | 18 July 1996 (version 0.15) |
| @@ -250,4 +133,5 @@ Manchester, UK | |||
| 250 | Guildford, Surrey, UK | 133 | Guildford, Surrey, UK |
| 251 | 7 August 1997 (bzip2, version 0.1) | 134 | 7 August 1997 (bzip2, version 0.1) |
| 252 | 29 August 1997 (bzip2, version 0.1pl2) | 135 | 29 August 1997 (bzip2, version 0.1pl2) |
| 136 | 23 August 1998 (bzip2, version 0.9.0) | ||
| 253 | 137 | ||
diff --git a/README.DOS b/README.DOS deleted file mode 100644 index 048de8c..0000000 --- a/README.DOS +++ /dev/null | |||
| @@ -1,16 +0,0 @@ | |||
| 1 | |||
| 2 | As of today (3 March 1998) I've removed the | ||
| 3 | Win95/NT executables from this distribution, sorry. | ||
| 4 | |||
| 5 | You can still get an executable from | ||
| 6 | http://www.muraroa.demon.co.uk, or (as a last | ||
| 7 | resort) by mailing me at jseward@acm.org. | ||
| 8 | |||
| 9 | The reason for this change of packaging is that it | ||
| 10 | makes it easier for me to fix problems with specific | ||
| 11 | executables if they are not included in the main | ||
| 12 | distribution. | ||
| 13 | |||
| 14 | J | ||
| 15 | |||
| 16 | |||
diff --git a/blocksort.c b/blocksort.c new file mode 100644 index 0000000..d8bb26a --- /dev/null +++ b/blocksort.c | |||
| @@ -0,0 +1,709 @@ | |||
| 1 | |||
| 2 | /*-------------------------------------------------------------*/ | ||
| 3 | /*--- Block sorting machinery ---*/ | ||
| 4 | /*--- blocksort.c ---*/ | ||
| 5 | /*-------------------------------------------------------------*/ | ||
| 6 | |||
| 7 | /*-- | ||
| 8 | This file is a part of bzip2 and/or libbzip2, a program and | ||
| 9 | library for lossless, block-sorting data compression. | ||
| 10 | |||
| 11 | Copyright (C) 1996-1998 Julian R Seward. All rights reserved. | ||
| 12 | |||
| 13 | Redistribution and use in source and binary forms, with or without | ||
| 14 | modification, are permitted provided that the following conditions | ||
| 15 | are met: | ||
| 16 | |||
| 17 | 1. Redistributions of source code must retain the above copyright | ||
| 18 | notice, this list of conditions and the following disclaimer. | ||
| 19 | |||
| 20 | 2. The origin of this software must not be misrepresented; you must | ||
| 21 | not claim that you wrote the original software. If you use this | ||
| 22 | software in a product, an acknowledgment in the product | ||
| 23 | documentation would be appreciated but is not required. | ||
| 24 | |||
| 25 | 3. Altered source versions must be plainly marked as such, and must | ||
| 26 | not be misrepresented as being the original software. | ||
| 27 | |||
| 28 | 4. The name of the author may not be used to endorse or promote | ||
| 29 | products derived from this software without specific prior written | ||
| 30 | permission. | ||
| 31 | |||
| 32 | THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS | ||
| 33 | OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED | ||
| 34 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE | ||
| 35 | ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY | ||
| 36 | DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL | ||
| 37 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE | ||
| 38 | GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS | ||
| 39 | INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, | ||
| 40 | WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING | ||
| 41 | NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS | ||
| 42 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | ||
| 43 | |||
| 44 | Julian Seward, Guildford, Surrey, UK. | ||
| 45 | jseward@acm.org | ||
| 46 | bzip2/libbzip2 version 0.9.0c of 18 October 1998 | ||
| 47 | |||
| 48 | This program is based on (at least) the work of: | ||
| 49 | Mike Burrows | ||
| 50 | David Wheeler | ||
| 51 | Peter Fenwick | ||
| 52 | Alistair Moffat | ||
| 53 | Radford Neal | ||
| 54 | Ian H. Witten | ||
| 55 | Robert Sedgewick | ||
| 56 | Jon L. Bentley | ||
| 57 | |||
| 58 | For more information on these sources, see the manual. | ||
| 59 | --*/ | ||
| 60 | |||
| 61 | |||
| 62 | #include "bzlib_private.h" | ||
| 63 | |||
| 64 | /*---------------------------------------------*/ | ||
| 65 | /*-- | ||
| 66 | Compare two strings in block. We assume (see | ||
| 67 | discussion above) that i1 and i2 have a max | ||
| 68 | offset of 10 on entry, and that the first | ||
| 69 | bytes of both block and quadrant have been | ||
| 70 | copied into the "overshoot area", ie | ||
| 71 | into the subscript range | ||
| 72 | [nblock .. nblock+NUM_OVERSHOOT_BYTES-1]. | ||
| 73 | --*/ | ||
| 74 | static __inline__ Bool fullGtU ( UChar* block, | ||
| 75 | UInt16* quadrant, | ||
| 76 | UInt32 nblock, | ||
| 77 | Int32* workDone, | ||
| 78 | Int32 i1, | ||
| 79 | Int32 i2 | ||
| 80 | ) | ||
| 81 | { | ||
| 82 | Int32 k; | ||
| 83 | UChar c1, c2; | ||
| 84 | UInt16 s1, s2; | ||
| 85 | |||
| 86 | AssertD ( i1 != i2, "fullGtU(1)" ); | ||
| 87 | |||
| 88 | c1 = block[i1]; | ||
| 89 | c2 = block[i2]; | ||
| 90 | if (c1 != c2) return (c1 > c2); | ||
| 91 | i1++; i2++; | ||
| 92 | |||
| 93 | c1 = block[i1]; | ||
| 94 | c2 = block[i2]; | ||
| 95 | if (c1 != c2) return (c1 > c2); | ||
| 96 | i1++; i2++; | ||
| 97 | |||
| 98 | c1 = block[i1]; | ||
| 99 | c2 = block[i2]; | ||
| 100 | if (c1 != c2) return (c1 > c2); | ||
| 101 | i1++; i2++; | ||
| 102 | |||
| 103 | c1 = block[i1]; | ||
| 104 | c2 = block[i2]; | ||
| 105 | if (c1 != c2) return (c1 > c2); | ||
| 106 | i1++; i2++; | ||
| 107 | |||
| 108 | c1 = block[i1]; | ||
| 109 | c2 = block[i2]; | ||
| 110 | if (c1 != c2) return (c1 > c2); | ||
| 111 | i1++; i2++; | ||
| 112 | |||
| 113 | c1 = block[i1]; | ||
| 114 | c2 = block[i2]; | ||
| 115 | if (c1 != c2) return (c1 > c2); | ||
| 116 | i1++; i2++; | ||
| 117 | |||
| 118 | k = nblock; | ||
| 119 | |||
| 120 | do { | ||
| 121 | |||
| 122 | c1 = block[i1]; | ||
| 123 | c2 = block[i2]; | ||
| 124 | if (c1 != c2) return (c1 > c2); | ||
| 125 | s1 = quadrant[i1]; | ||
| 126 | s2 = quadrant[i2]; | ||
| 127 | if (s1 != s2) return (s1 > s2); | ||
| 128 | i1++; i2++; | ||
| 129 | |||
| 130 | c1 = block[i1]; | ||
| 131 | c2 = block[i2]; | ||
| 132 | if (c1 != c2) return (c1 > c2); | ||
| 133 | s1 = quadrant[i1]; | ||
| 134 | s2 = quadrant[i2]; | ||
| 135 | if (s1 != s2) return (s1 > s2); | ||
| 136 | i1++; i2++; | ||
| 137 | |||
| 138 | c1 = block[i1]; | ||
| 139 | c2 = block[i2]; | ||
| 140 | if (c1 != c2) return (c1 > c2); | ||
| 141 | s1 = quadrant[i1]; | ||
| 142 | s2 = quadrant[i2]; | ||
| 143 | if (s1 != s2) return (s1 > s2); | ||
| 144 | i1++; i2++; | ||
| 145 | |||
| 146 | c1 = block[i1]; | ||
| 147 | c2 = block[i2]; | ||
| 148 | if (c1 != c2) return (c1 > c2); | ||
| 149 | s1 = quadrant[i1]; | ||
| 150 | s2 = quadrant[i2]; | ||
| 151 | if (s1 != s2) return (s1 > s2); | ||
| 152 | i1++; i2++; | ||
| 153 | |||
| 154 | if (i1 >= nblock) i1 -= nblock; | ||
| 155 | if (i2 >= nblock) i2 -= nblock; | ||
| 156 | |||
| 157 | k -= 4; | ||
| 158 | (*workDone)++; | ||
| 159 | } | ||
| 160 | while (k >= 0); | ||
| 161 | |||
| 162 | return False; | ||
| 163 | } | ||
| 164 | |||
| 165 | /*---------------------------------------------*/ | ||
| 166 | /*-- | ||
| 167 | Knuth's increments seem to work better | ||
| 168 | than Incerpi-Sedgewick here. Possibly | ||
| 169 | because the number of elems to sort is | ||
| 170 | usually small, typically <= 20. | ||
| 171 | --*/ | ||
| 172 | static Int32 incs[14] = { 1, 4, 13, 40, 121, 364, 1093, 3280, | ||
| 173 | 9841, 29524, 88573, 265720, | ||
| 174 | 797161, 2391484 }; | ||
| 175 | |||
| 176 | static void simpleSort ( EState* s, Int32 lo, Int32 hi, Int32 d ) | ||
| 177 | { | ||
| 178 | Int32 i, j, h, bigN, hp; | ||
| 179 | Int32 v; | ||
| 180 | |||
| 181 | UChar* block = s->block; | ||
| 182 | UInt32* zptr = s->zptr; | ||
| 183 | UInt16* quadrant = s->quadrant; | ||
| 184 | Int32* workDone = &(s->workDone); | ||
| 185 | Int32 nblock = s->nblock; | ||
| 186 | Int32 workLimit = s->workLimit; | ||
| 187 | Bool firstAttempt = s->firstAttempt; | ||
| 188 | |||
| 189 | bigN = hi - lo + 1; | ||
| 190 | if (bigN < 2) return; | ||
| 191 | |||
| 192 | hp = 0; | ||
| 193 | while (incs[hp] < bigN) hp++; | ||
| 194 | hp--; | ||
| 195 | |||
| 196 | for (; hp >= 0; hp--) { | ||
| 197 | h = incs[hp]; | ||
| 198 | i = lo + h; | ||
| 199 | while (True) { | ||
| 200 | |||
| 201 | /*-- copy 1 --*/ | ||
| 202 | if (i > hi) break; | ||
| 203 | v = zptr[i]; | ||
| 204 | j = i; | ||
| 205 | while ( fullGtU ( block, quadrant, nblock, workDone, | ||
| 206 | zptr[j-h]+d, v+d ) ) { | ||
| 207 | zptr[j] = zptr[j-h]; | ||
| 208 | j = j - h; | ||
| 209 | if (j <= (lo + h - 1)) break; | ||
| 210 | } | ||
| 211 | zptr[j] = v; | ||
| 212 | i++; | ||
| 213 | |||
| 214 | /*-- copy 2 --*/ | ||
| 215 | if (i > hi) break; | ||
| 216 | v = zptr[i]; | ||
| 217 | j = i; | ||
| 218 | while ( fullGtU ( block, quadrant, nblock, workDone, | ||
| 219 | zptr[j-h]+d, v+d ) ) { | ||
| 220 | zptr[j] = zptr[j-h]; | ||
| 221 | j = j - h; | ||
| 222 | if (j <= (lo + h - 1)) break; | ||
| 223 | } | ||
| 224 | zptr[j] = v; | ||
| 225 | i++; | ||
| 226 | |||
| 227 | /*-- copy 3 --*/ | ||
| 228 | if (i > hi) break; | ||
| 229 | v = zptr[i]; | ||
| 230 | j = i; | ||
| 231 | while ( fullGtU ( block, quadrant, nblock, workDone, | ||
| 232 | zptr[j-h]+d, v+d ) ) { | ||
| 233 | zptr[j] = zptr[j-h]; | ||
| 234 | j = j - h; | ||
| 235 | if (j <= (lo + h - 1)) break; | ||
| 236 | } | ||
| 237 | zptr[j] = v; | ||
| 238 | i++; | ||
| 239 | |||
| 240 | if (*workDone > workLimit && firstAttempt) return; | ||
| 241 | } | ||
| 242 | } | ||
| 243 | } | ||
| 244 | |||
| 245 | |||
| 246 | /*---------------------------------------------*/ | ||
| 247 | /*-- | ||
| 248 | The following is an implementation of | ||
| 249 | an elegant 3-way quicksort for strings, | ||
| 250 | described in a paper "Fast Algorithms for | ||
| 251 | Sorting and Searching Strings", by Robert | ||
| 252 | Sedgewick and Jon L. Bentley. | ||
| 253 | --*/ | ||
| 254 | |||
| 255 | #define swap(lv1, lv2) \ | ||
| 256 | { Int32 tmp = lv1; lv1 = lv2; lv2 = tmp; } | ||
| 257 | |||
| 258 | static void vswap ( UInt32* zptr, Int32 p1, Int32 p2, Int32 n ) | ||
| 259 | { | ||
| 260 | while (n > 0) { | ||
| 261 | swap(zptr[p1], zptr[p2]); | ||
| 262 | p1++; p2++; n--; | ||
| 263 | } | ||
| 264 | } | ||
| 265 | |||
| 266 | static UChar med3 ( UChar a, UChar b, UChar c ) | ||
| 267 | { | ||
| 268 | UChar t; | ||
| 269 | if (a > b) { t = a; a = b; b = t; }; | ||
| 270 | if (b > c) { t = b; b = c; c = t; }; | ||
| 271 | if (a > b) b = a; | ||
| 272 | return b; | ||
| 273 | } | ||
| 274 | |||
| 275 | |||
| 276 | #define min(a,b) ((a) < (b)) ? (a) : (b) | ||
| 277 | |||
| 278 | typedef | ||
| 279 | struct { Int32 ll; Int32 hh; Int32 dd; } | ||
| 280 | StackElem; | ||
| 281 | |||
| 282 | #define push(lz,hz,dz) { stack[sp].ll = lz; \ | ||
| 283 | stack[sp].hh = hz; \ | ||
| 284 | stack[sp].dd = dz; \ | ||
| 285 | sp++; } | ||
| 286 | |||
| 287 | #define pop(lz,hz,dz) { sp--; \ | ||
| 288 | lz = stack[sp].ll; \ | ||
| 289 | hz = stack[sp].hh; \ | ||
| 290 | dz = stack[sp].dd; } | ||
| 291 | |||
| 292 | #define SMALL_THRESH 20 | ||
| 293 | #define DEPTH_THRESH 10 | ||
| 294 | |||
| 295 | /*-- | ||
| 296 | If you are ever unlucky/improbable enough | ||
| 297 | to get a stack overflow whilst sorting, | ||
| 298 | increase the following constant and try | ||
| 299 | again. In practice I have never seen the | ||
| 300 | stack go above 27 elems, so the following | ||
| 301 | limit seems very generous. | ||
| 302 | --*/ | ||
| 303 | #define QSORT_STACK_SIZE 1000 | ||
| 304 | |||
| 305 | |||
| 306 | static void qSort3 ( EState* s, Int32 loSt, Int32 hiSt, Int32 dSt ) | ||
| 307 | { | ||
| 308 | Int32 unLo, unHi, ltLo, gtHi, med, n, m; | ||
| 309 | Int32 sp, lo, hi, d; | ||
| 310 | StackElem stack[QSORT_STACK_SIZE]; | ||
| 311 | |||
| 312 | UChar* block = s->block; | ||
| 313 | UInt32* zptr = s->zptr; | ||
| 314 | Int32* workDone = &(s->workDone); | ||
| 315 | Int32 workLimit = s->workLimit; | ||
| 316 | Bool firstAttempt = s->firstAttempt; | ||
| 317 | |||
| 318 | sp = 0; | ||
| 319 | push ( loSt, hiSt, dSt ); | ||
| 320 | |||
| 321 | while (sp > 0) { | ||
| 322 | |||
| 323 | AssertH ( sp < QSORT_STACK_SIZE, 1001 ); | ||
| 324 | |||
| 325 | pop ( lo, hi, d ); | ||
| 326 | |||
| 327 | if (hi - lo < SMALL_THRESH || d > DEPTH_THRESH) { | ||
| 328 | simpleSort ( s, lo, hi, d ); | ||
| 329 | if (*workDone > workLimit && firstAttempt) return; | ||
| 330 | continue; | ||
| 331 | } | ||
| 332 | |||
| 333 | med = med3 ( block[zptr[ lo ]+d], | ||
| 334 | block[zptr[ hi ]+d], | ||
| 335 | block[zptr[ (lo+hi)>>1 ]+d] ); | ||
| 336 | |||
| 337 | unLo = ltLo = lo; | ||
| 338 | unHi = gtHi = hi; | ||
| 339 | |||
| 340 | while (True) { | ||
| 341 | while (True) { | ||
| 342 | if (unLo > unHi) break; | ||
| 343 | n = ((Int32)block[zptr[unLo]+d]) - med; | ||
| 344 | if (n == 0) { swap(zptr[unLo], zptr[ltLo]); ltLo++; unLo++; continue; }; | ||
| 345 | if (n > 0) break; | ||
| 346 | unLo++; | ||
| 347 | } | ||
| 348 | while (True) { | ||
| 349 | if (unLo > unHi) break; | ||
| 350 | n = ((Int32)block[zptr[unHi]+d]) - med; | ||
| 351 | if (n == 0) { swap(zptr[unHi], zptr[gtHi]); gtHi--; unHi--; continue; }; | ||
| 352 | if (n < 0) break; | ||
| 353 | unHi--; | ||
| 354 | } | ||
| 355 | if (unLo > unHi) break; | ||
| 356 | swap(zptr[unLo], zptr[unHi]); unLo++; unHi--; | ||
| 357 | } | ||
| 358 | |||
| 359 | AssertD ( unHi == unLo-1, "bad termination in qSort3" ); | ||
| 360 | |||
| 361 | if (gtHi < ltLo) { | ||
| 362 | push(lo, hi, d+1 ); | ||
| 363 | continue; | ||
| 364 | } | ||
| 365 | |||
| 366 | n = min(ltLo-lo, unLo-ltLo); vswap(zptr, lo, unLo-n, n); | ||
| 367 | m = min(hi-gtHi, gtHi-unHi); vswap(zptr, unLo, hi-m+1, m); | ||
| 368 | |||
| 369 | n = lo + unLo - ltLo - 1; | ||
| 370 | m = hi - (gtHi - unHi) + 1; | ||
| 371 | |||
| 372 | push ( lo, n, d ); | ||
| 373 | push ( n+1, m-1, d+1 ); | ||
| 374 | push ( m, hi, d ); | ||
| 375 | } | ||
| 376 | } | ||
| 377 | |||
| 378 | |||
| 379 | /*---------------------------------------------*/ | ||
| 380 | |||
| 381 | #define BIGFREQ(b) (ftab[((b)+1) << 8] - ftab[(b) << 8]) | ||
| 382 | |||
| 383 | #define SETMASK (1 << 21) | ||
| 384 | #define CLEARMASK (~(SETMASK)) | ||
| 385 | |||
| 386 | static void sortMain ( EState* s ) | ||
| 387 | { | ||
| 388 | Int32 i, j, k, ss, sb; | ||
| 389 | Int32 runningOrder[256]; | ||
| 390 | Int32 copy[256]; | ||
| 391 | Bool bigDone[256]; | ||
| 392 | UChar c1, c2; | ||
| 393 | Int32 numQSorted; | ||
| 394 | |||
| 395 | UChar* block = s->block; | ||
| 396 | UInt32* zptr = s->zptr; | ||
| 397 | UInt16* quadrant = s->quadrant; | ||
| 398 | Int32* ftab = s->ftab; | ||
| 399 | Int32* workDone = &(s->workDone); | ||
| 400 | Int32 nblock = s->nblock; | ||
| 401 | Int32 workLimit = s->workLimit; | ||
| 402 | Bool firstAttempt = s->firstAttempt; | ||
| 403 | |||
| 404 | /*-- | ||
| 405 | In the various block-sized structures, live data runs | ||
| 406 | from 0 to last+NUM_OVERSHOOT_BYTES inclusive. First, | ||
| 407 | set up the overshoot area for block. | ||
| 408 | --*/ | ||
| 409 | |||
| 410 | if (s->verbosity >= 4) | ||
| 411 | VPrintf0( " sort initialise ...\n" ); | ||
| 412 | |||
| 413 | for (i = 0; i < BZ_NUM_OVERSHOOT_BYTES; i++) | ||
| 414 | block[nblock+i] = block[i % nblock]; | ||
| 415 | for (i = 0; i < nblock+BZ_NUM_OVERSHOOT_BYTES; i++) | ||
| 416 | quadrant[i] = 0; | ||
| 417 | |||
| 418 | |||
| 419 | if (nblock <= 4000) { | ||
| 420 | |||
| 421 | /*-- | ||
| 422 | Use simpleSort(), since the full sorting mechanism | ||
| 423 | has quite a large constant overhead. | ||
| 424 | --*/ | ||
| 425 | if (s->verbosity >= 4) VPrintf0( " simpleSort ...\n" ); | ||
| 426 | for (i = 0; i < nblock; i++) zptr[i] = i; | ||
| 427 | firstAttempt = False; | ||
| 428 | *workDone = workLimit = 0; | ||
| 429 | simpleSort ( s, 0, nblock-1, 0 ); | ||
| 430 | if (s->verbosity >= 4) VPrintf0( " simpleSort done.\n" ); | ||
| 431 | |||
| 432 | } else { | ||
| 433 | |||
| 434 | numQSorted = 0; | ||
| 435 | for (i = 0; i <= 255; i++) bigDone[i] = False; | ||
| 436 | |||
| 437 | if (s->verbosity >= 4) VPrintf0( " bucket sorting ...\n" ); | ||
| 438 | |||
| 439 | for (i = 0; i <= 65536; i++) ftab[i] = 0; | ||
| 440 | |||
| 441 | c1 = block[nblock-1]; | ||
| 442 | for (i = 0; i < nblock; i++) { | ||
| 443 | c2 = block[i]; | ||
| 444 | ftab[(c1 << 8) + c2]++; | ||
| 445 | c1 = c2; | ||
| 446 | } | ||
| 447 | |||
| 448 | for (i = 1; i <= 65536; i++) ftab[i] += ftab[i-1]; | ||
| 449 | |||
| 450 | c1 = block[0]; | ||
| 451 | for (i = 0; i < nblock-1; i++) { | ||
| 452 | c2 = block[i+1]; | ||
| 453 | j = (c1 << 8) + c2; | ||
| 454 | c1 = c2; | ||
| 455 | ftab[j]--; | ||
| 456 | zptr[ftab[j]] = i; | ||
| 457 | } | ||
| 458 | j = (block[nblock-1] << 8) + block[0]; | ||
| 459 | ftab[j]--; | ||
| 460 | zptr[ftab[j]] = nblock-1; | ||
| 461 | |||
| 462 | /*-- | ||
| 463 | Now ftab contains the first loc of every small bucket. | ||
| 464 | Calculate the running order, from smallest to largest | ||
| 465 | big bucket. | ||
| 466 | --*/ | ||
| 467 | |||
| 468 | for (i = 0; i <= 255; i++) runningOrder[i] = i; | ||
| 469 | |||
| 470 | { | ||
| 471 | Int32 vv; | ||
| 472 | Int32 h = 1; | ||
| 473 | do h = 3 * h + 1; while (h <= 256); | ||
| 474 | do { | ||
| 475 | h = h / 3; | ||
| 476 | for (i = h; i <= 255; i++) { | ||
| 477 | vv = runningOrder[i]; | ||
| 478 | j = i; | ||
| 479 | while ( BIGFREQ(runningOrder[j-h]) > BIGFREQ(vv) ) { | ||
| 480 | runningOrder[j] = runningOrder[j-h]; | ||
| 481 | j = j - h; | ||
| 482 | if (j <= (h - 1)) goto zero; | ||
| 483 | } | ||
| 484 | zero: | ||
| 485 | runningOrder[j] = vv; | ||
| 486 | } | ||
| 487 | } while (h != 1); | ||
| 488 | } | ||
| 489 | |||
| 490 | /*-- | ||
| 491 | The main sorting loop. | ||
| 492 | --*/ | ||
| 493 | |||
| 494 | for (i = 0; i <= 255; i++) { | ||
| 495 | |||
| 496 | /*-- | ||
| 497 | Process big buckets, starting with the least full. | ||
| 498 | Basically this is a 4-step process in which we call | ||
| 499 | qSort3 to sort the small buckets [ss, j], but | ||
| 500 | also make a big effort to avoid the calls if we can. | ||
| 501 | --*/ | ||
| 502 | ss = runningOrder[i]; | ||
| 503 | |||
| 504 | /*-- | ||
| 505 | Step 1: | ||
| 506 | Complete the big bucket [ss] by quicksorting | ||
| 507 | any unsorted small buckets [ss, j], for j != ss. | ||
| 508 | Hopefully previous pointer-scanning phases have already | ||
| 509 | completed many of the small buckets [ss, j], so | ||
| 510 | we don't have to sort them at all. | ||
| 511 | --*/ | ||
| 512 | for (j = 0; j <= 255; j++) { | ||
| 513 | if (j != ss) { | ||
| 514 | sb = (ss << 8) + j; | ||
| 515 | if ( ! (ftab[sb] & SETMASK) ) { | ||
| 516 | Int32 lo = ftab[sb] & CLEARMASK; | ||
| 517 | Int32 hi = (ftab[sb+1] & CLEARMASK) - 1; | ||
| 518 | if (hi > lo) { | ||
| 519 | if (s->verbosity >= 4) | ||
| 520 | VPrintf4( " qsort [0x%x, 0x%x] done %d this %d\n", | ||
| 521 | ss, j, numQSorted, hi - lo + 1 ); | ||
| 522 | qSort3 ( s, lo, hi, 2 ); | ||
| 523 | numQSorted += ( hi - lo + 1 ); | ||
| 524 | if (*workDone > workLimit && firstAttempt) return; | ||
| 525 | } | ||
| 526 | } | ||
| 527 | ftab[sb] |= SETMASK; | ||
| 528 | } | ||
| 529 | } | ||
| 530 | |||
| 531 | /*-- | ||
| 532 | Step 2: | ||
| 533 | Deal specially with case [ss, ss]. This establishes the | ||
| 534 | sorted order for [ss, ss] without any comparisons. | ||
| 535 | A clever trick, cryptically described as steps Q6b and Q6c | ||
| 536 | in SRC-124 (aka BW94). This makes it entirely practical to | ||
| 537 | not use a preliminary run-length coder, but unfortunately | ||
| 538 | we are now stuck with the .bz2 file format. | ||
| 539 | --*/ | ||
| 540 | { | ||
| 541 | Int32 put0, get0, put1, get1; | ||
| 542 | Int32 sbn = (ss << 8) + ss; | ||
| 543 | Int32 lo = ftab[sbn] & CLEARMASK; | ||
| 544 | Int32 hi = (ftab[sbn+1] & CLEARMASK) - 1; | ||
| 545 | UChar ssc = (UChar)ss; | ||
| 546 | put0 = lo; | ||
| 547 | get0 = ftab[ss << 8] & CLEARMASK; | ||
| 548 | put1 = hi; | ||
| 549 | get1 = (ftab[(ss+1) << 8] & CLEARMASK) - 1; | ||
| 550 | while (get0 < put0) { | ||
| 551 | j = zptr[get0]-1; if (j < 0) j += nblock; | ||
| 552 | c1 = block[j]; | ||
| 553 | if (c1 == ssc) { zptr[put0] = j; put0++; }; | ||
| 554 | get0++; | ||
| 555 | } | ||
| 556 | while (get1 > put1) { | ||
| 557 | j = zptr[get1]-1; if (j < 0) j += nblock; | ||
| 558 | c1 = block[j]; | ||
| 559 | if (c1 == ssc) { zptr[put1] = j; put1--; }; | ||
| 560 | get1--; | ||
| 561 | } | ||
| 562 | ftab[sbn] |= SETMASK; | ||
| 563 | } | ||
| 564 | |||
| 565 | /*-- | ||
| 566 | Step 3: | ||
| 567 | The [ss] big bucket is now done. Record this fact, | ||
| 568 | and update the quadrant descriptors. Remember to | ||
| 569 | update quadrants in the overshoot area too, if | ||
| 570 | necessary. The "if (i < 255)" test merely skips | ||
| 571 | this updating for the last bucket processed, since | ||
| 572 | updating for the last bucket is pointless. | ||
| 573 | |||
| 574 | The quadrant array provides a way to incrementally | ||
| 575 | cache sort orderings, as they appear, so as to | ||
| 576 | make subsequent comparisons in fullGtU() complete | ||
| 577 | faster. For repetitive blocks this makes a big | ||
| 578 | difference (but not big enough to be able to avoid | ||
| 579 | randomisation for very repetitive data.) | ||
| 580 | |||
| 581 | The precise meaning is: at all times: | ||
| 582 | |||
| 583 | for 0 <= i < nblock and 0 <= j <= nblock | ||
| 584 | |||
| 585 | if block[i] != block[j], | ||
| 586 | |||
| 587 | then the relative values of quadrant[i] and | ||
| 588 | quadrant[j] are meaningless. | ||
| 589 | |||
| 590 | else { | ||
| 591 | if quadrant[i] < quadrant[j] | ||
| 592 | then the string starting at i lexicographically | ||
| 593 | precedes the string starting at j | ||
| 594 | |||
| 595 | else if quadrant[i] > quadrant[j] | ||
| 596 | then the string starting at j lexicographically | ||
| 597 | precedes the string starting at i | ||
| 598 | |||
| 599 | else | ||
| 600 | the relative ordering of the strings starting | ||
| 601 | at i and j has not yet been determined. | ||
| 602 | } | ||
| 603 | --*/ | ||
| 604 | bigDone[ss] = True; | ||
| 605 | |||
| 606 | if (i < 255) { | ||
| 607 | Int32 bbStart = ftab[ss << 8] & CLEARMASK; | ||
| 608 | Int32 bbSize = (ftab[(ss+1) << 8] & CLEARMASK) - bbStart; | ||
| 609 | Int32 shifts = 0; | ||
| 610 | |||
| 611 | while ((bbSize >> shifts) > 65534) shifts++; | ||
| 612 | |||
| 613 | for (j = 0; j < bbSize; j++) { | ||
| 614 | Int32 a2update = zptr[bbStart + j]; | ||
| 615 | UInt16 qVal = (UInt16)(j >> shifts); | ||
| 616 | quadrant[a2update] = qVal; | ||
| 617 | if (a2update < BZ_NUM_OVERSHOOT_BYTES) | ||
| 618 | quadrant[a2update + nblock] = qVal; | ||
| 619 | } | ||
| 620 | |||
| 621 | AssertH ( ( ((bbSize-1) >> shifts) <= 65535 ), 1002 ); | ||
| 622 | } | ||
| 623 | |||
| 624 | /*-- | ||
| 625 | Step 4: | ||
| 626 | Now scan this big bucket [ss] so as to synthesise the | ||
| 627 | sorted order for small buckets [t, ss] for all t != ss. | ||
| 628 | This will avoid doing Real Work in subsequent Step 1's. | ||
| 629 | --*/ | ||
| 630 | for (j = 0; j <= 255; j++) | ||
| 631 | copy[j] = ftab[(j << 8) + ss] & CLEARMASK; | ||
| 632 | |||
| 633 | for (j = ftab[ss << 8] & CLEARMASK; | ||
| 634 | j < (ftab[(ss+1) << 8] & CLEARMASK); | ||
| 635 | j++) { | ||
| 636 | k = zptr[j]-1; if (k < 0) k += nblock; | ||
| 637 | c1 = block[k]; | ||
| 638 | if ( ! bigDone[c1] ) { | ||
| 639 | zptr[copy[c1]] = k; | ||
| 640 | copy[c1] ++; | ||
| 641 | } | ||
| 642 | } | ||
| 643 | |||
| 644 | for (j = 0; j <= 255; j++) ftab[(j << 8) + ss] |= SETMASK; | ||
| 645 | } | ||
| 646 | if (s->verbosity >= 4) | ||
| 647 | VPrintf3( " %d pointers, %d sorted, %d scanned\n", | ||
| 648 | nblock, numQSorted, nblock - numQSorted ); | ||
| 649 | } | ||
| 650 | } | ||
| 651 | |||
| 652 | |||
| 653 | /*---------------------------------------------*/ | ||
| 654 | static void randomiseBlock ( EState* s ) | ||
| 655 | { | ||
| 656 | Int32 i; | ||
| 657 | BZ_RAND_INIT_MASK; | ||
| 658 | for (i = 0; i < 256; i++) s->inUse[i] = False; | ||
| 659 | |||
| 660 | for (i = 0; i < s->nblock; i++) { | ||
| 661 | BZ_RAND_UPD_MASK; | ||
| 662 | s->block[i] ^= BZ_RAND_MASK; | ||
| 663 | s->inUse[s->block[i]] = True; | ||
| 664 | } | ||
| 665 | } | ||
| 666 | |||
| 667 | |||
| 668 | /*---------------------------------------------*/ | ||
| 669 | void blockSort ( EState* s ) | ||
| 670 | { | ||
| 671 | Int32 i; | ||
| 672 | |||
| 673 | s->workLimit = s->workFactor * (s->nblock - 1); | ||
| 674 | s->workDone = 0; | ||
| 675 | s->blockRandomised = False; | ||
| 676 | s->firstAttempt = True; | ||
| 677 | |||
| 678 | sortMain ( s ); | ||
| 679 | |||
| 680 | if (s->verbosity >= 3) | ||
| 681 | VPrintf3( " %d work, %d block, ratio %5.2f\n", | ||
| 682 | s->workDone, s->nblock-1, | ||
| 683 | (float)(s->workDone) / (float)(s->nblock-1) ); | ||
| 684 | |||
| 685 | if (s->workDone > s->workLimit && s->firstAttempt) { | ||
| 686 | if (s->verbosity >= 2) | ||
| 687 | VPrintf0( " sorting aborted; randomising block\n" ); | ||
| 688 | randomiseBlock ( s ); | ||
| 689 | s->workLimit = s->workDone = 0; | ||
| 690 | s->blockRandomised = True; | ||
| 691 | s->firstAttempt = False; | ||
| 692 | sortMain ( s ); | ||
| 693 | if (s->verbosity >= 3) | ||
| 694 | VPrintf3( " %d work, %d block, ratio %f\n", | ||
| 695 | s->workDone, s->nblock-1, | ||
| 696 | (float)(s->workDone) / (float)(s->nblock-1) ); | ||
| 697 | } | ||
| 698 | |||
| 699 | s->origPtr = -1; | ||
| 700 | for (i = 0; i < s->nblock; i++) | ||
| 701 | if (s->zptr[i] == 0) | ||
| 702 | { s->origPtr = i; break; }; | ||
| 703 | |||
| 704 | AssertH( s->origPtr != -1, 1003 ); | ||
| 705 | } | ||
| 706 | |||
| 707 | /*-------------------------------------------------------------*/ | ||
| 708 | /*--- end blocksort.c ---*/ | ||
| 709 | /*-------------------------------------------------------------*/ | ||
| @@ -1,21 +1,29 @@ | |||
| 1 | .PU | 1 | .PU |
| 2 | .TH bzip2 1 | 2 | .TH bzip2 1 |
| 3 | .SH NAME | 3 | .SH NAME |
| 4 | bzip2, bunzip2 \- a block-sorting file compressor, v0.1 | 4 | bzip2, bunzip2 \- a block-sorting file compressor, v0.9.0 |
| 5 | .br | ||
| 6 | bzcat \- decompresses files to stdout | ||
| 5 | .br | 7 | .br |
| 6 | bzip2recover \- recovers data from damaged bzip2 files | 8 | bzip2recover \- recovers data from damaged bzip2 files |
| 7 | 9 | ||
| 8 | .SH SYNOPSIS | 10 | .SH SYNOPSIS |
| 9 | .ll +8 | 11 | .ll +8 |
| 10 | .B bzip2 | 12 | .B bzip2 |
| 11 | .RB [ " \-cdfkstvVL123456789 " ] | 13 | .RB [ " \-cdfkstvzVL123456789 " ] |
| 12 | [ | 14 | [ |
| 13 | .I "filenames \&..." | 15 | .I "filenames \&..." |
| 14 | ] | 16 | ] |
| 15 | .ll -8 | 17 | .ll -8 |
| 16 | .br | 18 | .br |
| 17 | .B bunzip2 | 19 | .B bunzip2 |
| 18 | .RB [ " \-kvsVL " ] | 20 | .RB [ " \-fkvsVL " ] |
| 21 | [ | ||
| 22 | .I "filenames \&..." | ||
| 23 | ] | ||
| 24 | .br | ||
| 25 | .B bzcat | ||
| 26 | .RB [ " \-s " ] | ||
| 19 | [ | 27 | [ |
| 20 | .I "filenames \&..." | 28 | .I "filenames \&..." |
| 21 | ] | 29 | ] |
| @@ -24,7 +32,7 @@ bzip2recover \- recovers data from damaged bzip2 files | |||
| 24 | .I "filename" | 32 | .I "filename" |
| 25 | 33 | ||
| 26 | .SH DESCRIPTION | 34 | .SH DESCRIPTION |
| 27 | .I Bzip2 | 35 | .I bzip2 |
| 28 | compresses files using the Burrows-Wheeler block-sorting | 36 | compresses files using the Burrows-Wheeler block-sorting |
| 29 | text compression algorithm, and Huffman coding. | 37 | text compression algorithm, and Huffman coding. |
| 30 | Compression is generally considerably | 38 | Compression is generally considerably |
| @@ -38,7 +46,7 @@ those of | |||
| 38 | .I GNU Gzip, | 46 | .I GNU Gzip, |
| 39 | but they are not identical. | 47 | but they are not identical. |
| 40 | 48 | ||
| 41 | .I Bzip2 | 49 | .I bzip2 |
| 42 | expects a list of file names to accompany the command-line flags. | 50 | expects a list of file names to accompany the command-line flags. |
| 43 | Each file is replaced by a compressed version of itself, | 51 | Each file is replaced by a compressed version of itself, |
| 44 | with the name "original_name.bz2". | 52 | with the name "original_name.bz2". |
| @@ -50,11 +58,11 @@ original file names, permissions and dates in filesystems | |||
| 50 | which lack these concepts, or have serious file name length | 58 | which lack these concepts, or have serious file name length |
| 51 | restrictions, such as MS-DOS. | 59 | restrictions, such as MS-DOS. |
| 52 | 60 | ||
| 53 | .I Bzip2 | 61 | .I bzip2 |
| 54 | and | 62 | and |
| 55 | .I bunzip2 | 63 | .I bunzip2 |
| 56 | will not overwrite existing files; if you want this to happen, | 64 | will by default not overwrite existing files; |
| 57 | you should delete them first. | 65 | if you want this to happen, specify the \-f flag. |
| 58 | 66 | ||
| 59 | If no file names are specified, | 67 | If no file names are specified, |
| 60 | .I bzip2 | 68 | .I bzip2 |
| @@ -64,7 +72,7 @@ In this case, | |||
| 64 | will decline to write compressed output to a terminal, as | 72 | will decline to write compressed output to a terminal, as |
| 65 | this would be entirely incomprehensible and therefore pointless. | 73 | this would be entirely incomprehensible and therefore pointless. |
| 66 | 74 | ||
| 67 | .I Bunzip2 | 75 | .I bunzip2 |
| 68 | (or | 76 | (or |
| 69 | .I bzip2 \-d | 77 | .I bzip2 \-d |
| 70 | ) decompresses and restores all specified files whose names | 78 | ) decompresses and restores all specified files whose names |
| @@ -73,12 +81,28 @@ Files without this suffix are ignored. | |||
| 73 | Again, supplying no filenames | 81 | Again, supplying no filenames |
| 74 | causes decompression from standard input to standard output. | 82 | causes decompression from standard input to standard output. |
| 75 | 83 | ||
| 84 | .I bunzip2 | ||
| 85 | will correctly decompress a file which is the concatenation | ||
| 86 | of two or more compressed files. The result is the concatenation | ||
| 87 | of the corresponding uncompressed files. Integrity testing | ||
| 88 | (\-t) of concatenated compressed files is also supported. | ||
| 89 | |||
| 76 | You can also compress or decompress files to | 90 | You can also compress or decompress files to |
| 77 | the standard output by giving the \-c flag. | 91 | the standard output by giving the \-c flag. |
| 78 | You can decompress multiple files like this, but you may | 92 | Multiple files may be compressed and decompressed like this. |
| 79 | only compress a single file this way, since it would otherwise | 93 | The resulting outputs are fed sequentially to stdout. |
| 80 | be difficult to separate out the compressed representations of | 94 | Compression of multiple files in this manner generates |
| 81 | the original files. | 95 | a stream containing multiple compressed file representations. |
| 96 | Such a stream can be decompressed correctly only by | ||
| 97 | .I bzip2 | ||
| 98 | version 0.9.0 or later. Earlier versions of | ||
| 99 | .I bzip2 | ||
| 100 | will stop after decompressing the first file in the stream. | ||
| 101 | |||
| 102 | .I bzcat | ||
| 103 | (or | ||
| 104 | .I bzip2 \-dc | ||
| 105 | ) decompresses all specified files to the standard output. | ||
| 82 | 106 | ||
| 83 | Compression is always performed, even if the compressed file is | 107 | Compression is always performed, even if the compressed file is |
| 84 | slightly larger than the original. Files of less than about | 108 | slightly larger than the original. Files of less than about |
| @@ -132,7 +156,7 @@ Compression and decompression requirements, in bytes, can be estimated as: | |||
| 132 | 156 | ||
| 133 | Compression: 400k + ( 7 x block size ) | 157 | Compression: 400k + ( 7 x block size ) |
| 134 | 158 | ||
| 135 | Decompression: 100k + ( 5 x block size ), or | 159 | Decompression: 100k + ( 4 x block size ), or |
| 136 | .br | 160 | .br |
| 137 | 100k + ( 2.5 x block size ) | 161 | 100k + ( 2.5 x block size ) |
| 138 | 162 | ||
| @@ -147,7 +171,7 @@ choice of block size. | |||
| 147 | 171 | ||
| 148 | For files compressed with the default 900k block size, | 172 | For files compressed with the default 900k block size, |
| 149 | .I bunzip2 | 173 | .I bunzip2 |
| 150 | will require about 4600 kbytes to decompress. | 174 | will require about 3700 kbytes to decompress. |
| 151 | To support decompression of any file on a 4 megabyte machine, | 175 | To support decompression of any file on a 4 megabyte machine, |
| 152 | .I bunzip2 | 176 | .I bunzip2 |
| 153 | has an option to decompress using approximately half this | 177 | has an option to decompress using approximately half this |
| @@ -168,8 +192,8 @@ For example, compressing a file 20,000 bytes long with the flag | |||
| 168 | \-9 | 192 | \-9 |
| 169 | will cause the compressor to allocate around | 193 | will cause the compressor to allocate around |
| 170 | 6700k of memory, but only touch 400k + 20000 * 7 = 540 | 194 | 6700k of memory, but only touch 400k + 20000 * 7 = 540 |
| 171 | kbytes of it. Similarly, the decompressor will allocate 4600k but | 195 | kbytes of it. Similarly, the decompressor will allocate 3700k but |
| 172 | only touch 100k + 20000 * 5 = 200 kbytes. | 196 | only touch 100k + 20000 * 4 = 180 kbytes. |
| 173 | 197 | ||
| 174 | Here is a table which summarises the maximum memory usage for | 198 | Here is a table which summarises the maximum memory usage for |
| 175 | different block sizes. Also recorded is the total compressed | 199 | different block sizes. Also recorded is the total compressed |
| @@ -182,71 +206,73 @@ Corpus is dominated by smaller files. | |||
| 182 | Compress Decompress Decompress Corpus | 206 | Compress Decompress Decompress Corpus |
| 183 | Flag usage usage -s usage Size | 207 | Flag usage usage -s usage Size |
| 184 | 208 | ||
| 185 | -1 1100k 600k 350k 914704 | 209 | -1 1100k 500k 350k 914704 |
| 186 | -2 1800k 1100k 600k 877703 | 210 | -2 1800k 900k 600k 877703 |
| 187 | -3 2500k 1600k 850k 860338 | 211 | -3 2500k 1300k 850k 860338 |
| 188 | -4 3200k 2100k 1100k 846899 | 212 | -4 3200k 1700k 1100k 846899 |
| 189 | -5 3900k 2600k 1350k 845160 | 213 | -5 3900k 2100k 1350k 845160 |
| 190 | -6 4600k 3100k 1600k 838626 | 214 | -6 4600k 2500k 1600k 838626 |
| 191 | -7 5400k 3600k 1850k 834096 | 215 | -7 5400k 2900k 1850k 834096 |
| 192 | -8 6000k 4100k 2100k 828642 | 216 | -8 6000k 3300k 2100k 828642 |
| 193 | -9 6700k 4600k 2350k 828642 | 217 | -9 6700k 3700k 2350k 828642 |
| 194 | 218 | ||
| 195 | .SH OPTIONS | 219 | .SH OPTIONS |
| 196 | .TP | 220 | .TP |
| 197 | .B \-c --stdout | 221 | .B \-c --stdout |
| 198 | Compress or decompress to standard output. \-c will decompress | 222 | Compress or decompress to standard output. \-c will decompress |
| 199 | multiple files to stdout, but will only compress a single file to | 223 | multiple files to stdout, but will only compress a single file to |
| 200 | stdout. | 224 | stdout. |
| 201 | .TP | 225 | .TP |
| 202 | .B \-d --decompress | 226 | .B \-d --decompress |
| 203 | Force decompression. | 227 | Force decompression. |
| 204 | .I Bzip2 | 228 | .I bzip2, |
| 205 | and | ||
| 206 | .I bunzip2 | 229 | .I bunzip2 |
| 207 | are really the same program, and the decision about whether to | 230 | and |
| 208 | compress or decompress is done on the basis of which name is | 231 | .I bzcat |
| 232 | are really the same program, and the decision about what actions | ||
| 233 | to take is done on the basis of which name is | ||
| 209 | used. This flag overrides that mechanism, and forces | 234 | used. This flag overrides that mechanism, and forces |
| 210 | .I bzip2 | 235 | .I bzip2 |
| 211 | to decompress. | 236 | to decompress. |
| 212 | .TP | 237 | .TP |
| 213 | .B \-f --compress | 238 | .B \-z --compress |
| 214 | The complement to \-d: forces compression, regardless of the invokation | 239 | The complement to \-d: forces compression, regardless of the invokation |
| 215 | name. | 240 | name. |
| 216 | .TP | 241 | .TP |
| 217 | .B \-t --test | 242 | .B \-t --test |
| 218 | Check integrity of the specified file(s), but don't decompress them. | 243 | Check integrity of the specified file(s), but don't decompress them. |
| 219 | This really performs a trial decompression and throws away the result, | 244 | This really performs a trial decompression and throws away the result. |
| 220 | using the low-memory decompression algorithm (see \-s). | 245 | .TP |
| 246 | .B \-f --force | ||
| 247 | Force overwrite of output files. Normally, | ||
| 248 | .I bzip2 | ||
| 249 | will not overwrite existing output files. | ||
| 221 | .TP | 250 | .TP |
| 222 | .B \-k --keep | 251 | .B \-k --keep |
| 223 | Keep (don't delete) input files during compression or decompression. | 252 | Keep (don't delete) input files during compression or decompression. |
| 224 | .TP | 253 | .TP |
| 225 | .B \-s --small | 254 | .B \-s --small |
| 226 | Reduce memory usage, both for compression and decompression. | 255 | Reduce memory usage, for compression, decompression and |
| 227 | Files are decompressed using a modified algorithm which only | 256 | testing. |
| 257 | Files are decompressed and tested using a modified algorithm which only | ||
| 228 | requires 2.5 bytes per block byte. This means any file can be | 258 | requires 2.5 bytes per block byte. This means any file can be |
| 229 | decompressed in 2300k of memory, albeit somewhat more slowly than | 259 | decompressed in 2300k of memory, albeit at about half the normal |
| 230 | usual. | 260 | speed. |
| 231 | 261 | ||
| 232 | During compression, -s selects a block size of 200k, which limits | 262 | During compression, -s selects a block size of 200k, which limits |
| 233 | memory use to around the same figure, at the expense of your | 263 | memory use to around the same figure, at the expense of your |
| 234 | compression ratio. In short, if your machine is low on memory | 264 | compression ratio. In short, if your machine is low on memory |
| 235 | (8 megabytes or less), use -s for everything. See | 265 | (8 megabytes or less), use -s for everything. See |
| 236 | MEMORY MANAGEMENT above. | 266 | MEMORY MANAGEMENT above. |
| 237 | |||
| 238 | .TP | 267 | .TP |
| 239 | .B \-v --verbose | 268 | .B \-v --verbose |
| 240 | Verbose mode -- show the compression ratio for each file processed. | 269 | Verbose mode -- show the compression ratio for each file processed. |
| 241 | Further \-v's increase the verbosity level, spewing out lots of | 270 | Further \-v's increase the verbosity level, spewing out lots of |
| 242 | information which is primarily of interest for diagnostic purposes. | 271 | information which is primarily of interest for diagnostic purposes. |
| 243 | .TP | 272 | .TP |
| 244 | .B \-L --license | 273 | .B \-L --license -V --version |
| 245 | Display the software version, license terms and conditions. | 274 | Display the software version, license terms and conditions. |
| 246 | .TP | 275 | .TP |
| 247 | .B \-V --version | ||
| 248 | Same as \-L. | ||
| 249 | .TP | ||
| 250 | .B \-1 to \-9 | 276 | .B \-1 to \-9 |
| 251 | Set the block size to 100 k, 200 k .. 900 k when | 277 | Set the block size to 100 k, 200 k .. 900 k when |
| 252 | compressing. Has no effect when decompressing. | 278 | compressing. Has no effect when decompressing. |
| @@ -329,10 +355,6 @@ to compress the latter. | |||
| 329 | If you do get a file which causes severe slowness in compression, | 355 | If you do get a file which causes severe slowness in compression, |
| 330 | try making the block size as small as possible, with flag \-1. | 356 | try making the block size as small as possible, with flag \-1. |
| 331 | 357 | ||
| 332 | Incompressible or virtually-incompressible data may decompress | ||
| 333 | rather more slowly than one would hope. This is due to | ||
| 334 | a naive implementation of the move-to-front coder. | ||
| 335 | |||
| 336 | .I bzip2 | 358 | .I bzip2 |
| 337 | usually allocates several megabytes of memory to operate in, | 359 | usually allocates several megabytes of memory to operate in, |
| 338 | and then charges all over it in a fairly random fashion. This | 360 | and then charges all over it in a fairly random fashion. This |
| @@ -346,28 +368,19 @@ I imagine | |||
| 346 | .I bzip2 | 368 | .I bzip2 |
| 347 | will perform best on machines with very large caches. | 369 | will perform best on machines with very large caches. |
| 348 | 370 | ||
| 349 | Test mode (\-t) uses the low-memory decompression algorithm | ||
| 350 | (\-s). This means test mode does not run as fast as it could; | ||
| 351 | it could run as fast as the normal decompression machinery. | ||
| 352 | This could easily be fixed at the cost of some code bloat. | ||
| 353 | |||
| 354 | .SH CAVEATS | 371 | .SH CAVEATS |
| 355 | I/O error messages are not as helpful as they could be. | 372 | I/O error messages are not as helpful as they could be. |
| 356 | .I Bzip2 | 373 | .I Bzip2 |
| 357 | tries hard to detect I/O errors and exit cleanly, but the | 374 | tries hard to detect I/O errors and exit cleanly, but the |
| 358 | details of what the problem is sometimes seem rather misleading. | 375 | details of what the problem is sometimes seem rather misleading. |
| 359 | 376 | ||
| 360 | This manual page pertains to version 0.1 of | 377 | This manual page pertains to version 0.9.0 of |
| 361 | .I bzip2. | 378 | .I bzip2. |
| 362 | It may well happen that some future version will | 379 | Compressed data created by this version is entirely forwards and |
| 363 | use a different compressed file format. If you try to | 380 | backwards compatible with the previous public release, version 0.1pl2, |
| 364 | decompress, using 0.1, a .bz2 file created with some | 381 | but with the following exception: 0.9.0 can correctly decompress |
| 365 | future version which uses a different compressed file format, | 382 | multiple concatenated compressed files. 0.1pl2 cannot do this; it |
| 366 | 0.1 will complain that your file "is not a bzip2 file". | 383 | will stop after decompressing just the first file in the stream. |
| 367 | If that happens, you should obtain a more recent version | ||
| 368 | of | ||
| 369 | .I bzip2 | ||
| 370 | and use that to decompress the file. | ||
| 371 | 384 | ||
| 372 | Wildcard expansion for Windows 95 and NT | 385 | Wildcard expansion for Windows 95 and NT |
| 373 | is flaky. | 386 | is flaky. |
| @@ -377,63 +390,25 @@ uses 32-bit integers to represent bit positions in | |||
| 377 | compressed files, so it cannot handle compressed files | 390 | compressed files, so it cannot handle compressed files |
| 378 | more than 512 megabytes long. This could easily be fixed. | 391 | more than 512 megabytes long. This could easily be fixed. |
| 379 | 392 | ||
| 380 | .I bzip2recover | ||
| 381 | sometimes reports a very small, incomplete final block. | ||
| 382 | This is spurious and can be safely ignored. | ||
| 383 | |||
| 384 | .SH RELATIONSHIP TO bzip-0.21 | ||
| 385 | This program is a descendant of the | ||
| 386 | .I bzip | ||
| 387 | program, version 0.21, which I released in August 1996. | ||
| 388 | The primary difference of | ||
| 389 | .I bzip2 | ||
| 390 | is its avoidance of the possibly patented algorithms | ||
| 391 | which were used in 0.21. | ||
| 392 | .I bzip2 | ||
| 393 | also brings various useful refinements (\-s, \-t), | ||
| 394 | uses less memory, decompresses significantly faster, and | ||
| 395 | has support for recovering data from damaged files. | ||
| 396 | |||
| 397 | Because | ||
| 398 | .I bzip2 | ||
| 399 | uses Huffman coding to construct the compressed bitstream, | ||
| 400 | rather than the arithmetic coding used in 0.21, | ||
| 401 | the compressed representations generated by the two programs | ||
| 402 | are incompatible, and they will not interoperate. The change | ||
| 403 | in suffix from .bz to .bz2 reflects this. It would have been | ||
| 404 | helpful to at least allow | ||
| 405 | .I bzip2 | ||
| 406 | to decompress files created by 0.21, but this would | ||
| 407 | defeat the primary aim of having a patent-free compressor. | ||
| 408 | |||
| 409 | For a more precise statement about patent issues in | ||
| 410 | bzip2, please see the README file in the distribution. | ||
| 411 | |||
| 412 | Huffman coding necessarily involves some coding inefficiency | ||
| 413 | compared to arithmetic coding. This means that | ||
| 414 | .I bzip2 | ||
| 415 | compresses about 1% worse than 0.21, an unfortunate but | ||
| 416 | unavoidable fact-of-life. On the other hand, decompression | ||
| 417 | is approximately 50% faster for the same reason, and the | ||
| 418 | change in file format gave an opportunity to add data-recovery | ||
| 419 | features. So it is not all bad. | ||
| 420 | |||
| 421 | .SH AUTHOR | 393 | .SH AUTHOR |
| 422 | Julian Seward, jseward@acm.org. | 394 | Julian Seward, jseward@acm.org. |
| 423 | 395 | ||
| 396 | http://www.muraroa.demon.co.uk | ||
| 397 | |||
| 424 | The ideas embodied in | 398 | The ideas embodied in |
| 425 | .I bzip | ||
| 426 | and | ||
| 427 | .I bzip2 | 399 | .I bzip2 |
| 428 | are due to (at least) the following people: | 400 | are due to (at least) the following people: |
| 429 | Michael Burrows and David Wheeler (for the block sorting | 401 | Michael Burrows and David Wheeler (for the block sorting |
| 430 | transformation), David Wheeler (again, for the Huffman coder), | 402 | transformation), David Wheeler (again, for the Huffman coder), |
| 431 | Peter Fenwick (for the structured coding model in 0.21, | 403 | Peter Fenwick (for the structured coding model in the original |
| 404 | .I bzip, | ||
| 432 | and many refinements), | 405 | and many refinements), |
| 433 | and | 406 | and |
| 434 | Alistair Moffat, Radford Neal and Ian Witten (for the arithmetic | 407 | Alistair Moffat, Radford Neal and Ian Witten (for the arithmetic |
| 435 | coder in 0.21). I am much indebted for their help, support and advice. | 408 | coder in the original |
| 436 | See the file ALGORITHMS in the source distribution for pointers to | 409 | .I bzip). |
| 410 | I am much indebted for their help, support and advice. | ||
| 411 | See the manual in the source distribution for pointers to | ||
| 437 | sources of documentation. | 412 | sources of documentation. |
| 438 | Christian von Roques encouraged me to look for faster | 413 | Christian von Roques encouraged me to look for faster |
| 439 | sorting algorithms, so as to speed up compression. | 414 | sorting algorithms, so as to speed up compression. |
diff --git a/bzip2.1.preformatted b/bzip2.1.preformatted index 5206e05..8c4fab1 100644 --- a/bzip2.1.preformatted +++ b/bzip2.1.preformatted | |||
| @@ -5,18 +5,20 @@ bzip2(1) bzip2(1) | |||
| 5 | 5 | ||
| 6 | 6 | ||
| 7 | NNAAMMEE | 7 | NNAAMMEE |
| 8 | bzip2, bunzip2 - a block-sorting file compressor, v0.1 | 8 | bzip2, bunzip2 - a block-sorting file compressor, v0.9.0 |
| 9 | bzcat - decompresses files to stdout | ||
| 9 | bzip2recover - recovers data from damaged bzip2 files | 10 | bzip2recover - recovers data from damaged bzip2 files |
| 10 | 11 | ||
| 11 | 12 | ||
| 12 | SSYYNNOOPPSSIISS | 13 | SSYYNNOOPPSSIISS |
| 13 | bbzziipp22 [ --ccddffkkssttvvVVLL112233445566778899 ] [ _f_i_l_e_n_a_m_e_s _._._. ] | 14 | bbzziipp22 [ --ccddffkkssttvvzzVVLL112233445566778899 ] [ _f_i_l_e_n_a_m_e_s _._._. ] |
| 14 | bbuunnzziipp22 [ --kkvvssVVLL ] [ _f_i_l_e_n_a_m_e_s _._._. ] | 15 | bbuunnzziipp22 [ --ffkkvvssVVLL ] [ _f_i_l_e_n_a_m_e_s _._._. ] |
| 16 | bbzzccaatt [ --ss ] [ _f_i_l_e_n_a_m_e_s _._._. ] | ||
| 15 | bbzziipp22rreeccoovveerr _f_i_l_e_n_a_m_e | 17 | bbzziipp22rreeccoovveerr _f_i_l_e_n_a_m_e |
| 16 | 18 | ||
| 17 | 19 | ||
| 18 | DDEESSCCRRIIPPTTIIOONN | 20 | DDEESSCCRRIIPPTTIIOONN |
| 19 | _B_z_i_p_2 compresses files using the Burrows-Wheeler block- | 21 | _b_z_i_p_2 compresses files using the Burrows-Wheeler block- |
| 20 | sorting text compression algorithm, and Huffman coding. | 22 | sorting text compression algorithm, and Huffman coding. |
| 21 | Compression is generally considerably better than that | 23 | Compression is generally considerably better than that |
| 22 | achieved by more conventional LZ77/LZ78-based compressors, | 24 | achieved by more conventional LZ77/LZ78-based compressors, |
| @@ -26,7 +28,7 @@ DDEESSCCRRIIPPTTIIOONN | |||
| 26 | The command-line options are deliberately very similar to | 28 | The command-line options are deliberately very similar to |
| 27 | those of _G_N_U _G_z_i_p_, but they are not identical. | 29 | those of _G_N_U _G_z_i_p_, but they are not identical. |
| 28 | 30 | ||
| 29 | _B_z_i_p_2 expects a list of file names to accompany the com- | 31 | _b_z_i_p_2 expects a list of file names to accompany the com- |
| 30 | mand-line flags. Each file is replaced by a compressed | 32 | mand-line flags. Each file is replaced by a compressed |
| 31 | version of itself, with the name "original_name.bz2". | 33 | version of itself, with the name "original_name.bz2". |
| 32 | Each compressed file has the same modification date and | 34 | Each compressed file has the same modification date and |
| @@ -38,8 +40,8 @@ DDEESSCCRRIIPPTTIIOONN | |||
| 38 | cepts, or have serious file name length restrictions, such | 40 | cepts, or have serious file name length restrictions, such |
| 39 | as MS-DOS. | 41 | as MS-DOS. |
| 40 | 42 | ||
| 41 | _B_z_i_p_2 and _b_u_n_z_i_p_2 will not overwrite existing files; if | 43 | _b_z_i_p_2 and _b_u_n_z_i_p_2 will by default not overwrite existing |
| 42 | you want this to happen, you should delete them first. | 44 | files; if you want this to happen, specify the -f flag. |
| 43 | 45 | ||
| 44 | If no file names are specified, _b_z_i_p_2 compresses from | 46 | If no file names are specified, _b_z_i_p_2 compresses from |
| 45 | standard input to standard output. In this case, _b_z_i_p_2 | 47 | standard input to standard output. In this case, _b_z_i_p_2 |
| @@ -47,17 +49,15 @@ DDEESSCCRRIIPPTTIIOONN | |||
| 47 | this would be entirely incomprehensible and therefore | 49 | this would be entirely incomprehensible and therefore |
| 48 | pointless. | 50 | pointless. |
| 49 | 51 | ||
| 50 | _B_u_n_z_i_p_2 (or _b_z_i_p_2 _-_d ) decompresses and restores all spec- | 52 | _b_u_n_z_i_p_2 (or _b_z_i_p_2 _-_d ) decompresses and restores all spec- |
| 51 | ified files whose names end in ".bz2". Files without this | 53 | ified files whose names end in ".bz2". Files without this |
| 52 | suffix are ignored. Again, supplying no filenames causes | 54 | suffix are ignored. Again, supplying no filenames causes |
| 53 | decompression from standard input to standard output. | 55 | decompression from standard input to standard output. |
| 54 | 56 | ||
| 55 | You can also compress or decompress files to the standard | 57 | _b_u_n_z_i_p_2 will correctly decompress a file which is the con- |
| 56 | output by giving the -c flag. You can decompress multiple | 58 | catenation of two or more compressed files. The result is |
| 57 | files like this, but you may only compress a single file | 59 | the concatenation of the corresponding uncompressed files. |
| 58 | this way, since it would otherwise be difficult to sepa- | 60 | Integrity testing (-t) of concatenated compressed files is |
| 59 | rate out the compressed representations of the original | ||
| 60 | files. | ||
| 61 | 61 | ||
| 62 | 62 | ||
| 63 | 63 | ||
| @@ -70,6 +70,21 @@ DDEESSCCRRIIPPTTIIOONN | |||
| 70 | bzip2(1) bzip2(1) | 70 | bzip2(1) bzip2(1) |
| 71 | 71 | ||
| 72 | 72 | ||
| 73 | also supported. | ||
| 74 | |||
| 75 | You can also compress or decompress files to the standard | ||
| 76 | output by giving the -c flag. Multiple files may be com- | ||
| 77 | pressed and decompressed like this. The resulting outputs | ||
| 78 | are fed sequentially to stdout. Compression of multiple | ||
| 79 | files in this manner generates a stream containing multi- | ||
| 80 | ple compressed file representations. Such a stream can be | ||
| 81 | decompressed correctly only by _b_z_i_p_2 version 0.9.0 or | ||
| 82 | later. Earlier versions of _b_z_i_p_2 will stop after decom- | ||
| 83 | pressing the first file in the stream. | ||
| 84 | |||
| 85 | _b_z_c_a_t (or _b_z_i_p_2 _-_d_c ) decompresses all specified files to | ||
| 86 | the standard output. | ||
| 87 | |||
| 73 | Compression is always performed, even if the compressed | 88 | Compression is always performed, even if the compressed |
| 74 | file is slightly larger than the original. Files of less | 89 | file is slightly larger than the original. Files of less |
| 75 | than about one hundred bytes tend to get larger, since the | 90 | than about one hundred bytes tend to get larger, since the |
| @@ -108,36 +123,37 @@ MMEEMMOORRYY MMAANNAAGGEEMMEENNTT | |||
| 108 | file, and _b_u_n_z_i_p_2 then allocates itself just enough memory | 123 | file, and _b_u_n_z_i_p_2 then allocates itself just enough memory |
| 109 | to decompress the file. Since block sizes are stored in | 124 | to decompress the file. Since block sizes are stored in |
| 110 | compressed files, it follows that the flags -1 to -9 are | 125 | compressed files, it follows that the flags -1 to -9 are |
| 111 | irrelevant to and so ignored during decompression. Com- | 126 | irrelevant to and so ignored during decompression. |
| 112 | pression and decompression requirements, in bytes, can be | ||
| 113 | estimated as: | ||
| 114 | 127 | ||
| 115 | Compression: 400k + ( 7 x block size ) | ||
| 116 | 128 | ||
| 117 | Decompression: 100k + ( 5 x block size ), or | ||
| 118 | 100k + ( 2.5 x block size ) | ||
| 119 | 129 | ||
| 120 | Larger block sizes give rapidly diminishing marginal | 130 | 2 |
| 121 | returns; most of the compression comes from the first two | ||
| 122 | or three hundred k of block size, a fact worth bearing in | ||
| 123 | mind when using _b_z_i_p_2 on small machines. It is also | ||
| 124 | important to appreciate that the decompression memory | ||
| 125 | requirement is set at compression-time by the choice of | ||
| 126 | block size. | ||
| 127 | 131 | ||
| 128 | 132 | ||
| 129 | 133 | ||
| 130 | 2 | ||
| 131 | 134 | ||
| 132 | 135 | ||
| 136 | bzip2(1) bzip2(1) | ||
| 133 | 137 | ||
| 134 | 138 | ||
| 139 | Compression and decompression requirements, in bytes, can | ||
| 140 | be estimated as: | ||
| 135 | 141 | ||
| 136 | bzip2(1) bzip2(1) | 142 | Compression: 400k + ( 7 x block size ) |
| 137 | 143 | ||
| 144 | Decompression: 100k + ( 4 x block size ), or | ||
| 145 | 100k + ( 2.5 x block size ) | ||
| 146 | |||
| 147 | Larger block sizes give rapidly diminishing marginal | ||
| 148 | returns; most of the compression comes from the first two | ||
| 149 | or three hundred k of block size, a fact worth bearing in | ||
| 150 | mind when using _b_z_i_p_2 on small machines. It is also | ||
| 151 | important to appreciate that the decompression memory | ||
| 152 | requirement is set at compression-time by the choice of | ||
| 153 | block size. | ||
| 138 | 154 | ||
| 139 | For files compressed with the default 900k block size, | 155 | For files compressed with the default 900k block size, |
| 140 | _b_u_n_z_i_p_2 will require about 4600 kbytes to decompress. To | 156 | _b_u_n_z_i_p_2 will require about 3700 kbytes to decompress. To |
| 141 | support decompression of any file on a 4 megabyte machine, | 157 | support decompression of any file on a 4 megabyte machine, |
| 142 | _b_u_n_z_i_p_2 has an option to decompress using approximately | 158 | _b_u_n_z_i_p_2 has an option to decompress using approximately |
| 143 | half this amount of memory, about 2300 kbytes. Decompres- | 159 | half this amount of memory, about 2300 kbytes. Decompres- |
| @@ -157,8 +173,8 @@ bzip2(1) bzip2(1) | |||
| 157 | file 20,000 bytes long with the flag -9 will cause the | 173 | file 20,000 bytes long with the flag -9 will cause the |
| 158 | compressor to allocate around 6700k of memory, but only | 174 | compressor to allocate around 6700k of memory, but only |
| 159 | touch 400k + 20000 * 7 = 540 kbytes of it. Similarly, the | 175 | touch 400k + 20000 * 7 = 540 kbytes of it. Similarly, the |
| 160 | decompressor will allocate 4600k but only touch 100k + | 176 | decompressor will allocate 3700k but only touch 100k + |
| 161 | 20000 * 5 = 200 kbytes. | 177 | 20000 * 4 = 180 kbytes. |
| 162 | 178 | ||
| 163 | Here is a table which summarises the maximum memory usage | 179 | Here is a table which summarises the maximum memory usage |
| 164 | for different block sizes. Also recorded is the total | 180 | for different block sizes. Also recorded is the total |
| @@ -172,64 +188,66 @@ bzip2(1) bzip2(1) | |||
| 172 | Compress Decompress Decompress Corpus | 188 | Compress Decompress Decompress Corpus |
| 173 | Flag usage usage -s usage Size | 189 | Flag usage usage -s usage Size |
| 174 | 190 | ||
| 175 | -1 1100k 600k 350k 914704 | 191 | -1 1100k 500k 350k 914704 |
| 176 | -2 1800k 1100k 600k 877703 | 192 | -2 1800k 900k 600k 877703 |
| 177 | -3 2500k 1600k 850k 860338 | ||
| 178 | -4 3200k 2100k 1100k 846899 | ||
| 179 | -5 3900k 2600k 1350k 845160 | ||
| 180 | -6 4600k 3100k 1600k 838626 | ||
| 181 | -7 5400k 3600k 1850k 834096 | ||
| 182 | -8 6000k 4100k 2100k 828642 | ||
| 183 | -9 6700k 4600k 2350k 828642 | ||
| 184 | 193 | ||
| 185 | 194 | ||
| 186 | OOPPTTIIOONNSS | ||
| 187 | --cc ----ssttddoouutt | ||
| 188 | Compress or decompress to standard output. -c will | ||
| 189 | decompress multiple files to stdout, but will only | ||
| 190 | compress a single file to stdout. | ||
| 191 | |||
| 192 | 195 | ||
| 196 | 3 | ||
| 193 | 197 | ||
| 194 | 198 | ||
| 195 | 199 | ||
| 196 | 3 | ||
| 197 | 200 | ||
| 198 | 201 | ||
| 202 | bzip2(1) bzip2(1) | ||
| 199 | 203 | ||
| 200 | 204 | ||
| 205 | -3 2500k 1300k 850k 860338 | ||
| 206 | -4 3200k 1700k 1100k 846899 | ||
| 207 | -5 3900k 2100k 1350k 845160 | ||
| 208 | -6 4600k 2500k 1600k 838626 | ||
| 209 | -7 5400k 2900k 1850k 834096 | ||
| 210 | -8 6000k 3300k 2100k 828642 | ||
| 211 | -9 6700k 3700k 2350k 828642 | ||
| 201 | 212 | ||
| 202 | bzip2(1) bzip2(1) | ||
| 203 | 213 | ||
| 214 | OOPPTTIIOONNSS | ||
| 215 | --cc ----ssttddoouutt | ||
| 216 | Compress or decompress to standard output. -c will | ||
| 217 | decompress multiple files to stdout, but will only | ||
| 218 | compress a single file to stdout. | ||
| 204 | 219 | ||
| 205 | --dd ----ddeeccoommpprreessss | 220 | --dd ----ddeeccoommpprreessss |
| 206 | Force decompression. _B_z_i_p_2 and _b_u_n_z_i_p_2 are really | 221 | Force decompression. _b_z_i_p_2_, _b_u_n_z_i_p_2 and _b_z_c_a_t are |
| 207 | the same program, and the decision about whether to | 222 | really the same program, and the decision about |
| 208 | compress or decompress is done on the basis of | 223 | what actions to take is done on the basis of which |
| 209 | which name is used. This flag overrides that mech- | 224 | name is used. This flag overrides that mechanism, |
| 210 | anism, and forces _b_z_i_p_2 to decompress. | 225 | and forces _b_z_i_p_2 to decompress. |
| 211 | 226 | ||
| 212 | --ff ----ccoommpprreessss | 227 | --zz ----ccoommpprreessss |
| 213 | The complement to -d: forces compression, regard- | 228 | The complement to -d: forces compression, regard- |
| 214 | less of the invokation name. | 229 | less of the invokation name. |
| 215 | 230 | ||
| 216 | --tt ----tteesstt | 231 | --tt ----tteesstt |
| 217 | Check integrity of the specified file(s), but don't | 232 | Check integrity of the specified file(s), but don't |
| 218 | decompress them. This really performs a trial | 233 | decompress them. This really performs a trial |
| 219 | decompression and throws away the result, using the | 234 | decompression and throws away the result. |
| 220 | low-memory decompression algorithm (see -s). | 235 | |
| 236 | --ff ----ffoorrccee | ||
| 237 | Force overwrite of output files. Normally, _b_z_i_p_2 | ||
| 238 | will not overwrite existing output files. | ||
| 221 | 239 | ||
| 222 | --kk ----kkeeeepp | 240 | --kk ----kkeeeepp |
| 223 | Keep (don't delete) input files during compression | 241 | Keep (don't delete) input files during compression |
| 224 | or decompression. | 242 | or decompression. |
| 225 | 243 | ||
| 226 | --ss ----ssmmaallll | 244 | --ss ----ssmmaallll |
| 227 | Reduce memory usage, both for compression and | 245 | Reduce memory usage, for compression, decompression |
| 228 | decompression. Files are decompressed using a mod- | 246 | and testing. Files are decompressed and tested |
| 229 | ified algorithm which only requires 2.5 bytes per | 247 | using a modified algorithm which only requires 2.5 |
| 230 | block byte. This means any file can be decom- | 248 | bytes per block byte. This means any file can be |
| 231 | pressed in 2300k of memory, albeit somewhat more | 249 | decompressed in 2300k of memory, albeit at about |
| 232 | slowly than usual. | 250 | half the normal speed. |
| 233 | 251 | ||
| 234 | During compression, -s selects a block size of | 252 | During compression, -s selects a block size of |
| 235 | 200k, which limits memory use to around the same | 253 | 200k, which limits memory use to around the same |
| @@ -239,35 +257,32 @@ bzip2(1) bzip2(1) | |||
| 239 | MEMORY MANAGEMENT above. | 257 | MEMORY MANAGEMENT above. |
| 240 | 258 | ||
| 241 | 259 | ||
| 260 | |||
| 261 | |||
| 262 | 4 | ||
| 263 | |||
| 264 | |||
| 265 | |||
| 266 | |||
| 267 | |||
| 268 | bzip2(1) bzip2(1) | ||
| 269 | |||
| 270 | |||
| 242 | --vv ----vveerrbboossee | 271 | --vv ----vveerrbboossee |
| 243 | Verbose mode -- show the compression ratio for each | 272 | Verbose mode -- show the compression ratio for each |
| 244 | file processed. Further -v's increase the ver- | 273 | file processed. Further -v's increase the ver- |
| 245 | bosity level, spewing out lots of information which | 274 | bosity level, spewing out lots of information which |
| 246 | is primarily of interest for diagnostic purposes. | 275 | is primarily of interest for diagnostic purposes. |
| 247 | 276 | ||
| 248 | --LL ----lliicceennssee | 277 | --LL ----lliicceennssee --VV ----vveerrssiioonn |
| 249 | Display the software version, license terms and | 278 | Display the software version, license terms and |
| 250 | conditions. | 279 | conditions. |
| 251 | 280 | ||
| 252 | --VV ----vveerrssiioonn | ||
| 253 | Same as -L. | ||
| 254 | |||
| 255 | --11 ttoo --99 | 281 | --11 ttoo --99 |
| 256 | Set the block size to 100 k, 200 k .. 900 k when | 282 | Set the block size to 100 k, 200 k .. 900 k when |
| 257 | compressing. Has no effect when decompressing. | 283 | compressing. Has no effect when decompressing. |
| 258 | See MEMORY MANAGEMENT above. | 284 | See MEMORY MANAGEMENT above. |
| 259 | 285 | ||
| 260 | |||
| 261 | |||
| 262 | 4 | ||
| 263 | |||
| 264 | |||
| 265 | |||
| 266 | |||
| 267 | |||
| 268 | bzip2(1) bzip2(1) | ||
| 269 | |||
| 270 | |||
| 271 | ----rreeppeettiittiivvee--ffaasstt | 286 | ----rreeppeettiittiivvee--ffaasstt |
| 272 | _b_z_i_p_2 injects some small pseudo-random variations | 287 | _b_z_i_p_2 injects some small pseudo-random variations |
| 273 | into very repetitive blocks to limit worst-case | 288 | into very repetitive blocks to limit worst-case |
| @@ -306,34 +321,34 @@ RREECCOOVVEERRIINNGG DDAATTAA FFRROOMM DDAAMMAAGGEEDD F | |||
| 306 | _b_z_i_p_2_r_e_c_o_v_e_r takes a single argument, the name of the dam- | 321 | _b_z_i_p_2_r_e_c_o_v_e_r takes a single argument, the name of the dam- |
| 307 | aged file, and writes a number of files "rec0001file.bz2", | 322 | aged file, and writes a number of files "rec0001file.bz2", |
| 308 | "rec0002file.bz2", etc, containing the extracted blocks. | 323 | "rec0002file.bz2", etc, containing the extracted blocks. |
| 309 | The output filenames are designed so that the use of wild- | 324 | The output filenames are designed so that the use of |
| 310 | cards in subsequent processing -- for example, "bzip2 -dc | ||
| 311 | rec*file.bz2 > recovered_data" -- lists the files in the | ||
| 312 | "right" order. | ||
| 313 | 325 | ||
| 314 | _b_z_i_p_2_r_e_c_o_v_e_r should be of most use dealing with large .bz2 | ||
| 315 | files, as these will contain many blocks. It is clearly | ||
| 316 | futile to use it on damaged single-block files, since a | ||
| 317 | damaged block cannot be recovered. If you wish to min- | ||
| 318 | imise any potential data loss through media or transmis- | ||
| 319 | sion errors, you might consider compressing with a smaller | ||
| 320 | block size. | ||
| 321 | 326 | ||
| 322 | 327 | ||
| 323 | PPEERRFFOORRMMAANNCCEE NNOOTTEESS | 328 | 5 |
| 324 | The sorting phase of compression gathers together similar | ||
| 325 | 329 | ||
| 326 | 330 | ||
| 327 | 331 | ||
| 328 | 5 | ||
| 329 | 332 | ||
| 330 | 333 | ||
| 334 | bzip2(1) bzip2(1) | ||
| 331 | 335 | ||
| 332 | 336 | ||
| 337 | wildcards in subsequent processing -- for example, "bzip2 | ||
| 338 | -dc rec*file.bz2 > recovered_data" -- lists the files in | ||
| 339 | the "right" order. | ||
| 333 | 340 | ||
| 334 | bzip2(1) bzip2(1) | 341 | _b_z_i_p_2_r_e_c_o_v_e_r should be of most use dealing with large .bz2 |
| 342 | files, as these will contain many blocks. It is clearly | ||
| 343 | futile to use it on damaged single-block files, since a | ||
| 344 | damaged block cannot be recovered. If you wish to min- | ||
| 345 | imise any potential data loss through media or transmis- | ||
| 346 | sion errors, you might consider compressing with a smaller | ||
| 347 | block size. | ||
| 335 | 348 | ||
| 336 | 349 | ||
| 350 | PPEERRFFOORRMMAANNCCEE NNOOTTEESS | ||
| 351 | The sorting phase of compression gathers together similar | ||
| 337 | strings in the file. Because of this, files containing | 352 | strings in the file. Because of this, files containing |
| 338 | very long runs of repeated symbols, like "aabaabaabaab | 353 | very long runs of repeated symbols, like "aabaabaabaab |
| 339 | ..." (repeated several hundred times) may compress | 354 | ..." (repeated several hundred times) may compress |
| @@ -348,10 +363,6 @@ bzip2(1) bzip2(1) | |||
| 348 | severe slowness in compression, try making the block size | 363 | severe slowness in compression, try making the block size |
| 349 | as small as possible, with flag -1. | 364 | as small as possible, with flag -1. |
| 350 | 365 | ||
| 351 | Incompressible or virtually-incompressible data may decom- | ||
| 352 | press rather more slowly than one would hope. This is due | ||
| 353 | to a naive implementation of the move-to-front coder. | ||
| 354 | |||
| 355 | _b_z_i_p_2 usually allocates several megabytes of memory to | 366 | _b_z_i_p_2 usually allocates several megabytes of memory to |
| 356 | operate in, and then charges all over it in a fairly ran- | 367 | operate in, and then charges all over it in a fairly ran- |
| 357 | dom fashion. This means that performance, both for com- | 368 | dom fashion. This means that performance, both for com- |
| @@ -362,12 +373,6 @@ bzip2(1) bzip2(1) | |||
| 362 | large performance improvements. I imagine _b_z_i_p_2 will per- | 373 | large performance improvements. I imagine _b_z_i_p_2 will per- |
| 363 | form best on machines with very large caches. | 374 | form best on machines with very large caches. |
| 364 | 375 | ||
| 365 | Test mode (-t) uses the low-memory decompression algorithm | ||
| 366 | (-s). This means test mode does not run as fast as it | ||
| 367 | could; it could run as fast as the normal decompression | ||
| 368 | machinery. This could easily be fixed at the cost of some | ||
| 369 | code bloat. | ||
| 370 | |||
| 371 | 376 | ||
| 372 | CCAAVVEEAATTSS | 377 | CCAAVVEEAATTSS |
| 373 | I/O error messages are not as helpful as they could be. | 378 | I/O error messages are not as helpful as they could be. |
| @@ -375,19 +380,14 @@ CCAAVVEEAATTSS | |||
| 375 | but the details of what the problem is sometimes seem | 380 | but the details of what the problem is sometimes seem |
| 376 | rather misleading. | 381 | rather misleading. |
| 377 | 382 | ||
| 378 | This manual page pertains to version 0.1 of _b_z_i_p_2_. It may | 383 | This manual page pertains to version 0.9.0 of _b_z_i_p_2_. Com- |
| 379 | well happen that some future version will use a different | 384 | pressed data created by this version is entirely forwards |
| 380 | compressed file format. If you try to decompress, using | 385 | and backwards compatible with the previous public release, |
| 381 | 0.1, a .bz2 file created with some future version which | 386 | version 0.1pl2, but with the following exception: 0.9.0 |
| 382 | uses a different compressed file format, 0.1 will complain | 387 | can correctly decompress multiple concatenated compressed |
| 383 | that your file "is not a bzip2 file". If that happens, | 388 | files. 0.1pl2 cannot do this; it will stop after decom- |
| 384 | you should obtain a more recent version of _b_z_i_p_2 and use | 389 | pressing just the first file in the stream. |
| 385 | that to decompress the file. | ||
| 386 | 390 | ||
| 387 | Wildcard expansion for Windows 95 and NT is flaky. | ||
| 388 | |||
| 389 | _b_z_i_p_2_r_e_c_o_v_e_r uses 32-bit integers to represent bit posi- | ||
| 390 | tions in compressed files, so it cannot handle compressed | ||
| 391 | 391 | ||
| 392 | 392 | ||
| 393 | 393 | ||
| @@ -400,61 +400,59 @@ CCAAVVEEAATTSS | |||
| 400 | bzip2(1) bzip2(1) | 400 | bzip2(1) bzip2(1) |
| 401 | 401 | ||
| 402 | 402 | ||
| 403 | files more than 512 megabytes long. This could easily be | 403 | Wildcard expansion for Windows 95 and NT is flaky. |
| 404 | |||
| 405 | _b_z_i_p_2_r_e_c_o_v_e_r uses 32-bit integers to represent bit posi- | ||
| 406 | tions in compressed files, so it cannot handle compressed | ||
| 407 | files more than 512 megabytes long. This could easily be | ||
| 404 | fixed. | 408 | fixed. |
| 405 | 409 | ||
| 406 | _b_z_i_p_2_r_e_c_o_v_e_r sometimes reports a very small, incomplete | 410 | |
| 407 | final block. This is spurious and can be safely ignored. | 411 | AAUUTTHHOORR |
| 412 | Julian Seward, jseward@acm.org. | ||
| 413 | http://www.muraroa.demon.co.uk | ||
| 414 | |||
| 415 | The ideas embodied in _b_z_i_p_2 are due to (at least) the fol- | ||
| 416 | lowing people: Michael Burrows and David Wheeler (for the | ||
| 417 | block sorting transformation), David Wheeler (again, for | ||
| 418 | the Huffman coder), Peter Fenwick (for the structured cod- | ||
| 419 | ing model in the original _b_z_i_p_, and many refinements), and | ||
| 420 | Alistair Moffat, Radford Neal and Ian Witten (for the | ||
| 421 | arithmetic coder in the original _b_z_i_p_)_. I am much | ||
| 422 | indebted for their help, support and advice. See the man- | ||
| 423 | ual in the source distribution for pointers to sources of | ||
| 424 | documentation. Christian von Roques encouraged me to look | ||
| 425 | for faster sorting algorithms, so as to speed up compres- | ||
| 426 | sion. Bela Lubkin encouraged me to improve the worst-case | ||
| 427 | compression performance. Many people sent patches, helped | ||
| 428 | with portability problems, lent machines, gave advice and | ||
| 429 | were generally helpful. | ||
| 430 | |||
| 431 | |||
| 432 | |||
| 433 | |||
| 434 | |||
| 435 | |||
| 436 | |||
| 437 | |||
| 438 | |||
| 439 | |||
| 440 | |||
| 441 | |||
| 442 | |||
| 443 | |||
| 444 | |||
| 445 | |||
| 446 | |||
| 447 | |||
| 408 | 448 | ||
| 409 | 449 | ||
| 410 | RREELLAATTIIOONNSSHHIIPP TTOO bbzziipp--00..2211 | ||
| 411 | This program is a descendant of the _b_z_i_p program, version | ||
| 412 | 0.21, which I released in August 1996. The primary dif- | ||
| 413 | ference of _b_z_i_p_2 is its avoidance of the possibly patented | ||
| 414 | algorithms which were used in 0.21. _b_z_i_p_2 also brings | ||
| 415 | various useful refinements (-s, -t), uses less memory, | ||
| 416 | decompresses significantly faster, and has support for | ||
| 417 | recovering data from damaged files. | ||
| 418 | 450 | ||
| 419 | Because _b_z_i_p_2 uses Huffman coding to construct the com- | ||
| 420 | pressed bitstream, rather than the arithmetic coding used | ||
| 421 | in 0.21, the compressed representations generated by the | ||
| 422 | two programs are incompatible, and they will not interop- | ||
| 423 | erate. The change in suffix from .bz to .bz2 reflects | ||
| 424 | this. It would have been helpful to at least allow _b_z_i_p_2 | ||
| 425 | to decompress files created by 0.21, but this would defeat | ||
| 426 | the primary aim of having a patent-free compressor. | ||
| 427 | 451 | ||
| 428 | For a more precise statement about patent issues in bzip2, | ||
| 429 | please see the README file in the distribution. | ||
| 430 | 452 | ||
| 431 | Huffman coding necessarily involves some coding ineffi- | ||
| 432 | ciency compared to arithmetic coding. This means that | ||
| 433 | _b_z_i_p_2 compresses about 1% worse than 0.21, an unfortunate | ||
| 434 | but unavoidable fact-of-life. On the other hand, decom- | ||
| 435 | pression is approximately 50% faster for the same reason, | ||
| 436 | and the change in file format gave an opportunity to add | ||
| 437 | data-recovery features. So it is not all bad. | ||
| 438 | 453 | ||
| 439 | 454 | ||
| 440 | AAUUTTHHOORR | ||
| 441 | Julian Seward, jseward@acm.org. | ||
| 442 | 455 | ||
| 443 | The ideas embodied in _b_z_i_p and _b_z_i_p_2 are due to (at least) | ||
| 444 | the following people: Michael Burrows and David Wheeler | ||
| 445 | (for the block sorting transformation), David Wheeler | ||
| 446 | (again, for the Huffman coder), Peter Fenwick (for the | ||
| 447 | structured coding model in 0.21, and many refinements), | ||
| 448 | and Alistair Moffat, Radford Neal and Ian Witten (for the | ||
| 449 | arithmetic coder in 0.21). I am much indebted for their | ||
| 450 | help, support and advice. See the file ALGORITHMS in the | ||
| 451 | source distribution for pointers to sources of documenta- | ||
| 452 | tion. Christian von Roques encouraged me to look for | ||
| 453 | faster sorting algorithms, so as to speed up compression. | ||
| 454 | Bela Lubkin encouraged me to improve the worst-case com- | ||
| 455 | pression performance. Many people sent patches, helped | ||
| 456 | with portability problems, lent machines, gave advice and | ||
| 457 | were generally helpful. | ||
| 458 | 456 | ||
| 459 | 457 | ||
| 460 | 458 | ||
| @@ -4,28 +4,45 @@ | |||
| 4 | /*-----------------------------------------------------------*/ | 4 | /*-----------------------------------------------------------*/ |
| 5 | 5 | ||
| 6 | /*-- | 6 | /*-- |
| 7 | This program is bzip2, a lossless, block-sorting data compressor, | 7 | This file is a part of bzip2 and/or libbzip2, a program and |
| 8 | version 0.1pl2, dated 29-Aug-1997. | 8 | library for lossless, block-sorting data compression. |
| 9 | 9 | ||
| 10 | Copyright (C) 1996, 1997 by Julian Seward. | 10 | Copyright (C) 1996-1998 Julian R Seward. All rights reserved. |
| 11 | Guildford, Surrey, UK | 11 | |
| 12 | email: jseward@acm.org | 12 | Redistribution and use in source and binary forms, with or without |
| 13 | 13 | modification, are permitted provided that the following conditions | |
| 14 | This program is free software; you can redistribute it and/or modify | 14 | are met: |
| 15 | it under the terms of the GNU General Public License as published by | 15 | |
| 16 | the Free Software Foundation; either version 2 of the License, or | 16 | 1. Redistributions of source code must retain the above copyright |
| 17 | (at your option) any later version. | 17 | notice, this list of conditions and the following disclaimer. |
| 18 | 18 | ||
| 19 | This program is distributed in the hope that it will be useful, | 19 | 2. The origin of this software must not be misrepresented; you must |
| 20 | but WITHOUT ANY WARRANTY; without even the implied warranty of | 20 | not claim that you wrote the original software. If you use this |
| 21 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | 21 | software in a product, an acknowledgment in the product |
| 22 | GNU General Public License for more details. | 22 | documentation would be appreciated but is not required. |
| 23 | 23 | ||
| 24 | You should have received a copy of the GNU General Public License | 24 | 3. Altered source versions must be plainly marked as such, and must |
| 25 | along with this program; if not, write to the Free Software | 25 | not be misrepresented as being the original software. |
| 26 | Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. | 26 | |
| 27 | 27 | 4. The name of the author may not be used to endorse or promote | |
| 28 | The GNU General Public License is contained in the file LICENSE. | 28 | products derived from this software without specific prior written |
| 29 | permission. | ||
| 30 | |||
| 31 | THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS | ||
| 32 | OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED | ||
| 33 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE | ||
| 34 | ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY | ||
| 35 | DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL | ||
| 36 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE | ||
| 37 | GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS | ||
| 38 | INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, | ||
| 39 | WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING | ||
| 40 | NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS | ||
| 41 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | ||
| 42 | |||
| 43 | Julian Seward, Guildford, Surrey, UK. | ||
| 44 | jseward@acm.org | ||
| 45 | bzip2/libbzip2 version 0.9.0c of 18 October 1998 | ||
| 29 | 46 | ||
| 30 | This program is based on (at least) the work of: | 47 | This program is based on (at least) the work of: |
| 31 | Mike Burrows | 48 | Mike Burrows |
| @@ -37,21 +54,23 @@ | |||
| 37 | Robert Sedgewick | 54 | Robert Sedgewick |
| 38 | Jon L. Bentley | 55 | Jon L. Bentley |
| 39 | 56 | ||
| 40 | For more information on these sources, see the file ALGORITHMS. | 57 | For more information on these sources, see the manual. |
| 41 | --*/ | 58 | --*/ |
| 42 | 59 | ||
| 60 | |||
| 43 | /*----------------------------------------------------*/ | 61 | /*----------------------------------------------------*/ |
| 44 | /*--- IMPORTANT ---*/ | 62 | /*--- IMPORTANT ---*/ |
| 45 | /*----------------------------------------------------*/ | 63 | /*----------------------------------------------------*/ |
| 46 | 64 | ||
| 47 | /*-- | 65 | /*-- |
| 48 | WARNING: | 66 | WARNING: |
| 49 | This program (attempts to) compress data by performing several | 67 | This program and library (attempts to) compress data by |
| 50 | non-trivial transformations on it. Unless you are 100% familiar | 68 | performing several non-trivial transformations on it. |
| 51 | with *all* the algorithms contained herein, and with the | 69 | Unless you are 100% familiar with *all* the algorithms |
| 52 | consequences of modifying them, you should NOT meddle with the | 70 | contained herein, and with the consequences of modifying them, |
| 53 | compression or decompression machinery. Incorrect changes can | 71 | you should NOT meddle with the compression or decompression |
| 54 | and very likely *will* lead to disasterous loss of data. | 72 | machinery. Incorrect changes can and very likely *will* |
| 73 | lead to disasterous loss of data. | ||
| 55 | 74 | ||
| 56 | DISCLAIMER: | 75 | DISCLAIMER: |
| 57 | I TAKE NO RESPONSIBILITY FOR ANY LOSS OF DATA ARISING FROM THE | 76 | I TAKE NO RESPONSIBILITY FOR ANY LOSS OF DATA ARISING FROM THE |
| @@ -65,18 +84,19 @@ | |||
| 65 | of various special cases in the code which occur with very low | 84 | of various special cases in the code which occur with very low |
| 66 | but non-zero probability make it impossible to rule out the | 85 | but non-zero probability make it impossible to rule out the |
| 67 | possibility of bugs remaining in the program. DO NOT COMPRESS | 86 | possibility of bugs remaining in the program. DO NOT COMPRESS |
| 68 | ANY DATA WITH THIS PROGRAM UNLESS YOU ARE PREPARED TO ACCEPT THE | 87 | ANY DATA WITH THIS PROGRAM AND/OR LIBRARY UNLESS YOU ARE PREPARED |
| 69 | POSSIBILITY, HOWEVER SMALL, THAT THE DATA WILL NOT BE RECOVERABLE. | 88 | TO ACCEPT THE POSSIBILITY, HOWEVER SMALL, THAT THE DATA WILL |
| 89 | NOT BE RECOVERABLE. | ||
| 70 | 90 | ||
| 71 | That is not to say this program is inherently unreliable. | 91 | That is not to say this program is inherently unreliable. |
| 72 | Indeed, I very much hope the opposite is true. bzip2 has been | 92 | Indeed, I very much hope the opposite is true. bzip2/libbzip2 |
| 73 | carefully constructed and extensively tested. | 93 | has been carefully constructed and extensively tested. |
| 74 | 94 | ||
| 75 | PATENTS: | 95 | PATENTS: |
| 76 | To the best of my knowledge, bzip2 does not use any patented | 96 | To the best of my knowledge, bzip2/libbzip2 does not use any |
| 77 | algorithms. However, I do not have the resources available to | 97 | patented algorithms. However, I do not have the resources |
| 78 | carry out a full patent search. Therefore I cannot give any | 98 | available to carry out a full patent search. Therefore I cannot |
| 79 | guarantee of the above statement. | 99 | give any guarantee of the above statement. |
| 80 | --*/ | 100 | --*/ |
| 81 | 101 | ||
| 82 | 102 | ||
| @@ -103,6 +123,10 @@ | |||
| 103 | --*/ | 123 | --*/ |
| 104 | #define BZ_LCCWIN32 0 | 124 | #define BZ_LCCWIN32 0 |
| 105 | 125 | ||
| 126 | #ifdef _WIN32 | ||
| 127 | #define BZ_LCCWIN32 1 | ||
| 128 | #define BZ_UNIX 0 | ||
| 129 | #endif | ||
| 106 | 130 | ||
| 107 | 131 | ||
| 108 | /*---------------------------------------------*/ | 132 | /*---------------------------------------------*/ |
| @@ -112,12 +136,10 @@ | |||
| 112 | 136 | ||
| 113 | #include <stdio.h> | 137 | #include <stdio.h> |
| 114 | #include <stdlib.h> | 138 | #include <stdlib.h> |
| 115 | #if DEBUG | ||
| 116 | #include <assert.h> | ||
| 117 | #endif | ||
| 118 | #include <string.h> | 139 | #include <string.h> |
| 119 | #include <signal.h> | 140 | #include <signal.h> |
| 120 | #include <math.h> | 141 | #include <math.h> |
| 142 | #include "bzlib.h" | ||
| 121 | 143 | ||
| 122 | #define ERROR_IF_EOF(i) { if ((i) == EOF) ioError(); } | 144 | #define ERROR_IF_EOF(i) { if ((i) == EOF) ioError(); } |
| 123 | #define ERROR_IF_NOT_ZERO(i) { if ((i) != 0) ioError(); } | 145 | #define ERROR_IF_NOT_ZERO(i) { if ((i) != 0) ioError(); } |
| @@ -130,68 +152,45 @@ | |||
| 130 | --*/ | 152 | --*/ |
| 131 | 153 | ||
| 132 | #if BZ_UNIX | 154 | #if BZ_UNIX |
| 133 | #include <sys/types.h> | 155 | # include <sys/types.h> |
| 134 | #include <utime.h> | 156 | # include <utime.h> |
| 135 | #include <unistd.h> | 157 | # include <unistd.h> |
| 136 | #include <malloc.h> | 158 | # include <sys/stat.h> |
| 137 | #include <sys/stat.h> | 159 | # include <sys/times.h> |
| 138 | #include <sys/times.h> | 160 | |
| 139 | 161 | # define PATH_SEP '/' | |
| 140 | #define Int32 int | 162 | # define MY_LSTAT lstat |
| 141 | #define UInt32 unsigned int | 163 | # define MY_S_IFREG S_ISREG |
| 142 | #define Char char | 164 | # define MY_STAT stat |
| 143 | #define UChar unsigned char | 165 | |
| 144 | #define Int16 short | 166 | # define APPEND_FILESPEC(root, name) \ |
| 145 | #define UInt16 unsigned short | ||
| 146 | |||
| 147 | #define PATH_SEP '/' | ||
| 148 | #define MY_LSTAT lstat | ||
| 149 | #define MY_S_IFREG S_ISREG | ||
| 150 | #define MY_STAT stat | ||
| 151 | |||
| 152 | #define APPEND_FILESPEC(root, name) \ | ||
| 153 | root=snocString((root), (name)) | 167 | root=snocString((root), (name)) |
| 154 | 168 | ||
| 155 | #define SET_BINARY_MODE(fd) /**/ | 169 | # define SET_BINARY_MODE(fd) /**/ |
| 156 | 170 | ||
| 157 | /*-- | 171 | # ifdef __GNUC__ |
| 158 | You should try very hard to persuade your C compiler | 172 | # define NORETURN __attribute__ ((noreturn)) |
| 159 | to inline the bits marked INLINE. Otherwise bzip2 will | 173 | # else |
| 160 | run rather slowly. gcc version 2.x is recommended. | 174 | # define NORETURN /**/ |
| 161 | --*/ | 175 | # endif |
| 162 | #ifdef __GNUC__ | ||
| 163 | #define INLINE inline | ||
| 164 | #define NORETURN __attribute__ ((noreturn)) | ||
| 165 | #else | ||
| 166 | #define INLINE /**/ | ||
| 167 | #define NORETURN /**/ | ||
| 168 | #endif | ||
| 169 | #endif | 176 | #endif |
| 170 | 177 | ||
| 171 | 178 | ||
| 172 | 179 | ||
| 173 | #if BZ_LCCWIN32 | 180 | #if BZ_LCCWIN32 |
| 174 | #include <io.h> | 181 | # include <io.h> |
| 175 | #include <fcntl.h> | 182 | # include <fcntl.h> |
| 176 | #include <sys\stat.h> | 183 | # include <sys\stat.h> |
| 177 | 184 | ||
| 178 | #define Int32 int | 185 | # define NORETURN /**/ |
| 179 | #define UInt32 unsigned int | 186 | # define PATH_SEP '\\' |
| 180 | #define Int16 short | 187 | # define MY_LSTAT _stat |
| 181 | #define UInt16 unsigned short | 188 | # define MY_STAT _stat |
| 182 | #define Char char | 189 | # define MY_S_IFREG(x) ((x) & _S_IFREG) |
| 183 | #define UChar unsigned char | 190 | |
| 184 | 191 | # if 0 | |
| 185 | #define INLINE /**/ | ||
| 186 | #define NORETURN /**/ | ||
| 187 | #define PATH_SEP '\\' | ||
| 188 | #define MY_LSTAT _stat | ||
| 189 | #define MY_STAT _stat | ||
| 190 | #define MY_S_IFREG(x) ((x) & _S_IFREG) | ||
| 191 | |||
| 192 | #if 0 | ||
| 193 | /*-- lcc-win32 seems to expand wildcards itself --*/ | 192 | /*-- lcc-win32 seems to expand wildcards itself --*/ |
| 194 | #define APPEND_FILESPEC(root, spec) \ | 193 | # define APPEND_FILESPEC(root, spec) \ |
| 195 | do { \ | 194 | do { \ |
| 196 | if ((spec)[0] == '-') { \ | 195 | if ((spec)[0] == '-') { \ |
| 197 | root = snocString((root), (spec)); \ | 196 | root = snocString((root), (spec)); \ |
| @@ -211,12 +210,12 @@ | |||
| 211 | } \ | 210 | } \ |
| 212 | } \ | 211 | } \ |
| 213 | } while ( 0 ) | 212 | } while ( 0 ) |
| 214 | #else | 213 | # else |
| 215 | #define APPEND_FILESPEC(root, name) \ | 214 | # define APPEND_FILESPEC(root, name) \ |
| 216 | root = snocString ((root), (name)) | 215 | root = snocString ((root), (name)) |
| 217 | #endif | 216 | # endif |
| 218 | 217 | ||
| 219 | #define SET_BINARY_MODE(fd) \ | 218 | # define SET_BINARY_MODE(fd) \ |
| 220 | do { \ | 219 | do { \ |
| 221 | int retVal = setmode ( fileno ( fd ), \ | 220 | int retVal = setmode ( fileno ( fd ), \ |
| 222 | O_BINARY ); \ | 221 | O_BINARY ); \ |
| @@ -231,111 +230,32 @@ | |||
| 231 | Some more stuff for all platforms :-) | 230 | Some more stuff for all platforms :-) |
| 232 | --*/ | 231 | --*/ |
| 233 | 232 | ||
| 234 | #define Bool unsigned char | 233 | typedef char Char; |
| 235 | #define True 1 | 234 | typedef unsigned char Bool; |
| 236 | #define False 0 | 235 | typedef unsigned char UChar; |
| 236 | typedef int Int32; | ||
| 237 | typedef unsigned int UInt32; | ||
| 238 | typedef short Int16; | ||
| 239 | typedef unsigned short UInt16; | ||
| 240 | |||
| 241 | #define True ((Bool)1) | ||
| 242 | #define False ((Bool)0) | ||
| 237 | 243 | ||
| 238 | /*-- | 244 | /*-- |
| 239 | IntNative is your platform's `native' int size. | 245 | IntNative is your platform's `native' int size. |
| 240 | Only here to avoid probs with 64-bit platforms. | 246 | Only here to avoid probs with 64-bit platforms. |
| 241 | --*/ | 247 | --*/ |
| 242 | #define IntNative int | 248 | typedef int IntNative; |
| 243 | |||
| 244 | |||
| 245 | /*-- | ||
| 246 | change to 1, or compile with -DDEBUG=1 to debug | ||
| 247 | --*/ | ||
| 248 | #ifndef DEBUG | ||
| 249 | #define DEBUG 0 | ||
| 250 | #endif | ||
| 251 | |||
| 252 | |||
| 253 | /*---------------------------------------------------*/ | ||
| 254 | /*--- ---*/ | ||
| 255 | /*---------------------------------------------------*/ | ||
| 256 | |||
| 257 | /*-- | ||
| 258 | Implementation notes, July 1997 | ||
| 259 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
| 260 | |||
| 261 | Memory allocation | ||
| 262 | ~~~~~~~~~~~~~~~~~ | ||
| 263 | All large data structures are allocated on the C heap, | ||
| 264 | for better or for worse. That includes the various | ||
| 265 | arrays of pointers, striped words, bytes, frequency | ||
| 266 | tables and buffers for compression and decompression. | ||
| 267 | |||
| 268 | bzip2 can operate at various block-sizes, ranging from | ||
| 269 | 100k to 900k in 100k steps, and it allocates only as | ||
| 270 | much as it needs to. When compressing, we know from the | ||
| 271 | command-line options what the block-size is going to be, | ||
| 272 | so all allocation can be done at start-up; if that | ||
| 273 | succeeds, there can be no further allocation problems. | ||
| 274 | |||
| 275 | Decompression is more complicated. Each compressed file | ||
| 276 | contains, in its header, a byte indicating the block | ||
| 277 | size used for compression. This means bzip2 potentially | ||
| 278 | needs to reallocate memory for each file it deals with, | ||
| 279 | which in turn opens the possibility for a memory allocation | ||
| 280 | failure part way through a run of files, by encountering | ||
| 281 | a file requiring a much larger block size than all the | ||
| 282 | ones preceding it. | ||
| 283 | |||
| 284 | The policy is to simply give up if a memory allocation | ||
| 285 | failure occurs. During decompression, it would be | ||
| 286 | possible to move on to subsequent files in the hope that | ||
| 287 | some might ask for a smaller block size, but the | ||
| 288 | complications for doing this seem more trouble than they | ||
| 289 | are worth. | ||
| 290 | |||
| 291 | |||
| 292 | Compressed file formats | ||
| 293 | ~~~~~~~~~~~~~~~~~~~~~~~ | ||
| 294 | [This is now entirely different from both 0.21, and from | ||
| 295 | any previous Huffman-coded variant of bzip. | ||
| 296 | See the associated file bzip2.txt for details.] | ||
| 297 | |||
| 298 | |||
| 299 | Error conditions | ||
| 300 | ~~~~~~~~~~~~~~~~ | ||
| 301 | Dealing with error conditions is the least satisfactory | ||
| 302 | aspect of bzip2. The policy is to try and leave the | ||
| 303 | filesystem in a consistent state, then quit, even if it | ||
| 304 | means not processing some of the files mentioned in the | ||
| 305 | command line. `A consistent state' means that a file | ||
| 306 | exists either in its compressed or uncompressed form, | ||
| 307 | but not both. This boils down to the rule `delete the | ||
| 308 | output file if an error condition occurs, leaving the | ||
| 309 | input intact'. Input files are only deleted when we can | ||
| 310 | be pretty sure the output file has been written and | ||
| 311 | closed successfully. | ||
| 312 | |||
| 313 | Errors are a dog because there's so many things to | ||
| 314 | deal with. The following can happen mid-file, and | ||
| 315 | require cleaning up. | ||
| 316 | |||
| 317 | internal `panics' -- indicating a bug | ||
| 318 | corrupted or inconsistent compressed file | ||
| 319 | can't allocate enough memory to decompress this file | ||
| 320 | I/O error reading/writing/opening/closing | ||
| 321 | signal catches -- Control-C, SIGTERM, SIGHUP. | ||
| 322 | |||
| 323 | Other conditions, primarily pertaining to file names, | ||
| 324 | can be checked in-between files, which makes dealing | ||
| 325 | with them easier. | ||
| 326 | --*/ | ||
| 327 | |||
| 328 | 249 | ||
| 329 | 250 | ||
| 330 | /*---------------------------------------------------*/ | 251 | /*---------------------------------------------------*/ |
| 331 | /*--- Misc (file handling) data decls ---*/ | 252 | /*--- Misc (file handling) data decls ---*/ |
| 332 | /*---------------------------------------------------*/ | 253 | /*---------------------------------------------------*/ |
| 333 | 254 | ||
| 334 | UInt32 bytesIn, bytesOut; | ||
| 335 | Int32 verbosity; | 255 | Int32 verbosity; |
| 336 | Bool keepInputFiles, smallMode, testFailsExist; | 256 | Bool keepInputFiles, smallMode; |
| 337 | UInt32 globalCrc; | 257 | Bool forceOverwrite, testFailsExist; |
| 338 | Int32 numFileNames, numFilesProcessed; | 258 | Int32 numFileNames, numFilesProcessed, blockSize100k; |
| 339 | 259 | ||
| 340 | 260 | ||
| 341 | /*-- source modes; F==file, I==stdin, O==stdout --*/ | 261 | /*-- source modes; F==file, I==stdin, O==stdout --*/ |
| @@ -351,2691 +271,304 @@ Int32 numFileNames, numFilesProcessed; | |||
| 351 | Int32 opMode; | 271 | Int32 opMode; |
| 352 | Int32 srcMode; | 272 | Int32 srcMode; |
| 353 | 273 | ||
| 274 | #define FILE_NAME_LEN 1034 | ||
| 354 | 275 | ||
| 355 | Int32 longestFileName; | 276 | Int32 longestFileName; |
| 356 | Char inName[1024]; | 277 | Char inName[FILE_NAME_LEN]; |
| 357 | Char outName[1024]; | 278 | Char outName[FILE_NAME_LEN]; |
| 358 | Char *progName; | 279 | Char *progName; |
| 359 | Char progNameReally[1024]; | 280 | Char progNameReally[FILE_NAME_LEN]; |
| 360 | FILE *outputHandleJustInCase; | 281 | FILE *outputHandleJustInCase; |
| 361 | 282 | Int32 workFactor; | |
| 362 | void panic ( Char* ) NORETURN; | 283 | |
| 363 | void ioError ( void ) NORETURN; | 284 | void panic ( Char* ) NORETURN; |
| 364 | void compressOutOfMemory ( Int32, Int32 ) NORETURN; | 285 | void ioError ( void ) NORETURN; |
| 365 | void uncompressOutOfMemory ( Int32, Int32 ) NORETURN; | 286 | void outOfMemory ( void ) NORETURN; |
| 366 | void blockOverrun ( void ) NORETURN; | 287 | void blockOverrun ( void ) NORETURN; |
| 367 | void badBlockHeader ( void ) NORETURN; | 288 | void badBlockHeader ( void ) NORETURN; |
| 368 | void badBGLengths ( void ) NORETURN; | 289 | void badBGLengths ( void ) NORETURN; |
| 369 | void crcError ( UInt32, UInt32 ) NORETURN; | 290 | void crcError ( void ) NORETURN; |
| 370 | void bitStreamEOF ( void ) NORETURN; | 291 | void bitStreamEOF ( void ) NORETURN; |
| 371 | void cleanUpAndFail ( Int32 ) NORETURN; | 292 | void cleanUpAndFail ( Int32 ) NORETURN; |
| 372 | void compressedStreamEOF ( void ) NORETURN; | 293 | void compressedStreamEOF ( void ) NORETURN; |
| 373 | 294 | ||
| 295 | void copyFileName ( Char*, Char* ); | ||
| 374 | void* myMalloc ( Int32 ); | 296 | void* myMalloc ( Int32 ); |
| 375 | 297 | ||
| 376 | 298 | ||
| 377 | 299 | ||
| 378 | /*---------------------------------------------------*/ | 300 | /*---------------------------------------------------*/ |
| 379 | /*--- Data decls for the front end ---*/ | 301 | /*--- Processing of complete files and streams ---*/ |
| 380 | /*---------------------------------------------------*/ | ||
| 381 | |||
| 382 | /*-- | ||
| 383 | The overshoot bytes allow us to avoid most of | ||
| 384 | the cost of pointer renormalisation during | ||
| 385 | comparison of rotations in sorting. | ||
| 386 | The figure of 20 is derived as follows: | ||
| 387 | qSort3 allows an overshoot of up to 10. | ||
| 388 | It then calls simpleSort, which calls | ||
| 389 | fullGtU, also with max overshoot 10. | ||
| 390 | fullGtU does up to 10 comparisons without | ||
| 391 | renormalising, giving 10+10 == 20. | ||
| 392 | --*/ | ||
| 393 | #define NUM_OVERSHOOT_BYTES 20 | ||
| 394 | |||
| 395 | /*-- | ||
| 396 | These are the main data structures for | ||
| 397 | the Burrows-Wheeler transform. | ||
| 398 | --*/ | ||
| 399 | |||
| 400 | /*-- | ||
| 401 | Pointers to compression and decompression | ||
| 402 | structures. Set by | ||
| 403 | allocateCompressStructures and | ||
| 404 | setDecompressStructureSizes | ||
| 405 | |||
| 406 | The structures are always set to be suitable | ||
| 407 | for a block of size 100000 * blockSize100k. | ||
| 408 | --*/ | ||
| 409 | UChar *block; /*-- compress --*/ | ||
| 410 | UInt16 *quadrant; /*-- compress --*/ | ||
| 411 | Int32 *zptr; /*-- compress --*/ | ||
| 412 | UInt16 *szptr; /*-- overlays zptr ---*/ | ||
| 413 | Int32 *ftab; /*-- compress --*/ | ||
| 414 | |||
| 415 | UInt16 *ll16; /*-- small decompress --*/ | ||
| 416 | UChar *ll4; /*-- small decompress --*/ | ||
| 417 | |||
| 418 | Int32 *tt; /*-- fast decompress --*/ | ||
| 419 | UChar *ll8; /*-- fast decompress --*/ | ||
| 420 | |||
| 421 | |||
| 422 | /*-- | ||
| 423 | freq table collected to save a pass over the data | ||
| 424 | during decompression. | ||
| 425 | --*/ | ||
| 426 | Int32 unzftab[256]; | ||
| 427 | |||
| 428 | |||
| 429 | /*-- | ||
| 430 | index of the last char in the block, so | ||
| 431 | the block size == last + 1. | ||
| 432 | --*/ | ||
| 433 | Int32 last; | ||
| 434 | |||
| 435 | |||
| 436 | /*-- | ||
| 437 | index in zptr[] of original string after sorting. | ||
| 438 | --*/ | ||
| 439 | Int32 origPtr; | ||
| 440 | |||
| 441 | |||
| 442 | /*-- | ||
| 443 | always: in the range 0 .. 9. | ||
| 444 | The current block size is 100000 * this number. | ||
| 445 | --*/ | ||
| 446 | Int32 blockSize100k; | ||
| 447 | |||
| 448 | |||
| 449 | /*-- | ||
| 450 | Used when sorting. If too many long comparisons | ||
| 451 | happen, we stop sorting, randomise the block | ||
| 452 | slightly, and try again. | ||
| 453 | --*/ | ||
| 454 | |||
| 455 | Int32 workFactor; | ||
| 456 | Int32 workDone; | ||
| 457 | Int32 workLimit; | ||
| 458 | Bool blockRandomised; | ||
| 459 | Bool firstAttempt; | ||
| 460 | Int32 nBlocksRandomised; | ||
| 461 | |||
| 462 | |||
| 463 | |||
| 464 | /*---------------------------------------------------*/ | ||
| 465 | /*--- Data decls for the back end ---*/ | ||
| 466 | /*---------------------------------------------------*/ | ||
| 467 | |||
| 468 | #define MAX_ALPHA_SIZE 258 | ||
| 469 | #define MAX_CODE_LEN 23 | ||
| 470 | |||
| 471 | #define RUNA 0 | ||
| 472 | #define RUNB 1 | ||
| 473 | |||
| 474 | #define N_GROUPS 6 | ||
| 475 | #define G_SIZE 50 | ||
| 476 | #define N_ITERS 4 | ||
| 477 | |||
| 478 | #define MAX_SELECTORS (2 + (900000 / G_SIZE)) | ||
| 479 | |||
| 480 | Bool inUse[256]; | ||
| 481 | Int32 nInUse; | ||
| 482 | |||
| 483 | UChar seqToUnseq[256]; | ||
| 484 | UChar unseqToSeq[256]; | ||
| 485 | |||
| 486 | UChar selector [MAX_SELECTORS]; | ||
| 487 | UChar selectorMtf[MAX_SELECTORS]; | ||
| 488 | |||
| 489 | Int32 nMTF; | ||
| 490 | |||
| 491 | Int32 mtfFreq[MAX_ALPHA_SIZE]; | ||
| 492 | |||
| 493 | UChar len [N_GROUPS][MAX_ALPHA_SIZE]; | ||
| 494 | |||
| 495 | /*-- decompress only --*/ | ||
| 496 | Int32 limit [N_GROUPS][MAX_ALPHA_SIZE]; | ||
| 497 | Int32 base [N_GROUPS][MAX_ALPHA_SIZE]; | ||
| 498 | Int32 perm [N_GROUPS][MAX_ALPHA_SIZE]; | ||
| 499 | Int32 minLens[N_GROUPS]; | ||
| 500 | |||
| 501 | /*-- compress only --*/ | ||
| 502 | Int32 code [N_GROUPS][MAX_ALPHA_SIZE]; | ||
| 503 | Int32 rfreq[N_GROUPS][MAX_ALPHA_SIZE]; | ||
| 504 | |||
| 505 | |||
| 506 | /*---------------------------------------------------*/ | ||
| 507 | /*--- 32-bit CRC grunge ---*/ | ||
| 508 | /*---------------------------------------------------*/ | ||
| 509 | |||
| 510 | /*-- | ||
| 511 | I think this is an implementation of the AUTODIN-II, | ||
| 512 | Ethernet & FDDI 32-bit CRC standard. Vaguely derived | ||
| 513 | from code by Rob Warnock, in Section 51 of the | ||
| 514 | comp.compression FAQ. | ||
| 515 | --*/ | ||
| 516 | |||
| 517 | UInt32 crc32Table[256] = { | ||
| 518 | |||
| 519 | /*-- Ugly, innit? --*/ | ||
| 520 | |||
| 521 | 0x00000000UL, 0x04c11db7UL, 0x09823b6eUL, 0x0d4326d9UL, | ||
| 522 | 0x130476dcUL, 0x17c56b6bUL, 0x1a864db2UL, 0x1e475005UL, | ||
| 523 | 0x2608edb8UL, 0x22c9f00fUL, 0x2f8ad6d6UL, 0x2b4bcb61UL, | ||
| 524 | 0x350c9b64UL, 0x31cd86d3UL, 0x3c8ea00aUL, 0x384fbdbdUL, | ||
| 525 | 0x4c11db70UL, 0x48d0c6c7UL, 0x4593e01eUL, 0x4152fda9UL, | ||
| 526 | 0x5f15adacUL, 0x5bd4b01bUL, 0x569796c2UL, 0x52568b75UL, | ||
| 527 | 0x6a1936c8UL, 0x6ed82b7fUL, 0x639b0da6UL, 0x675a1011UL, | ||
| 528 | 0x791d4014UL, 0x7ddc5da3UL, 0x709f7b7aUL, 0x745e66cdUL, | ||
| 529 | 0x9823b6e0UL, 0x9ce2ab57UL, 0x91a18d8eUL, 0x95609039UL, | ||
| 530 | 0x8b27c03cUL, 0x8fe6dd8bUL, 0x82a5fb52UL, 0x8664e6e5UL, | ||
| 531 | 0xbe2b5b58UL, 0xbaea46efUL, 0xb7a96036UL, 0xb3687d81UL, | ||
| 532 | 0xad2f2d84UL, 0xa9ee3033UL, 0xa4ad16eaUL, 0xa06c0b5dUL, | ||
| 533 | 0xd4326d90UL, 0xd0f37027UL, 0xddb056feUL, 0xd9714b49UL, | ||
| 534 | 0xc7361b4cUL, 0xc3f706fbUL, 0xceb42022UL, 0xca753d95UL, | ||
| 535 | 0xf23a8028UL, 0xf6fb9d9fUL, 0xfbb8bb46UL, 0xff79a6f1UL, | ||
| 536 | 0xe13ef6f4UL, 0xe5ffeb43UL, 0xe8bccd9aUL, 0xec7dd02dUL, | ||
| 537 | 0x34867077UL, 0x30476dc0UL, 0x3d044b19UL, 0x39c556aeUL, | ||
| 538 | 0x278206abUL, 0x23431b1cUL, 0x2e003dc5UL, 0x2ac12072UL, | ||
| 539 | 0x128e9dcfUL, 0x164f8078UL, 0x1b0ca6a1UL, 0x1fcdbb16UL, | ||
| 540 | 0x018aeb13UL, 0x054bf6a4UL, 0x0808d07dUL, 0x0cc9cdcaUL, | ||
| 541 | 0x7897ab07UL, 0x7c56b6b0UL, 0x71159069UL, 0x75d48ddeUL, | ||
| 542 | 0x6b93dddbUL, 0x6f52c06cUL, 0x6211e6b5UL, 0x66d0fb02UL, | ||
| 543 | 0x5e9f46bfUL, 0x5a5e5b08UL, 0x571d7dd1UL, 0x53dc6066UL, | ||
| 544 | 0x4d9b3063UL, 0x495a2dd4UL, 0x44190b0dUL, 0x40d816baUL, | ||
| 545 | 0xaca5c697UL, 0xa864db20UL, 0xa527fdf9UL, 0xa1e6e04eUL, | ||
| 546 | 0xbfa1b04bUL, 0xbb60adfcUL, 0xb6238b25UL, 0xb2e29692UL, | ||
| 547 | 0x8aad2b2fUL, 0x8e6c3698UL, 0x832f1041UL, 0x87ee0df6UL, | ||
| 548 | 0x99a95df3UL, 0x9d684044UL, 0x902b669dUL, 0x94ea7b2aUL, | ||
| 549 | 0xe0b41de7UL, 0xe4750050UL, 0xe9362689UL, 0xedf73b3eUL, | ||
| 550 | 0xf3b06b3bUL, 0xf771768cUL, 0xfa325055UL, 0xfef34de2UL, | ||
| 551 | 0xc6bcf05fUL, 0xc27dede8UL, 0xcf3ecb31UL, 0xcbffd686UL, | ||
| 552 | 0xd5b88683UL, 0xd1799b34UL, 0xdc3abdedUL, 0xd8fba05aUL, | ||
| 553 | 0x690ce0eeUL, 0x6dcdfd59UL, 0x608edb80UL, 0x644fc637UL, | ||
| 554 | 0x7a089632UL, 0x7ec98b85UL, 0x738aad5cUL, 0x774bb0ebUL, | ||
| 555 | 0x4f040d56UL, 0x4bc510e1UL, 0x46863638UL, 0x42472b8fUL, | ||
| 556 | 0x5c007b8aUL, 0x58c1663dUL, 0x558240e4UL, 0x51435d53UL, | ||
| 557 | 0x251d3b9eUL, 0x21dc2629UL, 0x2c9f00f0UL, 0x285e1d47UL, | ||
| 558 | 0x36194d42UL, 0x32d850f5UL, 0x3f9b762cUL, 0x3b5a6b9bUL, | ||
| 559 | 0x0315d626UL, 0x07d4cb91UL, 0x0a97ed48UL, 0x0e56f0ffUL, | ||
| 560 | 0x1011a0faUL, 0x14d0bd4dUL, 0x19939b94UL, 0x1d528623UL, | ||
| 561 | 0xf12f560eUL, 0xf5ee4bb9UL, 0xf8ad6d60UL, 0xfc6c70d7UL, | ||
| 562 | 0xe22b20d2UL, 0xe6ea3d65UL, 0xeba91bbcUL, 0xef68060bUL, | ||
| 563 | 0xd727bbb6UL, 0xd3e6a601UL, 0xdea580d8UL, 0xda649d6fUL, | ||
| 564 | 0xc423cd6aUL, 0xc0e2d0ddUL, 0xcda1f604UL, 0xc960ebb3UL, | ||
| 565 | 0xbd3e8d7eUL, 0xb9ff90c9UL, 0xb4bcb610UL, 0xb07daba7UL, | ||
| 566 | 0xae3afba2UL, 0xaafbe615UL, 0xa7b8c0ccUL, 0xa379dd7bUL, | ||
| 567 | 0x9b3660c6UL, 0x9ff77d71UL, 0x92b45ba8UL, 0x9675461fUL, | ||
| 568 | 0x8832161aUL, 0x8cf30badUL, 0x81b02d74UL, 0x857130c3UL, | ||
| 569 | 0x5d8a9099UL, 0x594b8d2eUL, 0x5408abf7UL, 0x50c9b640UL, | ||
| 570 | 0x4e8ee645UL, 0x4a4ffbf2UL, 0x470cdd2bUL, 0x43cdc09cUL, | ||
| 571 | 0x7b827d21UL, 0x7f436096UL, 0x7200464fUL, 0x76c15bf8UL, | ||
| 572 | 0x68860bfdUL, 0x6c47164aUL, 0x61043093UL, 0x65c52d24UL, | ||
| 573 | 0x119b4be9UL, 0x155a565eUL, 0x18197087UL, 0x1cd86d30UL, | ||
| 574 | 0x029f3d35UL, 0x065e2082UL, 0x0b1d065bUL, 0x0fdc1becUL, | ||
| 575 | 0x3793a651UL, 0x3352bbe6UL, 0x3e119d3fUL, 0x3ad08088UL, | ||
| 576 | 0x2497d08dUL, 0x2056cd3aUL, 0x2d15ebe3UL, 0x29d4f654UL, | ||
| 577 | 0xc5a92679UL, 0xc1683bceUL, 0xcc2b1d17UL, 0xc8ea00a0UL, | ||
| 578 | 0xd6ad50a5UL, 0xd26c4d12UL, 0xdf2f6bcbUL, 0xdbee767cUL, | ||
| 579 | 0xe3a1cbc1UL, 0xe760d676UL, 0xea23f0afUL, 0xeee2ed18UL, | ||
| 580 | 0xf0a5bd1dUL, 0xf464a0aaUL, 0xf9278673UL, 0xfde69bc4UL, | ||
| 581 | 0x89b8fd09UL, 0x8d79e0beUL, 0x803ac667UL, 0x84fbdbd0UL, | ||
| 582 | 0x9abc8bd5UL, 0x9e7d9662UL, 0x933eb0bbUL, 0x97ffad0cUL, | ||
| 583 | 0xafb010b1UL, 0xab710d06UL, 0xa6322bdfUL, 0xa2f33668UL, | ||
| 584 | 0xbcb4666dUL, 0xb8757bdaUL, 0xb5365d03UL, 0xb1f740b4UL | ||
| 585 | }; | ||
| 586 | |||
| 587 | |||
| 588 | /*---------------------------------------------*/ | ||
| 589 | void initialiseCRC ( void ) | ||
| 590 | { | ||
| 591 | globalCrc = 0xffffffffUL; | ||
| 592 | } | ||
| 593 | |||
| 594 | |||
| 595 | /*---------------------------------------------*/ | ||
| 596 | UInt32 getFinalCRC ( void ) | ||
| 597 | { | ||
| 598 | return ~globalCrc; | ||
| 599 | } | ||
| 600 | |||
| 601 | |||
| 602 | /*---------------------------------------------*/ | ||
| 603 | UInt32 getGlobalCRC ( void ) | ||
| 604 | { | ||
| 605 | return globalCrc; | ||
| 606 | } | ||
| 607 | |||
| 608 | |||
| 609 | /*---------------------------------------------*/ | ||
| 610 | void setGlobalCRC ( UInt32 newCrc ) | ||
| 611 | { | ||
| 612 | globalCrc = newCrc; | ||
| 613 | } | ||
| 614 | |||
| 615 | |||
| 616 | /*---------------------------------------------*/ | ||
| 617 | #define UPDATE_CRC(crcVar,cha) \ | ||
| 618 | { \ | ||
| 619 | crcVar = (crcVar << 8) ^ \ | ||
| 620 | crc32Table[(crcVar >> 24) ^ \ | ||
| 621 | ((UChar)cha)]; \ | ||
| 622 | } | ||
| 623 | |||
| 624 | |||
| 625 | /*---------------------------------------------------*/ | ||
| 626 | /*--- Bit stream I/O ---*/ | ||
| 627 | /*---------------------------------------------------*/ | 302 | /*---------------------------------------------------*/ |
| 628 | 303 | ||
| 629 | |||
| 630 | UInt32 bsBuff; | ||
| 631 | Int32 bsLive; | ||
| 632 | FILE* bsStream; | ||
| 633 | Bool bsWriting; | ||
| 634 | |||
| 635 | |||
| 636 | /*---------------------------------------------*/ | ||
| 637 | void bsSetStream ( FILE* f, Bool wr ) | ||
| 638 | { | ||
| 639 | if (bsStream != NULL) panic ( "bsSetStream" ); | ||
| 640 | bsStream = f; | ||
| 641 | bsLive = 0; | ||
| 642 | bsBuff = 0; | ||
| 643 | bytesOut = 0; | ||
| 644 | bytesIn = 0; | ||
| 645 | bsWriting = wr; | ||
| 646 | } | ||
| 647 | |||
| 648 | |||
| 649 | /*---------------------------------------------*/ | ||
| 650 | void bsFinishedWithStream ( void ) | ||
| 651 | { | ||
| 652 | if (bsWriting) | ||
| 653 | while (bsLive > 0) { | ||
| 654 | fputc ( (UChar)(bsBuff >> 24), bsStream ); | ||
| 655 | bsBuff <<= 8; | ||
| 656 | bsLive -= 8; | ||
| 657 | bytesOut++; | ||
| 658 | } | ||
| 659 | bsStream = NULL; | ||
| 660 | } | ||
| 661 | |||
| 662 | |||
| 663 | /*---------------------------------------------*/ | ||
| 664 | #define bsNEEDR(nz) \ | ||
| 665 | { \ | ||
| 666 | while (bsLive < nz) { \ | ||
| 667 | Int32 zzi = fgetc ( bsStream ); \ | ||
| 668 | if (zzi == EOF) compressedStreamEOF(); \ | ||
| 669 | bsBuff = (bsBuff << 8) | (zzi & 0xffL); \ | ||
| 670 | bsLive += 8; \ | ||
| 671 | } \ | ||
| 672 | } | ||
| 673 | |||
| 674 | |||
| 675 | /*---------------------------------------------*/ | ||
| 676 | #define bsNEEDW(nz) \ | ||
| 677 | { \ | ||
| 678 | while (bsLive >= 8) { \ | ||
| 679 | fputc ( (UChar)(bsBuff >> 24), \ | ||
| 680 | bsStream ); \ | ||
| 681 | bsBuff <<= 8; \ | ||
| 682 | bsLive -= 8; \ | ||
| 683 | bytesOut++; \ | ||
| 684 | } \ | ||
| 685 | } | ||
| 686 | |||
| 687 | |||
| 688 | /*---------------------------------------------*/ | ||
| 689 | #define bsR1(vz) \ | ||
| 690 | { \ | ||
| 691 | bsNEEDR(1); \ | ||
| 692 | vz = (bsBuff >> (bsLive-1)) & 1; \ | ||
| 693 | bsLive--; \ | ||
| 694 | } | ||
| 695 | |||
| 696 | |||
| 697 | /*---------------------------------------------*/ | ||
| 698 | INLINE UInt32 bsR ( Int32 n ) | ||
| 699 | { | ||
| 700 | UInt32 v; | ||
| 701 | bsNEEDR ( n ); | ||
| 702 | v = (bsBuff >> (bsLive-n)) & ((1 << n)-1); | ||
| 703 | bsLive -= n; | ||
| 704 | return v; | ||
| 705 | } | ||
| 706 | |||
| 707 | |||
| 708 | /*---------------------------------------------*/ | ||
| 709 | INLINE void bsW ( Int32 n, UInt32 v ) | ||
| 710 | { | ||
| 711 | bsNEEDW ( n ); | ||
| 712 | bsBuff |= (v << (32 - bsLive - n)); | ||
| 713 | bsLive += n; | ||
| 714 | } | ||
| 715 | |||
| 716 | |||
| 717 | /*---------------------------------------------*/ | ||
| 718 | UChar bsGetUChar ( void ) | ||
| 719 | { | ||
| 720 | return (UChar)bsR(8); | ||
| 721 | } | ||
| 722 | |||
| 723 | |||
| 724 | /*---------------------------------------------*/ | ||
| 725 | void bsPutUChar ( UChar c ) | ||
| 726 | { | ||
| 727 | bsW(8, (UInt32)c ); | ||
| 728 | } | ||
| 729 | |||
| 730 | |||
| 731 | /*---------------------------------------------*/ | 304 | /*---------------------------------------------*/ |
| 732 | Int32 bsGetUInt32 ( void ) | 305 | Bool myfeof ( FILE* f ) |
| 733 | { | 306 | { |
| 734 | UInt32 u; | 307 | Int32 c = fgetc ( f ); |
| 735 | u = 0; | 308 | if (c == EOF) return True; |
| 736 | u = (u << 8) | bsR(8); | 309 | ungetc ( c, f ); |
| 737 | u = (u << 8) | bsR(8); | 310 | return False; |
| 738 | u = (u << 8) | bsR(8); | ||
| 739 | u = (u << 8) | bsR(8); | ||
| 740 | return u; | ||
| 741 | } | ||
| 742 | |||
| 743 | |||
| 744 | /*---------------------------------------------*/ | ||
| 745 | UInt32 bsGetIntVS ( UInt32 numBits ) | ||
| 746 | { | ||
| 747 | return (UInt32)bsR(numBits); | ||
| 748 | } | ||
| 749 | |||
| 750 | |||
| 751 | /*---------------------------------------------*/ | ||
| 752 | UInt32 bsGetInt32 ( void ) | ||
| 753 | { | ||
| 754 | return (Int32)bsGetUInt32(); | ||
| 755 | } | ||
| 756 | |||
| 757 | |||
| 758 | /*---------------------------------------------*/ | ||
| 759 | void bsPutUInt32 ( UInt32 u ) | ||
| 760 | { | ||
| 761 | bsW ( 8, (u >> 24) & 0xffL ); | ||
| 762 | bsW ( 8, (u >> 16) & 0xffL ); | ||
| 763 | bsW ( 8, (u >> 8) & 0xffL ); | ||
| 764 | bsW ( 8, u & 0xffL ); | ||
| 765 | } | ||
| 766 | |||
| 767 | |||
| 768 | /*---------------------------------------------*/ | ||
| 769 | void bsPutInt32 ( Int32 c ) | ||
| 770 | { | ||
| 771 | bsPutUInt32 ( (UInt32)c ); | ||
| 772 | } | 311 | } |
| 773 | 312 | ||
| 774 | 313 | ||
| 775 | /*---------------------------------------------*/ | 314 | /*---------------------------------------------*/ |
| 776 | void bsPutIntVS ( Int32 numBits, UInt32 c ) | 315 | void compressStream ( FILE *stream, FILE *zStream ) |
| 777 | { | 316 | { |
| 778 | bsW ( numBits, c ); | 317 | BZFILE* bzf = NULL; |
| 779 | } | 318 | UChar ibuf[5000]; |
| 780 | 319 | Int32 nIbuf; | |
| 781 | 320 | UInt32 nbytes_in, nbytes_out; | |
| 782 | /*---------------------------------------------------*/ | 321 | Int32 bzerr, bzerr_dummy, ret; |
| 783 | /*--- Huffman coding low-level stuff ---*/ | ||
| 784 | /*---------------------------------------------------*/ | ||
| 785 | |||
| 786 | #define WEIGHTOF(zz0) ((zz0) & 0xffffff00) | ||
| 787 | #define DEPTHOF(zz1) ((zz1) & 0x000000ff) | ||
| 788 | #define MYMAX(zz2,zz3) ((zz2) > (zz3) ? (zz2) : (zz3)) | ||
| 789 | |||
| 790 | #define ADDWEIGHTS(zw1,zw2) \ | ||
| 791 | (WEIGHTOF(zw1)+WEIGHTOF(zw2)) | \ | ||
| 792 | (1 + MYMAX(DEPTHOF(zw1),DEPTHOF(zw2))) | ||
| 793 | |||
| 794 | #define UPHEAP(z) \ | ||
| 795 | { \ | ||
| 796 | Int32 zz, tmp; \ | ||
| 797 | zz = z; tmp = heap[zz]; \ | ||
| 798 | while (weight[tmp] < weight[heap[zz >> 1]]) { \ | ||
| 799 | heap[zz] = heap[zz >> 1]; \ | ||
| 800 | zz >>= 1; \ | ||
| 801 | } \ | ||
| 802 | heap[zz] = tmp; \ | ||
| 803 | } | ||
| 804 | |||
| 805 | #define DOWNHEAP(z) \ | ||
| 806 | { \ | ||
| 807 | Int32 zz, yy, tmp; \ | ||
| 808 | zz = z; tmp = heap[zz]; \ | ||
| 809 | while (True) { \ | ||
| 810 | yy = zz << 1; \ | ||
| 811 | if (yy > nHeap) break; \ | ||
| 812 | if (yy < nHeap && \ | ||
| 813 | weight[heap[yy+1]] < weight[heap[yy]]) \ | ||
| 814 | yy++; \ | ||
| 815 | if (weight[tmp] < weight[heap[yy]]) break; \ | ||
| 816 | heap[zz] = heap[yy]; \ | ||
| 817 | zz = yy; \ | ||
| 818 | } \ | ||
| 819 | heap[zz] = tmp; \ | ||
| 820 | } | ||
| 821 | 322 | ||
| 323 | SET_BINARY_MODE(stream); | ||
| 324 | SET_BINARY_MODE(zStream); | ||
| 822 | 325 | ||
| 823 | /*---------------------------------------------*/ | 326 | if (ferror(stream)) goto errhandler_io; |
| 824 | void hbMakeCodeLengths ( UChar *len, | 327 | if (ferror(zStream)) goto errhandler_io; |
| 825 | Int32 *freq, | ||
| 826 | Int32 alphaSize, | ||
| 827 | Int32 maxLen ) | ||
| 828 | { | ||
| 829 | /*-- | ||
| 830 | Nodes and heap entries run from 1. Entry 0 | ||
| 831 | for both the heap and nodes is a sentinel. | ||
| 832 | --*/ | ||
| 833 | Int32 nNodes, nHeap, n1, n2, i, j, k; | ||
| 834 | Bool tooLong; | ||
| 835 | 328 | ||
| 836 | Int32 heap [ MAX_ALPHA_SIZE + 2 ]; | 329 | bzf = bzWriteOpen ( &bzerr, zStream, |
| 837 | Int32 weight [ MAX_ALPHA_SIZE * 2 ]; | 330 | blockSize100k, verbosity, workFactor ); |
| 838 | Int32 parent [ MAX_ALPHA_SIZE * 2 ]; | 331 | if (bzerr != BZ_OK) goto errhandler; |
| 839 | 332 | ||
| 840 | for (i = 0; i < alphaSize; i++) | 333 | if (verbosity >= 2) fprintf ( stderr, "\n" ); |
| 841 | weight[i+1] = (freq[i] == 0 ? 1 : freq[i]) << 8; | ||
| 842 | 334 | ||
| 843 | while (True) { | 335 | while (True) { |
| 844 | 336 | ||
| 845 | nNodes = alphaSize; | 337 | if (myfeof(stream)) break; |
| 846 | nHeap = 0; | 338 | nIbuf = fread ( ibuf, sizeof(UChar), 5000, stream ); |
| 847 | 339 | if (ferror(stream)) goto errhandler_io; | |
| 848 | heap[0] = 0; | 340 | if (nIbuf > 0) bzWrite ( &bzerr, bzf, (void*)ibuf, nIbuf ); |
| 849 | weight[0] = 0; | 341 | if (bzerr != BZ_OK) goto errhandler; |
| 850 | parent[0] = -2; | ||
| 851 | |||
| 852 | for (i = 1; i <= alphaSize; i++) { | ||
| 853 | parent[i] = -1; | ||
| 854 | nHeap++; | ||
| 855 | heap[nHeap] = i; | ||
| 856 | UPHEAP(nHeap); | ||
| 857 | } | ||
| 858 | if (!(nHeap < (MAX_ALPHA_SIZE+2))) | ||
| 859 | panic ( "hbMakeCodeLengths(1)" ); | ||
| 860 | |||
| 861 | while (nHeap > 1) { | ||
| 862 | n1 = heap[1]; heap[1] = heap[nHeap]; nHeap--; DOWNHEAP(1); | ||
| 863 | n2 = heap[1]; heap[1] = heap[nHeap]; nHeap--; DOWNHEAP(1); | ||
| 864 | nNodes++; | ||
| 865 | parent[n1] = parent[n2] = nNodes; | ||
| 866 | weight[nNodes] = ADDWEIGHTS(weight[n1], weight[n2]); | ||
| 867 | parent[nNodes] = -1; | ||
| 868 | nHeap++; | ||
| 869 | heap[nHeap] = nNodes; | ||
| 870 | UPHEAP(nHeap); | ||
| 871 | } | ||
| 872 | if (!(nNodes < (MAX_ALPHA_SIZE * 2))) | ||
| 873 | panic ( "hbMakeCodeLengths(2)" ); | ||
| 874 | |||
| 875 | tooLong = False; | ||
| 876 | for (i = 1; i <= alphaSize; i++) { | ||
| 877 | j = 0; | ||
| 878 | k = i; | ||
| 879 | while (parent[k] >= 0) { k = parent[k]; j++; } | ||
| 880 | len[i-1] = j; | ||
| 881 | if (j > maxLen) tooLong = True; | ||
| 882 | } | ||
| 883 | |||
| 884 | if (! tooLong) break; | ||
| 885 | 342 | ||
| 886 | for (i = 1; i < alphaSize; i++) { | ||
| 887 | j = weight[i] >> 8; | ||
| 888 | j = 1 + (j / 2); | ||
| 889 | weight[i] = j << 8; | ||
| 890 | } | ||
| 891 | } | 343 | } |
| 892 | } | ||
| 893 | |||
| 894 | 344 | ||
| 895 | /*---------------------------------------------*/ | 345 | bzWriteClose ( &bzerr, bzf, 0, &nbytes_in, &nbytes_out ); |
| 896 | void hbAssignCodes ( Int32 *code, | 346 | if (bzerr != BZ_OK) goto errhandler; |
| 897 | UChar *length, | ||
| 898 | Int32 minLen, | ||
| 899 | Int32 maxLen, | ||
| 900 | Int32 alphaSize ) | ||
| 901 | { | ||
| 902 | Int32 n, vec, i; | ||
| 903 | 347 | ||
| 904 | vec = 0; | 348 | if (ferror(zStream)) goto errhandler_io; |
| 905 | for (n = minLen; n <= maxLen; n++) { | 349 | ret = fflush ( zStream ); |
| 906 | for (i = 0; i < alphaSize; i++) | 350 | if (ret == EOF) goto errhandler_io; |
| 907 | if (length[i] == n) { code[i] = vec; vec++; }; | 351 | if (zStream != stdout) { |
| 908 | vec <<= 1; | 352 | ret = fclose ( zStream ); |
| 353 | if (ret == EOF) goto errhandler_io; | ||
| 909 | } | 354 | } |
| 910 | } | 355 | if (ferror(stream)) goto errhandler_io; |
| 911 | 356 | ret = fclose ( stream ); | |
| 912 | 357 | if (ret == EOF) goto errhandler_io; | |
| 913 | /*---------------------------------------------*/ | ||
| 914 | void hbCreateDecodeTables ( Int32 *limit, | ||
| 915 | Int32 *base, | ||
| 916 | Int32 *perm, | ||
| 917 | UChar *length, | ||
| 918 | Int32 minLen, | ||
| 919 | Int32 maxLen, | ||
| 920 | Int32 alphaSize ) | ||
| 921 | { | ||
| 922 | Int32 pp, i, j, vec; | ||
| 923 | |||
| 924 | pp = 0; | ||
| 925 | for (i = minLen; i <= maxLen; i++) | ||
| 926 | for (j = 0; j < alphaSize; j++) | ||
| 927 | if (length[j] == i) { perm[pp] = j; pp++; }; | ||
| 928 | |||
| 929 | for (i = 0; i < MAX_CODE_LEN; i++) base[i] = 0; | ||
| 930 | for (i = 0; i < alphaSize; i++) base[length[i]+1]++; | ||
| 931 | 358 | ||
| 932 | for (i = 1; i < MAX_CODE_LEN; i++) base[i] += base[i-1]; | 359 | if (nbytes_in == 0) nbytes_in = 1; |
| 933 | 360 | ||
| 934 | for (i = 0; i < MAX_CODE_LEN; i++) limit[i] = 0; | 361 | if (verbosity >= 1) |
| 935 | vec = 0; | 362 | fprintf ( stderr, "%6.3f:1, %6.3f bits/byte, " |
| 936 | 363 | "%5.2f%% saved, %d in, %d out.\n", | |
| 937 | for (i = minLen; i <= maxLen; i++) { | 364 | (float)nbytes_in / (float)nbytes_out, |
| 938 | vec += (base[i+1] - base[i]); | 365 | (8.0 * (float)nbytes_out) / (float)nbytes_in, |
| 939 | limit[i] = vec-1; | 366 | 100.0 * (1.0 - (float)nbytes_out / (float)nbytes_in), |
| 940 | vec <<= 1; | 367 | nbytes_in, |
| 941 | } | 368 | nbytes_out |
| 942 | for (i = minLen + 1; i <= maxLen; i++) | 369 | ); |
| 943 | base[i] = ((limit[i-1] + 1) << 1) - base[i]; | ||
| 944 | } | ||
| 945 | |||
| 946 | |||
| 947 | |||
| 948 | /*---------------------------------------------------*/ | ||
| 949 | /*--- Undoing the reversible transformation ---*/ | ||
| 950 | /*---------------------------------------------------*/ | ||
| 951 | |||
| 952 | /*---------------------------------------------*/ | ||
| 953 | #define SET_LL4(i,n) \ | ||
| 954 | { if (((i) & 0x1) == 0) \ | ||
| 955 | ll4[(i) >> 1] = (ll4[(i) >> 1] & 0xf0) | (n); else \ | ||
| 956 | ll4[(i) >> 1] = (ll4[(i) >> 1] & 0x0f) | ((n) << 4); \ | ||
| 957 | } | ||
| 958 | |||
| 959 | #define GET_LL4(i) \ | ||
| 960 | (((UInt32)(ll4[(i) >> 1])) >> (((i) << 2) & 0x4) & 0xF) | ||
| 961 | |||
| 962 | #define SET_LL(i,n) \ | ||
| 963 | { ll16[i] = (UInt16)(n & 0x0000ffff); \ | ||
| 964 | SET_LL4(i, n >> 16); \ | ||
| 965 | } | ||
| 966 | |||
| 967 | #define GET_LL(i) \ | ||
| 968 | (((UInt32)ll16[i]) | (GET_LL4(i) << 16)) | ||
| 969 | |||
| 970 | |||
| 971 | /*---------------------------------------------*/ | ||
| 972 | /*-- | ||
| 973 | Manage memory for compression/decompression. | ||
| 974 | When compressing, a single block size applies to | ||
| 975 | all files processed, and that's set when the | ||
| 976 | program starts. But when decompressing, each file | ||
| 977 | processed could have been compressed with a | ||
| 978 | different block size, so we may have to free | ||
| 979 | and reallocate on a per-file basis. | ||
| 980 | |||
| 981 | A call with argument of zero means | ||
| 982 | `free up everything.' And a value of zero for | ||
| 983 | blockSize100k means no memory is currently allocated. | ||
| 984 | --*/ | ||
| 985 | 370 | ||
| 371 | return; | ||
| 986 | 372 | ||
| 987 | /*---------------------------------------------*/ | 373 | errhandler: |
| 988 | void allocateCompressStructures ( void ) | 374 | bzWriteClose ( &bzerr_dummy, bzf, 1, &nbytes_in, &nbytes_out ); |
| 989 | { | 375 | switch (bzerr) { |
| 990 | Int32 n = 100000 * blockSize100k; | 376 | case BZ_MEM_ERROR: |
| 991 | block = malloc ( (n + 1 + NUM_OVERSHOOT_BYTES) * sizeof(UChar) ); | 377 | outOfMemory (); |
| 992 | quadrant = malloc ( (n + NUM_OVERSHOOT_BYTES) * sizeof(Int16) ); | 378 | case BZ_IO_ERROR: |
| 993 | zptr = malloc ( n * sizeof(Int32) ); | 379 | errhandler_io: |
| 994 | ftab = malloc ( 65537 * sizeof(Int32) ); | 380 | ioError(); break; |
| 995 | 381 | default: | |
| 996 | if (block == NULL || quadrant == NULL || | 382 | panic ( "compress:unexpected error" ); |
| 997 | zptr == NULL || ftab == NULL) { | ||
| 998 | Int32 totalDraw | ||
| 999 | = (n + 1 + NUM_OVERSHOOT_BYTES) * sizeof(UChar) + | ||
| 1000 | (n + NUM_OVERSHOOT_BYTES) * sizeof(Int16) + | ||
| 1001 | n * sizeof(Int32) + | ||
| 1002 | 65537 * sizeof(Int32); | ||
| 1003 | |||
| 1004 | compressOutOfMemory ( totalDraw, n ); | ||
| 1005 | } | 383 | } |
| 1006 | 384 | ||
| 1007 | /*-- | 385 | panic ( "compress:end" ); |
| 1008 | Since we want valid indexes for block of | 386 | /*notreached*/ |
| 1009 | -1 to n + NUM_OVERSHOOT_BYTES - 1 | ||
| 1010 | inclusive. | ||
| 1011 | --*/ | ||
| 1012 | block++; | ||
| 1013 | |||
| 1014 | /*-- | ||
| 1015 | The back end needs a place to store the MTF values | ||
| 1016 | whilst it calculates the coding tables. We could | ||
| 1017 | put them in the zptr array. However, these values | ||
| 1018 | will fit in a short, so we overlay szptr at the | ||
| 1019 | start of zptr, in the hope of reducing the number | ||
| 1020 | of cache misses induced by the multiple traversals | ||
| 1021 | of the MTF values when calculating coding tables. | ||
| 1022 | Seems to improve compression speed by about 1%. | ||
| 1023 | --*/ | ||
| 1024 | szptr = (UInt16*)zptr; | ||
| 1025 | } | ||
| 1026 | |||
| 1027 | |||
| 1028 | /*---------------------------------------------*/ | ||
| 1029 | void setDecompressStructureSizes ( Int32 newSize100k ) | ||
| 1030 | { | ||
| 1031 | if (! (0 <= newSize100k && newSize100k <= 9 && | ||
| 1032 | 0 <= blockSize100k && blockSize100k <= 9)) | ||
| 1033 | panic ( "setDecompressStructureSizes" ); | ||
| 1034 | |||
| 1035 | if (newSize100k == blockSize100k) return; | ||
| 1036 | |||
| 1037 | blockSize100k = newSize100k; | ||
| 1038 | |||
| 1039 | if (ll16 != NULL) free ( ll16 ); | ||
| 1040 | if (ll4 != NULL) free ( ll4 ); | ||
| 1041 | if (ll8 != NULL) free ( ll8 ); | ||
| 1042 | if (tt != NULL) free ( tt ); | ||
| 1043 | |||
| 1044 | if (newSize100k == 0) return; | ||
| 1045 | |||
| 1046 | if (smallMode) { | ||
| 1047 | |||
| 1048 | Int32 n = 100000 * newSize100k; | ||
| 1049 | ll16 = malloc ( n * sizeof(UInt16) ); | ||
| 1050 | ll4 = malloc ( ((n+1) >> 1) * sizeof(UChar) ); | ||
| 1051 | |||
| 1052 | if (ll4 == NULL || ll16 == NULL) { | ||
| 1053 | Int32 totalDraw | ||
| 1054 | = n * sizeof(Int16) + ((n+1) >> 1) * sizeof(UChar); | ||
| 1055 | uncompressOutOfMemory ( totalDraw, n ); | ||
| 1056 | } | ||
| 1057 | |||
| 1058 | } else { | ||
| 1059 | |||
| 1060 | Int32 n = 100000 * newSize100k; | ||
| 1061 | ll8 = malloc ( n * sizeof(UChar) ); | ||
| 1062 | tt = malloc ( n * sizeof(Int32) ); | ||
| 1063 | |||
| 1064 | if (ll8 == NULL || tt == NULL) { | ||
| 1065 | Int32 totalDraw | ||
| 1066 | = n * sizeof(UChar) + n * sizeof(UInt32); | ||
| 1067 | uncompressOutOfMemory ( totalDraw, n ); | ||
| 1068 | } | ||
| 1069 | |||
| 1070 | } | ||
| 1071 | } | 387 | } |
| 1072 | 388 | ||
| 1073 | 389 | ||
| 1074 | 390 | ||
| 1075 | /*---------------------------------------------------*/ | ||
| 1076 | /*--- The new back end ---*/ | ||
| 1077 | /*---------------------------------------------------*/ | ||
| 1078 | |||
| 1079 | /*---------------------------------------------*/ | ||
| 1080 | void makeMaps ( void ) | ||
| 1081 | { | ||
| 1082 | Int32 i; | ||
| 1083 | nInUse = 0; | ||
| 1084 | for (i = 0; i < 256; i++) | ||
| 1085 | if (inUse[i]) { | ||
| 1086 | seqToUnseq[nInUse] = i; | ||
| 1087 | unseqToSeq[i] = nInUse; | ||
| 1088 | nInUse++; | ||
| 1089 | } | ||
| 1090 | } | ||
| 1091 | |||
| 1092 | |||
| 1093 | /*---------------------------------------------*/ | 391 | /*---------------------------------------------*/ |
| 1094 | void generateMTFValues ( void ) | 392 | Bool uncompressStream ( FILE *zStream, FILE *stream ) |
| 1095 | { | ||
| 1096 | UChar yy[256]; | ||
| 1097 | Int32 i, j; | ||
| 1098 | UChar tmp; | ||
| 1099 | UChar tmp2; | ||
| 1100 | Int32 zPend; | ||
| 1101 | Int32 wr; | ||
| 1102 | Int32 EOB; | ||
| 1103 | |||
| 1104 | makeMaps(); | ||
| 1105 | EOB = nInUse+1; | ||
| 1106 | |||
| 1107 | for (i = 0; i <= EOB; i++) mtfFreq[i] = 0; | ||
| 1108 | |||
| 1109 | wr = 0; | ||
| 1110 | zPend = 0; | ||
| 1111 | for (i = 0; i < nInUse; i++) yy[i] = (UChar) i; | ||
| 1112 | |||
| 1113 | |||
| 1114 | for (i = 0; i <= last; i++) { | ||
| 1115 | UChar ll_i; | ||
| 1116 | |||
| 1117 | #if DEBUG | ||
| 1118 | assert (wr <= i); | ||
| 1119 | #endif | ||
| 1120 | |||
| 1121 | ll_i = unseqToSeq[block[zptr[i] - 1]]; | ||
| 1122 | #if DEBUG | ||
| 1123 | assert (ll_i < nInUse); | ||
| 1124 | #endif | ||
| 1125 | |||
| 1126 | j = 0; | ||
| 1127 | tmp = yy[j]; | ||
| 1128 | while ( ll_i != tmp ) { | ||
| 1129 | j++; | ||
| 1130 | tmp2 = tmp; | ||
| 1131 | tmp = yy[j]; | ||
| 1132 | yy[j] = tmp2; | ||
| 1133 | }; | ||
| 1134 | yy[0] = tmp; | ||
| 1135 | |||
| 1136 | if (j == 0) { | ||
| 1137 | zPend++; | ||
| 1138 | } else { | ||
| 1139 | if (zPend > 0) { | ||
| 1140 | zPend--; | ||
| 1141 | while (True) { | ||
| 1142 | switch (zPend % 2) { | ||
| 1143 | case 0: szptr[wr] = RUNA; wr++; mtfFreq[RUNA]++; break; | ||
| 1144 | case 1: szptr[wr] = RUNB; wr++; mtfFreq[RUNB]++; break; | ||
| 1145 | }; | ||
| 1146 | if (zPend < 2) break; | ||
| 1147 | zPend = (zPend - 2) / 2; | ||
| 1148 | }; | ||
| 1149 | zPend = 0; | ||
| 1150 | } | ||
| 1151 | szptr[wr] = j+1; wr++; mtfFreq[j+1]++; | ||
| 1152 | } | ||
| 1153 | } | ||
| 1154 | |||
| 1155 | if (zPend > 0) { | ||
| 1156 | zPend--; | ||
| 1157 | while (True) { | ||
| 1158 | switch (zPend % 2) { | ||
| 1159 | case 0: szptr[wr] = RUNA; wr++; mtfFreq[RUNA]++; break; | ||
| 1160 | case 1: szptr[wr] = RUNB; wr++; mtfFreq[RUNB]++; break; | ||
| 1161 | }; | ||
| 1162 | if (zPend < 2) break; | ||
| 1163 | zPend = (zPend - 2) / 2; | ||
| 1164 | }; | ||
| 1165 | } | ||
| 1166 | |||
| 1167 | szptr[wr] = EOB; wr++; mtfFreq[EOB]++; | ||
| 1168 | |||
| 1169 | nMTF = wr; | ||
| 1170 | } | ||
| 1171 | |||
| 1172 | |||
| 1173 | /*---------------------------------------------*/ | ||
| 1174 | #define LESSER_ICOST 0 | ||
| 1175 | #define GREATER_ICOST 15 | ||
| 1176 | |||
| 1177 | void sendMTFValues ( void ) | ||
| 1178 | { | 393 | { |
| 1179 | Int32 v, t, i, j, gs, ge, totc, bt, bc, iter; | 394 | BZFILE* bzf = NULL; |
| 1180 | Int32 nSelectors, alphaSize, minLen, maxLen, selCtr; | 395 | Int32 bzerr, bzerr_dummy, ret, nread, streamNo, i; |
| 1181 | Int32 nGroups, nBytes; | 396 | UChar obuf[5000]; |
| 1182 | 397 | UChar unused[BZ_MAX_UNUSED]; | |
| 1183 | /*-- | 398 | Int32 nUnused; |
| 1184 | UChar len [N_GROUPS][MAX_ALPHA_SIZE]; | 399 | UChar* unusedTmp; |
| 1185 | is a global since the decoder also needs it. | ||
| 1186 | |||
| 1187 | Int32 code[N_GROUPS][MAX_ALPHA_SIZE]; | ||
| 1188 | Int32 rfreq[N_GROUPS][MAX_ALPHA_SIZE]; | ||
| 1189 | are also globals only used in this proc. | ||
| 1190 | Made global to keep stack frame size small. | ||
| 1191 | --*/ | ||
| 1192 | |||
| 1193 | |||
| 1194 | UInt16 cost[N_GROUPS]; | ||
| 1195 | Int32 fave[N_GROUPS]; | ||
| 1196 | |||
| 1197 | if (verbosity >= 3) | ||
| 1198 | fprintf ( stderr, | ||
| 1199 | " %d in block, %d after MTF & 1-2 coding, %d+2 syms in use\n", | ||
| 1200 | last+1, nMTF, nInUse ); | ||
| 1201 | |||
| 1202 | alphaSize = nInUse+2; | ||
| 1203 | for (t = 0; t < N_GROUPS; t++) | ||
| 1204 | for (v = 0; v < alphaSize; v++) | ||
| 1205 | len[t][v] = GREATER_ICOST; | ||
| 1206 | |||
| 1207 | /*--- Decide how many coding tables to use ---*/ | ||
| 1208 | if (nMTF <= 0) panic ( "sendMTFValues(0)" ); | ||
| 1209 | if (nMTF < 200) nGroups = 2; else | ||
| 1210 | if (nMTF < 800) nGroups = 4; else | ||
| 1211 | nGroups = 6; | ||
| 1212 | |||
| 1213 | /*--- Generate an initial set of coding tables ---*/ | ||
| 1214 | { | ||
| 1215 | Int32 nPart, remF, tFreq, aFreq; | ||
| 1216 | |||
| 1217 | nPart = nGroups; | ||
| 1218 | remF = nMTF; | ||
| 1219 | gs = 0; | ||
| 1220 | while (nPart > 0) { | ||
| 1221 | tFreq = remF / nPart; | ||
| 1222 | ge = gs-1; | ||
| 1223 | aFreq = 0; | ||
| 1224 | while (aFreq < tFreq && ge < alphaSize-1) { | ||
| 1225 | ge++; | ||
| 1226 | aFreq += mtfFreq[ge]; | ||
| 1227 | } | ||
| 1228 | |||
| 1229 | if (ge > gs | ||
| 1230 | && nPart != nGroups && nPart != 1 | ||
| 1231 | && ((nGroups-nPart) % 2 == 1)) { | ||
| 1232 | aFreq -= mtfFreq[ge]; | ||
| 1233 | ge--; | ||
| 1234 | } | ||
| 1235 | 400 | ||
| 1236 | if (verbosity >= 3) | 401 | nUnused = 0; |
| 1237 | fprintf ( stderr, | 402 | streamNo = 0; |
| 1238 | " initial group %d, [%d .. %d], has %d syms (%4.1f%%)\n", | ||
| 1239 | nPart, gs, ge, aFreq, | ||
| 1240 | (100.0 * (float)aFreq) / (float)nMTF ); | ||
| 1241 | |||
| 1242 | for (v = 0; v < alphaSize; v++) | ||
| 1243 | if (v >= gs && v <= ge) | ||
| 1244 | len[nPart-1][v] = LESSER_ICOST; else | ||
| 1245 | len[nPart-1][v] = GREATER_ICOST; | ||
| 1246 | |||
| 1247 | nPart--; | ||
| 1248 | gs = ge+1; | ||
| 1249 | remF -= aFreq; | ||
| 1250 | } | ||
| 1251 | } | ||
| 1252 | |||
| 1253 | /*--- | ||
| 1254 | Iterate up to N_ITERS times to improve the tables. | ||
| 1255 | ---*/ | ||
| 1256 | for (iter = 0; iter < N_ITERS; iter++) { | ||
| 1257 | |||
| 1258 | for (t = 0; t < nGroups; t++) fave[t] = 0; | ||
| 1259 | |||
| 1260 | for (t = 0; t < nGroups; t++) | ||
| 1261 | for (v = 0; v < alphaSize; v++) | ||
| 1262 | rfreq[t][v] = 0; | ||
| 1263 | |||
| 1264 | nSelectors = 0; | ||
| 1265 | totc = 0; | ||
| 1266 | gs = 0; | ||
| 1267 | while (True) { | ||
| 1268 | |||
| 1269 | /*--- Set group start & end marks. --*/ | ||
| 1270 | if (gs >= nMTF) break; | ||
| 1271 | ge = gs + G_SIZE - 1; | ||
| 1272 | if (ge >= nMTF) ge = nMTF-1; | ||
| 1273 | |||
| 1274 | /*-- | ||
| 1275 | Calculate the cost of this group as coded | ||
| 1276 | by each of the coding tables. | ||
| 1277 | --*/ | ||
| 1278 | for (t = 0; t < nGroups; t++) cost[t] = 0; | ||
| 1279 | |||
| 1280 | if (nGroups == 6) { | ||
| 1281 | register UInt16 cost0, cost1, cost2, cost3, cost4, cost5; | ||
| 1282 | cost0 = cost1 = cost2 = cost3 = cost4 = cost5 = 0; | ||
| 1283 | for (i = gs; i <= ge; i++) { | ||
| 1284 | UInt16 icv = szptr[i]; | ||
| 1285 | cost0 += len[0][icv]; | ||
| 1286 | cost1 += len[1][icv]; | ||
| 1287 | cost2 += len[2][icv]; | ||
| 1288 | cost3 += len[3][icv]; | ||
| 1289 | cost4 += len[4][icv]; | ||
| 1290 | cost5 += len[5][icv]; | ||
| 1291 | } | ||
| 1292 | cost[0] = cost0; cost[1] = cost1; cost[2] = cost2; | ||
| 1293 | cost[3] = cost3; cost[4] = cost4; cost[5] = cost5; | ||
| 1294 | } else { | ||
| 1295 | for (i = gs; i <= ge; i++) { | ||
| 1296 | UInt16 icv = szptr[i]; | ||
| 1297 | for (t = 0; t < nGroups; t++) cost[t] += len[t][icv]; | ||
| 1298 | } | ||
| 1299 | } | ||
| 1300 | |||
| 1301 | /*-- | ||
| 1302 | Find the coding table which is best for this group, | ||
| 1303 | and record its identity in the selector table. | ||
| 1304 | --*/ | ||
| 1305 | bc = 999999999; bt = -1; | ||
| 1306 | for (t = 0; t < nGroups; t++) | ||
| 1307 | if (cost[t] < bc) { bc = cost[t]; bt = t; }; | ||
| 1308 | totc += bc; | ||
| 1309 | fave[bt]++; | ||
| 1310 | selector[nSelectors] = bt; | ||
| 1311 | nSelectors++; | ||
| 1312 | |||
| 1313 | /*-- | ||
| 1314 | Increment the symbol frequencies for the selected table. | ||
| 1315 | --*/ | ||
| 1316 | for (i = gs; i <= ge; i++) | ||
| 1317 | rfreq[bt][ szptr[i] ]++; | ||
| 1318 | |||
| 1319 | gs = ge+1; | ||
| 1320 | } | ||
| 1321 | if (verbosity >= 3) { | ||
| 1322 | fprintf ( stderr, | ||
| 1323 | " pass %d: size is %d, grp uses are ", | ||
| 1324 | iter+1, totc/8 ); | ||
| 1325 | for (t = 0; t < nGroups; t++) | ||
| 1326 | fprintf ( stderr, "%d ", fave[t] ); | ||
| 1327 | fprintf ( stderr, "\n" ); | ||
| 1328 | } | ||
| 1329 | |||
| 1330 | /*-- | ||
| 1331 | Recompute the tables based on the accumulated frequencies. | ||
| 1332 | --*/ | ||
| 1333 | for (t = 0; t < nGroups; t++) | ||
| 1334 | hbMakeCodeLengths ( &len[t][0], &rfreq[t][0], alphaSize, 20 ); | ||
| 1335 | } | ||
| 1336 | 403 | ||
| 404 | SET_BINARY_MODE(stream); | ||
| 405 | SET_BINARY_MODE(zStream); | ||
| 1337 | 406 | ||
| 1338 | if (!(nGroups < 8)) panic ( "sendMTFValues(1)" ); | 407 | if (ferror(stream)) goto errhandler_io; |
| 1339 | if (!(nSelectors < 32768 && | 408 | if (ferror(zStream)) goto errhandler_io; |
| 1340 | nSelectors <= (2 + (900000 / G_SIZE)))) | ||
| 1341 | panic ( "sendMTFValues(2)" ); | ||
| 1342 | |||
| 1343 | |||
| 1344 | /*--- Compute MTF values for the selectors. ---*/ | ||
| 1345 | { | ||
| 1346 | UChar pos[N_GROUPS], ll_i, tmp2, tmp; | ||
| 1347 | for (i = 0; i < nGroups; i++) pos[i] = i; | ||
| 1348 | for (i = 0; i < nSelectors; i++) { | ||
| 1349 | ll_i = selector[i]; | ||
| 1350 | j = 0; | ||
| 1351 | tmp = pos[j]; | ||
| 1352 | while ( ll_i != tmp ) { | ||
| 1353 | j++; | ||
| 1354 | tmp2 = tmp; | ||
| 1355 | tmp = pos[j]; | ||
| 1356 | pos[j] = tmp2; | ||
| 1357 | }; | ||
| 1358 | pos[0] = tmp; | ||
| 1359 | selectorMtf[i] = j; | ||
| 1360 | } | ||
| 1361 | }; | ||
| 1362 | |||
| 1363 | /*--- Assign actual codes for the tables. --*/ | ||
| 1364 | for (t = 0; t < nGroups; t++) { | ||
| 1365 | minLen = 32; | ||
| 1366 | maxLen = 0; | ||
| 1367 | for (i = 0; i < alphaSize; i++) { | ||
| 1368 | if (len[t][i] > maxLen) maxLen = len[t][i]; | ||
| 1369 | if (len[t][i] < minLen) minLen = len[t][i]; | ||
| 1370 | } | ||
| 1371 | if (maxLen > 20) panic ( "sendMTFValues(3)" ); | ||
| 1372 | if (minLen < 1) panic ( "sendMTFValues(4)" ); | ||
| 1373 | hbAssignCodes ( &code[t][0], &len[t][0], | ||
| 1374 | minLen, maxLen, alphaSize ); | ||
| 1375 | } | ||
| 1376 | |||
| 1377 | /*--- Transmit the mapping table. ---*/ | ||
| 1378 | { | ||
| 1379 | Bool inUse16[16]; | ||
| 1380 | for (i = 0; i < 16; i++) { | ||
| 1381 | inUse16[i] = False; | ||
| 1382 | for (j = 0; j < 16; j++) | ||
| 1383 | if (inUse[i * 16 + j]) inUse16[i] = True; | ||
| 1384 | } | ||
| 1385 | |||
| 1386 | nBytes = bytesOut; | ||
| 1387 | for (i = 0; i < 16; i++) | ||
| 1388 | if (inUse16[i]) bsW(1,1); else bsW(1,0); | ||
| 1389 | |||
| 1390 | for (i = 0; i < 16; i++) | ||
| 1391 | if (inUse16[i]) | ||
| 1392 | for (j = 0; j < 16; j++) | ||
| 1393 | if (inUse[i * 16 + j]) bsW(1,1); else bsW(1,0); | ||
| 1394 | |||
| 1395 | if (verbosity >= 3) | ||
| 1396 | fprintf ( stderr, " bytes: mapping %d, ", bytesOut-nBytes ); | ||
| 1397 | } | ||
| 1398 | |||
| 1399 | /*--- Now the selectors. ---*/ | ||
| 1400 | nBytes = bytesOut; | ||
| 1401 | bsW ( 3, nGroups ); | ||
| 1402 | bsW ( 15, nSelectors ); | ||
| 1403 | for (i = 0; i < nSelectors; i++) { | ||
| 1404 | for (j = 0; j < selectorMtf[i]; j++) bsW(1,1); | ||
| 1405 | bsW(1,0); | ||
| 1406 | } | ||
| 1407 | if (verbosity >= 3) | ||
| 1408 | fprintf ( stderr, "selectors %d, ", bytesOut-nBytes ); | ||
| 1409 | |||
| 1410 | /*--- Now the coding tables. ---*/ | ||
| 1411 | nBytes = bytesOut; | ||
| 1412 | |||
| 1413 | for (t = 0; t < nGroups; t++) { | ||
| 1414 | Int32 curr = len[t][0]; | ||
| 1415 | bsW ( 5, curr ); | ||
| 1416 | for (i = 0; i < alphaSize; i++) { | ||
| 1417 | while (curr < len[t][i]) { bsW(2,2); curr++; /* 10 */ }; | ||
| 1418 | while (curr > len[t][i]) { bsW(2,3); curr--; /* 11 */ }; | ||
| 1419 | bsW ( 1, 0 ); | ||
| 1420 | } | ||
| 1421 | } | ||
| 1422 | |||
| 1423 | if (verbosity >= 3) | ||
| 1424 | fprintf ( stderr, "code lengths %d, ", bytesOut-nBytes ); | ||
| 1425 | 409 | ||
| 1426 | /*--- And finally, the block data proper ---*/ | ||
| 1427 | nBytes = bytesOut; | ||
| 1428 | selCtr = 0; | ||
| 1429 | gs = 0; | ||
| 1430 | while (True) { | 410 | while (True) { |
| 1431 | if (gs >= nMTF) break; | ||
| 1432 | ge = gs + G_SIZE - 1; | ||
| 1433 | if (ge >= nMTF) ge = nMTF-1; | ||
| 1434 | for (i = gs; i <= ge; i++) { | ||
| 1435 | #if DEBUG | ||
| 1436 | assert (selector[selCtr] < nGroups); | ||
| 1437 | #endif | ||
| 1438 | bsW ( len [selector[selCtr]] [szptr[i]], | ||
| 1439 | code [selector[selCtr]] [szptr[i]] ); | ||
| 1440 | } | ||
| 1441 | 411 | ||
| 1442 | gs = ge+1; | 412 | bzf = bzReadOpen ( |
| 1443 | selCtr++; | 413 | &bzerr, zStream, verbosity, |
| 1444 | } | 414 | (int)smallMode, unused, nUnused |
| 1445 | if (!(selCtr == nSelectors)) panic ( "sendMTFValues(5)" ); | 415 | ); |
| 1446 | 416 | if (bzf == NULL || bzerr != BZ_OK) goto errhandler; | |
| 1447 | if (verbosity >= 3) | 417 | streamNo++; |
| 1448 | fprintf ( stderr, "codes %d\n", bytesOut-nBytes ); | 418 | |
| 1449 | } | 419 | while (bzerr == BZ_OK) { |
| 1450 | 420 | nread = bzRead ( &bzerr, bzf, obuf, 5000 ); | |
| 1451 | 421 | if (bzerr == BZ_DATA_ERROR_MAGIC) goto errhandler; | |
| 1452 | /*---------------------------------------------*/ | 422 | if ((bzerr == BZ_OK || bzerr == BZ_STREAM_END) && nread > 0) |
| 1453 | void moveToFrontCodeAndSend ( void ) | 423 | fwrite ( obuf, sizeof(UChar), nread, stream ); |
| 1454 | { | 424 | if (ferror(stream)) goto errhandler_io; |
| 1455 | bsPutIntVS ( 24, origPtr ); | ||
| 1456 | generateMTFValues(); | ||
| 1457 | sendMTFValues(); | ||
| 1458 | } | ||
| 1459 | |||
| 1460 | |||
| 1461 | /*---------------------------------------------*/ | ||
| 1462 | void recvDecodingTables ( void ) | ||
| 1463 | { | ||
| 1464 | Int32 i, j, t, nGroups, nSelectors, alphaSize; | ||
| 1465 | Int32 minLen, maxLen; | ||
| 1466 | Bool inUse16[16]; | ||
| 1467 | |||
| 1468 | /*--- Receive the mapping table ---*/ | ||
| 1469 | for (i = 0; i < 16; i++) | ||
| 1470 | if (bsR(1) == 1) | ||
| 1471 | inUse16[i] = True; else | ||
| 1472 | inUse16[i] = False; | ||
| 1473 | |||
| 1474 | for (i = 0; i < 256; i++) inUse[i] = False; | ||
| 1475 | |||
| 1476 | for (i = 0; i < 16; i++) | ||
| 1477 | if (inUse16[i]) | ||
| 1478 | for (j = 0; j < 16; j++) | ||
| 1479 | if (bsR(1) == 1) inUse[i * 16 + j] = True; | ||
| 1480 | |||
| 1481 | makeMaps(); | ||
| 1482 | alphaSize = nInUse+2; | ||
| 1483 | |||
| 1484 | /*--- Now the selectors ---*/ | ||
| 1485 | nGroups = bsR ( 3 ); | ||
| 1486 | nSelectors = bsR ( 15 ); | ||
| 1487 | for (i = 0; i < nSelectors; i++) { | ||
| 1488 | j = 0; | ||
| 1489 | while (bsR(1) == 1) j++; | ||
| 1490 | selectorMtf[i] = j; | ||
| 1491 | } | ||
| 1492 | |||
| 1493 | /*--- Undo the MTF values for the selectors. ---*/ | ||
| 1494 | { | ||
| 1495 | UChar pos[N_GROUPS], tmp, v; | ||
| 1496 | for (v = 0; v < nGroups; v++) pos[v] = v; | ||
| 1497 | |||
| 1498 | for (i = 0; i < nSelectors; i++) { | ||
| 1499 | v = selectorMtf[i]; | ||
| 1500 | tmp = pos[v]; | ||
| 1501 | while (v > 0) { pos[v] = pos[v-1]; v--; } | ||
| 1502 | pos[0] = tmp; | ||
| 1503 | selector[i] = tmp; | ||
| 1504 | } | 425 | } |
| 1505 | } | 426 | if (bzerr != BZ_STREAM_END) goto errhandler; |
| 1506 | |||
| 1507 | /*--- Now the coding tables ---*/ | ||
| 1508 | for (t = 0; t < nGroups; t++) { | ||
| 1509 | Int32 curr = bsR ( 5 ); | ||
| 1510 | for (i = 0; i < alphaSize; i++) { | ||
| 1511 | while (bsR(1) == 1) { | ||
| 1512 | if (bsR(1) == 0) curr++; else curr--; | ||
| 1513 | } | ||
| 1514 | len[t][i] = curr; | ||
| 1515 | } | ||
| 1516 | } | ||
| 1517 | |||
| 1518 | /*--- Create the Huffman decoding tables ---*/ | ||
| 1519 | for (t = 0; t < nGroups; t++) { | ||
| 1520 | minLen = 32; | ||
| 1521 | maxLen = 0; | ||
| 1522 | for (i = 0; i < alphaSize; i++) { | ||
| 1523 | if (len[t][i] > maxLen) maxLen = len[t][i]; | ||
| 1524 | if (len[t][i] < minLen) minLen = len[t][i]; | ||
| 1525 | } | ||
| 1526 | hbCreateDecodeTables ( | ||
| 1527 | &limit[t][0], &base[t][0], &perm[t][0], &len[t][0], | ||
| 1528 | minLen, maxLen, alphaSize | ||
| 1529 | ); | ||
| 1530 | minLens[t] = minLen; | ||
| 1531 | } | ||
| 1532 | } | ||
| 1533 | |||
| 1534 | |||
| 1535 | /*---------------------------------------------*/ | ||
| 1536 | #define GET_MTF_VAL(lval) \ | ||
| 1537 | { \ | ||
| 1538 | Int32 zt, zn, zvec, zj; \ | ||
| 1539 | if (groupPos == 0) { \ | ||
| 1540 | groupNo++; \ | ||
| 1541 | groupPos = G_SIZE; \ | ||
| 1542 | } \ | ||
| 1543 | groupPos--; \ | ||
| 1544 | zt = selector[groupNo]; \ | ||
| 1545 | zn = minLens[zt]; \ | ||
| 1546 | zvec = bsR ( zn ); \ | ||
| 1547 | while (zvec > limit[zt][zn]) { \ | ||
| 1548 | zn++; bsR1(zj); \ | ||
| 1549 | zvec = (zvec << 1) | zj; \ | ||
| 1550 | }; \ | ||
| 1551 | lval = perm[zt][zvec - base[zt][zn]]; \ | ||
| 1552 | } | ||
| 1553 | |||
| 1554 | |||
| 1555 | /*---------------------------------------------*/ | ||
| 1556 | void getAndMoveToFrontDecode ( void ) | ||
| 1557 | { | ||
| 1558 | UChar yy[256]; | ||
| 1559 | Int32 i, j, nextSym, limitLast; | ||
| 1560 | Int32 EOB, groupNo, groupPos; | ||
| 1561 | |||
| 1562 | limitLast = 100000 * blockSize100k; | ||
| 1563 | origPtr = bsGetIntVS ( 24 ); | ||
| 1564 | |||
| 1565 | recvDecodingTables(); | ||
| 1566 | EOB = nInUse+1; | ||
| 1567 | groupNo = -1; | ||
| 1568 | groupPos = 0; | ||
| 1569 | |||
| 1570 | /*-- | ||
| 1571 | Setting up the unzftab entries here is not strictly | ||
| 1572 | necessary, but it does save having to do it later | ||
| 1573 | in a separate pass, and so saves a block's worth of | ||
| 1574 | cache misses. | ||
| 1575 | --*/ | ||
| 1576 | for (i = 0; i <= 255; i++) unzftab[i] = 0; | ||
| 1577 | |||
| 1578 | for (i = 0; i <= 255; i++) yy[i] = (UChar) i; | ||
| 1579 | |||
| 1580 | last = -1; | ||
| 1581 | 427 | ||
| 1582 | GET_MTF_VAL(nextSym); | 428 | bzReadGetUnused ( &bzerr, bzf, (void**)(&unusedTmp), &nUnused ); |
| 429 | if (bzerr != BZ_OK) panic ( "decompress:bzReadGetUnused" ); | ||
| 1583 | 430 | ||
| 1584 | while (True) { | 431 | for (i = 0; i < nUnused; i++) unused[i] = unusedTmp[i]; |
| 1585 | |||
| 1586 | if (nextSym == EOB) break; | ||
| 1587 | |||
| 1588 | if (nextSym == RUNA || nextSym == RUNB) { | ||
| 1589 | UChar ch; | ||
| 1590 | Int32 s = -1; | ||
| 1591 | Int32 N = 1; | ||
| 1592 | do { | ||
| 1593 | if (nextSym == RUNA) s = s + (0+1) * N; else | ||
| 1594 | if (nextSym == RUNB) s = s + (1+1) * N; | ||
| 1595 | N = N * 2; | ||
| 1596 | GET_MTF_VAL(nextSym); | ||
| 1597 | } | ||
| 1598 | while (nextSym == RUNA || nextSym == RUNB); | ||
| 1599 | 432 | ||
| 1600 | s++; | 433 | bzReadClose ( &bzerr, bzf ); |
| 1601 | ch = seqToUnseq[yy[0]]; | 434 | if (bzerr != BZ_OK) panic ( "decompress:bzReadGetUnused" ); |
| 1602 | unzftab[ch] += s; | ||
| 1603 | 435 | ||
| 1604 | if (smallMode) | 436 | if (nUnused == 0 && myfeof(zStream)) break; |
| 1605 | while (s > 0) { | ||
| 1606 | last++; | ||
| 1607 | ll16[last] = ch; | ||
| 1608 | s--; | ||
| 1609 | } | ||
| 1610 | else | ||
| 1611 | while (s > 0) { | ||
| 1612 | last++; | ||
| 1613 | ll8[last] = ch; | ||
| 1614 | s--; | ||
| 1615 | }; | ||
| 1616 | |||
| 1617 | if (last >= limitLast) blockOverrun(); | ||
| 1618 | continue; | ||
| 1619 | |||
| 1620 | } else { | ||
| 1621 | |||
| 1622 | UChar tmp; | ||
| 1623 | last++; if (last >= limitLast) blockOverrun(); | ||
| 1624 | |||
| 1625 | tmp = yy[nextSym-1]; | ||
| 1626 | unzftab[seqToUnseq[tmp]]++; | ||
| 1627 | if (smallMode) | ||
| 1628 | ll16[last] = seqToUnseq[tmp]; else | ||
| 1629 | ll8[last] = seqToUnseq[tmp]; | ||
| 1630 | |||
| 1631 | /*-- | ||
| 1632 | This loop is hammered during decompression, | ||
| 1633 | hence the unrolling. | ||
| 1634 | |||
| 1635 | for (j = nextSym-1; j > 0; j--) yy[j] = yy[j-1]; | ||
| 1636 | --*/ | ||
| 1637 | |||
| 1638 | j = nextSym-1; | ||
| 1639 | for (; j > 3; j -= 4) { | ||
| 1640 | yy[j] = yy[j-1]; | ||
| 1641 | yy[j-1] = yy[j-2]; | ||
| 1642 | yy[j-2] = yy[j-3]; | ||
| 1643 | yy[j-3] = yy[j-4]; | ||
| 1644 | } | ||
| 1645 | for (; j > 0; j--) yy[j] = yy[j-1]; | ||
| 1646 | 437 | ||
| 1647 | yy[0] = tmp; | ||
| 1648 | GET_MTF_VAL(nextSym); | ||
| 1649 | continue; | ||
| 1650 | } | ||
| 1651 | } | 438 | } |
| 1652 | } | ||
| 1653 | |||
| 1654 | |||
| 1655 | /*---------------------------------------------------*/ | ||
| 1656 | /*--- Block-sorting machinery ---*/ | ||
| 1657 | /*---------------------------------------------------*/ | ||
| 1658 | 439 | ||
| 1659 | /*---------------------------------------------*/ | 440 | if (ferror(zStream)) goto errhandler_io; |
| 1660 | /*-- | 441 | ret = fclose ( zStream ); |
| 1661 | Compare two strings in block. We assume (see | 442 | if (ret == EOF) goto errhandler_io; |
| 1662 | discussion above) that i1 and i2 have a max | ||
| 1663 | offset of 10 on entry, and that the first | ||
| 1664 | bytes of both block and quadrant have been | ||
| 1665 | copied into the "overshoot area", ie | ||
| 1666 | into the subscript range | ||
| 1667 | [last+1 .. last+NUM_OVERSHOOT_BYTES]. | ||
| 1668 | --*/ | ||
| 1669 | INLINE Bool fullGtU ( Int32 i1, Int32 i2 ) | ||
| 1670 | { | ||
| 1671 | Int32 k; | ||
| 1672 | UChar c1, c2; | ||
| 1673 | UInt16 s1, s2; | ||
| 1674 | |||
| 1675 | #if DEBUG | ||
| 1676 | /*-- | ||
| 1677 | shellsort shouldn't ask to compare | ||
| 1678 | something with itself. | ||
| 1679 | --*/ | ||
| 1680 | assert (i1 != i2); | ||
| 1681 | #endif | ||
| 1682 | |||
| 1683 | c1 = block[i1]; | ||
| 1684 | c2 = block[i2]; | ||
| 1685 | if (c1 != c2) return (c1 > c2); | ||
| 1686 | i1++; i2++; | ||
| 1687 | |||
| 1688 | c1 = block[i1]; | ||
| 1689 | c2 = block[i2]; | ||
| 1690 | if (c1 != c2) return (c1 > c2); | ||
| 1691 | i1++; i2++; | ||
| 1692 | |||
| 1693 | c1 = block[i1]; | ||
| 1694 | c2 = block[i2]; | ||
| 1695 | if (c1 != c2) return (c1 > c2); | ||
| 1696 | i1++; i2++; | ||
| 1697 | |||
| 1698 | c1 = block[i1]; | ||
| 1699 | c2 = block[i2]; | ||
| 1700 | if (c1 != c2) return (c1 > c2); | ||
| 1701 | i1++; i2++; | ||
| 1702 | |||
| 1703 | c1 = block[i1]; | ||
| 1704 | c2 = block[i2]; | ||
| 1705 | if (c1 != c2) return (c1 > c2); | ||
| 1706 | i1++; i2++; | ||
| 1707 | |||
| 1708 | c1 = block[i1]; | ||
| 1709 | c2 = block[i2]; | ||
| 1710 | if (c1 != c2) return (c1 > c2); | ||
| 1711 | i1++; i2++; | ||
| 1712 | |||
| 1713 | k = last + 1; | ||
| 1714 | |||
| 1715 | do { | ||
| 1716 | |||
| 1717 | c1 = block[i1]; | ||
| 1718 | c2 = block[i2]; | ||
| 1719 | if (c1 != c2) return (c1 > c2); | ||
| 1720 | s1 = quadrant[i1]; | ||
| 1721 | s2 = quadrant[i2]; | ||
| 1722 | if (s1 != s2) return (s1 > s2); | ||
| 1723 | i1++; i2++; | ||
| 1724 | |||
| 1725 | c1 = block[i1]; | ||
| 1726 | c2 = block[i2]; | ||
| 1727 | if (c1 != c2) return (c1 > c2); | ||
| 1728 | s1 = quadrant[i1]; | ||
| 1729 | s2 = quadrant[i2]; | ||
| 1730 | if (s1 != s2) return (s1 > s2); | ||
| 1731 | i1++; i2++; | ||
| 1732 | |||
| 1733 | c1 = block[i1]; | ||
| 1734 | c2 = block[i2]; | ||
| 1735 | if (c1 != c2) return (c1 > c2); | ||
| 1736 | s1 = quadrant[i1]; | ||
| 1737 | s2 = quadrant[i2]; | ||
| 1738 | if (s1 != s2) return (s1 > s2); | ||
| 1739 | i1++; i2++; | ||
| 1740 | |||
| 1741 | c1 = block[i1]; | ||
| 1742 | c2 = block[i2]; | ||
| 1743 | if (c1 != c2) return (c1 > c2); | ||
| 1744 | s1 = quadrant[i1]; | ||
| 1745 | s2 = quadrant[i2]; | ||
| 1746 | if (s1 != s2) return (s1 > s2); | ||
| 1747 | i1++; i2++; | ||
| 1748 | |||
| 1749 | if (i1 > last) { i1 -= last; i1--; }; | ||
| 1750 | if (i2 > last) { i2 -= last; i2--; }; | ||
| 1751 | |||
| 1752 | k -= 4; | ||
| 1753 | workDone++; | ||
| 1754 | } | ||
| 1755 | while (k >= 0); | ||
| 1756 | 443 | ||
| 1757 | return False; | 444 | if (ferror(stream)) goto errhandler_io; |
| 1758 | } | 445 | ret = fflush ( stream ); |
| 1759 | 446 | if (ret != 0) goto errhandler_io; | |
| 1760 | /*---------------------------------------------*/ | 447 | if (stream != stdout) { |
| 1761 | /*-- | 448 | ret = fclose ( stream ); |
| 1762 | Knuth's increments seem to work better | 449 | if (ret == EOF) goto errhandler_io; |
| 1763 | than Incerpi-Sedgewick here. Possibly | ||
| 1764 | because the number of elems to sort is | ||
| 1765 | usually small, typically <= 20. | ||
| 1766 | --*/ | ||
| 1767 | Int32 incs[14] = { 1, 4, 13, 40, 121, 364, 1093, 3280, | ||
| 1768 | 9841, 29524, 88573, 265720, | ||
| 1769 | 797161, 2391484 }; | ||
| 1770 | |||
| 1771 | void simpleSort ( Int32 lo, Int32 hi, Int32 d ) | ||
| 1772 | { | ||
| 1773 | Int32 i, j, h, bigN, hp; | ||
| 1774 | Int32 v; | ||
| 1775 | |||
| 1776 | bigN = hi - lo + 1; | ||
| 1777 | if (bigN < 2) return; | ||
| 1778 | |||
| 1779 | hp = 0; | ||
| 1780 | while (incs[hp] < bigN) hp++; | ||
| 1781 | hp--; | ||
| 1782 | |||
| 1783 | for (; hp >= 0; hp--) { | ||
| 1784 | h = incs[hp]; | ||
| 1785 | if (verbosity >= 5) | ||
| 1786 | fprintf ( stderr, " shell increment %d\n", h ); | ||
| 1787 | |||
| 1788 | i = lo + h; | ||
| 1789 | while (True) { | ||
| 1790 | |||
| 1791 | /*-- copy 1 --*/ | ||
| 1792 | if (i > hi) break; | ||
| 1793 | v = zptr[i]; | ||
| 1794 | j = i; | ||
| 1795 | while ( fullGtU ( zptr[j-h]+d, v+d ) ) { | ||
| 1796 | zptr[j] = zptr[j-h]; | ||
| 1797 | j = j - h; | ||
| 1798 | if (j <= (lo + h - 1)) break; | ||
| 1799 | } | ||
| 1800 | zptr[j] = v; | ||
| 1801 | i++; | ||
| 1802 | |||
| 1803 | /*-- copy 2 --*/ | ||
| 1804 | if (i > hi) break; | ||
| 1805 | v = zptr[i]; | ||
| 1806 | j = i; | ||
| 1807 | while ( fullGtU ( zptr[j-h]+d, v+d ) ) { | ||
| 1808 | zptr[j] = zptr[j-h]; | ||
| 1809 | j = j - h; | ||
| 1810 | if (j <= (lo + h - 1)) break; | ||
| 1811 | } | ||
| 1812 | zptr[j] = v; | ||
| 1813 | i++; | ||
| 1814 | |||
| 1815 | /*-- copy 3 --*/ | ||
| 1816 | if (i > hi) break; | ||
| 1817 | v = zptr[i]; | ||
| 1818 | j = i; | ||
| 1819 | while ( fullGtU ( zptr[j-h]+d, v+d ) ) { | ||
| 1820 | zptr[j] = zptr[j-h]; | ||
| 1821 | j = j - h; | ||
| 1822 | if (j <= (lo + h - 1)) break; | ||
| 1823 | } | ||
| 1824 | zptr[j] = v; | ||
| 1825 | i++; | ||
| 1826 | |||
| 1827 | if (workDone > workLimit && firstAttempt) return; | ||
| 1828 | } | ||
| 1829 | } | ||
| 1830 | } | ||
| 1831 | |||
| 1832 | |||
| 1833 | /*---------------------------------------------*/ | ||
| 1834 | /*-- | ||
| 1835 | The following is an implementation of | ||
| 1836 | an elegant 3-way quicksort for strings, | ||
| 1837 | described in a paper "Fast Algorithms for | ||
| 1838 | Sorting and Searching Strings", by Robert | ||
| 1839 | Sedgewick and Jon L. Bentley. | ||
| 1840 | --*/ | ||
| 1841 | |||
| 1842 | #define swap(lv1, lv2) \ | ||
| 1843 | { Int32 tmp = lv1; lv1 = lv2; lv2 = tmp; } | ||
| 1844 | |||
| 1845 | INLINE void vswap ( Int32 p1, Int32 p2, Int32 n ) | ||
| 1846 | { | ||
| 1847 | while (n > 0) { | ||
| 1848 | swap(zptr[p1], zptr[p2]); | ||
| 1849 | p1++; p2++; n--; | ||
| 1850 | } | ||
| 1851 | } | ||
| 1852 | |||
| 1853 | INLINE UChar med3 ( UChar a, UChar b, UChar c ) | ||
| 1854 | { | ||
| 1855 | UChar t; | ||
| 1856 | if (a > b) { t = a; a = b; b = t; }; | ||
| 1857 | if (b > c) { t = b; b = c; c = t; }; | ||
| 1858 | if (a > b) b = a; | ||
| 1859 | return b; | ||
| 1860 | } | ||
| 1861 | |||
| 1862 | |||
| 1863 | #define min(a,b) ((a) < (b)) ? (a) : (b) | ||
| 1864 | |||
| 1865 | typedef | ||
| 1866 | struct { Int32 ll; Int32 hh; Int32 dd; } | ||
| 1867 | StackElem; | ||
| 1868 | |||
| 1869 | #define push(lz,hz,dz) { stack[sp].ll = lz; \ | ||
| 1870 | stack[sp].hh = hz; \ | ||
| 1871 | stack[sp].dd = dz; \ | ||
| 1872 | sp++; } | ||
| 1873 | |||
| 1874 | #define pop(lz,hz,dz) { sp--; \ | ||
| 1875 | lz = stack[sp].ll; \ | ||
| 1876 | hz = stack[sp].hh; \ | ||
| 1877 | dz = stack[sp].dd; } | ||
| 1878 | |||
| 1879 | #define SMALL_THRESH 20 | ||
| 1880 | #define DEPTH_THRESH 10 | ||
| 1881 | |||
| 1882 | /*-- | ||
| 1883 | If you are ever unlucky/improbable enough | ||
| 1884 | to get a stack overflow whilst sorting, | ||
| 1885 | increase the following constant and try | ||
| 1886 | again. In practice I have never seen the | ||
| 1887 | stack go above 27 elems, so the following | ||
| 1888 | limit seems very generous. | ||
| 1889 | --*/ | ||
| 1890 | #define QSORT_STACK_SIZE 1000 | ||
| 1891 | |||
| 1892 | |||
| 1893 | void qSort3 ( Int32 loSt, Int32 hiSt, Int32 dSt ) | ||
| 1894 | { | ||
| 1895 | Int32 unLo, unHi, ltLo, gtHi, med, n, m; | ||
| 1896 | Int32 sp, lo, hi, d; | ||
| 1897 | StackElem stack[QSORT_STACK_SIZE]; | ||
| 1898 | |||
| 1899 | sp = 0; | ||
| 1900 | push ( loSt, hiSt, dSt ); | ||
| 1901 | |||
| 1902 | while (sp > 0) { | ||
| 1903 | |||
| 1904 | if (sp >= QSORT_STACK_SIZE) panic ( "stack overflow in qSort3" ); | ||
| 1905 | |||
| 1906 | pop ( lo, hi, d ); | ||
| 1907 | |||
| 1908 | if (hi - lo < SMALL_THRESH || d > DEPTH_THRESH) { | ||
| 1909 | simpleSort ( lo, hi, d ); | ||
| 1910 | if (workDone > workLimit && firstAttempt) return; | ||
| 1911 | continue; | ||
| 1912 | } | ||
| 1913 | |||
| 1914 | med = med3 ( block[zptr[ lo ]+d], | ||
| 1915 | block[zptr[ hi ]+d], | ||
| 1916 | block[zptr[ (lo+hi)>>1 ]+d] ); | ||
| 1917 | |||
| 1918 | unLo = ltLo = lo; | ||
| 1919 | unHi = gtHi = hi; | ||
| 1920 | |||
| 1921 | while (True) { | ||
| 1922 | while (True) { | ||
| 1923 | if (unLo > unHi) break; | ||
| 1924 | n = ((Int32)block[zptr[unLo]+d]) - med; | ||
| 1925 | if (n == 0) { swap(zptr[unLo], zptr[ltLo]); ltLo++; unLo++; continue; }; | ||
| 1926 | if (n > 0) break; | ||
| 1927 | unLo++; | ||
| 1928 | } | ||
| 1929 | while (True) { | ||
| 1930 | if (unLo > unHi) break; | ||
| 1931 | n = ((Int32)block[zptr[unHi]+d]) - med; | ||
| 1932 | if (n == 0) { swap(zptr[unHi], zptr[gtHi]); gtHi--; unHi--; continue; }; | ||
| 1933 | if (n < 0) break; | ||
| 1934 | unHi--; | ||
| 1935 | } | ||
| 1936 | if (unLo > unHi) break; | ||
| 1937 | swap(zptr[unLo], zptr[unHi]); unLo++; unHi--; | ||
| 1938 | } | ||
| 1939 | #if DEBUG | ||
| 1940 | assert (unHi == unLo-1); | ||
| 1941 | #endif | ||
| 1942 | |||
| 1943 | if (gtHi < ltLo) { | ||
| 1944 | push(lo, hi, d+1 ); | ||
| 1945 | continue; | ||
| 1946 | } | ||
| 1947 | |||
| 1948 | n = min(ltLo-lo, unLo-ltLo); vswap(lo, unLo-n, n); | ||
| 1949 | m = min(hi-gtHi, gtHi-unHi); vswap(unLo, hi-m+1, m); | ||
| 1950 | |||
| 1951 | n = lo + unLo - ltLo - 1; | ||
| 1952 | m = hi - (gtHi - unHi) + 1; | ||
| 1953 | |||
| 1954 | push ( lo, n, d ); | ||
| 1955 | push ( n+1, m-1, d+1 ); | ||
| 1956 | push ( m, hi, d ); | ||
| 1957 | } | ||
| 1958 | } | ||
| 1959 | |||
| 1960 | |||
| 1961 | /*---------------------------------------------*/ | ||
| 1962 | |||
| 1963 | #define BIGFREQ(b) (ftab[((b)+1) << 8] - ftab[(b) << 8]) | ||
| 1964 | |||
| 1965 | #define SETMASK (1 << 21) | ||
| 1966 | #define CLEARMASK (~(SETMASK)) | ||
| 1967 | |||
| 1968 | void sortIt ( void ) | ||
| 1969 | { | ||
| 1970 | Int32 i, j, ss, sb; | ||
| 1971 | Int32 runningOrder[256]; | ||
| 1972 | Int32 copy[256]; | ||
| 1973 | Bool bigDone[256]; | ||
| 1974 | UChar c1, c2; | ||
| 1975 | Int32 numQSorted; | ||
| 1976 | |||
| 1977 | /*-- | ||
| 1978 | In the various block-sized structures, live data runs | ||
| 1979 | from 0 to last+NUM_OVERSHOOT_BYTES inclusive. First, | ||
| 1980 | set up the overshoot area for block. | ||
| 1981 | --*/ | ||
| 1982 | |||
| 1983 | if (verbosity >= 4) fprintf ( stderr, " sort initialise ...\n" ); | ||
| 1984 | for (i = 0; i < NUM_OVERSHOOT_BYTES; i++) | ||
| 1985 | block[last+i+1] = block[i % (last+1)]; | ||
| 1986 | for (i = 0; i <= last+NUM_OVERSHOOT_BYTES; i++) | ||
| 1987 | quadrant[i] = 0; | ||
| 1988 | |||
| 1989 | block[-1] = block[last]; | ||
| 1990 | |||
| 1991 | if (last < 4000) { | ||
| 1992 | |||
| 1993 | /*-- | ||
| 1994 | Use simpleSort(), since the full sorting mechanism | ||
| 1995 | has quite a large constant overhead. | ||
| 1996 | --*/ | ||
| 1997 | if (verbosity >= 4) fprintf ( stderr, " simpleSort ...\n" ); | ||
| 1998 | for (i = 0; i <= last; i++) zptr[i] = i; | ||
| 1999 | firstAttempt = False; | ||
| 2000 | workDone = workLimit = 0; | ||
| 2001 | simpleSort ( 0, last, 0 ); | ||
| 2002 | if (verbosity >= 4) fprintf ( stderr, " simpleSort done.\n" ); | ||
| 2003 | |||
| 2004 | } else { | ||
| 2005 | |||
| 2006 | numQSorted = 0; | ||
| 2007 | for (i = 0; i <= 255; i++) bigDone[i] = False; | ||
| 2008 | |||
| 2009 | if (verbosity >= 4) fprintf ( stderr, " bucket sorting ...\n" ); | ||
| 2010 | |||
| 2011 | for (i = 0; i <= 65536; i++) ftab[i] = 0; | ||
| 2012 | |||
| 2013 | c1 = block[-1]; | ||
| 2014 | for (i = 0; i <= last; i++) { | ||
| 2015 | c2 = block[i]; | ||
| 2016 | ftab[(c1 << 8) + c2]++; | ||
| 2017 | c1 = c2; | ||
| 2018 | } | ||
| 2019 | |||
| 2020 | for (i = 1; i <= 65536; i++) ftab[i] += ftab[i-1]; | ||
| 2021 | |||
| 2022 | c1 = block[0]; | ||
| 2023 | for (i = 0; i < last; i++) { | ||
| 2024 | c2 = block[i+1]; | ||
| 2025 | j = (c1 << 8) + c2; | ||
| 2026 | c1 = c2; | ||
| 2027 | ftab[j]--; | ||
| 2028 | zptr[ftab[j]] = i; | ||
| 2029 | } | ||
| 2030 | j = (block[last] << 8) + block[0]; | ||
| 2031 | ftab[j]--; | ||
| 2032 | zptr[ftab[j]] = last; | ||
| 2033 | |||
| 2034 | /*-- | ||
| 2035 | Now ftab contains the first loc of every small bucket. | ||
| 2036 | Calculate the running order, from smallest to largest | ||
| 2037 | big bucket. | ||
| 2038 | --*/ | ||
| 2039 | |||
| 2040 | for (i = 0; i <= 255; i++) runningOrder[i] = i; | ||
| 2041 | |||
| 2042 | { | ||
| 2043 | Int32 vv; | ||
| 2044 | Int32 h = 1; | ||
| 2045 | do h = 3 * h + 1; while (h <= 256); | ||
| 2046 | do { | ||
| 2047 | h = h / 3; | ||
| 2048 | for (i = h; i <= 255; i++) { | ||
| 2049 | vv = runningOrder[i]; | ||
| 2050 | j = i; | ||
| 2051 | while ( BIGFREQ(runningOrder[j-h]) > BIGFREQ(vv) ) { | ||
| 2052 | runningOrder[j] = runningOrder[j-h]; | ||
| 2053 | j = j - h; | ||
| 2054 | if (j <= (h - 1)) goto zero; | ||
| 2055 | } | ||
| 2056 | zero: | ||
| 2057 | runningOrder[j] = vv; | ||
| 2058 | } | ||
| 2059 | } while (h != 1); | ||
| 2060 | } | ||
| 2061 | |||
| 2062 | /*-- | ||
| 2063 | The main sorting loop. | ||
| 2064 | --*/ | ||
| 2065 | |||
| 2066 | for (i = 0; i <= 255; i++) { | ||
| 2067 | |||
| 2068 | /*-- | ||
| 2069 | Process big buckets, starting with the least full. | ||
| 2070 | --*/ | ||
| 2071 | ss = runningOrder[i]; | ||
| 2072 | |||
| 2073 | /*-- | ||
| 2074 | Complete the big bucket [ss] by quicksorting | ||
| 2075 | any unsorted small buckets [ss, j]. Hopefully | ||
| 2076 | previous pointer-scanning phases have already | ||
| 2077 | completed many of the small buckets [ss, j], so | ||
| 2078 | we don't have to sort them at all. | ||
| 2079 | --*/ | ||
| 2080 | for (j = 0; j <= 255; j++) { | ||
| 2081 | sb = (ss << 8) + j; | ||
| 2082 | if ( ! (ftab[sb] & SETMASK) ) { | ||
| 2083 | Int32 lo = ftab[sb] & CLEARMASK; | ||
| 2084 | Int32 hi = (ftab[sb+1] & CLEARMASK) - 1; | ||
| 2085 | if (hi > lo) { | ||
| 2086 | if (verbosity >= 4) | ||
| 2087 | fprintf ( stderr, | ||
| 2088 | " qsort [0x%x, 0x%x] done %d this %d\n", | ||
| 2089 | ss, j, numQSorted, hi - lo + 1 ); | ||
| 2090 | qSort3 ( lo, hi, 2 ); | ||
| 2091 | numQSorted += ( hi - lo + 1 ); | ||
| 2092 | if (workDone > workLimit && firstAttempt) return; | ||
| 2093 | } | ||
| 2094 | ftab[sb] |= SETMASK; | ||
| 2095 | } | ||
| 2096 | } | ||
| 2097 | |||
| 2098 | /*-- | ||
| 2099 | The ss big bucket is now done. Record this fact, | ||
| 2100 | and update the quadrant descriptors. Remember to | ||
| 2101 | update quadrants in the overshoot area too, if | ||
| 2102 | necessary. The "if (i < 255)" test merely skips | ||
| 2103 | this updating for the last bucket processed, since | ||
| 2104 | updating for the last bucket is pointless. | ||
| 2105 | --*/ | ||
| 2106 | bigDone[ss] = True; | ||
| 2107 | |||
| 2108 | if (i < 255) { | ||
| 2109 | Int32 bbStart = ftab[ss << 8] & CLEARMASK; | ||
| 2110 | Int32 bbSize = (ftab[(ss+1) << 8] & CLEARMASK) - bbStart; | ||
| 2111 | Int32 shifts = 0; | ||
| 2112 | |||
| 2113 | while ((bbSize >> shifts) > 65534) shifts++; | ||
| 2114 | |||
| 2115 | for (j = 0; j < bbSize; j++) { | ||
| 2116 | Int32 a2update = zptr[bbStart + j]; | ||
| 2117 | UInt16 qVal = (UInt16)(j >> shifts); | ||
| 2118 | quadrant[a2update] = qVal; | ||
| 2119 | if (a2update < NUM_OVERSHOOT_BYTES) | ||
| 2120 | quadrant[a2update + last + 1] = qVal; | ||
| 2121 | } | ||
| 2122 | |||
| 2123 | if (! ( ((bbSize-1) >> shifts) <= 65535 )) panic ( "sortIt" ); | ||
| 2124 | } | ||
| 2125 | |||
| 2126 | /*-- | ||
| 2127 | Now scan this big bucket so as to synthesise the | ||
| 2128 | sorted order for small buckets [t, ss] for all t != ss. | ||
| 2129 | --*/ | ||
| 2130 | for (j = 0; j <= 255; j++) | ||
| 2131 | copy[j] = ftab[(j << 8) + ss] & CLEARMASK; | ||
| 2132 | |||
| 2133 | for (j = ftab[ss << 8] & CLEARMASK; | ||
| 2134 | j < (ftab[(ss+1) << 8] & CLEARMASK); | ||
| 2135 | j++) { | ||
| 2136 | c1 = block[zptr[j]-1]; | ||
| 2137 | if ( ! bigDone[c1] ) { | ||
| 2138 | zptr[copy[c1]] = zptr[j] == 0 ? last : zptr[j] - 1; | ||
| 2139 | copy[c1] ++; | ||
| 2140 | } | ||
| 2141 | } | ||
| 2142 | |||
| 2143 | for (j = 0; j <= 255; j++) ftab[(j << 8) + ss] |= SETMASK; | ||
| 2144 | } | ||
| 2145 | if (verbosity >= 4) | ||
| 2146 | fprintf ( stderr, " %d pointers, %d sorted, %d scanned\n", | ||
| 2147 | last+1, numQSorted, (last+1) - numQSorted ); | ||
| 2148 | } | ||
| 2149 | } | ||
| 2150 | |||
| 2151 | |||
| 2152 | /*---------------------------------------------------*/ | ||
| 2153 | /*--- Stuff for randomising repetitive blocks ---*/ | ||
| 2154 | /*---------------------------------------------------*/ | ||
| 2155 | |||
| 2156 | /*---------------------------------------------*/ | ||
| 2157 | Int32 rNums[512] = { | ||
| 2158 | 619, 720, 127, 481, 931, 816, 813, 233, 566, 247, | ||
| 2159 | 985, 724, 205, 454, 863, 491, 741, 242, 949, 214, | ||
| 2160 | 733, 859, 335, 708, 621, 574, 73, 654, 730, 472, | ||
| 2161 | 419, 436, 278, 496, 867, 210, 399, 680, 480, 51, | ||
| 2162 | 878, 465, 811, 169, 869, 675, 611, 697, 867, 561, | ||
| 2163 | 862, 687, 507, 283, 482, 129, 807, 591, 733, 623, | ||
| 2164 | 150, 238, 59, 379, 684, 877, 625, 169, 643, 105, | ||
| 2165 | 170, 607, 520, 932, 727, 476, 693, 425, 174, 647, | ||
| 2166 | 73, 122, 335, 530, 442, 853, 695, 249, 445, 515, | ||
| 2167 | 909, 545, 703, 919, 874, 474, 882, 500, 594, 612, | ||
| 2168 | 641, 801, 220, 162, 819, 984, 589, 513, 495, 799, | ||
| 2169 | 161, 604, 958, 533, 221, 400, 386, 867, 600, 782, | ||
| 2170 | 382, 596, 414, 171, 516, 375, 682, 485, 911, 276, | ||
| 2171 | 98, 553, 163, 354, 666, 933, 424, 341, 533, 870, | ||
| 2172 | 227, 730, 475, 186, 263, 647, 537, 686, 600, 224, | ||
| 2173 | 469, 68, 770, 919, 190, 373, 294, 822, 808, 206, | ||
| 2174 | 184, 943, 795, 384, 383, 461, 404, 758, 839, 887, | ||
| 2175 | 715, 67, 618, 276, 204, 918, 873, 777, 604, 560, | ||
| 2176 | 951, 160, 578, 722, 79, 804, 96, 409, 713, 940, | ||
| 2177 | 652, 934, 970, 447, 318, 353, 859, 672, 112, 785, | ||
| 2178 | 645, 863, 803, 350, 139, 93, 354, 99, 820, 908, | ||
| 2179 | 609, 772, 154, 274, 580, 184, 79, 626, 630, 742, | ||
| 2180 | 653, 282, 762, 623, 680, 81, 927, 626, 789, 125, | ||
| 2181 | 411, 521, 938, 300, 821, 78, 343, 175, 128, 250, | ||
| 2182 | 170, 774, 972, 275, 999, 639, 495, 78, 352, 126, | ||
| 2183 | 857, 956, 358, 619, 580, 124, 737, 594, 701, 612, | ||
| 2184 | 669, 112, 134, 694, 363, 992, 809, 743, 168, 974, | ||
| 2185 | 944, 375, 748, 52, 600, 747, 642, 182, 862, 81, | ||
| 2186 | 344, 805, 988, 739, 511, 655, 814, 334, 249, 515, | ||
| 2187 | 897, 955, 664, 981, 649, 113, 974, 459, 893, 228, | ||
| 2188 | 433, 837, 553, 268, 926, 240, 102, 654, 459, 51, | ||
| 2189 | 686, 754, 806, 760, 493, 403, 415, 394, 687, 700, | ||
| 2190 | 946, 670, 656, 610, 738, 392, 760, 799, 887, 653, | ||
| 2191 | 978, 321, 576, 617, 626, 502, 894, 679, 243, 440, | ||
| 2192 | 680, 879, 194, 572, 640, 724, 926, 56, 204, 700, | ||
| 2193 | 707, 151, 457, 449, 797, 195, 791, 558, 945, 679, | ||
| 2194 | 297, 59, 87, 824, 713, 663, 412, 693, 342, 606, | ||
| 2195 | 134, 108, 571, 364, 631, 212, 174, 643, 304, 329, | ||
| 2196 | 343, 97, 430, 751, 497, 314, 983, 374, 822, 928, | ||
| 2197 | 140, 206, 73, 263, 980, 736, 876, 478, 430, 305, | ||
| 2198 | 170, 514, 364, 692, 829, 82, 855, 953, 676, 246, | ||
| 2199 | 369, 970, 294, 750, 807, 827, 150, 790, 288, 923, | ||
| 2200 | 804, 378, 215, 828, 592, 281, 565, 555, 710, 82, | ||
| 2201 | 896, 831, 547, 261, 524, 462, 293, 465, 502, 56, | ||
| 2202 | 661, 821, 976, 991, 658, 869, 905, 758, 745, 193, | ||
| 2203 | 768, 550, 608, 933, 378, 286, 215, 979, 792, 961, | ||
| 2204 | 61, 688, 793, 644, 986, 403, 106, 366, 905, 644, | ||
| 2205 | 372, 567, 466, 434, 645, 210, 389, 550, 919, 135, | ||
| 2206 | 780, 773, 635, 389, 707, 100, 626, 958, 165, 504, | ||
| 2207 | 920, 176, 193, 713, 857, 265, 203, 50, 668, 108, | ||
| 2208 | 645, 990, 626, 197, 510, 357, 358, 850, 858, 364, | ||
| 2209 | 936, 638 | ||
| 2210 | }; | ||
| 2211 | |||
| 2212 | |||
| 2213 | #define RAND_DECLS \ | ||
| 2214 | Int32 rNToGo = 0; \ | ||
| 2215 | Int32 rTPos = 0; \ | ||
| 2216 | |||
| 2217 | #define RAND_MASK ((rNToGo == 1) ? 1 : 0) | ||
| 2218 | |||
| 2219 | #define RAND_UPD_MASK \ | ||
| 2220 | if (rNToGo == 0) { \ | ||
| 2221 | rNToGo = rNums[rTPos]; \ | ||
| 2222 | rTPos++; if (rTPos == 512) rTPos = 0; \ | ||
| 2223 | } \ | ||
| 2224 | rNToGo--; | ||
| 2225 | |||
| 2226 | |||
| 2227 | |||
| 2228 | /*---------------------------------------------------*/ | ||
| 2229 | /*--- The Reversible Transformation (tm) ---*/ | ||
| 2230 | /*---------------------------------------------------*/ | ||
| 2231 | |||
| 2232 | /*---------------------------------------------*/ | ||
| 2233 | void randomiseBlock ( void ) | ||
| 2234 | { | ||
| 2235 | Int32 i; | ||
| 2236 | RAND_DECLS; | ||
| 2237 | for (i = 0; i < 256; i++) inUse[i] = False; | ||
| 2238 | |||
| 2239 | for (i = 0; i <= last; i++) { | ||
| 2240 | RAND_UPD_MASK; | ||
| 2241 | block[i] ^= RAND_MASK; | ||
| 2242 | inUse[block[i]] = True; | ||
| 2243 | } | ||
| 2244 | } | ||
| 2245 | |||
| 2246 | |||
| 2247 | /*---------------------------------------------*/ | ||
| 2248 | void doReversibleTransformation ( void ) | ||
| 2249 | { | ||
| 2250 | Int32 i; | ||
| 2251 | |||
| 2252 | if (verbosity >= 2) fprintf ( stderr, "\n" ); | ||
| 2253 | |||
| 2254 | workLimit = workFactor * last; | ||
| 2255 | workDone = 0; | ||
| 2256 | blockRandomised = False; | ||
| 2257 | firstAttempt = True; | ||
| 2258 | |||
| 2259 | sortIt (); | ||
| 2260 | |||
| 2261 | if (verbosity >= 3) | ||
| 2262 | fprintf ( stderr, " %d work, %d block, ratio %5.2f\n", | ||
| 2263 | workDone, last, (float)workDone / (float)(last) ); | ||
| 2264 | |||
| 2265 | if (workDone > workLimit && firstAttempt) { | ||
| 2266 | if (verbosity >= 2) | ||
| 2267 | fprintf ( stderr, " sorting aborted; randomising block\n" ); | ||
| 2268 | randomiseBlock (); | ||
| 2269 | workLimit = workDone = 0; | ||
| 2270 | blockRandomised = True; | ||
| 2271 | firstAttempt = False; | ||
| 2272 | sortIt(); | ||
| 2273 | if (verbosity >= 3) | ||
| 2274 | fprintf ( stderr, " %d work, %d block, ratio %f\n", | ||
| 2275 | workDone, last, (float)workDone / (float)(last) ); | ||
| 2276 | } | ||
| 2277 | |||
| 2278 | origPtr = -1; | ||
| 2279 | for (i = 0; i <= last; i++) | ||
| 2280 | if (zptr[i] == 0) | ||
| 2281 | { origPtr = i; break; }; | ||
| 2282 | |||
| 2283 | if (origPtr == -1) panic ( "doReversibleTransformation" ); | ||
| 2284 | } | ||
| 2285 | |||
| 2286 | |||
| 2287 | /*---------------------------------------------*/ | ||
| 2288 | |||
| 2289 | INLINE Int32 indexIntoF ( Int32 indx, Int32 *cftab ) | ||
| 2290 | { | ||
| 2291 | Int32 nb, na, mid; | ||
| 2292 | nb = 0; | ||
| 2293 | na = 256; | ||
| 2294 | do { | ||
| 2295 | mid = (nb + na) >> 1; | ||
| 2296 | if (indx >= cftab[mid]) nb = mid; else na = mid; | ||
| 2297 | } | ||
| 2298 | while (na - nb != 1); | ||
| 2299 | return nb; | ||
| 2300 | } | ||
| 2301 | |||
| 2302 | |||
| 2303 | #define GET_SMALL(cccc) \ | ||
| 2304 | \ | ||
| 2305 | cccc = indexIntoF ( tPos, cftab ); \ | ||
| 2306 | tPos = GET_LL(tPos); | ||
| 2307 | |||
| 2308 | |||
| 2309 | void undoReversibleTransformation_small ( FILE* dst ) | ||
| 2310 | { | ||
| 2311 | Int32 cftab[257], cftabAlso[257]; | ||
| 2312 | Int32 i, j, tmp, tPos; | ||
| 2313 | UChar ch; | ||
| 2314 | |||
| 2315 | /*-- | ||
| 2316 | We assume here that the global array unzftab will | ||
| 2317 | already be holding the frequency counts for | ||
| 2318 | ll8[0 .. last]. | ||
| 2319 | --*/ | ||
| 2320 | |||
| 2321 | /*-- Set up cftab to facilitate generation of indexIntoF --*/ | ||
| 2322 | cftab[0] = 0; | ||
| 2323 | for (i = 1; i <= 256; i++) cftab[i] = unzftab[i-1]; | ||
| 2324 | for (i = 1; i <= 256; i++) cftab[i] += cftab[i-1]; | ||
| 2325 | |||
| 2326 | /*-- Make a copy of it, used in generation of T --*/ | ||
| 2327 | for (i = 0; i <= 256; i++) cftabAlso[i] = cftab[i]; | ||
| 2328 | |||
| 2329 | /*-- compute the T vector --*/ | ||
| 2330 | for (i = 0; i <= last; i++) { | ||
| 2331 | ch = (UChar)ll16[i]; | ||
| 2332 | SET_LL(i, cftabAlso[ch]); | ||
| 2333 | cftabAlso[ch]++; | ||
| 2334 | } | ||
| 2335 | |||
| 2336 | /*-- | ||
| 2337 | Compute T^(-1) by pointer reversal on T. This is rather | ||
| 2338 | subtle, in that, if the original block was two or more | ||
| 2339 | (in general, N) concatenated copies of the same thing, | ||
| 2340 | the T vector will consist of N cycles, each of length | ||
| 2341 | blocksize / N, and decoding will involve traversing one | ||
| 2342 | of these cycles N times. Which particular cycle doesn't | ||
| 2343 | matter -- they are all equivalent. The tricky part is to | ||
| 2344 | make sure that the pointer reversal creates a correct | ||
| 2345 | reversed cycle for us to traverse. So, the code below | ||
| 2346 | simply reverses whatever cycle origPtr happens to fall into, | ||
| 2347 | without regard to the cycle length. That gives one reversed | ||
| 2348 | cycle, which for normal blocks, is the entire block-size long. | ||
| 2349 | For repeated blocks, it will be interspersed with the other | ||
| 2350 | N-1 non-reversed cycles. Providing that the F-subscripting | ||
| 2351 | phase which follows starts at origPtr, all then works ok. | ||
| 2352 | --*/ | ||
| 2353 | i = origPtr; | ||
| 2354 | j = GET_LL(i); | ||
| 2355 | do { | ||
| 2356 | tmp = GET_LL(j); | ||
| 2357 | SET_LL(j, i); | ||
| 2358 | i = j; | ||
| 2359 | j = tmp; | ||
| 2360 | } | ||
| 2361 | while (i != origPtr); | ||
| 2362 | |||
| 2363 | /*-- | ||
| 2364 | We recreate the original by subscripting F through T^(-1). | ||
| 2365 | The run-length-decoder below requires characters incrementally, | ||
| 2366 | so tPos is set to a starting value, and is updated by | ||
| 2367 | the GET_SMALL macro. | ||
| 2368 | --*/ | ||
| 2369 | tPos = origPtr; | ||
| 2370 | |||
| 2371 | /*-------------------------------------------------*/ | ||
| 2372 | /*-- | ||
| 2373 | This is pretty much a verbatim copy of the | ||
| 2374 | run-length decoder present in the distribution | ||
| 2375 | bzip-0.21; it has to be here to avoid creating | ||
| 2376 | block[] as an intermediary structure. As in 0.21, | ||
| 2377 | this code derives from some sent to me by | ||
| 2378 | Christian von Roques. | ||
| 2379 | |||
| 2380 | It allows dst==NULL, so as to support the test (-t) | ||
| 2381 | option without slowing down the fast decompression | ||
| 2382 | code. | ||
| 2383 | --*/ | ||
| 2384 | { | ||
| 2385 | IntNative retVal; | ||
| 2386 | Int32 i2, count, chPrev, ch2; | ||
| 2387 | UInt32 localCrc; | ||
| 2388 | |||
| 2389 | count = 0; | ||
| 2390 | i2 = 0; | ||
| 2391 | ch2 = 256; /*-- not a char and not EOF --*/ | ||
| 2392 | localCrc = getGlobalCRC(); | ||
| 2393 | |||
| 2394 | { | ||
| 2395 | RAND_DECLS; | ||
| 2396 | while ( i2 <= last ) { | ||
| 2397 | chPrev = ch2; | ||
| 2398 | GET_SMALL(ch2); | ||
| 2399 | if (blockRandomised) { | ||
| 2400 | RAND_UPD_MASK; | ||
| 2401 | ch2 ^= (UInt32)RAND_MASK; | ||
| 2402 | } | ||
| 2403 | i2++; | ||
| 2404 | |||
| 2405 | if (dst) | ||
| 2406 | retVal = putc ( ch2, dst ); | ||
| 2407 | |||
| 2408 | UPDATE_CRC ( localCrc, (UChar)ch2 ); | ||
| 2409 | |||
| 2410 | if (ch2 != chPrev) { | ||
| 2411 | count = 1; | ||
| 2412 | } else { | ||
| 2413 | count++; | ||
| 2414 | if (count >= 4) { | ||
| 2415 | Int32 j2; | ||
| 2416 | UChar z; | ||
| 2417 | GET_SMALL(z); | ||
| 2418 | if (blockRandomised) { | ||
| 2419 | RAND_UPD_MASK; | ||
| 2420 | z ^= RAND_MASK; | ||
| 2421 | } | ||
| 2422 | for (j2 = 0; j2 < (Int32)z; j2++) { | ||
| 2423 | if (dst) retVal = putc (ch2, dst); | ||
| 2424 | UPDATE_CRC ( localCrc, (UChar)ch2 ); | ||
| 2425 | } | ||
| 2426 | i2++; | ||
| 2427 | count = 0; | ||
| 2428 | } | ||
| 2429 | } | ||
| 2430 | } | ||
| 2431 | } | ||
| 2432 | |||
| 2433 | setGlobalCRC ( localCrc ); | ||
| 2434 | } | ||
| 2435 | /*-- end of the in-line run-length-decoder. --*/ | ||
| 2436 | } | ||
| 2437 | #undef GET_SMALL | ||
| 2438 | |||
| 2439 | |||
| 2440 | /*---------------------------------------------*/ | ||
| 2441 | |||
| 2442 | #define GET_FAST(cccc) \ | ||
| 2443 | \ | ||
| 2444 | cccc = ll8[tPos]; \ | ||
| 2445 | tPos = tt[tPos]; | ||
| 2446 | |||
| 2447 | |||
| 2448 | void undoReversibleTransformation_fast ( FILE* dst ) | ||
| 2449 | { | ||
| 2450 | Int32 cftab[257]; | ||
| 2451 | Int32 i, tPos; | ||
| 2452 | UChar ch; | ||
| 2453 | |||
| 2454 | /*-- | ||
| 2455 | We assume here that the global array unzftab will | ||
| 2456 | already be holding the frequency counts for | ||
| 2457 | ll8[0 .. last]. | ||
| 2458 | --*/ | ||
| 2459 | |||
| 2460 | /*-- Set up cftab to facilitate generation of T^(-1) --*/ | ||
| 2461 | cftab[0] = 0; | ||
| 2462 | for (i = 1; i <= 256; i++) cftab[i] = unzftab[i-1]; | ||
| 2463 | for (i = 1; i <= 256; i++) cftab[i] += cftab[i-1]; | ||
| 2464 | |||
| 2465 | /*-- compute the T^(-1) vector --*/ | ||
| 2466 | for (i = 0; i <= last; i++) { | ||
| 2467 | ch = (UChar)ll8[i]; | ||
| 2468 | tt[cftab[ch]] = i; | ||
| 2469 | cftab[ch]++; | ||
| 2470 | } | 450 | } |
| 451 | if (verbosity >= 2) fprintf ( stderr, "\n " ); | ||
| 452 | return True; | ||
| 2471 | 453 | ||
| 2472 | /*-- | 454 | errhandler: |
| 2473 | We recreate the original by subscripting L through T^(-1). | 455 | bzReadClose ( &bzerr_dummy, bzf ); |
| 2474 | The run-length-decoder below requires characters incrementally, | 456 | switch (bzerr) { |
| 2475 | so tPos is set to a starting value, and is updated by | 457 | case BZ_IO_ERROR: |
| 2476 | the GET_FAST macro. | 458 | errhandler_io: |
| 2477 | --*/ | 459 | ioError(); break; |
| 2478 | tPos = tt[origPtr]; | 460 | case BZ_DATA_ERROR: |
| 2479 | 461 | crcError(); | |
| 2480 | /*-------------------------------------------------*/ | 462 | case BZ_MEM_ERROR: |
| 2481 | /*-- | 463 | outOfMemory(); |
| 2482 | This is pretty much a verbatim copy of the | 464 | case BZ_UNEXPECTED_EOF: |
| 2483 | run-length decoder present in the distribution | 465 | compressedStreamEOF(); |
| 2484 | bzip-0.21; it has to be here to avoid creating | 466 | case BZ_DATA_ERROR_MAGIC: |
| 2485 | block[] as an intermediary structure. As in 0.21, | 467 | if (streamNo == 1) { |
| 2486 | this code derives from some sent to me by | 468 | return False; |
| 2487 | Christian von Roques. | 469 | } else { |
| 2488 | --*/ | 470 | fprintf ( stderr, |
| 2489 | { | 471 | "\n%s: %s: trailing garbage after EOF ignored\n", |
| 2490 | IntNative retVal; | 472 | progName, inName ); |
| 2491 | Int32 i2, count, chPrev, ch2; | 473 | return True; |
| 2492 | UInt32 localCrc; | ||
| 2493 | |||
| 2494 | count = 0; | ||
| 2495 | i2 = 0; | ||
| 2496 | ch2 = 256; /*-- not a char and not EOF --*/ | ||
| 2497 | localCrc = getGlobalCRC(); | ||
| 2498 | |||
| 2499 | if (blockRandomised) { | ||
| 2500 | RAND_DECLS; | ||
| 2501 | while ( i2 <= last ) { | ||
| 2502 | chPrev = ch2; | ||
| 2503 | GET_FAST(ch2); | ||
| 2504 | RAND_UPD_MASK; | ||
| 2505 | ch2 ^= (UInt32)RAND_MASK; | ||
| 2506 | i2++; | ||
| 2507 | |||
| 2508 | retVal = putc ( ch2, dst ); | ||
| 2509 | UPDATE_CRC ( localCrc, (UChar)ch2 ); | ||
| 2510 | |||
| 2511 | if (ch2 != chPrev) { | ||
| 2512 | count = 1; | ||
| 2513 | } else { | ||
| 2514 | count++; | ||
| 2515 | if (count >= 4) { | ||
| 2516 | Int32 j2; | ||
| 2517 | UChar z; | ||
| 2518 | GET_FAST(z); | ||
| 2519 | RAND_UPD_MASK; | ||
| 2520 | z ^= RAND_MASK; | ||
| 2521 | for (j2 = 0; j2 < (Int32)z; j2++) { | ||
| 2522 | retVal = putc (ch2, dst); | ||
| 2523 | UPDATE_CRC ( localCrc, (UChar)ch2 ); | ||
| 2524 | } | ||
| 2525 | i2++; | ||
| 2526 | count = 0; | ||
| 2527 | } | ||
| 2528 | } | ||
| 2529 | } | ||
| 2530 | |||
| 2531 | } else { | ||
| 2532 | |||
| 2533 | while ( i2 <= last ) { | ||
| 2534 | chPrev = ch2; | ||
| 2535 | GET_FAST(ch2); | ||
| 2536 | i2++; | ||
| 2537 | |||
| 2538 | retVal = putc ( ch2, dst ); | ||
| 2539 | UPDATE_CRC ( localCrc, (UChar)ch2 ); | ||
| 2540 | |||
| 2541 | if (ch2 != chPrev) { | ||
| 2542 | count = 1; | ||
| 2543 | } else { | ||
| 2544 | count++; | ||
| 2545 | if (count >= 4) { | ||
| 2546 | Int32 j2; | ||
| 2547 | UChar z; | ||
| 2548 | GET_FAST(z); | ||
| 2549 | for (j2 = 0; j2 < (Int32)z; j2++) { | ||
| 2550 | retVal = putc (ch2, dst); | ||
| 2551 | UPDATE_CRC ( localCrc, (UChar)ch2 ); | ||
| 2552 | } | ||
| 2553 | i2++; | ||
| 2554 | count = 0; | ||
| 2555 | } | ||
| 2556 | } | ||
| 2557 | } | 474 | } |
| 2558 | 475 | default: | |
| 2559 | } /*-- if (blockRandomised) --*/ | 476 | panic ( "decompress:unexpected error" ); |
| 2560 | |||
| 2561 | setGlobalCRC ( localCrc ); | ||
| 2562 | } | ||
| 2563 | /*-- end of the in-line run-length-decoder. --*/ | ||
| 2564 | } | ||
| 2565 | #undef GET_FAST | ||
| 2566 | |||
| 2567 | |||
| 2568 | /*---------------------------------------------------*/ | ||
| 2569 | /*--- The block loader and RLEr ---*/ | ||
| 2570 | /*---------------------------------------------------*/ | ||
| 2571 | |||
| 2572 | /*---------------------------------------------*/ | ||
| 2573 | /* Top 16: run length, 1 to 255. | ||
| 2574 | * Lower 16: the char, or MY_EOF for EOF. | ||
| 2575 | */ | ||
| 2576 | |||
| 2577 | #define MY_EOF 257 | ||
| 2578 | |||
| 2579 | INLINE Int32 getRLEpair ( FILE* src ) | ||
| 2580 | { | ||
| 2581 | Int32 runLength; | ||
| 2582 | IntNative ch, chLatest; | ||
| 2583 | |||
| 2584 | ch = getc ( src ); | ||
| 2585 | |||
| 2586 | /*--- Because I have no idea what kind of a value EOF is. ---*/ | ||
| 2587 | if (ch == EOF) { | ||
| 2588 | ERROR_IF_NOT_ZERO ( ferror(src)); | ||
| 2589 | return (1 << 16) | MY_EOF; | ||
| 2590 | } | ||
| 2591 | |||
| 2592 | runLength = 0; | ||
| 2593 | do { | ||
| 2594 | chLatest = getc ( src ); | ||
| 2595 | runLength++; | ||
| 2596 | bytesIn++; | ||
| 2597 | } | ||
| 2598 | while (ch == chLatest && runLength < 255); | ||
| 2599 | |||
| 2600 | if ( chLatest != EOF ) { | ||
| 2601 | if ( ungetc ( chLatest, src ) == EOF ) | ||
| 2602 | panic ( "getRLEpair: ungetc failed" ); | ||
| 2603 | } else { | ||
| 2604 | ERROR_IF_NOT_ZERO ( ferror(src) ); | ||
| 2605 | } | ||
| 2606 | |||
| 2607 | /*--- Conditional is just a speedup hack. ---*/ | ||
| 2608 | if (runLength == 1) { | ||
| 2609 | UPDATE_CRC ( globalCrc, (UChar)ch ); | ||
| 2610 | return (1 << 16) | ch; | ||
| 2611 | } else { | ||
| 2612 | Int32 i; | ||
| 2613 | for (i = 1; i <= runLength; i++) | ||
| 2614 | UPDATE_CRC ( globalCrc, (UChar)ch ); | ||
| 2615 | return (runLength << 16) | ch; | ||
| 2616 | } | 477 | } |
| 2617 | } | ||
| 2618 | 478 | ||
| 2619 | 479 | panic ( "decompress:end" ); | |
| 2620 | /*---------------------------------------------*/ | 480 | return True; /*notreached*/ |
| 2621 | void loadAndRLEsource ( FILE* src ) | ||
| 2622 | { | ||
| 2623 | Int32 ch, allowableBlockSize, i; | ||
| 2624 | |||
| 2625 | last = -1; | ||
| 2626 | ch = 0; | ||
| 2627 | |||
| 2628 | for (i = 0; i < 256; i++) inUse[i] = False; | ||
| 2629 | |||
| 2630 | /*--- 20 is just a paranoia constant ---*/ | ||
| 2631 | allowableBlockSize = 100000 * blockSize100k - 20; | ||
| 2632 | |||
| 2633 | while (last < allowableBlockSize && ch != MY_EOF) { | ||
| 2634 | Int32 rlePair, runLen; | ||
| 2635 | rlePair = getRLEpair ( src ); | ||
| 2636 | ch = rlePair & 0xFFFF; | ||
| 2637 | runLen = (UInt32)rlePair >> 16; | ||
| 2638 | |||
| 2639 | #if DEBUG | ||
| 2640 | assert (runLen >= 1 && runLen <= 255); | ||
| 2641 | #endif | ||
| 2642 | |||
| 2643 | if (ch != MY_EOF) { | ||
| 2644 | inUse[ch] = True; | ||
| 2645 | switch (runLen) { | ||
| 2646 | case 1: | ||
| 2647 | last++; block[last] = (UChar)ch; break; | ||
| 2648 | case 2: | ||
| 2649 | last++; block[last] = (UChar)ch; | ||
| 2650 | last++; block[last] = (UChar)ch; break; | ||
| 2651 | case 3: | ||
| 2652 | last++; block[last] = (UChar)ch; | ||
| 2653 | last++; block[last] = (UChar)ch; | ||
| 2654 | last++; block[last] = (UChar)ch; break; | ||
| 2655 | default: | ||
| 2656 | inUse[runLen-4] = True; | ||
| 2657 | last++; block[last] = (UChar)ch; | ||
| 2658 | last++; block[last] = (UChar)ch; | ||
| 2659 | last++; block[last] = (UChar)ch; | ||
| 2660 | last++; block[last] = (UChar)ch; | ||
| 2661 | last++; block[last] = (UChar)(runLen-4); break; | ||
| 2662 | } | ||
| 2663 | } | ||
| 2664 | } | ||
| 2665 | } | 481 | } |
| 2666 | 482 | ||
| 2667 | 483 | ||
| 2668 | /*---------------------------------------------------*/ | ||
| 2669 | /*--- Processing of complete files and streams ---*/ | ||
| 2670 | /*---------------------------------------------------*/ | ||
| 2671 | |||
| 2672 | /*---------------------------------------------*/ | 484 | /*---------------------------------------------*/ |
| 2673 | void compressStream ( FILE *stream, FILE *zStream ) | 485 | Bool testStream ( FILE *zStream ) |
| 2674 | { | 486 | { |
| 2675 | IntNative retVal; | 487 | BZFILE* bzf = NULL; |
| 2676 | UInt32 blockCRC, combinedCRC; | 488 | Int32 bzerr, bzerr_dummy, ret, nread, streamNo, i; |
| 2677 | Int32 blockNo; | 489 | UChar obuf[5000]; |
| 490 | UChar unused[BZ_MAX_UNUSED]; | ||
| 491 | Int32 nUnused; | ||
| 492 | UChar* unusedTmp; | ||
| 2678 | 493 | ||
| 2679 | blockNo = 0; | 494 | nUnused = 0; |
| 2680 | bytesIn = 0; | 495 | streamNo = 0; |
| 2681 | bytesOut = 0; | ||
| 2682 | nBlocksRandomised = 0; | ||
| 2683 | 496 | ||
| 2684 | SET_BINARY_MODE(stream); | ||
| 2685 | SET_BINARY_MODE(zStream); | 497 | SET_BINARY_MODE(zStream); |
| 2686 | 498 | if (ferror(zStream)) goto errhandler_io; | |
| 2687 | ERROR_IF_NOT_ZERO ( ferror(stream) ); | ||
| 2688 | ERROR_IF_NOT_ZERO ( ferror(zStream) ); | ||
| 2689 | |||
| 2690 | bsSetStream ( zStream, True ); | ||
| 2691 | |||
| 2692 | /*--- Write `magic' bytes B and Z, | ||
| 2693 | then h indicating file-format == huffmanised, | ||
| 2694 | followed by a digit indicating blockSize100k. | ||
| 2695 | ---*/ | ||
| 2696 | bsPutUChar ( 'B' ); | ||
| 2697 | bsPutUChar ( 'Z' ); | ||
| 2698 | bsPutUChar ( 'h' ); | ||
| 2699 | bsPutUChar ( '0' + blockSize100k ); | ||
| 2700 | |||
| 2701 | combinedCRC = 0; | ||
| 2702 | |||
| 2703 | if (verbosity >= 2) fprintf ( stderr, "\n" ); | ||
| 2704 | 499 | ||
| 2705 | while (True) { | 500 | while (True) { |
| 2706 | 501 | ||
| 2707 | blockNo++; | 502 | bzf = bzReadOpen ( |
| 2708 | initialiseCRC (); | 503 | &bzerr, zStream, verbosity, |
| 2709 | loadAndRLEsource ( stream ); | 504 | (int)smallMode, unused, nUnused |
| 2710 | ERROR_IF_NOT_ZERO ( ferror(stream) ); | 505 | ); |
| 2711 | if (last == -1) break; | 506 | if (bzf == NULL || bzerr != BZ_OK) goto errhandler; |
| 2712 | 507 | streamNo++; | |
| 2713 | blockCRC = getFinalCRC (); | ||
| 2714 | combinedCRC = (combinedCRC << 1) | (combinedCRC >> 31); | ||
| 2715 | combinedCRC ^= blockCRC; | ||
| 2716 | |||
| 2717 | if (verbosity >= 2) | ||
| 2718 | fprintf ( stderr, " block %d: crc = 0x%8x, combined CRC = 0x%8x, size = %d", | ||
| 2719 | blockNo, blockCRC, combinedCRC, last+1 ); | ||
| 2720 | |||
| 2721 | /*-- sort the block and establish posn of original string --*/ | ||
| 2722 | doReversibleTransformation (); | ||
| 2723 | |||
| 2724 | /*-- | ||
| 2725 | A 6-byte block header, the value chosen arbitrarily | ||
| 2726 | as 0x314159265359 :-). A 32 bit value does not really | ||
| 2727 | give a strong enough guarantee that the value will not | ||
| 2728 | appear by chance in the compressed datastream. Worst-case | ||
| 2729 | probability of this event, for a 900k block, is about | ||
| 2730 | 2.0e-3 for 32 bits, 1.0e-5 for 40 bits and 4.0e-8 for 48 bits. | ||
| 2731 | For a compressed file of size 100Gb -- about 100000 blocks -- | ||
| 2732 | only a 48-bit marker will do. NB: normal compression/ | ||
| 2733 | decompression do *not* rely on these statistical properties. | ||
| 2734 | They are only important when trying to recover blocks from | ||
| 2735 | damaged files. | ||
| 2736 | --*/ | ||
| 2737 | bsPutUChar ( 0x31 ); bsPutUChar ( 0x41 ); | ||
| 2738 | bsPutUChar ( 0x59 ); bsPutUChar ( 0x26 ); | ||
| 2739 | bsPutUChar ( 0x53 ); bsPutUChar ( 0x59 ); | ||
| 2740 | |||
| 2741 | /*-- Now the block's CRC, so it is in a known place. --*/ | ||
| 2742 | bsPutUInt32 ( blockCRC ); | ||
| 2743 | |||
| 2744 | /*-- Now a single bit indicating randomisation. --*/ | ||
| 2745 | if (blockRandomised) { | ||
| 2746 | bsW(1,1); nBlocksRandomised++; | ||
| 2747 | } else | ||
| 2748 | bsW(1,0); | ||
| 2749 | |||
| 2750 | /*-- Finally, block's contents proper. --*/ | ||
| 2751 | moveToFrontCodeAndSend (); | ||
| 2752 | |||
| 2753 | ERROR_IF_NOT_ZERO ( ferror(zStream) ); | ||
| 2754 | } | ||
| 2755 | |||
| 2756 | if (verbosity >= 2 && nBlocksRandomised > 0) | ||
| 2757 | fprintf ( stderr, " %d block%s needed randomisation\n", | ||
| 2758 | nBlocksRandomised, | ||
| 2759 | nBlocksRandomised == 1 ? "" : "s" ); | ||
| 2760 | |||
| 2761 | /*-- | ||
| 2762 | Now another magic 48-bit number, 0x177245385090, to | ||
| 2763 | indicate the end of the last block. (sqrt(pi), if | ||
| 2764 | you want to know. I did want to use e, but it contains | ||
| 2765 | too much repetition -- 27 18 28 18 28 46 -- for me | ||
| 2766 | to feel statistically comfortable. Call me paranoid.) | ||
| 2767 | --*/ | ||
| 2768 | |||
| 2769 | bsPutUChar ( 0x17 ); bsPutUChar ( 0x72 ); | ||
| 2770 | bsPutUChar ( 0x45 ); bsPutUChar ( 0x38 ); | ||
| 2771 | bsPutUChar ( 0x50 ); bsPutUChar ( 0x90 ); | ||
| 2772 | |||
| 2773 | bsPutUInt32 ( combinedCRC ); | ||
| 2774 | if (verbosity >= 2) | ||
| 2775 | fprintf ( stderr, " final combined CRC = 0x%x\n ", combinedCRC ); | ||
| 2776 | 508 | ||
| 2777 | /*-- Close the files in an utterly paranoid way. --*/ | 509 | while (bzerr == BZ_OK) { |
| 2778 | bsFinishedWithStream (); | 510 | nread = bzRead ( &bzerr, bzf, obuf, 5000 ); |
| 2779 | 511 | if (bzerr == BZ_DATA_ERROR_MAGIC) goto errhandler; | |
| 2780 | ERROR_IF_NOT_ZERO ( ferror(zStream) ); | 512 | } |
| 2781 | retVal = fflush ( zStream ); | 513 | if (bzerr != BZ_STREAM_END) goto errhandler; |
| 2782 | ERROR_IF_EOF ( retVal ); | ||
| 2783 | retVal = fclose ( zStream ); | ||
| 2784 | ERROR_IF_EOF ( retVal ); | ||
| 2785 | |||
| 2786 | ERROR_IF_NOT_ZERO ( ferror(stream) ); | ||
| 2787 | retVal = fclose ( stream ); | ||
| 2788 | ERROR_IF_EOF ( retVal ); | ||
| 2789 | |||
| 2790 | if (bytesIn == 0) bytesIn = 1; | ||
| 2791 | if (bytesOut == 0) bytesOut = 1; | ||
| 2792 | 514 | ||
| 2793 | if (verbosity >= 1) | 515 | bzReadGetUnused ( &bzerr, bzf, (void**)(&unusedTmp), &nUnused ); |
| 2794 | fprintf ( stderr, "%6.3f:1, %6.3f bits/byte, " | 516 | if (bzerr != BZ_OK) panic ( "test:bzReadGetUnused" ); |
| 2795 | "%5.2f%% saved, %d in, %d out.\n", | ||
| 2796 | (float)bytesIn / (float)bytesOut, | ||
| 2797 | (8.0 * (float)bytesOut) / (float)bytesIn, | ||
| 2798 | 100.0 * (1.0 - (float)bytesOut / (float)bytesIn), | ||
| 2799 | bytesIn, | ||
| 2800 | bytesOut | ||
| 2801 | ); | ||
| 2802 | } | ||
| 2803 | 517 | ||
| 518 | for (i = 0; i < nUnused; i++) unused[i] = unusedTmp[i]; | ||
| 2804 | 519 | ||
| 2805 | /*---------------------------------------------*/ | 520 | bzReadClose ( &bzerr, bzf ); |
| 2806 | Bool uncompressStream ( FILE *zStream, FILE *stream ) | 521 | if (bzerr != BZ_OK) panic ( "test:bzReadGetUnused" ); |
| 2807 | { | 522 | if (nUnused == 0 && myfeof(zStream)) break; |
| 2808 | UChar magic1, magic2, magic3, magic4; | ||
| 2809 | UChar magic5, magic6; | ||
| 2810 | UInt32 storedBlockCRC, storedCombinedCRC; | ||
| 2811 | UInt32 computedBlockCRC, computedCombinedCRC; | ||
| 2812 | Int32 currBlockNo; | ||
| 2813 | IntNative retVal; | ||
| 2814 | 523 | ||
| 2815 | SET_BINARY_MODE(stream); | ||
| 2816 | SET_BINARY_MODE(zStream); | ||
| 2817 | |||
| 2818 | ERROR_IF_NOT_ZERO ( ferror(stream) ); | ||
| 2819 | ERROR_IF_NOT_ZERO ( ferror(zStream) ); | ||
| 2820 | |||
| 2821 | bsSetStream ( zStream, False ); | ||
| 2822 | |||
| 2823 | /*-- | ||
| 2824 | A bad magic number is `recoverable from'; | ||
| 2825 | return with False so the caller skips the file. | ||
| 2826 | --*/ | ||
| 2827 | magic1 = bsGetUChar (); | ||
| 2828 | magic2 = bsGetUChar (); | ||
| 2829 | magic3 = bsGetUChar (); | ||
| 2830 | magic4 = bsGetUChar (); | ||
| 2831 | if (magic1 != 'B' || | ||
| 2832 | magic2 != 'Z' || | ||
| 2833 | magic3 != 'h' || | ||
| 2834 | magic4 < '1' || | ||
| 2835 | magic4 > '9') { | ||
| 2836 | bsFinishedWithStream(); | ||
| 2837 | retVal = fclose ( stream ); | ||
| 2838 | ERROR_IF_EOF ( retVal ); | ||
| 2839 | return False; | ||
| 2840 | } | 524 | } |
| 2841 | 525 | ||
| 2842 | setDecompressStructureSizes ( magic4 - '0' ); | 526 | if (ferror(zStream)) goto errhandler_io; |
| 2843 | computedCombinedCRC = 0; | 527 | ret = fclose ( zStream ); |
| 2844 | 528 | if (ret == EOF) goto errhandler_io; | |
| 2845 | if (verbosity >= 2) fprintf ( stderr, "\n " ); | ||
| 2846 | currBlockNo = 0; | ||
| 2847 | |||
| 2848 | while (True) { | ||
| 2849 | magic1 = bsGetUChar (); | ||
| 2850 | magic2 = bsGetUChar (); | ||
| 2851 | magic3 = bsGetUChar (); | ||
| 2852 | magic4 = bsGetUChar (); | ||
| 2853 | magic5 = bsGetUChar (); | ||
| 2854 | magic6 = bsGetUChar (); | ||
| 2855 | if (magic1 == 0x17 && magic2 == 0x72 && | ||
| 2856 | magic3 == 0x45 && magic4 == 0x38 && | ||
| 2857 | magic5 == 0x50 && magic6 == 0x90) break; | ||
| 2858 | |||
| 2859 | if (magic1 != 0x31 || magic2 != 0x41 || | ||
| 2860 | magic3 != 0x59 || magic4 != 0x26 || | ||
| 2861 | magic5 != 0x53 || magic6 != 0x59) badBlockHeader(); | ||
| 2862 | |||
| 2863 | storedBlockCRC = bsGetUInt32 (); | ||
| 2864 | |||
| 2865 | if (bsR(1) == 1) | ||
| 2866 | blockRandomised = True; else | ||
| 2867 | blockRandomised = False; | ||
| 2868 | |||
| 2869 | currBlockNo++; | ||
| 2870 | if (verbosity >= 2) | ||
| 2871 | fprintf ( stderr, "[%d: huff+mtf ", currBlockNo ); | ||
| 2872 | getAndMoveToFrontDecode (); | ||
| 2873 | ERROR_IF_NOT_ZERO ( ferror(zStream) ); | ||
| 2874 | |||
| 2875 | initialiseCRC(); | ||
| 2876 | if (verbosity >= 2) fprintf ( stderr, "rt+rld" ); | ||
| 2877 | if (smallMode) | ||
| 2878 | undoReversibleTransformation_small ( stream ); | ||
| 2879 | else | ||
| 2880 | undoReversibleTransformation_fast ( stream ); | ||
| 2881 | |||
| 2882 | ERROR_IF_NOT_ZERO ( ferror(stream) ); | ||
| 2883 | |||
| 2884 | computedBlockCRC = getFinalCRC(); | ||
| 2885 | if (verbosity >= 3) | ||
| 2886 | fprintf ( stderr, " {0x%x, 0x%x}", storedBlockCRC, computedBlockCRC ); | ||
| 2887 | if (verbosity >= 2) fprintf ( stderr, "] " ); | ||
| 2888 | |||
| 2889 | /*-- A bad CRC is considered a fatal error. --*/ | ||
| 2890 | if (storedBlockCRC != computedBlockCRC) | ||
| 2891 | crcError ( storedBlockCRC, computedBlockCRC ); | ||
| 2892 | |||
| 2893 | computedCombinedCRC = (computedCombinedCRC << 1) | (computedCombinedCRC >> 31); | ||
| 2894 | computedCombinedCRC ^= computedBlockCRC; | ||
| 2895 | }; | ||
| 2896 | 529 | ||
| 2897 | if (verbosity >= 2) fprintf ( stderr, "\n " ); | 530 | if (verbosity >= 2) fprintf ( stderr, "\n " ); |
| 2898 | |||
| 2899 | storedCombinedCRC = bsGetUInt32 (); | ||
| 2900 | if (verbosity >= 2) | ||
| 2901 | fprintf ( stderr, | ||
| 2902 | "combined CRCs: stored = 0x%x, computed = 0x%x\n ", | ||
| 2903 | storedCombinedCRC, computedCombinedCRC ); | ||
| 2904 | if (storedCombinedCRC != computedCombinedCRC) | ||
| 2905 | crcError ( storedCombinedCRC, computedCombinedCRC ); | ||
| 2906 | |||
| 2907 | |||
| 2908 | bsFinishedWithStream (); | ||
| 2909 | ERROR_IF_NOT_ZERO ( ferror(zStream) ); | ||
| 2910 | retVal = fclose ( zStream ); | ||
| 2911 | ERROR_IF_EOF ( retVal ); | ||
| 2912 | |||
| 2913 | ERROR_IF_NOT_ZERO ( ferror(stream) ); | ||
| 2914 | retVal = fflush ( stream ); | ||
| 2915 | ERROR_IF_NOT_ZERO ( retVal ); | ||
| 2916 | if (stream != stdout) { | ||
| 2917 | retVal = fclose ( stream ); | ||
| 2918 | ERROR_IF_EOF ( retVal ); | ||
| 2919 | } | ||
| 2920 | return True; | 531 | return True; |
| 2921 | } | ||
| 2922 | |||
| 2923 | |||
| 2924 | /*---------------------------------------------*/ | ||
| 2925 | Bool testStream ( FILE *zStream ) | ||
| 2926 | { | ||
| 2927 | UChar magic1, magic2, magic3, magic4; | ||
| 2928 | UChar magic5, magic6; | ||
| 2929 | UInt32 storedBlockCRC, storedCombinedCRC; | ||
| 2930 | UInt32 computedBlockCRC, computedCombinedCRC; | ||
| 2931 | Int32 currBlockNo; | ||
| 2932 | IntNative retVal; | ||
| 2933 | |||
| 2934 | SET_BINARY_MODE(zStream); | ||
| 2935 | ERROR_IF_NOT_ZERO ( ferror(zStream) ); | ||
| 2936 | |||
| 2937 | bsSetStream ( zStream, False ); | ||
| 2938 | |||
| 2939 | magic1 = bsGetUChar (); | ||
| 2940 | magic2 = bsGetUChar (); | ||
| 2941 | magic3 = bsGetUChar (); | ||
| 2942 | magic4 = bsGetUChar (); | ||
| 2943 | if (magic1 != 'B' || | ||
| 2944 | magic2 != 'Z' || | ||
| 2945 | magic3 != 'h' || | ||
| 2946 | magic4 < '1' || | ||
| 2947 | magic4 > '9') { | ||
| 2948 | bsFinishedWithStream(); | ||
| 2949 | fclose ( zStream ); | ||
| 2950 | fprintf ( stderr, "\n%s: bad magic number (ie, not created by bzip2)\n", | ||
| 2951 | inName ); | ||
| 2952 | return False; | ||
| 2953 | } | ||
| 2954 | 532 | ||
| 2955 | smallMode = True; | 533 | errhandler: |
| 2956 | setDecompressStructureSizes ( magic4 - '0' ); | 534 | bzReadClose ( &bzerr_dummy, bzf ); |
| 2957 | computedCombinedCRC = 0; | 535 | switch (bzerr) { |
| 2958 | 536 | case BZ_IO_ERROR: | |
| 2959 | if (verbosity >= 2) fprintf ( stderr, "\n" ); | 537 | errhandler_io: |
| 2960 | currBlockNo = 0; | 538 | ioError(); break; |
| 2961 | 539 | case BZ_DATA_ERROR: | |
| 2962 | while (True) { | ||
| 2963 | magic1 = bsGetUChar (); | ||
| 2964 | magic2 = bsGetUChar (); | ||
| 2965 | magic3 = bsGetUChar (); | ||
| 2966 | magic4 = bsGetUChar (); | ||
| 2967 | magic5 = bsGetUChar (); | ||
| 2968 | magic6 = bsGetUChar (); | ||
| 2969 | if (magic1 == 0x17 && magic2 == 0x72 && | ||
| 2970 | magic3 == 0x45 && magic4 == 0x38 && | ||
| 2971 | magic5 == 0x50 && magic6 == 0x90) break; | ||
| 2972 | |||
| 2973 | currBlockNo++; | ||
| 2974 | if (magic1 != 0x31 || magic2 != 0x41 || | ||
| 2975 | magic3 != 0x59 || magic4 != 0x26 || | ||
| 2976 | magic5 != 0x53 || magic6 != 0x59) { | ||
| 2977 | bsFinishedWithStream(); | ||
| 2978 | fclose ( zStream ); | ||
| 2979 | fprintf ( stderr, | 540 | fprintf ( stderr, |
| 2980 | "\n%s, block %d: bad header (not == 0x314159265359)\n", | 541 | "\n%s: data integrity (CRC) error in data\n", |
| 2981 | inName, currBlockNo ); | 542 | inName ); |
| 2982 | return False; | 543 | return False; |
| 2983 | } | 544 | case BZ_MEM_ERROR: |
| 2984 | storedBlockCRC = bsGetUInt32 (); | 545 | outOfMemory(); |
| 2985 | 546 | case BZ_UNEXPECTED_EOF: | |
| 2986 | if (bsR(1) == 1) | 547 | fprintf ( stderr, |
| 2987 | blockRandomised = True; else | 548 | "\n%s: file ends unexpectedly\n", |
| 2988 | blockRandomised = False; | 549 | inName ); |
| 2989 | |||
| 2990 | if (verbosity >= 2) | ||
| 2991 | fprintf ( stderr, " block [%d: huff+mtf ", currBlockNo ); | ||
| 2992 | getAndMoveToFrontDecode (); | ||
| 2993 | ERROR_IF_NOT_ZERO ( ferror(zStream) ); | ||
| 2994 | |||
| 2995 | initialiseCRC(); | ||
| 2996 | if (verbosity >= 2) fprintf ( stderr, "rt+rld" ); | ||
| 2997 | undoReversibleTransformation_small ( NULL ); | ||
| 2998 | |||
| 2999 | computedBlockCRC = getFinalCRC(); | ||
| 3000 | if (verbosity >= 3) | ||
| 3001 | fprintf ( stderr, " {0x%x, 0x%x}", storedBlockCRC, computedBlockCRC ); | ||
| 3002 | if (verbosity >= 2) fprintf ( stderr, "] " ); | ||
| 3003 | |||
| 3004 | if (storedBlockCRC != computedBlockCRC) { | ||
| 3005 | bsFinishedWithStream(); | ||
| 3006 | fclose ( zStream ); | ||
| 3007 | fprintf ( stderr, "\n%s, block %d: computed CRC does not match stored one\n", | ||
| 3008 | inName, currBlockNo ); | ||
| 3009 | return False; | 550 | return False; |
| 3010 | } | 551 | case BZ_DATA_ERROR_MAGIC: |
| 3011 | 552 | if (streamNo == 1) { | |
| 3012 | if (verbosity >= 2) fprintf ( stderr, "ok\n" ); | 553 | fprintf ( stderr, |
| 3013 | computedCombinedCRC = (computedCombinedCRC << 1) | (computedCombinedCRC >> 31); | 554 | "\n%s: bad magic number (ie, not created by bzip2)\n", |
| 3014 | computedCombinedCRC ^= computedBlockCRC; | 555 | inName ); |
| 3015 | }; | 556 | return False; |
| 3016 | 557 | } else { | |
| 3017 | storedCombinedCRC = bsGetUInt32 (); | 558 | fprintf ( stderr, |
| 3018 | if (verbosity >= 2) | 559 | "\n%s: %s: trailing garbage after EOF ignored\n", |
| 3019 | fprintf ( stderr, | 560 | progName, inName ); |
| 3020 | " combined CRCs: stored = 0x%x, computed = 0x%x\n ", | 561 | return True; |
| 3021 | storedCombinedCRC, computedCombinedCRC ); | 562 | } |
| 3022 | if (storedCombinedCRC != computedCombinedCRC) { | 563 | default: |
| 3023 | bsFinishedWithStream(); | 564 | panic ( "test:unexpected error" ); |
| 3024 | fclose ( zStream ); | ||
| 3025 | fprintf ( stderr, "\n%s: computed CRC does not match stored one\n", | ||
| 3026 | inName ); | ||
| 3027 | return False; | ||
| 3028 | } | 565 | } |
| 3029 | 566 | ||
| 3030 | bsFinishedWithStream (); | 567 | panic ( "test:end" ); |
| 3031 | ERROR_IF_NOT_ZERO ( ferror(zStream) ); | 568 | return True; /*notreached*/ |
| 3032 | retVal = fclose ( zStream ); | ||
| 3033 | ERROR_IF_EOF ( retVal ); | ||
| 3034 | return True; | ||
| 3035 | } | 569 | } |
| 3036 | 570 | ||
| 3037 | 571 | ||
| 3038 | |||
| 3039 | /*---------------------------------------------------*/ | 572 | /*---------------------------------------------------*/ |
| 3040 | /*--- Error [non-] handling grunge ---*/ | 573 | /*--- Error [non-] handling grunge ---*/ |
| 3041 | /*---------------------------------------------------*/ | 574 | /*---------------------------------------------------*/ |
| @@ -3059,8 +592,7 @@ void showFileNames ( void ) | |||
| 3059 | fprintf ( | 592 | fprintf ( |
| 3060 | stderr, | 593 | stderr, |
| 3061 | "\tInput file = %s, output file = %s\n", | 594 | "\tInput file = %s, output file = %s\n", |
| 3062 | inName==NULL ? "(null)" : inName, | 595 | inName, outName |
| 3063 | outName==NULL ? "(null)" : outName | ||
| 3064 | ); | 596 | ); |
| 3065 | } | 597 | } |
| 3066 | 598 | ||
| @@ -3072,8 +604,7 @@ void cleanUpAndFail ( Int32 ec ) | |||
| 3072 | 604 | ||
| 3073 | if ( srcMode == SM_F2F && opMode != OM_TEST ) { | 605 | if ( srcMode == SM_F2F && opMode != OM_TEST ) { |
| 3074 | fprintf ( stderr, "%s: Deleting output file %s, if it exists.\n", | 606 | fprintf ( stderr, "%s: Deleting output file %s, if it exists.\n", |
| 3075 | progName, | 607 | progName, outName ); |
| 3076 | outName==NULL ? "(null)" : outName ); | ||
| 3077 | if (outputHandleJustInCase != NULL) | 608 | if (outputHandleJustInCase != NULL) |
| 3078 | fclose ( outputHandleJustInCase ); | 609 | fclose ( outputHandleJustInCase ); |
| 3079 | retVal = remove ( outName ); | 610 | retVal = remove ( outName ); |
| @@ -3108,11 +639,10 @@ void panic ( Char* s ) | |||
| 3108 | 639 | ||
| 3109 | 640 | ||
| 3110 | /*---------------------------------------------*/ | 641 | /*---------------------------------------------*/ |
| 3111 | void badBGLengths ( void ) | 642 | void crcError () |
| 3112 | { | 643 | { |
| 3113 | fprintf ( stderr, | 644 | fprintf ( stderr, |
| 3114 | "\n%s: error when reading background model code lengths,\n" | 645 | "\n%s: Data integrity error when decompressing.\n", |
| 3115 | "\twhich probably means the compressed file is corrupted.\n", | ||
| 3116 | progName ); | 646 | progName ); |
| 3117 | showFileNames(); | 647 | showFileNames(); |
| 3118 | cadvise(); | 648 | cadvise(); |
| @@ -3121,19 +651,6 @@ void badBGLengths ( void ) | |||
| 3121 | 651 | ||
| 3122 | 652 | ||
| 3123 | /*---------------------------------------------*/ | 653 | /*---------------------------------------------*/ |
| 3124 | void crcError ( UInt32 crcStored, UInt32 crcComputed ) | ||
| 3125 | { | ||
| 3126 | fprintf ( stderr, | ||
| 3127 | "\n%s: Data integrity error when decompressing.\n" | ||
| 3128 | "\tStored CRC = 0x%x, computed CRC = 0x%x\n", | ||
| 3129 | progName, crcStored, crcComputed ); | ||
| 3130 | showFileNames(); | ||
| 3131 | cadvise(); | ||
| 3132 | cleanUpAndFail( 2 ); | ||
| 3133 | } | ||
| 3134 | |||
| 3135 | |||
| 3136 | /*---------------------------------------------*/ | ||
| 3137 | void compressedStreamEOF ( void ) | 654 | void compressedStreamEOF ( void ) |
| 3138 | { | 655 | { |
| 3139 | fprintf ( stderr, | 656 | fprintf ( stderr, |
| @@ -3160,46 +677,6 @@ void ioError ( ) | |||
| 3160 | 677 | ||
| 3161 | 678 | ||
| 3162 | /*---------------------------------------------*/ | 679 | /*---------------------------------------------*/ |
| 3163 | void blockOverrun () | ||
| 3164 | { | ||
| 3165 | fprintf ( stderr, | ||
| 3166 | "\n%s: block overrun during decompression,\n" | ||
| 3167 | "\twhich probably means the compressed file\n" | ||
| 3168 | "\tis corrupted.\n", | ||
| 3169 | progName ); | ||
| 3170 | showFileNames(); | ||
| 3171 | cadvise(); | ||
| 3172 | cleanUpAndFail( 2 ); | ||
| 3173 | } | ||
| 3174 | |||
| 3175 | |||
| 3176 | /*---------------------------------------------*/ | ||
| 3177 | void badBlockHeader () | ||
| 3178 | { | ||
| 3179 | fprintf ( stderr, | ||
| 3180 | "\n%s: bad block header in the compressed file,\n" | ||
| 3181 | "\twhich probably means it is corrupted.\n", | ||
| 3182 | progName ); | ||
| 3183 | showFileNames(); | ||
| 3184 | cadvise(); | ||
| 3185 | cleanUpAndFail( 2 ); | ||
| 3186 | } | ||
| 3187 | |||
| 3188 | |||
| 3189 | /*---------------------------------------------*/ | ||
| 3190 | void bitStreamEOF () | ||
| 3191 | { | ||
| 3192 | fprintf ( stderr, | ||
| 3193 | "\n%s: read past the end of compressed data,\n" | ||
| 3194 | "\twhich probably means it is corrupted.\n", | ||
| 3195 | progName ); | ||
| 3196 | showFileNames(); | ||
| 3197 | cadvise(); | ||
| 3198 | cleanUpAndFail( 2 ); | ||
| 3199 | } | ||
| 3200 | |||
| 3201 | |||
| 3202 | /*---------------------------------------------*/ | ||
| 3203 | void mySignalCatcher ( IntNative n ) | 680 | void mySignalCatcher ( IntNative n ) |
| 3204 | { | 681 | { |
| 3205 | fprintf ( stderr, | 682 | fprintf ( stderr, |
| @@ -3233,27 +710,11 @@ void mySIGSEGVorSIGBUScatcher ( IntNative n ) | |||
| 3233 | 710 | ||
| 3234 | 711 | ||
| 3235 | /*---------------------------------------------*/ | 712 | /*---------------------------------------------*/ |
| 3236 | void uncompressOutOfMemory ( Int32 draw, Int32 blockSize ) | 713 | void outOfMemory ( void ) |
| 3237 | { | ||
| 3238 | fprintf ( stderr, | ||
| 3239 | "\n%s: Can't allocate enough memory for decompression.\n" | ||
| 3240 | "\tRequested %d bytes for a block size of %d.\n" | ||
| 3241 | "\tTry selecting space-economic decompress (with flag -s)\n" | ||
| 3242 | "\tand failing that, find a machine with more memory.\n", | ||
| 3243 | progName, draw, blockSize ); | ||
| 3244 | showFileNames(); | ||
| 3245 | cleanUpAndFail(1); | ||
| 3246 | } | ||
| 3247 | |||
| 3248 | |||
| 3249 | /*---------------------------------------------*/ | ||
| 3250 | void compressOutOfMemory ( Int32 draw, Int32 blockSize ) | ||
| 3251 | { | 714 | { |
| 3252 | fprintf ( stderr, | 715 | fprintf ( stderr, |
| 3253 | "\n%s: Can't allocate enough memory for compression.\n" | 716 | "\n%s: couldn't allocate enough memory\n", |
| 3254 | "\tRequested %d bytes for a block size of %d.\n" | 717 | progName ); |
| 3255 | "\tTry selecting a small block size (with flag -s).\n", | ||
| 3256 | progName, draw, blockSize ); | ||
| 3257 | showFileNames(); | 718 | showFileNames(); |
| 3258 | cleanUpAndFail(1); | 719 | cleanUpAndFail(1); |
| 3259 | } | 720 | } |
| @@ -3274,6 +735,24 @@ void pad ( Char *s ) | |||
| 3274 | 735 | ||
| 3275 | 736 | ||
| 3276 | /*---------------------------------------------*/ | 737 | /*---------------------------------------------*/ |
| 738 | void copyFileName ( Char* to, Char* from ) | ||
| 739 | { | ||
| 740 | if ( strlen(from) > FILE_NAME_LEN-10 ) { | ||
| 741 | fprintf ( | ||
| 742 | stderr, | ||
| 743 | "bzip2: file name\n`%s'\nis suspiciously (> 1024 chars) long.\n" | ||
| 744 | "Try using a reasonable file name instead. Sorry! :)\n", | ||
| 745 | from | ||
| 746 | ); | ||
| 747 | exit(1); | ||
| 748 | } | ||
| 749 | |||
| 750 | strncpy(to,from,FILE_NAME_LEN-10); | ||
| 751 | to[FILE_NAME_LEN-10]='\0'; | ||
| 752 | } | ||
| 753 | |||
| 754 | |||
| 755 | /*---------------------------------------------*/ | ||
| 3277 | Bool fileExists ( Char* name ) | 756 | Bool fileExists ( Char* name ) |
| 3278 | { | 757 | { |
| 3279 | FILE *tmp = fopen ( name, "rb" ); | 758 | FILE *tmp = fopen ( name, "rb" ); |
| @@ -3287,7 +766,7 @@ Bool fileExists ( Char* name ) | |||
| 3287 | /*-- | 766 | /*-- |
| 3288 | if in doubt, return True | 767 | if in doubt, return True |
| 3289 | --*/ | 768 | --*/ |
| 3290 | Bool notABogStandardFile ( Char* name ) | 769 | Bool notAStandardFile ( Char* name ) |
| 3291 | { | 770 | { |
| 3292 | IntNative i; | 771 | IntNative i; |
| 3293 | struct MY_STAT statBuf; | 772 | struct MY_STAT statBuf; |
| @@ -3300,9 +779,9 @@ Bool notABogStandardFile ( Char* name ) | |||
| 3300 | 779 | ||
| 3301 | 780 | ||
| 3302 | /*---------------------------------------------*/ | 781 | /*---------------------------------------------*/ |
| 3303 | void copyDateAndPermissions ( Char *srcName, Char *dstName ) | 782 | void copyDatePermissionsAndOwner ( Char *srcName, Char *dstName ) |
| 3304 | { | 783 | { |
| 3305 | #if BZ_UNIX | 784 | #if BZ_UNIX |
| 3306 | IntNative retVal; | 785 | IntNative retVal; |
| 3307 | struct MY_STAT statBuf; | 786 | struct MY_STAT statBuf; |
| 3308 | struct utimbuf uTimBuf; | 787 | struct utimbuf uTimBuf; |
| @@ -3314,13 +793,34 @@ void copyDateAndPermissions ( Char *srcName, Char *dstName ) | |||
| 3314 | 793 | ||
| 3315 | retVal = chmod ( dstName, statBuf.st_mode ); | 794 | retVal = chmod ( dstName, statBuf.st_mode ); |
| 3316 | ERROR_IF_NOT_ZERO ( retVal ); | 795 | ERROR_IF_NOT_ZERO ( retVal ); |
| 796 | /* Not sure if this is really portable or not. Causes | ||
| 797 | problems on my x86-Linux Redhat 5.0 box. Decided | ||
| 798 | to omit it from 0.9.0. JRS, 27 June 98. If you | ||
| 799 | understand Unix file semantics and portability issues | ||
| 800 | well enough to fix this properly, drop me a line | ||
| 801 | at jseward@acm.org. | ||
| 802 | retVal = chown ( dstName, statBuf.st_uid, statBuf.st_gid ); | ||
| 803 | ERROR_IF_NOT_ZERO ( retVal ); | ||
| 804 | */ | ||
| 3317 | retVal = utime ( dstName, &uTimBuf ); | 805 | retVal = utime ( dstName, &uTimBuf ); |
| 3318 | ERROR_IF_NOT_ZERO ( retVal ); | 806 | ERROR_IF_NOT_ZERO ( retVal ); |
| 3319 | #endif | 807 | #endif |
| 3320 | } | 808 | } |
| 3321 | 809 | ||
| 3322 | 810 | ||
| 3323 | /*---------------------------------------------*/ | 811 | /*---------------------------------------------*/ |
| 812 | void setInterimPermissions ( Char *dstName ) | ||
| 813 | { | ||
| 814 | #if BZ_UNIX | ||
| 815 | IntNative retVal; | ||
| 816 | retVal = chmod ( dstName, S_IRUSR | S_IWUSR ); | ||
| 817 | ERROR_IF_NOT_ZERO ( retVal ); | ||
| 818 | #endif | ||
| 819 | } | ||
| 820 | |||
| 821 | |||
| 822 | |||
| 823 | /*---------------------------------------------*/ | ||
| 3324 | Bool endsInBz2 ( Char* name ) | 824 | Bool endsInBz2 ( Char* name ) |
| 3325 | { | 825 | { |
| 3326 | Int32 n = strlen ( name ); | 826 | Int32 n = strlen ( name ); |
| @@ -3353,13 +853,13 @@ void compress ( Char *name ) | |||
| 3353 | panic ( "compress: bad modes\n" ); | 853 | panic ( "compress: bad modes\n" ); |
| 3354 | 854 | ||
| 3355 | switch (srcMode) { | 855 | switch (srcMode) { |
| 3356 | case SM_I2O: strcpy ( inName, "(stdin)" ); | 856 | case SM_I2O: copyFileName ( inName, "(stdin)" ); |
| 3357 | strcpy ( outName, "(stdout)" ); break; | 857 | copyFileName ( outName, "(stdout)" ); break; |
| 3358 | case SM_F2F: strcpy ( inName, name ); | 858 | case SM_F2F: copyFileName ( inName, name ); |
| 3359 | strcpy ( outName, name ); | 859 | copyFileName ( outName, name ); |
| 3360 | strcat ( outName, ".bz2" ); break; | 860 | strcat ( outName, ".bz2" ); break; |
| 3361 | case SM_F2O: strcpy ( inName, name ); | 861 | case SM_F2O: copyFileName ( inName, name ); |
| 3362 | strcpy ( outName, "(stdout)" ); break; | 862 | copyFileName ( outName, "(stdout)" ); break; |
| 3363 | } | 863 | } |
| 3364 | 864 | ||
| 3365 | if ( srcMode != SM_I2O && containsDubiousChars ( inName ) ) { | 865 | if ( srcMode != SM_I2O && containsDubiousChars ( inName ) ) { |
| @@ -3377,12 +877,12 @@ void compress ( Char *name ) | |||
| 3377 | progName, inName ); | 877 | progName, inName ); |
| 3378 | return; | 878 | return; |
| 3379 | } | 879 | } |
| 3380 | if ( srcMode != SM_I2O && notABogStandardFile ( inName )) { | 880 | if ( srcMode != SM_I2O && notAStandardFile ( inName )) { |
| 3381 | fprintf ( stderr, "%s: Input file %s is not a normal file, skipping.\n", | 881 | fprintf ( stderr, "%s: Input file %s is not a normal file, skipping.\n", |
| 3382 | progName, inName ); | 882 | progName, inName ); |
| 3383 | return; | 883 | return; |
| 3384 | } | 884 | } |
| 3385 | if ( srcMode == SM_F2F && fileExists ( outName ) ) { | 885 | if ( srcMode == SM_F2F && !forceOverwrite && fileExists ( outName ) ) { |
| 3386 | fprintf ( stderr, "%s: Output file %s already exists, skipping.\n", | 886 | fprintf ( stderr, "%s: Output file %s already exists, skipping.\n", |
| 3387 | progName, outName ); | 887 | progName, outName ); |
| 3388 | return; | 888 | return; |
| @@ -3434,6 +934,7 @@ void compress ( Char *name ) | |||
| 3434 | progName, inName ); | 934 | progName, inName ); |
| 3435 | return; | 935 | return; |
| 3436 | }; | 936 | }; |
| 937 | setInterimPermissions ( outName ); | ||
| 3437 | break; | 938 | break; |
| 3438 | 939 | ||
| 3439 | default: | 940 | default: |
| @@ -3454,7 +955,7 @@ void compress ( Char *name ) | |||
| 3454 | 955 | ||
| 3455 | /*--- If there was an I/O error, we won't get here. ---*/ | 956 | /*--- If there was an I/O error, we won't get here. ---*/ |
| 3456 | if ( srcMode == SM_F2F ) { | 957 | if ( srcMode == SM_F2F ) { |
| 3457 | copyDateAndPermissions ( inName, outName ); | 958 | copyDatePermissionsAndOwner ( inName, outName ); |
| 3458 | if ( !keepInputFiles ) { | 959 | if ( !keepInputFiles ) { |
| 3459 | IntNative retVal = remove ( inName ); | 960 | IntNative retVal = remove ( inName ); |
| 3460 | ERROR_IF_NOT_ZERO ( retVal ); | 961 | ERROR_IF_NOT_ZERO ( retVal ); |
| @@ -3474,15 +975,15 @@ void uncompress ( Char *name ) | |||
| 3474 | panic ( "uncompress: bad modes\n" ); | 975 | panic ( "uncompress: bad modes\n" ); |
| 3475 | 976 | ||
| 3476 | switch (srcMode) { | 977 | switch (srcMode) { |
| 3477 | case SM_I2O: strcpy ( inName, "(stdin)" ); | 978 | case SM_I2O: copyFileName ( inName, "(stdin)" ); |
| 3478 | strcpy ( outName, "(stdout)" ); break; | 979 | copyFileName ( outName, "(stdout)" ); break; |
| 3479 | case SM_F2F: strcpy ( inName, name ); | 980 | case SM_F2F: copyFileName ( inName, name ); |
| 3480 | strcpy ( outName, name ); | 981 | copyFileName ( outName, name ); |
| 3481 | if (endsInBz2 ( outName )) | 982 | if (endsInBz2 ( outName )) |
| 3482 | outName [ strlen ( outName ) - 4 ] = '\0'; | 983 | outName [ strlen ( outName ) - 4 ] = '\0'; |
| 3483 | break; | 984 | break; |
| 3484 | case SM_F2O: strcpy ( inName, name ); | 985 | case SM_F2O: copyFileName ( inName, name ); |
| 3485 | strcpy ( outName, "(stdout)" ); break; | 986 | copyFileName ( outName, "(stdout)" ); break; |
| 3486 | } | 987 | } |
| 3487 | 988 | ||
| 3488 | if ( srcMode != SM_I2O && containsDubiousChars ( inName ) ) { | 989 | if ( srcMode != SM_I2O && containsDubiousChars ( inName ) ) { |
| @@ -3501,12 +1002,12 @@ void uncompress ( Char *name ) | |||
| 3501 | progName, inName ); | 1002 | progName, inName ); |
| 3502 | return; | 1003 | return; |
| 3503 | } | 1004 | } |
| 3504 | if ( srcMode != SM_I2O && notABogStandardFile ( inName )) { | 1005 | if ( srcMode != SM_I2O && notAStandardFile ( inName )) { |
| 3505 | fprintf ( stderr, "%s: Input file %s is not a normal file, skipping.\n", | 1006 | fprintf ( stderr, "%s: Input file %s is not a normal file, skipping.\n", |
| 3506 | progName, inName ); | 1007 | progName, inName ); |
| 3507 | return; | 1008 | return; |
| 3508 | } | 1009 | } |
| 3509 | if ( srcMode == SM_F2F && fileExists ( outName ) ) { | 1010 | if ( srcMode == SM_F2F && !forceOverwrite && fileExists ( outName ) ) { |
| 3510 | fprintf ( stderr, "%s: Output file %s already exists, skipping.\n", | 1011 | fprintf ( stderr, "%s: Output file %s already exists, skipping.\n", |
| 3511 | progName, outName ); | 1012 | progName, outName ); |
| 3512 | return; | 1013 | return; |
| @@ -3550,6 +1051,7 @@ void uncompress ( Char *name ) | |||
| 3550 | progName, inName ); | 1051 | progName, inName ); |
| 3551 | return; | 1052 | return; |
| 3552 | }; | 1053 | }; |
| 1054 | setInterimPermissions ( outName ); | ||
| 3553 | break; | 1055 | break; |
| 3554 | 1056 | ||
| 3555 | default: | 1057 | default: |
| @@ -3571,7 +1073,7 @@ void uncompress ( Char *name ) | |||
| 3571 | /*--- If there was an I/O error, we won't get here. ---*/ | 1073 | /*--- If there was an I/O error, we won't get here. ---*/ |
| 3572 | if ( magicNumberOK ) { | 1074 | if ( magicNumberOK ) { |
| 3573 | if ( srcMode == SM_F2F ) { | 1075 | if ( srcMode == SM_F2F ) { |
| 3574 | copyDateAndPermissions ( inName, outName ); | 1076 | copyDatePermissionsAndOwner ( inName, outName ); |
| 3575 | if ( !keepInputFiles ) { | 1077 | if ( !keepInputFiles ) { |
| 3576 | IntNative retVal = remove ( inName ); | 1078 | IntNative retVal = remove ( inName ); |
| 3577 | ERROR_IF_NOT_ZERO ( retVal ); | 1079 | ERROR_IF_NOT_ZERO ( retVal ); |
| @@ -3607,11 +1109,11 @@ void testf ( Char *name ) | |||
| 3607 | if (name == NULL && srcMode != SM_I2O) | 1109 | if (name == NULL && srcMode != SM_I2O) |
| 3608 | panic ( "testf: bad modes\n" ); | 1110 | panic ( "testf: bad modes\n" ); |
| 3609 | 1111 | ||
| 3610 | strcpy ( outName, "(none)" ); | 1112 | copyFileName ( outName, "(none)" ); |
| 3611 | switch (srcMode) { | 1113 | switch (srcMode) { |
| 3612 | case SM_I2O: strcpy ( inName, "(stdin)" ); break; | 1114 | case SM_I2O: copyFileName ( inName, "(stdin)" ); break; |
| 3613 | case SM_F2F: strcpy ( inName, name ); break; | 1115 | case SM_F2F: copyFileName ( inName, name ); break; |
| 3614 | case SM_F2O: strcpy ( inName, name ); break; | 1116 | case SM_F2O: copyFileName ( inName, name ); break; |
| 3615 | } | 1117 | } |
| 3616 | 1118 | ||
| 3617 | if ( srcMode != SM_I2O && containsDubiousChars ( inName ) ) { | 1119 | if ( srcMode != SM_I2O && containsDubiousChars ( inName ) ) { |
| @@ -3630,7 +1132,7 @@ void testf ( Char *name ) | |||
| 3630 | progName, inName ); | 1132 | progName, inName ); |
| 3631 | return; | 1133 | return; |
| 3632 | } | 1134 | } |
| 3633 | if ( srcMode != SM_I2O && notABogStandardFile ( inName )) { | 1135 | if ( srcMode != SM_I2O && notAStandardFile ( inName )) { |
| 3634 | fprintf ( stderr, "%s: Input file %s is not a normal file, skipping.\n", | 1136 | fprintf ( stderr, "%s: Input file %s is not a normal file, skipping.\n", |
| 3635 | progName, inName ); | 1137 | progName, inName ); |
| 3636 | return; | 1138 | return; |
| @@ -3684,25 +1186,18 @@ void license ( void ) | |||
| 3684 | fprintf ( stderr, | 1186 | fprintf ( stderr, |
| 3685 | 1187 | ||
| 3686 | "bzip2, a block-sorting file compressor. " | 1188 | "bzip2, a block-sorting file compressor. " |
| 3687 | "Version 0.1pl2, 29-Aug-97.\n" | 1189 | "Version 0.9.0c, 18-Oct-98.\n" |
| 3688 | " \n" | 1190 | " \n" |
| 3689 | " Copyright (C) 1996, 1997 by Julian Seward.\n" | 1191 | " Copyright (C) 1996, 1997, 1998 by Julian Seward.\n" |
| 3690 | " \n" | 1192 | " \n" |
| 3691 | " This program is free software; you can redistribute it and/or modify\n" | 1193 | " This program is free software; you can redistribute it and/or modify\n" |
| 3692 | " it under the terms of the GNU General Public License as published by\n" | 1194 | " it under the terms set out in the LICENSE file, which is included\n" |
| 3693 | " the Free Software Foundation; either version 2 of the License, or\n" | 1195 | " in the bzip2-0.9.0c source distribution.\n" |
| 3694 | " (at your option) any later version.\n" | ||
| 3695 | " \n" | 1196 | " \n" |
| 3696 | " This program is distributed in the hope that it will be useful,\n" | 1197 | " This program is distributed in the hope that it will be useful,\n" |
| 3697 | " but WITHOUT ANY WARRANTY; without even the implied warranty of\n" | 1198 | " but WITHOUT ANY WARRANTY; without even the implied warranty of\n" |
| 3698 | " MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the\n" | 1199 | " MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the\n" |
| 3699 | " GNU General Public License for more details.\n" | 1200 | " LICENSE file for more details.\n" |
| 3700 | " \n" | ||
| 3701 | " You should have received a copy of the GNU General Public License\n" | ||
| 3702 | " along with this program; if not, write to the Free Software\n" | ||
| 3703 | " Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.\n" | ||
| 3704 | " \n" | ||
| 3705 | " The GNU General Public License is contained in the file LICENSE.\n" | ||
| 3706 | " \n" | 1201 | " \n" |
| 3707 | ); | 1202 | ); |
| 3708 | } | 1203 | } |
| @@ -3714,16 +1209,17 @@ void usage ( Char *fullProgName ) | |||
| 3714 | fprintf ( | 1209 | fprintf ( |
| 3715 | stderr, | 1210 | stderr, |
| 3716 | "bzip2, a block-sorting file compressor. " | 1211 | "bzip2, a block-sorting file compressor. " |
| 3717 | "Version 0.1pl2, 29-Aug-97.\n" | 1212 | "Version 0.9.0c, 18-Oct-98.\n" |
| 3718 | "\n usage: %s [flags and input files in any order]\n" | 1213 | "\n usage: %s [flags and input files in any order]\n" |
| 3719 | "\n" | 1214 | "\n" |
| 3720 | " -h --help print this message\n" | 1215 | " -h --help print this message\n" |
| 3721 | " -d --decompress force decompression\n" | 1216 | " -d --decompress force decompression\n" |
| 3722 | " -f --compress force compression\n" | 1217 | " -z --compress force compression\n" |
| 1218 | " -k --keep keep (don't delete) input files\n" | ||
| 1219 | " -f --force overwrite existing output filess\n" | ||
| 3723 | " -t --test test compressed file integrity\n" | 1220 | " -t --test test compressed file integrity\n" |
| 3724 | " -c --stdout output to standard out\n" | 1221 | " -c --stdout output to standard out\n" |
| 3725 | " -v --verbose be verbose (a 2nd -v gives more)\n" | 1222 | " -v --verbose be verbose (a 2nd -v gives more)\n" |
| 3726 | " -k --keep keep (don't delete) input files\n" | ||
| 3727 | " -L --license display software version & license\n" | 1223 | " -L --license display software version & license\n" |
| 3728 | " -V --version display software version & license\n" | 1224 | " -V --version display software version & license\n" |
| 3729 | " -s --small use less memory (at most 2500k)\n" | 1225 | " -s --small use less memory (at most 2500k)\n" |
| @@ -3731,15 +1227,16 @@ void usage ( Char *fullProgName ) | |||
| 3731 | " --repetitive-fast compress repetitive blocks faster\n" | 1227 | " --repetitive-fast compress repetitive blocks faster\n" |
| 3732 | " --repetitive-best compress repetitive blocks better\n" | 1228 | " --repetitive-best compress repetitive blocks better\n" |
| 3733 | "\n" | 1229 | "\n" |
| 3734 | " If invoked as `bzip2', the default action is to compress.\n" | 1230 | " If invoked as `bzip2', default action is to compress.\n" |
| 3735 | " as `bunzip2', the default action is to decompress.\n" | 1231 | " as `bunzip2', default action is to decompress.\n" |
| 1232 | " as `bz2cat', default action is to decompress to stdout.\n" | ||
| 3736 | "\n" | 1233 | "\n" |
| 3737 | " If no file names are given, bzip2 compresses or decompresses\n" | 1234 | " If no file names are given, bzip2 compresses or decompresses\n" |
| 3738 | " from standard input to standard output. You can combine\n" | 1235 | " from standard input to standard output. You can combine\n" |
| 3739 | " flags, so `-v -4' means the same as -v4 or -4v, &c.\n" | 1236 | " short flags, so `-v -4' means the same as -v4 or -4v, &c.\n" |
| 3740 | #if BZ_UNIX | 1237 | #if BZ_UNIX |
| 3741 | "\n" | 1238 | "\n" |
| 3742 | #endif | 1239 | #endif |
| 3743 | , | 1240 | , |
| 3744 | 1241 | ||
| 3745 | fullProgName | 1242 | fullProgName |
| @@ -3776,14 +1273,7 @@ void *myMalloc ( Int32 n ) | |||
| 3776 | void* p; | 1273 | void* p; |
| 3777 | 1274 | ||
| 3778 | p = malloc ( (size_t)n ); | 1275 | p = malloc ( (size_t)n ); |
| 3779 | if (p == NULL) { | 1276 | if (p == NULL) outOfMemory (); |
| 3780 | fprintf ( | ||
| 3781 | stderr, | ||
| 3782 | "%s: `malloc' failed on request for %d bytes.\n", | ||
| 3783 | progName, n | ||
| 3784 | ); | ||
| 3785 | exit ( 1 ); | ||
| 3786 | } | ||
| 3787 | return p; | 1277 | return p; |
| 3788 | } | 1278 | } |
| 3789 | 1279 | ||
| @@ -3817,7 +1307,6 @@ Cell *snocString ( Cell *root, Char *name ) | |||
| 3817 | } | 1307 | } |
| 3818 | 1308 | ||
| 3819 | 1309 | ||
| 3820 | |||
| 3821 | /*---------------------------------------------*/ | 1310 | /*---------------------------------------------*/ |
| 3822 | #define ISFLAG(s) (strcmp(aa->name, (s))==0) | 1311 | #define ISFLAG(s) (strcmp(aa->name, (s))==0) |
| 3823 | 1312 | ||
| @@ -3829,11 +1318,6 @@ IntNative main ( IntNative argc, Char *argv[] ) | |||
| 3829 | Cell *argList; | 1318 | Cell *argList; |
| 3830 | Cell *aa; | 1319 | Cell *aa; |
| 3831 | 1320 | ||
| 3832 | |||
| 3833 | #if DEBUG | ||
| 3834 | fprintf ( stderr, "bzip2: *** compiled with debugging ON ***\n" ); | ||
| 3835 | #endif | ||
| 3836 | |||
| 3837 | /*-- Be really really really paranoid :-) --*/ | 1321 | /*-- Be really really really paranoid :-) --*/ |
| 3838 | if (sizeof(Int32) != 4 || sizeof(UInt32) != 4 || | 1322 | if (sizeof(Int32) != 4 || sizeof(UInt32) != 4 || |
| 3839 | sizeof(Int16) != 2 || sizeof(UInt16) != 2 || | 1323 | sizeof(Int16) != 2 || sizeof(UInt16) != 2 || |
| @@ -3844,7 +1328,7 @@ IntNative main ( IntNative argc, Char *argv[] ) | |||
| 3844 | "\tof 4, 2 and 1 bytes to run properly, and they don't.\n" | 1328 | "\tof 4, 2 and 1 bytes to run properly, and they don't.\n" |
| 3845 | "\tProbably you can fix this by defining them correctly,\n" | 1329 | "\tProbably you can fix this by defining them correctly,\n" |
| 3846 | "\tand recompiling. Bye!\n" ); | 1330 | "\tand recompiling. Bye!\n" ); |
| 3847 | exit(1); | 1331 | exit(3); |
| 3848 | } | 1332 | } |
| 3849 | 1333 | ||
| 3850 | 1334 | ||
| @@ -3852,35 +1336,28 @@ IntNative main ( IntNative argc, Char *argv[] ) | |||
| 3852 | signal (SIGINT, mySignalCatcher); | 1336 | signal (SIGINT, mySignalCatcher); |
| 3853 | signal (SIGTERM, mySignalCatcher); | 1337 | signal (SIGTERM, mySignalCatcher); |
| 3854 | signal (SIGSEGV, mySIGSEGVorSIGBUScatcher); | 1338 | signal (SIGSEGV, mySIGSEGVorSIGBUScatcher); |
| 3855 | #if BZ_UNIX | 1339 | #if BZ_UNIX |
| 3856 | signal (SIGHUP, mySignalCatcher); | 1340 | signal (SIGHUP, mySignalCatcher); |
| 3857 | signal (SIGBUS, mySIGSEGVorSIGBUScatcher); | 1341 | signal (SIGBUS, mySIGSEGVorSIGBUScatcher); |
| 3858 | #endif | 1342 | #endif |
| 3859 | 1343 | ||
| 3860 | 1344 | ||
| 3861 | /*-- Initialise --*/ | 1345 | /*-- Initialise --*/ |
| 3862 | outputHandleJustInCase = NULL; | 1346 | outputHandleJustInCase = NULL; |
| 3863 | ftab = NULL; | ||
| 3864 | ll4 = NULL; | ||
| 3865 | ll16 = NULL; | ||
| 3866 | ll8 = NULL; | ||
| 3867 | tt = NULL; | ||
| 3868 | block = NULL; | ||
| 3869 | zptr = NULL; | ||
| 3870 | smallMode = False; | 1347 | smallMode = False; |
| 3871 | keepInputFiles = False; | 1348 | keepInputFiles = False; |
| 1349 | forceOverwrite = False; | ||
| 3872 | verbosity = 0; | 1350 | verbosity = 0; |
| 3873 | blockSize100k = 9; | 1351 | blockSize100k = 9; |
| 3874 | testFailsExist = False; | 1352 | testFailsExist = False; |
| 3875 | bsStream = NULL; | ||
| 3876 | numFileNames = 0; | 1353 | numFileNames = 0; |
| 3877 | numFilesProcessed = 0; | 1354 | numFilesProcessed = 0; |
| 3878 | workFactor = 30; | 1355 | workFactor = 30; |
| 3879 | 1356 | ||
| 3880 | strcpy ( inName, "(none)" ); | 1357 | copyFileName ( inName, "(none)" ); |
| 3881 | strcpy ( outName, "(none)" ); | 1358 | copyFileName ( outName, "(none)" ); |
| 3882 | 1359 | ||
| 3883 | strcpy ( progNameReally, argv[0] ); | 1360 | copyFileName ( progNameReally, argv[0] ); |
| 3884 | progName = &progNameReally[0]; | 1361 | progName = &progNameReally[0]; |
| 3885 | for (tmp = &progNameReally[0]; *tmp != '\0'; tmp++) | 1362 | for (tmp = &progNameReally[0]; *tmp != '\0'; tmp++) |
| 3886 | if (*tmp == PATH_SEP) progName = tmp + 1; | 1363 | if (*tmp == PATH_SEP) progName = tmp + 1; |
| @@ -3903,20 +1380,26 @@ IntNative main ( IntNative argc, Char *argv[] ) | |||
| 3903 | } | 1380 | } |
| 3904 | 1381 | ||
| 3905 | 1382 | ||
| 3906 | /*-- Determine what to do (compress/uncompress/test). --*/ | 1383 | /*-- Determine source modes; flag handling may change this too. --*/ |
| 1384 | if (numFileNames == 0) | ||
| 1385 | srcMode = SM_I2O; else srcMode = SM_F2F; | ||
| 1386 | |||
| 1387 | |||
| 1388 | /*-- Determine what to do (compress/uncompress/test/cat). --*/ | ||
| 3907 | /*-- Note that subsequent flag handling may change this. --*/ | 1389 | /*-- Note that subsequent flag handling may change this. --*/ |
| 3908 | opMode = OM_Z; | 1390 | opMode = OM_Z; |
| 3909 | 1391 | ||
| 3910 | if ( (strcmp ( "bunzip2", progName ) == 0) || | 1392 | if ( (strstr ( progName, "unzip" ) != 0) || |
| 3911 | (strcmp ( "BUNZIP2", progName ) == 0) || | 1393 | (strstr ( progName, "UNZIP" ) != 0) ) |
| 3912 | (strcmp ( "bunzip2.exe", progName ) == 0) || | ||
| 3913 | (strcmp ( "BUNZIP2.EXE", progName ) == 0) ) | ||
| 3914 | opMode = OM_UNZ; | 1394 | opMode = OM_UNZ; |
| 3915 | 1395 | ||
| 3916 | 1396 | if ( (strstr ( progName, "z2cat" ) != 0) || | |
| 3917 | /*-- Determine source modes; flag handling may change this too. --*/ | 1397 | (strstr ( progName, "Z2CAT" ) != 0) || |
| 3918 | if (numFileNames == 0) | 1398 | (strstr ( progName, "zcat" ) != 0) || |
| 3919 | srcMode = SM_I2O; else srcMode = SM_F2F; | 1399 | (strstr ( progName, "ZCAT" ) != 0) ) { |
| 1400 | opMode = OM_UNZ; | ||
| 1401 | srcMode = (numFileNames == 0) ? SM_I2O : SM_F2O; | ||
| 1402 | } | ||
| 3920 | 1403 | ||
| 3921 | 1404 | ||
| 3922 | /*-- Look at the flags. --*/ | 1405 | /*-- Look at the flags. --*/ |
| @@ -3926,7 +1409,8 @@ IntNative main ( IntNative argc, Char *argv[] ) | |||
| 3926 | switch (aa->name[j]) { | 1409 | switch (aa->name[j]) { |
| 3927 | case 'c': srcMode = SM_F2O; break; | 1410 | case 'c': srcMode = SM_F2O; break; |
| 3928 | case 'd': opMode = OM_UNZ; break; | 1411 | case 'd': opMode = OM_UNZ; break; |
| 3929 | case 'f': opMode = OM_Z; break; | 1412 | case 'z': opMode = OM_Z; break; |
| 1413 | case 'f': forceOverwrite = True; break; | ||
| 3930 | case 't': opMode = OM_TEST; break; | 1414 | case 't': opMode = OM_TEST; break; |
| 3931 | case 'k': keepInputFiles = True; break; | 1415 | case 'k': keepInputFiles = True; break; |
| 3932 | case 's': smallMode = True; break; | 1416 | case 's': smallMode = True; break; |
| @@ -3957,6 +1441,7 @@ IntNative main ( IntNative argc, Char *argv[] ) | |||
| 3957 | if (ISFLAG("--stdout")) srcMode = SM_F2O; else | 1441 | if (ISFLAG("--stdout")) srcMode = SM_F2O; else |
| 3958 | if (ISFLAG("--decompress")) opMode = OM_UNZ; else | 1442 | if (ISFLAG("--decompress")) opMode = OM_UNZ; else |
| 3959 | if (ISFLAG("--compress")) opMode = OM_Z; else | 1443 | if (ISFLAG("--compress")) opMode = OM_Z; else |
| 1444 | if (ISFLAG("--force")) forceOverwrite = True; else | ||
| 3960 | if (ISFLAG("--test")) opMode = OM_TEST; else | 1445 | if (ISFLAG("--test")) opMode = OM_TEST; else |
| 3961 | if (ISFLAG("--keep")) keepInputFiles = True; else | 1446 | if (ISFLAG("--keep")) keepInputFiles = True; else |
| 3962 | if (ISFLAG("--small")) smallMode = True; else | 1447 | if (ISFLAG("--small")) smallMode = True; else |
| @@ -3974,14 +1459,9 @@ IntNative main ( IntNative argc, Char *argv[] ) | |||
| 3974 | } | 1459 | } |
| 3975 | } | 1460 | } |
| 3976 | 1461 | ||
| 1462 | if (verbosity > 4) verbosity = 4; | ||
| 3977 | if (opMode == OM_Z && smallMode) blockSize100k = 2; | 1463 | if (opMode == OM_Z && smallMode) blockSize100k = 2; |
| 3978 | 1464 | ||
| 3979 | if (opMode == OM_Z && srcMode == SM_F2O && numFileNames > 1) { | ||
| 3980 | fprintf ( stderr, "%s: I won't compress multiple files to stdout.\n", | ||
| 3981 | progName ); | ||
| 3982 | exit ( 1 ); | ||
| 3983 | } | ||
| 3984 | |||
| 3985 | if (srcMode == SM_F2O && numFileNames == 0) { | 1465 | if (srcMode == SM_F2O && numFileNames == 0) { |
| 3986 | fprintf ( stderr, "%s: -c expects at least one filename.\n", | 1466 | fprintf ( stderr, "%s: -c expects at least one filename.\n", |
| 3987 | progName ); | 1467 | progName ); |
| @@ -3997,7 +1477,6 @@ IntNative main ( IntNative argc, Char *argv[] ) | |||
| 3997 | if (opMode != OM_Z) blockSize100k = 0; | 1477 | if (opMode != OM_Z) blockSize100k = 0; |
| 3998 | 1478 | ||
| 3999 | if (opMode == OM_Z) { | 1479 | if (opMode == OM_Z) { |
| 4000 | allocateCompressStructures(); | ||
| 4001 | if (srcMode == SM_I2O) | 1480 | if (srcMode == SM_I2O) |
| 4002 | compress ( NULL ); | 1481 | compress ( NULL ); |
| 4003 | else | 1482 | else |
| @@ -1,22 +1,22 @@ | |||
| 1 | 1 | ||
| 2 | |||
| 3 | |||
| 4 | bzip2(1) bzip2(1) | 2 | bzip2(1) bzip2(1) |
| 5 | 3 | ||
| 6 | 4 | ||
| 7 | NAME | 5 | NAME |
| 8 | bzip2, bunzip2 - a block-sorting file compressor, v0.1 | 6 | bzip2, bunzip2 - a block-sorting file compressor, v0.9.0 |
| 7 | bzcat - decompresses files to stdout | ||
| 9 | bzip2recover - recovers data from damaged bzip2 files | 8 | bzip2recover - recovers data from damaged bzip2 files |
| 10 | 9 | ||
| 11 | 10 | ||
| 12 | SYNOPSIS | 11 | SYNOPSIS |
| 13 | bzip2 [ -cdfkstvVL123456789 ] [ filenames ... ] | 12 | bzip2 [ -cdfkstvzVL123456789 ] [ filenames ... ] |
| 14 | bunzip2 [ -kvsVL ] [ filenames ... ] | 13 | bunzip2 [ -fkvsVL ] [ filenames ... ] |
| 14 | bzcat [ -s ] [ filenames ... ] | ||
| 15 | bzip2recover filename | 15 | bzip2recover filename |
| 16 | 16 | ||
| 17 | 17 | ||
| 18 | DESCRIPTION | 18 | DESCRIPTION |
| 19 | Bzip2 compresses files using the Burrows-Wheeler block- | 19 | bzip2 compresses files using the Burrows-Wheeler block- |
| 20 | sorting text compression algorithm, and Huffman coding. | 20 | sorting text compression algorithm, and Huffman coding. |
| 21 | Compression is generally considerably better than that | 21 | Compression is generally considerably better than that |
| 22 | achieved by more conventional LZ77/LZ78-based compressors, | 22 | achieved by more conventional LZ77/LZ78-based compressors, |
| @@ -26,7 +26,7 @@ DESCRIPTION | |||
| 26 | The command-line options are deliberately very similar to | 26 | The command-line options are deliberately very similar to |
| 27 | those of GNU Gzip, but they are not identical. | 27 | those of GNU Gzip, but they are not identical. |
| 28 | 28 | ||
| 29 | Bzip2 expects a list of file names to accompany the com- | 29 | bzip2 expects a list of file names to accompany the com- |
| 30 | mand-line flags. Each file is replaced by a compressed | 30 | mand-line flags. Each file is replaced by a compressed |
| 31 | version of itself, with the name "original_name.bz2". | 31 | version of itself, with the name "original_name.bz2". |
| 32 | Each compressed file has the same modification date and | 32 | Each compressed file has the same modification date and |
| @@ -38,8 +38,8 @@ DESCRIPTION | |||
| 38 | cepts, or have serious file name length restrictions, such | 38 | cepts, or have serious file name length restrictions, such |
| 39 | as MS-DOS. | 39 | as MS-DOS. |
| 40 | 40 | ||
| 41 | Bzip2 and bunzip2 will not overwrite existing files; if | 41 | bzip2 and bunzip2 will by default not overwrite existing |
| 42 | you want this to happen, you should delete them first. | 42 | files; if you want this to happen, specify the -f flag. |
| 43 | 43 | ||
| 44 | If no file names are specified, bzip2 compresses from | 44 | If no file names are specified, bzip2 compresses from |
| 45 | standard input to standard output. In this case, bzip2 | 45 | standard input to standard output. In this case, bzip2 |
| @@ -47,28 +47,29 @@ DESCRIPTION | |||
| 47 | this would be entirely incomprehensible and therefore | 47 | this would be entirely incomprehensible and therefore |
| 48 | pointless. | 48 | pointless. |
| 49 | 49 | ||
| 50 | Bunzip2 (or bzip2 -d ) decompresses and restores all spec- | 50 | bunzip2 (or bzip2 -d ) decompresses and restores all spec- |
| 51 | ified files whose names end in ".bz2". Files without this | 51 | ified files whose names end in ".bz2". Files without this |
| 52 | suffix are ignored. Again, supplying no filenames causes | 52 | suffix are ignored. Again, supplying no filenames causes |
| 53 | decompression from standard input to standard output. | 53 | decompression from standard input to standard output. |
| 54 | 54 | ||
| 55 | You can also compress or decompress files to the standard | 55 | bunzip2 will correctly decompress a file which is the con- |
| 56 | output by giving the -c flag. You can decompress multiple | 56 | catenation of two or more compressed files. The result is |
| 57 | files like this, but you may only compress a single file | 57 | the concatenation of the corresponding uncompressed files. |
| 58 | this way, since it would otherwise be difficult to sepa- | 58 | Integrity testing (-t) of concatenated compressed files is |
| 59 | rate out the compressed representations of the original | 59 | also supported. |
| 60 | files. | ||
| 61 | |||
| 62 | |||
| 63 | |||
| 64 | 1 | ||
| 65 | |||
| 66 | |||
| 67 | |||
| 68 | |||
| 69 | |||
| 70 | bzip2(1) bzip2(1) | ||
| 71 | 60 | ||
| 61 | You can also compress or decompress files to the standard | ||
| 62 | output by giving the -c flag. Multiple files may be com- | ||
| 63 | pressed and decompressed like this. The resulting outputs | ||
| 64 | are fed sequentially to stdout. Compression of multiple | ||
| 65 | files in this manner generates a stream containing multi- | ||
| 66 | ple compressed file representations. Such a stream can be | ||
| 67 | decompressed correctly only by bzip2 version 0.9.0 or | ||
| 68 | later. Earlier versions of bzip2 will stop after decom- | ||
| 69 | pressing the first file in the stream. | ||
| 70 | |||
| 71 | bzcat (or bzip2 -dc ) decompresses all specified files to | ||
| 72 | the standard output. | ||
| 72 | 73 | ||
| 73 | Compression is always performed, even if the compressed | 74 | Compression is always performed, even if the compressed |
| 74 | file is slightly larger than the original. Files of less | 75 | file is slightly larger than the original. Files of less |
| @@ -108,13 +109,14 @@ MEMORY MANAGEMENT | |||
| 108 | file, and bunzip2 then allocates itself just enough memory | 109 | file, and bunzip2 then allocates itself just enough memory |
| 109 | to decompress the file. Since block sizes are stored in | 110 | to decompress the file. Since block sizes are stored in |
| 110 | compressed files, it follows that the flags -1 to -9 are | 111 | compressed files, it follows that the flags -1 to -9 are |
| 111 | irrelevant to and so ignored during decompression. Com- | 112 | irrelevant to and so ignored during decompression. |
| 112 | pression and decompression requirements, in bytes, can be | 113 | |
| 113 | estimated as: | 114 | Compression and decompression requirements, in bytes, can |
| 115 | be estimated as: | ||
| 114 | 116 | ||
| 115 | Compression: 400k + ( 7 x block size ) | 117 | Compression: 400k + ( 7 x block size ) |
| 116 | 118 | ||
| 117 | Decompression: 100k + ( 5 x block size ), or | 119 | Decompression: 100k + ( 4 x block size ), or |
| 118 | 100k + ( 2.5 x block size ) | 120 | 100k + ( 2.5 x block size ) |
| 119 | 121 | ||
| 120 | Larger block sizes give rapidly diminishing marginal | 122 | Larger block sizes give rapidly diminishing marginal |
| @@ -125,19 +127,8 @@ MEMORY MANAGEMENT | |||
| 125 | requirement is set at compression-time by the choice of | 127 | requirement is set at compression-time by the choice of |
| 126 | block size. | 128 | block size. |
| 127 | 129 | ||
| 128 | |||
| 129 | |||
| 130 | 2 | ||
| 131 | |||
| 132 | |||
| 133 | |||
| 134 | |||
| 135 | |||
| 136 | bzip2(1) bzip2(1) | ||
| 137 | |||
| 138 | |||
| 139 | For files compressed with the default 900k block size, | 130 | For files compressed with the default 900k block size, |
| 140 | bunzip2 will require about 4600 kbytes to decompress. To | 131 | bunzip2 will require about 3700 kbytes to decompress. To |
| 141 | support decompression of any file on a 4 megabyte machine, | 132 | support decompression of any file on a 4 megabyte machine, |
| 142 | bunzip2 has an option to decompress using approximately | 133 | bunzip2 has an option to decompress using approximately |
| 143 | half this amount of memory, about 2300 kbytes. Decompres- | 134 | half this amount of memory, about 2300 kbytes. Decompres- |
| @@ -157,8 +148,8 @@ bzip2(1) bzip2(1) | |||
| 157 | file 20,000 bytes long with the flag -9 will cause the | 148 | file 20,000 bytes long with the flag -9 will cause the |
| 158 | compressor to allocate around 6700k of memory, but only | 149 | compressor to allocate around 6700k of memory, but only |
| 159 | touch 400k + 20000 * 7 = 540 kbytes of it. Similarly, the | 150 | touch 400k + 20000 * 7 = 540 kbytes of it. Similarly, the |
| 160 | decompressor will allocate 4600k but only touch 100k + | 151 | decompressor will allocate 3700k but only touch 100k + |
| 161 | 20000 * 5 = 200 kbytes. | 152 | 20000 * 4 = 180 kbytes. |
| 162 | 153 | ||
| 163 | Here is a table which summarises the maximum memory usage | 154 | Here is a table which summarises the maximum memory usage |
| 164 | for different block sizes. Also recorded is the total | 155 | for different block sizes. Also recorded is the total |
| @@ -172,15 +163,15 @@ bzip2(1) bzip2(1) | |||
| 172 | Compress Decompress Decompress Corpus | 163 | Compress Decompress Decompress Corpus |
| 173 | Flag usage usage -s usage Size | 164 | Flag usage usage -s usage Size |
| 174 | 165 | ||
| 175 | -1 1100k 600k 350k 914704 | 166 | -1 1100k 500k 350k 914704 |
| 176 | -2 1800k 1100k 600k 877703 | 167 | -2 1800k 900k 600k 877703 |
| 177 | -3 2500k 1600k 850k 860338 | 168 | -3 2500k 1300k 850k 860338 |
| 178 | -4 3200k 2100k 1100k 846899 | 169 | -4 3200k 1700k 1100k 846899 |
| 179 | -5 3900k 2600k 1350k 845160 | 170 | -5 3900k 2100k 1350k 845160 |
| 180 | -6 4600k 3100k 1600k 838626 | 171 | -6 4600k 2500k 1600k 838626 |
| 181 | -7 5400k 3600k 1850k 834096 | 172 | -7 5400k 2900k 1850k 834096 |
| 182 | -8 6000k 4100k 2100k 828642 | 173 | -8 6000k 3300k 2100k 828642 |
| 183 | -9 6700k 4600k 2350k 828642 | 174 | -9 6700k 3700k 2350k 828642 |
| 184 | 175 | ||
| 185 | 176 | ||
| 186 | OPTIONS | 177 | OPTIONS |
| @@ -189,47 +180,37 @@ OPTIONS | |||
| 189 | decompress multiple files to stdout, but will only | 180 | decompress multiple files to stdout, but will only |
| 190 | compress a single file to stdout. | 181 | compress a single file to stdout. |
| 191 | 182 | ||
| 192 | |||
| 193 | |||
| 194 | |||
| 195 | |||
| 196 | 3 | ||
| 197 | |||
| 198 | |||
| 199 | |||
| 200 | |||
| 201 | |||
| 202 | bzip2(1) bzip2(1) | ||
| 203 | |||
| 204 | |||
| 205 | -d --decompress | 183 | -d --decompress |
| 206 | Force decompression. Bzip2 and bunzip2 are really | 184 | Force decompression. bzip2, bunzip2 and bzcat are |
| 207 | the same program, and the decision about whether to | 185 | really the same program, and the decision about |
| 208 | compress or decompress is done on the basis of | 186 | what actions to take is done on the basis of which |
| 209 | which name is used. This flag overrides that mech- | 187 | name is used. This flag overrides that mechanism, |
| 210 | anism, and forces bzip2 to decompress. | 188 | and forces bzip2 to decompress. |
| 211 | 189 | ||
| 212 | -f --compress | 190 | -z --compress |
| 213 | The complement to -d: forces compression, regard- | 191 | The complement to -d: forces compression, regard- |
| 214 | less of the invokation name. | 192 | less of the invokation name. |
| 215 | 193 | ||
| 216 | -t --test | 194 | -t --test |
| 217 | Check integrity of the specified file(s), but don't | 195 | Check integrity of the specified file(s), but don't |
| 218 | decompress them. This really performs a trial | 196 | decompress them. This really performs a trial |
| 219 | decompression and throws away the result, using the | 197 | decompression and throws away the result. |
| 220 | low-memory decompression algorithm (see -s). | 198 | |
| 199 | -f --force | ||
| 200 | Force overwrite of output files. Normally, bzip2 | ||
| 201 | will not overwrite existing output files. | ||
| 221 | 202 | ||
| 222 | -k --keep | 203 | -k --keep |
| 223 | Keep (don't delete) input files during compression | 204 | Keep (don't delete) input files during compression |
| 224 | or decompression. | 205 | or decompression. |
| 225 | 206 | ||
| 226 | -s --small | 207 | -s --small |
| 227 | Reduce memory usage, both for compression and | 208 | Reduce memory usage, for compression, decompression |
| 228 | decompression. Files are decompressed using a mod- | 209 | and testing. Files are decompressed and tested |
| 229 | ified algorithm which only requires 2.5 bytes per | 210 | using a modified algorithm which only requires 2.5 |
| 230 | block byte. This means any file can be decom- | 211 | bytes per block byte. This means any file can be |
| 231 | pressed in 2300k of memory, albeit somewhat more | 212 | decompressed in 2300k of memory, albeit at about |
| 232 | slowly than usual. | 213 | half the normal speed. |
| 233 | 214 | ||
| 234 | During compression, -s selects a block size of | 215 | During compression, -s selects a block size of |
| 235 | 200k, which limits memory use to around the same | 216 | 200k, which limits memory use to around the same |
| @@ -238,36 +219,21 @@ bzip2(1) bzip2(1) | |||
| 238 | megabytes or less), use -s for everything. See | 219 | megabytes or less), use -s for everything. See |
| 239 | MEMORY MANAGEMENT above. | 220 | MEMORY MANAGEMENT above. |
| 240 | 221 | ||
| 241 | |||
| 242 | -v --verbose | 222 | -v --verbose |
| 243 | Verbose mode -- show the compression ratio for each | 223 | Verbose mode -- show the compression ratio for each |
| 244 | file processed. Further -v's increase the ver- | 224 | file processed. Further -v's increase the ver- |
| 245 | bosity level, spewing out lots of information which | 225 | bosity level, spewing out lots of information which |
| 246 | is primarily of interest for diagnostic purposes. | 226 | is primarily of interest for diagnostic purposes. |
| 247 | 227 | ||
| 248 | -L --license | 228 | -L --license -V --version |
| 249 | Display the software version, license terms and | 229 | Display the software version, license terms and |
| 250 | conditions. | 230 | conditions. |
| 251 | 231 | ||
| 252 | -V --version | ||
| 253 | Same as -L. | ||
| 254 | |||
| 255 | -1 to -9 | 232 | -1 to -9 |
| 256 | Set the block size to 100 k, 200 k .. 900 k when | 233 | Set the block size to 100 k, 200 k .. 900 k when |
| 257 | compressing. Has no effect when decompressing. | 234 | compressing. Has no effect when decompressing. |
| 258 | See MEMORY MANAGEMENT above. | 235 | See MEMORY MANAGEMENT above. |
| 259 | 236 | ||
| 260 | |||
| 261 | |||
| 262 | 4 | ||
| 263 | |||
| 264 | |||
| 265 | |||
| 266 | |||
| 267 | |||
| 268 | bzip2(1) bzip2(1) | ||
| 269 | |||
| 270 | |||
| 271 | --repetitive-fast | 237 | --repetitive-fast |
| 272 | bzip2 injects some small pseudo-random variations | 238 | bzip2 injects some small pseudo-random variations |
| 273 | into very repetitive blocks to limit worst-case | 239 | into very repetitive blocks to limit worst-case |
| @@ -278,7 +244,6 @@ bzip2(1) bzip2(1) | |||
| 278 | would take before resorting to randomisation. This | 244 | would take before resorting to randomisation. This |
| 279 | flag makes it give up much sooner. | 245 | flag makes it give up much sooner. |
| 280 | 246 | ||
| 281 | |||
| 282 | --repetitive-best | 247 | --repetitive-best |
| 283 | Opposite of --repetitive-fast; try a lot harder | 248 | Opposite of --repetitive-fast; try a lot harder |
| 284 | before resorting to randomisation. | 249 | before resorting to randomisation. |
| @@ -306,10 +271,10 @@ RECOVERING DATA FROM DAMAGED FILES | |||
| 306 | bzip2recover takes a single argument, the name of the dam- | 271 | bzip2recover takes a single argument, the name of the dam- |
| 307 | aged file, and writes a number of files "rec0001file.bz2", | 272 | aged file, and writes a number of files "rec0001file.bz2", |
| 308 | "rec0002file.bz2", etc, containing the extracted blocks. | 273 | "rec0002file.bz2", etc, containing the extracted blocks. |
| 309 | The output filenames are designed so that the use of wild- | 274 | The output filenames are designed so that the use of |
| 310 | cards in subsequent processing -- for example, "bzip2 -dc | 275 | wildcards in subsequent processing -- for example, "bzip2 |
| 311 | rec*file.bz2 > recovered_data" -- lists the files in the | 276 | -dc rec*file.bz2 > recovered_data" -- lists the files in |
| 312 | "right" order. | 277 | the "right" order. |
| 313 | 278 | ||
| 314 | bzip2recover should be of most use dealing with large .bz2 | 279 | bzip2recover should be of most use dealing with large .bz2 |
| 315 | files, as these will contain many blocks. It is clearly | 280 | files, as these will contain many blocks. It is clearly |
| @@ -322,18 +287,6 @@ RECOVERING DATA FROM DAMAGED FILES | |||
| 322 | 287 | ||
| 323 | PERFORMANCE NOTES | 288 | PERFORMANCE NOTES |
| 324 | The sorting phase of compression gathers together similar | 289 | The sorting phase of compression gathers together similar |
| 325 | |||
| 326 | |||
| 327 | |||
| 328 | 5 | ||
| 329 | |||
| 330 | |||
| 331 | |||
| 332 | |||
| 333 | |||
| 334 | bzip2(1) bzip2(1) | ||
| 335 | |||
| 336 | |||
| 337 | strings in the file. Because of this, files containing | 290 | strings in the file. Because of this, files containing |
| 338 | very long runs of repeated symbols, like "aabaabaabaab | 291 | very long runs of repeated symbols, like "aabaabaabaab |
| 339 | ..." (repeated several hundred times) may compress | 292 | ..." (repeated several hundred times) may compress |
| @@ -348,10 +301,6 @@ bzip2(1) bzip2(1) | |||
| 348 | severe slowness in compression, try making the block size | 301 | severe slowness in compression, try making the block size |
| 349 | as small as possible, with flag -1. | 302 | as small as possible, with flag -1. |
| 350 | 303 | ||
| 351 | Incompressible or virtually-incompressible data may decom- | ||
| 352 | press rather more slowly than one would hope. This is due | ||
| 353 | to a naive implementation of the move-to-front coder. | ||
| 354 | |||
| 355 | bzip2 usually allocates several megabytes of memory to | 304 | bzip2 usually allocates several megabytes of memory to |
| 356 | operate in, and then charges all over it in a fairly ran- | 305 | operate in, and then charges all over it in a fairly ran- |
| 357 | dom fashion. This means that performance, both for com- | 306 | dom fashion. This means that performance, both for com- |
| @@ -362,12 +311,6 @@ bzip2(1) bzip2(1) | |||
| 362 | large performance improvements. I imagine bzip2 will per- | 311 | large performance improvements. I imagine bzip2 will per- |
| 363 | form best on machines with very large caches. | 312 | form best on machines with very large caches. |
| 364 | 313 | ||
| 365 | Test mode (-t) uses the low-memory decompression algorithm | ||
| 366 | (-s). This means test mode does not run as fast as it | ||
| 367 | could; it could run as fast as the normal decompression | ||
| 368 | machinery. This could easily be fixed at the cost of some | ||
| 369 | code bloat. | ||
| 370 | |||
| 371 | 314 | ||
| 372 | CAVEATS | 315 | CAVEATS |
| 373 | I/O error messages are not as helpful as they could be. | 316 | I/O error messages are not as helpful as they could be. |
| @@ -375,91 +318,38 @@ CAVEATS | |||
| 375 | but the details of what the problem is sometimes seem | 318 | but the details of what the problem is sometimes seem |
| 376 | rather misleading. | 319 | rather misleading. |
| 377 | 320 | ||
| 378 | This manual page pertains to version 0.1 of bzip2. It may | 321 | This manual page pertains to version 0.9.0 of bzip2. Com- |
| 379 | well happen that some future version will use a different | 322 | pressed data created by this version is entirely forwards |
| 380 | compressed file format. If you try to decompress, using | 323 | and backwards compatible with the previous public release, |
| 381 | 0.1, a .bz2 file created with some future version which | 324 | version 0.1pl2, but with the following exception: 0.9.0 |
| 382 | uses a different compressed file format, 0.1 will complain | 325 | can correctly decompress multiple concatenated compressed |
| 383 | that your file "is not a bzip2 file". If that happens, | 326 | files. 0.1pl2 cannot do this; it will stop after decom- |
| 384 | you should obtain a more recent version of bzip2 and use | 327 | pressing just the first file in the stream. |
| 385 | that to decompress the file. | ||
| 386 | 328 | ||
| 387 | Wildcard expansion for Windows 95 and NT is flaky. | 329 | Wildcard expansion for Windows 95 and NT is flaky. |
| 388 | 330 | ||
| 389 | bzip2recover uses 32-bit integers to represent bit posi- | 331 | bzip2recover uses 32-bit integers to represent bit posi- |
| 390 | tions in compressed files, so it cannot handle compressed | 332 | tions in compressed files, so it cannot handle compressed |
| 391 | 333 | files more than 512 megabytes long. This could easily be | |
| 392 | |||
| 393 | |||
| 394 | 6 | ||
| 395 | |||
| 396 | |||
| 397 | |||
| 398 | |||
| 399 | |||
| 400 | bzip2(1) bzip2(1) | ||
| 401 | |||
| 402 | |||
| 403 | files more than 512 megabytes long. This could easily be | ||
| 404 | fixed. | 334 | fixed. |
| 405 | 335 | ||
| 406 | bzip2recover sometimes reports a very small, incomplete | ||
| 407 | final block. This is spurious and can be safely ignored. | ||
| 408 | |||
| 409 | |||
| 410 | RELATIONSHIP TO bzip-0.21 | ||
| 411 | This program is a descendant of the bzip program, version | ||
| 412 | 0.21, which I released in August 1996. The primary dif- | ||
| 413 | ference of bzip2 is its avoidance of the possibly patented | ||
| 414 | algorithms which were used in 0.21. bzip2 also brings | ||
| 415 | various useful refinements (-s, -t), uses less memory, | ||
| 416 | decompresses significantly faster, and has support for | ||
| 417 | recovering data from damaged files. | ||
| 418 | |||
| 419 | Because bzip2 uses Huffman coding to construct the com- | ||
| 420 | pressed bitstream, rather than the arithmetic coding used | ||
| 421 | in 0.21, the compressed representations generated by the | ||
| 422 | two programs are incompatible, and they will not interop- | ||
| 423 | erate. The change in suffix from .bz to .bz2 reflects | ||
| 424 | this. It would have been helpful to at least allow bzip2 | ||
| 425 | to decompress files created by 0.21, but this would defeat | ||
| 426 | the primary aim of having a patent-free compressor. | ||
| 427 | |||
| 428 | For a more precise statement about patent issues in bzip2, | ||
| 429 | please see the README file in the distribution. | ||
| 430 | |||
| 431 | Huffman coding necessarily involves some coding ineffi- | ||
| 432 | ciency compared to arithmetic coding. This means that | ||
| 433 | bzip2 compresses about 1% worse than 0.21, an unfortunate | ||
| 434 | but unavoidable fact-of-life. On the other hand, decom- | ||
| 435 | pression is approximately 50% faster for the same reason, | ||
| 436 | and the change in file format gave an opportunity to add | ||
| 437 | data-recovery features. So it is not all bad. | ||
| 438 | |||
| 439 | 336 | ||
| 440 | AUTHOR | 337 | AUTHOR |
| 441 | Julian Seward, jseward@acm.org. | 338 | Julian Seward, jseward@acm.org. |
| 442 | 339 | http://www.muraroa.demon.co.uk | |
| 443 | The ideas embodied in bzip and bzip2 are due to (at least) | 340 | |
| 444 | the following people: Michael Burrows and David Wheeler | 341 | The ideas embodied in bzip2 are due to (at least) the fol- |
| 445 | (for the block sorting transformation), David Wheeler | 342 | lowing people: Michael Burrows and David Wheeler (for the |
| 446 | (again, for the Huffman coder), Peter Fenwick (for the | 343 | block sorting transformation), David Wheeler (again, for |
| 447 | structured coding model in 0.21, and many refinements), | 344 | the Huffman coder), Peter Fenwick (for the structured cod- |
| 448 | and Alistair Moffat, Radford Neal and Ian Witten (for the | 345 | ing model in the original bzip, and many refinements), and |
| 449 | arithmetic coder in 0.21). I am much indebted for their | 346 | Alistair Moffat, Radford Neal and Ian Witten (for the |
| 450 | help, support and advice. See the file ALGORITHMS in the | 347 | arithmetic coder in the original bzip). I am much |
| 451 | source distribution for pointers to sources of documenta- | 348 | indebted for their help, support and advice. See the man- |
| 452 | tion. Christian von Roques encouraged me to look for | 349 | ual in the source distribution for pointers to sources of |
| 453 | faster sorting algorithms, so as to speed up compression. | 350 | documentation. Christian von Roques encouraged me to look |
| 454 | Bela Lubkin encouraged me to improve the worst-case com- | 351 | for faster sorting algorithms, so as to speed up compres- |
| 455 | pression performance. Many people sent patches, helped | 352 | sion. Bela Lubkin encouraged me to improve the worst-case |
| 353 | compression performance. Many people sent patches, helped | ||
| 456 | with portability problems, lent machines, gave advice and | 354 | with portability problems, lent machines, gave advice and |
| 457 | were generally helpful. | 355 | were generally helpful. |
| 458 | |||
| 459 | |||
| 460 | |||
| 461 | |||
| 462 | |||
| 463 | 7 | ||
| 464 | |||
| 465 | |||
diff --git a/bzip2recover.c b/bzip2recover.c index 0eef0e6..0e2822b 100644 --- a/bzip2recover.c +++ b/bzip2recover.c | |||
| @@ -7,43 +7,63 @@ | |||
| 7 | /*-- | 7 | /*-- |
| 8 | This program is bzip2recover, a program to attempt data | 8 | This program is bzip2recover, a program to attempt data |
| 9 | salvage from damaged files created by the accompanying | 9 | salvage from damaged files created by the accompanying |
| 10 | bzip2-0.1 program. | 10 | bzip2-0.9.0c program. |
| 11 | 11 | ||
| 12 | Copyright (C) 1996, 1997 by Julian Seward. | 12 | Copyright (C) 1996-1998 Julian R Seward. All rights reserved. |
| 13 | Guildford, Surrey, UK | 13 | |
| 14 | email: jseward@acm.org | 14 | Redistribution and use in source and binary forms, with or without |
| 15 | 15 | modification, are permitted provided that the following conditions | |
| 16 | This program is free software; you can redistribute it and/or modify | 16 | are met: |
| 17 | it under the terms of the GNU General Public License as published by | 17 | |
| 18 | the Free Software Foundation; either version 2 of the License, or | 18 | 1. Redistributions of source code must retain the above copyright |
| 19 | (at your option) any later version. | 19 | notice, this list of conditions and the following disclaimer. |
| 20 | 20 | ||
| 21 | This program is distributed in the hope that it will be useful, | 21 | 2. The origin of this software must not be misrepresented; you must |
| 22 | but WITHOUT ANY WARRANTY; without even the implied warranty of | 22 | not claim that you wrote the original software. If you use this |
| 23 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | 23 | software in a product, an acknowledgment in the product |
| 24 | GNU General Public License for more details. | 24 | documentation would be appreciated but is not required. |
| 25 | 25 | ||
| 26 | You should have received a copy of the GNU General Public License | 26 | 3. Altered source versions must be plainly marked as such, and must |
| 27 | along with this program; if not, write to the Free Software | 27 | not be misrepresented as being the original software. |
| 28 | Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. | 28 | |
| 29 | 29 | 4. The name of the author may not be used to endorse or promote | |
| 30 | The GNU General Public License is contained in the file LICENSE. | 30 | products derived from this software without specific prior written |
| 31 | permission. | ||
| 32 | |||
| 33 | THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS | ||
| 34 | OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED | ||
| 35 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE | ||
| 36 | ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY | ||
| 37 | DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL | ||
| 38 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE | ||
| 39 | GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS | ||
| 40 | INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, | ||
| 41 | WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING | ||
| 42 | NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS | ||
| 43 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | ||
| 44 | |||
| 45 | Julian Seward, Guildford, Surrey, UK. | ||
| 46 | jseward@acm.org | ||
| 47 | bzip2/libbzip2 version 0.9.0c of 18 October 1998 | ||
| 31 | --*/ | 48 | --*/ |
| 32 | 49 | ||
| 50 | /*-- | ||
| 51 | This program is a complete hack and should be rewritten | ||
| 52 | properly. It isn't very complicated. | ||
| 53 | --*/ | ||
| 33 | 54 | ||
| 34 | #include <stdio.h> | 55 | #include <stdio.h> |
| 35 | #include <errno.h> | 56 | #include <errno.h> |
| 36 | #include <malloc.h> | ||
| 37 | #include <stdlib.h> | 57 | #include <stdlib.h> |
| 38 | #include <strings.h> /*-- or try string.h --*/ | 58 | #include <string.h> |
| 39 | 59 | ||
| 40 | #define UInt32 unsigned int | 60 | typedef unsigned int UInt32; |
| 41 | #define Int32 int | 61 | typedef int Int32; |
| 42 | #define UChar unsigned char | 62 | typedef unsigned char UChar; |
| 43 | #define Char char | 63 | typedef char Char; |
| 44 | #define Bool unsigned char | 64 | typedef unsigned char Bool; |
| 45 | #define True 1 | 65 | #define True ((Bool)1) |
| 46 | #define False 0 | 66 | #define False ((Bool)0) |
| 47 | 67 | ||
| 48 | 68 | ||
| 49 | Char inFileName[2000]; | 69 | Char inFileName[2000]; |
| @@ -191,8 +211,9 @@ void bsClose ( BitStream* bs ) | |||
| 191 | if (retVal == EOF) writeError(); | 211 | if (retVal == EOF) writeError(); |
| 192 | } | 212 | } |
| 193 | retVal = fclose ( bs->handle ); | 213 | retVal = fclose ( bs->handle ); |
| 194 | if (retVal == EOF) | 214 | if (retVal == EOF) { |
| 195 | if (bs->mode == 'w') writeError(); else readError(); | 215 | if (bs->mode == 'w') writeError(); else readError(); |
| 216 | } | ||
| 196 | free ( bs ); | 217 | free ( bs ); |
| 197 | } | 218 | } |
| 198 | 219 | ||
| @@ -248,13 +269,19 @@ Int32 main ( Int32 argc, Char** argv ) | |||
| 248 | UInt32 bitsRead; | 269 | UInt32 bitsRead; |
| 249 | UInt32 bStart[20000]; | 270 | UInt32 bStart[20000]; |
| 250 | UInt32 bEnd[20000]; | 271 | UInt32 bEnd[20000]; |
| 272 | |||
| 273 | UInt32 rbStart[20000]; | ||
| 274 | UInt32 rbEnd[20000]; | ||
| 275 | Int32 rbCtr; | ||
| 276 | |||
| 277 | |||
| 251 | UInt32 buffHi, buffLo, blockCRC; | 278 | UInt32 buffHi, buffLo, blockCRC; |
| 252 | Char* p; | 279 | Char* p; |
| 253 | 280 | ||
| 254 | strcpy ( progName, argv[0] ); | 281 | strcpy ( progName, argv[0] ); |
| 255 | inFileName[0] = outFileName[0] = 0; | 282 | inFileName[0] = outFileName[0] = 0; |
| 256 | 283 | ||
| 257 | fprintf ( stderr, "bzip2recover: extracts blocks from damaged .bz2 files.\n" ); | 284 | fprintf ( stderr, "bzip2recover v0.9.0c: extracts blocks from damaged .bz2 files.\n" ); |
| 258 | 285 | ||
| 259 | if (argc != 2) { | 286 | if (argc != 2) { |
| 260 | fprintf ( stderr, "%s: usage is `%s damaged_file_name'.\n", | 287 | fprintf ( stderr, "%s: usage is `%s damaged_file_name'.\n", |
| @@ -278,6 +305,8 @@ Int32 main ( Int32 argc, Char** argv ) | |||
| 278 | currBlock = 0; | 305 | currBlock = 0; |
| 279 | bStart[currBlock] = 0; | 306 | bStart[currBlock] = 0; |
| 280 | 307 | ||
| 308 | rbCtr = 0; | ||
| 309 | |||
| 281 | while (True) { | 310 | while (True) { |
| 282 | b = bsGetBit ( bsIn ); | 311 | b = bsGetBit ( bsIn ); |
| 283 | bitsRead++; | 312 | bitsRead++; |
| @@ -303,19 +332,25 @@ Int32 main ( Int32 argc, Char** argv ) | |||
| 303 | if (bitsRead > 49) | 332 | if (bitsRead > 49) |
| 304 | bEnd[currBlock] = bitsRead-49; else | 333 | bEnd[currBlock] = bitsRead-49; else |
| 305 | bEnd[currBlock] = 0; | 334 | bEnd[currBlock] = 0; |
| 306 | if (currBlock > 0) | 335 | if (currBlock > 0 && |
| 336 | (bEnd[currBlock] - bStart[currBlock]) >= 130) { | ||
| 307 | fprintf ( stderr, " block %d runs from %d to %d\n", | 337 | fprintf ( stderr, " block %d runs from %d to %d\n", |
| 308 | currBlock, bStart[currBlock], bEnd[currBlock] ); | 338 | rbCtr+1, bStart[currBlock], bEnd[currBlock] ); |
| 339 | rbStart[rbCtr] = bStart[currBlock]; | ||
| 340 | rbEnd[rbCtr] = bEnd[currBlock]; | ||
| 341 | rbCtr++; | ||
| 342 | } | ||
| 309 | currBlock++; | 343 | currBlock++; |
| 344 | |||
| 310 | bStart[currBlock] = bitsRead; | 345 | bStart[currBlock] = bitsRead; |
| 311 | } | 346 | } |
| 312 | } | 347 | } |
| 313 | 348 | ||
| 314 | bsClose ( bsIn ); | 349 | bsClose ( bsIn ); |
| 315 | 350 | ||
| 316 | /*-- identified blocks run from 1 to currBlock inclusive. --*/ | 351 | /*-- identified blocks run from 1 to rbCtr inclusive. --*/ |
| 317 | 352 | ||
| 318 | if (currBlock < 1) { | 353 | if (rbCtr < 1) { |
| 319 | fprintf ( stderr, | 354 | fprintf ( stderr, |
| 320 | "%s: sorry, I couldn't find any block boundaries.\n", | 355 | "%s: sorry, I couldn't find any block boundaries.\n", |
| 321 | progName ); | 356 | progName ); |
| @@ -336,23 +371,23 @@ Int32 main ( Int32 argc, Char** argv ) | |||
| 336 | 371 | ||
| 337 | bitsRead = 0; | 372 | bitsRead = 0; |
| 338 | outFile = NULL; | 373 | outFile = NULL; |
| 339 | wrBlock = 1; | 374 | wrBlock = 0; |
| 340 | while (True) { | 375 | while (True) { |
| 341 | b = bsGetBit(bsIn); | 376 | b = bsGetBit(bsIn); |
| 342 | if (b == 2) break; | 377 | if (b == 2) break; |
| 343 | buffHi = (buffHi << 1) | (buffLo >> 31); | 378 | buffHi = (buffHi << 1) | (buffLo >> 31); |
| 344 | buffLo = (buffLo << 1) | (b & 1); | 379 | buffLo = (buffLo << 1) | (b & 1); |
| 345 | if (bitsRead == 47+bStart[wrBlock]) | 380 | if (bitsRead == 47+rbStart[wrBlock]) |
| 346 | blockCRC = (buffHi << 16) | (buffLo >> 16); | 381 | blockCRC = (buffHi << 16) | (buffLo >> 16); |
| 347 | 382 | ||
| 348 | if (outFile != NULL && bitsRead >= bStart[wrBlock] | 383 | if (outFile != NULL && bitsRead >= rbStart[wrBlock] |
| 349 | && bitsRead <= bEnd[wrBlock]) { | 384 | && bitsRead <= rbEnd[wrBlock]) { |
| 350 | bsPutBit ( bsWr, b ); | 385 | bsPutBit ( bsWr, b ); |
| 351 | } | 386 | } |
| 352 | 387 | ||
| 353 | bitsRead++; | 388 | bitsRead++; |
| 354 | 389 | ||
| 355 | if (bitsRead == bEnd[wrBlock]+1) { | 390 | if (bitsRead == rbEnd[wrBlock]+1) { |
| 356 | if (outFile != NULL) { | 391 | if (outFile != NULL) { |
| 357 | bsPutUChar ( bsWr, 0x17 ); bsPutUChar ( bsWr, 0x72 ); | 392 | bsPutUChar ( bsWr, 0x17 ); bsPutUChar ( bsWr, 0x72 ); |
| 358 | bsPutUChar ( bsWr, 0x45 ); bsPutUChar ( bsWr, 0x38 ); | 393 | bsPutUChar ( bsWr, 0x45 ); bsPutUChar ( bsWr, 0x38 ); |
| @@ -360,18 +395,18 @@ Int32 main ( Int32 argc, Char** argv ) | |||
| 360 | bsPutUInt32 ( bsWr, blockCRC ); | 395 | bsPutUInt32 ( bsWr, blockCRC ); |
| 361 | bsClose ( bsWr ); | 396 | bsClose ( bsWr ); |
| 362 | } | 397 | } |
| 363 | if (wrBlock >= currBlock) break; | 398 | if (wrBlock >= rbCtr) break; |
| 364 | wrBlock++; | 399 | wrBlock++; |
| 365 | } else | 400 | } else |
| 366 | if (bitsRead == bStart[wrBlock]) { | 401 | if (bitsRead == rbStart[wrBlock]) { |
| 367 | outFileName[0] = 0; | 402 | outFileName[0] = 0; |
| 368 | sprintf ( outFileName, "rec%4d", wrBlock ); | 403 | sprintf ( outFileName, "rec%4d", wrBlock+1 ); |
| 369 | for (p = outFileName; *p != 0; p++) if (*p == ' ') *p = '0'; | 404 | for (p = outFileName; *p != 0; p++) if (*p == ' ') *p = '0'; |
| 370 | strcat ( outFileName, inFileName ); | 405 | strcat ( outFileName, inFileName ); |
| 371 | if ( !endsInBz2(outFileName)) strcat ( outFileName, ".bz2" ); | 406 | if ( !endsInBz2(outFileName)) strcat ( outFileName, ".bz2" ); |
| 372 | 407 | ||
| 373 | fprintf ( stderr, " writing block %d to `%s' ...\n", | 408 | fprintf ( stderr, " writing block %d to `%s' ...\n", |
| 374 | wrBlock, outFileName ); | 409 | wrBlock+1, outFileName ); |
| 375 | 410 | ||
| 376 | outFile = fopen ( outFileName, "wb" ); | 411 | outFile = fopen ( outFileName, "wb" ); |
| 377 | if (outFile == NULL) { | 412 | if (outFile == NULL) { |
| @@ -0,0 +1,1512 @@ | |||
| 1 | |||
| 2 | /*-------------------------------------------------------------*/ | ||
| 3 | /*--- Library top-level functions. ---*/ | ||
| 4 | /*--- bzlib.c ---*/ | ||
| 5 | /*-------------------------------------------------------------*/ | ||
| 6 | |||
| 7 | /*-- | ||
| 8 | This file is a part of bzip2 and/or libbzip2, a program and | ||
| 9 | library for lossless, block-sorting data compression. | ||
| 10 | |||
| 11 | Copyright (C) 1996-1998 Julian R Seward. All rights reserved. | ||
| 12 | |||
| 13 | Redistribution and use in source and binary forms, with or without | ||
| 14 | modification, are permitted provided that the following conditions | ||
| 15 | are met: | ||
| 16 | |||
| 17 | 1. Redistributions of source code must retain the above copyright | ||
| 18 | notice, this list of conditions and the following disclaimer. | ||
| 19 | |||
| 20 | 2. The origin of this software must not be misrepresented; you must | ||
| 21 | not claim that you wrote the original software. If you use this | ||
| 22 | software in a product, an acknowledgment in the product | ||
| 23 | documentation would be appreciated but is not required. | ||
| 24 | |||
| 25 | 3. Altered source versions must be plainly marked as such, and must | ||
| 26 | not be misrepresented as being the original software. | ||
| 27 | |||
| 28 | 4. The name of the author may not be used to endorse or promote | ||
| 29 | products derived from this software without specific prior written | ||
| 30 | permission. | ||
| 31 | |||
| 32 | THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS | ||
| 33 | OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED | ||
| 34 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE | ||
| 35 | ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY | ||
| 36 | DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL | ||
| 37 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE | ||
| 38 | GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS | ||
| 39 | INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, | ||
| 40 | WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING | ||
| 41 | NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS | ||
| 42 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | ||
| 43 | |||
| 44 | Julian Seward, Guildford, Surrey, UK. | ||
| 45 | jseward@acm.org | ||
| 46 | bzip2/libbzip2 version 0.9.0c of 18 October 1998 | ||
| 47 | |||
| 48 | This program is based on (at least) the work of: | ||
| 49 | Mike Burrows | ||
| 50 | David Wheeler | ||
| 51 | Peter Fenwick | ||
| 52 | Alistair Moffat | ||
| 53 | Radford Neal | ||
| 54 | Ian H. Witten | ||
| 55 | Robert Sedgewick | ||
| 56 | Jon L. Bentley | ||
| 57 | |||
| 58 | For more information on these sources, see the manual. | ||
| 59 | --*/ | ||
| 60 | |||
| 61 | /*-- | ||
| 62 | CHANGES | ||
| 63 | ~~~~~~~ | ||
| 64 | 0.9.0 -- original version. | ||
| 65 | |||
| 66 | 0.9.0a/b -- no changes in this file. | ||
| 67 | |||
| 68 | 0.9.0c | ||
| 69 | * made zero-length BZ_FLUSH work correctly in bzCompress(). | ||
| 70 | * fixed bzWrite/bzRead to ignore zero-length requests. | ||
| 71 | * fixed bzread to correctly handle read requests after EOF. | ||
| 72 | * wrong parameter order in call to bzDecompressInit in | ||
| 73 | bzBuffToBuffDecompress. Fixed. | ||
| 74 | --*/ | ||
| 75 | |||
| 76 | #include "bzlib_private.h" | ||
| 77 | |||
| 78 | |||
| 79 | /*---------------------------------------------------*/ | ||
| 80 | /*--- Compression stuff ---*/ | ||
| 81 | /*---------------------------------------------------*/ | ||
| 82 | |||
| 83 | |||
| 84 | /*---------------------------------------------------*/ | ||
| 85 | #ifndef BZ_NO_STDIO | ||
| 86 | void bz__AssertH__fail ( int errcode ) | ||
| 87 | { | ||
| 88 | fprintf(stderr, | ||
| 89 | "\n\nbzip2/libbzip2, v0.9.0c: internal error number %d.\n" | ||
| 90 | "This is a bug in bzip2/libbzip2, v0.9.0c. Please report\n" | ||
| 91 | "it to me at: jseward@acm.org. If this happened when\n" | ||
| 92 | "you were using some program which uses libbzip2 as a\n" | ||
| 93 | "component, you should also report this bug to the author(s)\n" | ||
| 94 | "of that program. Please make an effort to report this bug;\n" | ||
| 95 | "timely and accurate bug reports eventually lead to higher\n" | ||
| 96 | "quality software. Thx. Julian Seward, 18 October 1998.\n\n", | ||
| 97 | errcode | ||
| 98 | ); | ||
| 99 | exit(3); | ||
| 100 | } | ||
| 101 | #endif | ||
| 102 | |||
| 103 | |||
| 104 | /*---------------------------------------------------*/ | ||
| 105 | static | ||
| 106 | void* default_bzalloc ( void* opaque, Int32 items, Int32 size ) | ||
| 107 | { | ||
| 108 | void* v = malloc ( items * size ); | ||
| 109 | return v; | ||
| 110 | } | ||
| 111 | |||
| 112 | static | ||
| 113 | void default_bzfree ( void* opaque, void* addr ) | ||
| 114 | { | ||
| 115 | if (addr != NULL) free ( addr ); | ||
| 116 | } | ||
| 117 | |||
| 118 | |||
| 119 | /*---------------------------------------------------*/ | ||
| 120 | static | ||
| 121 | void prepare_new_block ( EState* s ) | ||
| 122 | { | ||
| 123 | Int32 i; | ||
| 124 | s->nblock = 0; | ||
| 125 | s->numZ = 0; | ||
| 126 | s->state_out_pos = 0; | ||
| 127 | BZ_INITIALISE_CRC ( s->blockCRC ); | ||
| 128 | for (i = 0; i < 256; i++) s->inUse[i] = False; | ||
| 129 | s->blockNo++; | ||
| 130 | } | ||
| 131 | |||
| 132 | |||
| 133 | /*---------------------------------------------------*/ | ||
| 134 | static | ||
| 135 | void init_RL ( EState* s ) | ||
| 136 | { | ||
| 137 | s->state_in_ch = 256; | ||
| 138 | s->state_in_len = 0; | ||
| 139 | } | ||
| 140 | |||
| 141 | |||
| 142 | static | ||
| 143 | Bool isempty_RL ( EState* s ) | ||
| 144 | { | ||
| 145 | if (s->state_in_ch < 256 && s->state_in_len > 0) | ||
| 146 | return False; else | ||
| 147 | return True; | ||
| 148 | } | ||
| 149 | |||
| 150 | |||
| 151 | /*---------------------------------------------------*/ | ||
| 152 | int BZ_API(bzCompressInit) | ||
| 153 | ( bz_stream* strm, | ||
| 154 | int blockSize100k, | ||
| 155 | int verbosity, | ||
| 156 | int workFactor ) | ||
| 157 | { | ||
| 158 | Int32 n; | ||
| 159 | EState* s; | ||
| 160 | |||
| 161 | if (strm == NULL || | ||
| 162 | blockSize100k < 1 || blockSize100k > 9 || | ||
| 163 | workFactor < 0 || workFactor > 250) | ||
| 164 | return BZ_PARAM_ERROR; | ||
| 165 | |||
| 166 | if (workFactor == 0) workFactor = 30; | ||
| 167 | if (strm->bzalloc == NULL) strm->bzalloc = default_bzalloc; | ||
| 168 | if (strm->bzfree == NULL) strm->bzfree = default_bzfree; | ||
| 169 | |||
| 170 | s = BZALLOC( sizeof(EState) ); | ||
| 171 | if (s == NULL) return BZ_MEM_ERROR; | ||
| 172 | s->strm = strm; | ||
| 173 | |||
| 174 | s->block = NULL; | ||
| 175 | s->quadrant = NULL; | ||
| 176 | s->zptr = NULL; | ||
| 177 | s->ftab = NULL; | ||
| 178 | |||
| 179 | n = 100000 * blockSize100k; | ||
| 180 | s->block = BZALLOC( (n + BZ_NUM_OVERSHOOT_BYTES) * sizeof(UChar) ); | ||
| 181 | s->quadrant = BZALLOC( (n + BZ_NUM_OVERSHOOT_BYTES) * sizeof(Int16) ); | ||
| 182 | s->zptr = BZALLOC( n * sizeof(Int32) ); | ||
| 183 | s->ftab = BZALLOC( 65537 * sizeof(Int32) ); | ||
| 184 | |||
| 185 | if (s->block == NULL || s->quadrant == NULL || | ||
| 186 | s->zptr == NULL || s->ftab == NULL) { | ||
| 187 | if (s->block != NULL) BZFREE(s->block); | ||
| 188 | if (s->quadrant != NULL) BZFREE(s->quadrant); | ||
| 189 | if (s->zptr != NULL) BZFREE(s->zptr); | ||
| 190 | if (s->ftab != NULL) BZFREE(s->ftab); | ||
| 191 | if (s != NULL) BZFREE(s); | ||
| 192 | return BZ_MEM_ERROR; | ||
| 193 | } | ||
| 194 | |||
| 195 | s->szptr = (UInt16*)(s->zptr); | ||
| 196 | |||
| 197 | s->blockNo = 0; | ||
| 198 | s->state = BZ_S_INPUT; | ||
| 199 | s->mode = BZ_M_RUNNING; | ||
| 200 | s->combinedCRC = 0; | ||
| 201 | s->blockSize100k = blockSize100k; | ||
| 202 | s->nblockMAX = 100000 * blockSize100k - 19; | ||
| 203 | s->verbosity = verbosity; | ||
| 204 | s->workFactor = workFactor; | ||
| 205 | s->nBlocksRandomised = 0; | ||
| 206 | strm->state = s; | ||
| 207 | strm->total_in = 0; | ||
| 208 | strm->total_out = 0; | ||
| 209 | init_RL ( s ); | ||
| 210 | prepare_new_block ( s ); | ||
| 211 | return BZ_OK; | ||
| 212 | } | ||
| 213 | |||
| 214 | |||
| 215 | /*---------------------------------------------------*/ | ||
| 216 | static | ||
| 217 | void add_pair_to_block ( EState* s ) | ||
| 218 | { | ||
| 219 | Int32 i; | ||
| 220 | UChar ch = (UChar)(s->state_in_ch); | ||
| 221 | for (i = 0; i < s->state_in_len; i++) { | ||
| 222 | BZ_UPDATE_CRC( s->blockCRC, ch ); | ||
| 223 | } | ||
| 224 | s->inUse[s->state_in_ch] = True; | ||
| 225 | switch (s->state_in_len) { | ||
| 226 | case 1: | ||
| 227 | s->block[s->nblock] = (UChar)ch; s->nblock++; | ||
| 228 | break; | ||
| 229 | case 2: | ||
| 230 | s->block[s->nblock] = (UChar)ch; s->nblock++; | ||
| 231 | s->block[s->nblock] = (UChar)ch; s->nblock++; | ||
| 232 | break; | ||
| 233 | case 3: | ||
| 234 | s->block[s->nblock] = (UChar)ch; s->nblock++; | ||
| 235 | s->block[s->nblock] = (UChar)ch; s->nblock++; | ||
| 236 | s->block[s->nblock] = (UChar)ch; s->nblock++; | ||
| 237 | break; | ||
| 238 | default: | ||
| 239 | s->inUse[s->state_in_len-4] = True; | ||
| 240 | s->block[s->nblock] = (UChar)ch; s->nblock++; | ||
| 241 | s->block[s->nblock] = (UChar)ch; s->nblock++; | ||
| 242 | s->block[s->nblock] = (UChar)ch; s->nblock++; | ||
| 243 | s->block[s->nblock] = (UChar)ch; s->nblock++; | ||
| 244 | s->block[s->nblock] = (UChar)(s->state_in_len-4); | ||
| 245 | s->nblock++; | ||
| 246 | break; | ||
| 247 | } | ||
| 248 | } | ||
| 249 | |||
| 250 | |||
| 251 | /*---------------------------------------------------*/ | ||
| 252 | static | ||
| 253 | void flush_RL ( EState* s ) | ||
| 254 | { | ||
| 255 | if (s->state_in_ch < 256) add_pair_to_block ( s ); | ||
| 256 | init_RL ( s ); | ||
| 257 | } | ||
| 258 | |||
| 259 | |||
| 260 | /*---------------------------------------------------*/ | ||
| 261 | #define ADD_CHAR_TO_BLOCK(zs,zchh0) \ | ||
| 262 | { \ | ||
| 263 | UInt32 zchh = (UInt32)(zchh0); \ | ||
| 264 | /*-- fast track the common case --*/ \ | ||
| 265 | if (zchh != zs->state_in_ch && \ | ||
| 266 | zs->state_in_len == 1) { \ | ||
| 267 | UChar ch = (UChar)(zs->state_in_ch); \ | ||
| 268 | BZ_UPDATE_CRC( zs->blockCRC, ch ); \ | ||
| 269 | zs->inUse[zs->state_in_ch] = True; \ | ||
| 270 | zs->block[zs->nblock] = (UChar)ch; \ | ||
| 271 | zs->nblock++; \ | ||
| 272 | zs->state_in_ch = zchh; \ | ||
| 273 | } \ | ||
| 274 | else \ | ||
| 275 | /*-- general, uncommon cases --*/ \ | ||
| 276 | if (zchh != zs->state_in_ch || \ | ||
| 277 | zs->state_in_len == 255) { \ | ||
| 278 | if (zs->state_in_ch < 256) \ | ||
| 279 | add_pair_to_block ( zs ); \ | ||
| 280 | zs->state_in_ch = zchh; \ | ||
| 281 | zs->state_in_len = 1; \ | ||
| 282 | } else { \ | ||
| 283 | zs->state_in_len++; \ | ||
| 284 | } \ | ||
| 285 | } | ||
| 286 | |||
| 287 | |||
| 288 | /*---------------------------------------------------*/ | ||
| 289 | static | ||
| 290 | Bool copy_input_until_stop ( EState* s ) | ||
| 291 | { | ||
| 292 | Bool progress_in = False; | ||
| 293 | |||
| 294 | if (s->mode == BZ_M_RUNNING) { | ||
| 295 | |||
| 296 | /*-- fast track the common case --*/ | ||
| 297 | while (True) { | ||
| 298 | /*-- block full? --*/ | ||
| 299 | if (s->nblock >= s->nblockMAX) break; | ||
| 300 | /*-- no input? --*/ | ||
| 301 | if (s->strm->avail_in == 0) break; | ||
| 302 | progress_in = True; | ||
| 303 | ADD_CHAR_TO_BLOCK ( s, (UInt32)(*((UChar*)(s->strm->next_in))) ); | ||
| 304 | s->strm->next_in++; | ||
| 305 | s->strm->avail_in--; | ||
| 306 | s->strm->total_in++; | ||
| 307 | } | ||
| 308 | |||
| 309 | } else { | ||
| 310 | |||
| 311 | /*-- general, uncommon case --*/ | ||
| 312 | while (True) { | ||
| 313 | /*-- block full? --*/ | ||
| 314 | if (s->nblock >= s->nblockMAX) break; | ||
| 315 | /*-- no input? --*/ | ||
| 316 | if (s->strm->avail_in == 0) break; | ||
| 317 | /*-- flush/finish end? --*/ | ||
| 318 | if (s->avail_in_expect == 0) break; | ||
| 319 | progress_in = True; | ||
| 320 | ADD_CHAR_TO_BLOCK ( s, (UInt32)(*((UChar*)(s->strm->next_in))) ); | ||
| 321 | s->strm->next_in++; | ||
| 322 | s->strm->avail_in--; | ||
| 323 | s->strm->total_in++; | ||
| 324 | s->avail_in_expect--; | ||
| 325 | } | ||
| 326 | } | ||
| 327 | return progress_in; | ||
| 328 | } | ||
| 329 | |||
| 330 | |||
| 331 | /*---------------------------------------------------*/ | ||
| 332 | static | ||
| 333 | Bool copy_output_until_stop ( EState* s ) | ||
| 334 | { | ||
| 335 | Bool progress_out = False; | ||
| 336 | |||
| 337 | while (True) { | ||
| 338 | |||
| 339 | /*-- no output space? --*/ | ||
| 340 | if (s->strm->avail_out == 0) break; | ||
| 341 | |||
| 342 | /*-- block done? --*/ | ||
| 343 | if (s->state_out_pos >= s->numZ) break; | ||
| 344 | |||
| 345 | progress_out = True; | ||
| 346 | *(s->strm->next_out) = ((UChar*)(s->quadrant))[s->state_out_pos]; | ||
| 347 | s->state_out_pos++; | ||
| 348 | s->strm->avail_out--; | ||
| 349 | s->strm->next_out++; | ||
| 350 | s->strm->total_out++; | ||
| 351 | |||
| 352 | } | ||
| 353 | |||
| 354 | return progress_out; | ||
| 355 | } | ||
| 356 | |||
| 357 | |||
| 358 | /*---------------------------------------------------*/ | ||
| 359 | static | ||
| 360 | Bool handle_compress ( bz_stream* strm ) | ||
| 361 | { | ||
| 362 | Bool progress_in = False; | ||
| 363 | Bool progress_out = False; | ||
| 364 | EState* s = strm->state; | ||
| 365 | |||
| 366 | while (True) { | ||
| 367 | |||
| 368 | if (s->state == BZ_S_OUTPUT) { | ||
| 369 | progress_out |= copy_output_until_stop ( s ); | ||
| 370 | if (s->state_out_pos < s->numZ) break; | ||
| 371 | if (s->mode == BZ_M_FINISHING && | ||
| 372 | s->avail_in_expect == 0 && | ||
| 373 | isempty_RL(s)) break; | ||
| 374 | prepare_new_block ( s ); | ||
| 375 | s->state = BZ_S_INPUT; | ||
| 376 | if (s->mode == BZ_M_FLUSHING && | ||
| 377 | s->avail_in_expect == 0 && | ||
| 378 | isempty_RL(s)) break; | ||
| 379 | } | ||
| 380 | |||
| 381 | if (s->state == BZ_S_INPUT) { | ||
| 382 | progress_in |= copy_input_until_stop ( s ); | ||
| 383 | if (s->mode != BZ_M_RUNNING && s->avail_in_expect == 0) { | ||
| 384 | flush_RL ( s ); | ||
| 385 | compressBlock ( s, s->mode == BZ_M_FINISHING ); | ||
| 386 | s->state = BZ_S_OUTPUT; | ||
| 387 | } | ||
| 388 | else | ||
| 389 | if (s->nblock >= s->nblockMAX) { | ||
| 390 | compressBlock ( s, False ); | ||
| 391 | s->state = BZ_S_OUTPUT; | ||
| 392 | } | ||
| 393 | else | ||
| 394 | if (s->strm->avail_in == 0) { | ||
| 395 | break; | ||
| 396 | } | ||
| 397 | } | ||
| 398 | |||
| 399 | } | ||
| 400 | |||
| 401 | return progress_in || progress_out; | ||
| 402 | } | ||
| 403 | |||
| 404 | |||
| 405 | /*---------------------------------------------------*/ | ||
| 406 | int BZ_API(bzCompress) ( bz_stream *strm, int action ) | ||
| 407 | { | ||
| 408 | Bool progress; | ||
| 409 | EState* s; | ||
| 410 | if (strm == NULL) return BZ_PARAM_ERROR; | ||
| 411 | s = strm->state; | ||
| 412 | if (s == NULL) return BZ_PARAM_ERROR; | ||
| 413 | if (s->strm != strm) return BZ_PARAM_ERROR; | ||
| 414 | |||
| 415 | preswitch: | ||
| 416 | switch (s->mode) { | ||
| 417 | |||
| 418 | case BZ_M_IDLE: | ||
| 419 | return BZ_SEQUENCE_ERROR; | ||
| 420 | |||
| 421 | case BZ_M_RUNNING: | ||
| 422 | if (action == BZ_RUN) { | ||
| 423 | progress = handle_compress ( strm ); | ||
| 424 | return progress ? BZ_RUN_OK : BZ_PARAM_ERROR; | ||
| 425 | } | ||
| 426 | else | ||
| 427 | if (action == BZ_FLUSH) { | ||
| 428 | s->avail_in_expect = strm->avail_in; | ||
| 429 | s->mode = BZ_M_FLUSHING; | ||
| 430 | goto preswitch; | ||
| 431 | } | ||
| 432 | else | ||
| 433 | if (action == BZ_FINISH) { | ||
| 434 | s->avail_in_expect = strm->avail_in; | ||
| 435 | s->mode = BZ_M_FINISHING; | ||
| 436 | goto preswitch; | ||
| 437 | } | ||
| 438 | else | ||
| 439 | return BZ_PARAM_ERROR; | ||
| 440 | |||
| 441 | case BZ_M_FLUSHING: | ||
| 442 | if (action != BZ_FLUSH) return BZ_SEQUENCE_ERROR; | ||
| 443 | if (s->avail_in_expect != s->strm->avail_in) return BZ_SEQUENCE_ERROR; | ||
| 444 | progress = handle_compress ( strm ); | ||
| 445 | if (s->avail_in_expect > 0 || !isempty_RL(s) || | ||
| 446 | s->state_out_pos < s->numZ) return BZ_FLUSH_OK; | ||
| 447 | s->mode = BZ_M_RUNNING; | ||
| 448 | return BZ_RUN_OK; | ||
| 449 | |||
| 450 | case BZ_M_FINISHING: | ||
| 451 | if (action != BZ_FINISH) return BZ_SEQUENCE_ERROR; | ||
| 452 | if (s->avail_in_expect != s->strm->avail_in) return BZ_SEQUENCE_ERROR; | ||
| 453 | progress = handle_compress ( strm ); | ||
| 454 | if (!progress) return BZ_SEQUENCE_ERROR; | ||
| 455 | if (s->avail_in_expect > 0 || !isempty_RL(s) || | ||
| 456 | s->state_out_pos < s->numZ) return BZ_FINISH_OK; | ||
| 457 | s->mode = BZ_M_IDLE; | ||
| 458 | return BZ_STREAM_END; | ||
| 459 | } | ||
| 460 | return BZ_OK; /*--not reached--*/ | ||
| 461 | } | ||
| 462 | |||
| 463 | |||
| 464 | /*---------------------------------------------------*/ | ||
| 465 | int BZ_API(bzCompressEnd) ( bz_stream *strm ) | ||
| 466 | { | ||
| 467 | EState* s; | ||
| 468 | if (strm == NULL) return BZ_PARAM_ERROR; | ||
| 469 | s = strm->state; | ||
| 470 | if (s == NULL) return BZ_PARAM_ERROR; | ||
| 471 | if (s->strm != strm) return BZ_PARAM_ERROR; | ||
| 472 | |||
| 473 | if (s->block != NULL) BZFREE(s->block); | ||
| 474 | if (s->quadrant != NULL) BZFREE(s->quadrant); | ||
| 475 | if (s->zptr != NULL) BZFREE(s->zptr); | ||
| 476 | if (s->ftab != NULL) BZFREE(s->ftab); | ||
| 477 | BZFREE(strm->state); | ||
| 478 | |||
| 479 | strm->state = NULL; | ||
| 480 | |||
| 481 | return BZ_OK; | ||
| 482 | } | ||
| 483 | |||
| 484 | |||
| 485 | /*---------------------------------------------------*/ | ||
| 486 | /*--- Decompression stuff ---*/ | ||
| 487 | /*---------------------------------------------------*/ | ||
| 488 | |||
| 489 | /*---------------------------------------------------*/ | ||
| 490 | int BZ_API(bzDecompressInit) | ||
| 491 | ( bz_stream* strm, | ||
| 492 | int verbosity, | ||
| 493 | int small ) | ||
| 494 | { | ||
| 495 | DState* s; | ||
| 496 | |||
| 497 | if (strm == NULL) return BZ_PARAM_ERROR; | ||
| 498 | if (small != 0 && small != 1) return BZ_PARAM_ERROR; | ||
| 499 | if (verbosity < 0 || verbosity > 4) return BZ_PARAM_ERROR; | ||
| 500 | |||
| 501 | if (strm->bzalloc == NULL) strm->bzalloc = default_bzalloc; | ||
| 502 | if (strm->bzfree == NULL) strm->bzfree = default_bzfree; | ||
| 503 | |||
| 504 | s = BZALLOC( sizeof(DState) ); | ||
| 505 | if (s == NULL) return BZ_MEM_ERROR; | ||
| 506 | s->strm = strm; | ||
| 507 | strm->state = s; | ||
| 508 | s->state = BZ_X_MAGIC_1; | ||
| 509 | s->bsLive = 0; | ||
| 510 | s->bsBuff = 0; | ||
| 511 | s->calculatedCombinedCRC = 0; | ||
| 512 | strm->total_in = 0; | ||
| 513 | strm->total_out = 0; | ||
| 514 | s->smallDecompress = (Bool)small; | ||
| 515 | s->ll4 = NULL; | ||
| 516 | s->ll16 = NULL; | ||
| 517 | s->tt = NULL; | ||
| 518 | s->currBlockNo = 0; | ||
| 519 | s->verbosity = verbosity; | ||
| 520 | |||
| 521 | return BZ_OK; | ||
| 522 | } | ||
| 523 | |||
| 524 | |||
| 525 | /*---------------------------------------------------*/ | ||
| 526 | static | ||
| 527 | void unRLE_obuf_to_output_FAST ( DState* s ) | ||
| 528 | { | ||
| 529 | UChar k1; | ||
| 530 | |||
| 531 | if (s->blockRandomised) { | ||
| 532 | |||
| 533 | while (True) { | ||
| 534 | /* try to finish existing run */ | ||
| 535 | while (True) { | ||
| 536 | if (s->strm->avail_out == 0) return; | ||
| 537 | if (s->state_out_len == 0) break; | ||
| 538 | *( (UChar*)(s->strm->next_out) ) = s->state_out_ch; | ||
| 539 | BZ_UPDATE_CRC ( s->calculatedBlockCRC, s->state_out_ch ); | ||
| 540 | s->state_out_len--; | ||
| 541 | s->strm->next_out++; | ||
| 542 | s->strm->avail_out--; | ||
| 543 | s->strm->total_out++; | ||
| 544 | } | ||
| 545 | |||
| 546 | /* can a new run be started? */ | ||
| 547 | if (s->nblock_used == s->save_nblock+1) return; | ||
| 548 | |||
| 549 | |||
| 550 | s->state_out_len = 1; | ||
| 551 | s->state_out_ch = s->k0; | ||
| 552 | BZ_GET_FAST(k1); BZ_RAND_UPD_MASK; | ||
| 553 | k1 ^= BZ_RAND_MASK; s->nblock_used++; | ||
| 554 | if (s->nblock_used == s->save_nblock+1) continue; | ||
| 555 | if (k1 != s->k0) { s->k0 = k1; continue; }; | ||
| 556 | |||
| 557 | s->state_out_len = 2; | ||
| 558 | BZ_GET_FAST(k1); BZ_RAND_UPD_MASK; | ||
| 559 | k1 ^= BZ_RAND_MASK; s->nblock_used++; | ||
| 560 | if (s->nblock_used == s->save_nblock+1) continue; | ||
| 561 | if (k1 != s->k0) { s->k0 = k1; continue; }; | ||
| 562 | |||
| 563 | s->state_out_len = 3; | ||
| 564 | BZ_GET_FAST(k1); BZ_RAND_UPD_MASK; | ||
| 565 | k1 ^= BZ_RAND_MASK; s->nblock_used++; | ||
| 566 | if (s->nblock_used == s->save_nblock+1) continue; | ||
| 567 | if (k1 != s->k0) { s->k0 = k1; continue; }; | ||
| 568 | |||
| 569 | BZ_GET_FAST(k1); BZ_RAND_UPD_MASK; | ||
| 570 | k1 ^= BZ_RAND_MASK; s->nblock_used++; | ||
| 571 | s->state_out_len = ((Int32)k1) + 4; | ||
| 572 | BZ_GET_FAST(s->k0); BZ_RAND_UPD_MASK; | ||
| 573 | s->k0 ^= BZ_RAND_MASK; s->nblock_used++; | ||
| 574 | } | ||
| 575 | |||
| 576 | } else { | ||
| 577 | |||
| 578 | /* restore */ | ||
| 579 | UInt32 c_calculatedBlockCRC = s->calculatedBlockCRC; | ||
| 580 | UChar c_state_out_ch = s->state_out_ch; | ||
| 581 | Int32 c_state_out_len = s->state_out_len; | ||
| 582 | Int32 c_nblock_used = s->nblock_used; | ||
| 583 | Int32 c_k0 = s->k0; | ||
| 584 | UInt32* c_tt = s->tt; | ||
| 585 | UInt32 c_tPos = s->tPos; | ||
| 586 | char* cs_next_out = s->strm->next_out; | ||
| 587 | unsigned int cs_avail_out = s->strm->avail_out; | ||
| 588 | /* end restore */ | ||
| 589 | |||
| 590 | UInt32 avail_out_INIT = cs_avail_out; | ||
| 591 | Int32 s_save_nblockPP = s->save_nblock+1; | ||
| 592 | |||
| 593 | while (True) { | ||
| 594 | |||
| 595 | /* try to finish existing run */ | ||
| 596 | if (c_state_out_len > 0) { | ||
| 597 | while (True) { | ||
| 598 | if (cs_avail_out == 0) goto return_notr; | ||
| 599 | if (c_state_out_len == 1) break; | ||
| 600 | *( (UChar*)(cs_next_out) ) = c_state_out_ch; | ||
| 601 | BZ_UPDATE_CRC ( c_calculatedBlockCRC, c_state_out_ch ); | ||
| 602 | c_state_out_len--; | ||
| 603 | cs_next_out++; | ||
| 604 | cs_avail_out--; | ||
| 605 | } | ||
| 606 | s_state_out_len_eq_one: | ||
| 607 | { | ||
| 608 | if (cs_avail_out == 0) { | ||
| 609 | c_state_out_len = 1; goto return_notr; | ||
| 610 | }; | ||
| 611 | *( (UChar*)(cs_next_out) ) = c_state_out_ch; | ||
| 612 | BZ_UPDATE_CRC ( c_calculatedBlockCRC, c_state_out_ch ); | ||
| 613 | cs_next_out++; | ||
| 614 | cs_avail_out--; | ||
| 615 | } | ||
| 616 | } | ||
| 617 | /* can a new run be started? */ | ||
| 618 | if (c_nblock_used == s_save_nblockPP) { | ||
| 619 | c_state_out_len = 0; goto return_notr; | ||
| 620 | }; | ||
| 621 | c_state_out_ch = c_k0; | ||
| 622 | BZ_GET_FAST_C(k1); c_nblock_used++; | ||
| 623 | if (k1 != c_k0) { | ||
| 624 | c_k0 = k1; goto s_state_out_len_eq_one; | ||
| 625 | }; | ||
| 626 | if (c_nblock_used == s_save_nblockPP) | ||
| 627 | goto s_state_out_len_eq_one; | ||
| 628 | |||
| 629 | c_state_out_len = 2; | ||
| 630 | BZ_GET_FAST_C(k1); c_nblock_used++; | ||
| 631 | if (c_nblock_used == s_save_nblockPP) continue; | ||
| 632 | if (k1 != c_k0) { c_k0 = k1; continue; }; | ||
| 633 | |||
| 634 | c_state_out_len = 3; | ||
| 635 | BZ_GET_FAST_C(k1); c_nblock_used++; | ||
| 636 | if (c_nblock_used == s_save_nblockPP) continue; | ||
| 637 | if (k1 != c_k0) { c_k0 = k1; continue; }; | ||
| 638 | |||
| 639 | BZ_GET_FAST_C(k1); c_nblock_used++; | ||
| 640 | c_state_out_len = ((Int32)k1) + 4; | ||
| 641 | BZ_GET_FAST_C(c_k0); c_nblock_used++; | ||
| 642 | } | ||
| 643 | |||
| 644 | return_notr: | ||
| 645 | s->strm->total_out += (avail_out_INIT - cs_avail_out); | ||
| 646 | |||
| 647 | /* save */ | ||
| 648 | s->calculatedBlockCRC = c_calculatedBlockCRC; | ||
| 649 | s->state_out_ch = c_state_out_ch; | ||
| 650 | s->state_out_len = c_state_out_len; | ||
| 651 | s->nblock_used = c_nblock_used; | ||
| 652 | s->k0 = c_k0; | ||
| 653 | s->tt = c_tt; | ||
| 654 | s->tPos = c_tPos; | ||
| 655 | s->strm->next_out = cs_next_out; | ||
| 656 | s->strm->avail_out = cs_avail_out; | ||
| 657 | /* end save */ | ||
| 658 | } | ||
| 659 | } | ||
| 660 | |||
| 661 | |||
| 662 | |||
| 663 | /*---------------------------------------------------*/ | ||
| 664 | __inline__ Int32 indexIntoF ( Int32 indx, Int32 *cftab ) | ||
| 665 | { | ||
| 666 | Int32 nb, na, mid; | ||
| 667 | nb = 0; | ||
| 668 | na = 256; | ||
| 669 | do { | ||
| 670 | mid = (nb + na) >> 1; | ||
| 671 | if (indx >= cftab[mid]) nb = mid; else na = mid; | ||
| 672 | } | ||
| 673 | while (na - nb != 1); | ||
| 674 | return nb; | ||
| 675 | } | ||
| 676 | |||
| 677 | |||
| 678 | /*---------------------------------------------------*/ | ||
| 679 | static | ||
| 680 | void unRLE_obuf_to_output_SMALL ( DState* s ) | ||
| 681 | { | ||
| 682 | UChar k1; | ||
| 683 | |||
| 684 | if (s->blockRandomised) { | ||
| 685 | |||
| 686 | while (True) { | ||
| 687 | /* try to finish existing run */ | ||
| 688 | while (True) { | ||
| 689 | if (s->strm->avail_out == 0) return; | ||
| 690 | if (s->state_out_len == 0) break; | ||
| 691 | *( (UChar*)(s->strm->next_out) ) = s->state_out_ch; | ||
| 692 | BZ_UPDATE_CRC ( s->calculatedBlockCRC, s->state_out_ch ); | ||
| 693 | s->state_out_len--; | ||
| 694 | s->strm->next_out++; | ||
| 695 | s->strm->avail_out--; | ||
| 696 | s->strm->total_out++; | ||
| 697 | } | ||
| 698 | |||
| 699 | /* can a new run be started? */ | ||
| 700 | if (s->nblock_used == s->save_nblock+1) return; | ||
| 701 | |||
| 702 | |||
| 703 | s->state_out_len = 1; | ||
| 704 | s->state_out_ch = s->k0; | ||
| 705 | BZ_GET_SMALL(k1); BZ_RAND_UPD_MASK; | ||
| 706 | k1 ^= BZ_RAND_MASK; s->nblock_used++; | ||
| 707 | if (s->nblock_used == s->save_nblock+1) continue; | ||
| 708 | if (k1 != s->k0) { s->k0 = k1; continue; }; | ||
| 709 | |||
| 710 | s->state_out_len = 2; | ||
| 711 | BZ_GET_SMALL(k1); BZ_RAND_UPD_MASK; | ||
| 712 | k1 ^= BZ_RAND_MASK; s->nblock_used++; | ||
| 713 | if (s->nblock_used == s->save_nblock+1) continue; | ||
| 714 | if (k1 != s->k0) { s->k0 = k1; continue; }; | ||
| 715 | |||
| 716 | s->state_out_len = 3; | ||
| 717 | BZ_GET_SMALL(k1); BZ_RAND_UPD_MASK; | ||
| 718 | k1 ^= BZ_RAND_MASK; s->nblock_used++; | ||
| 719 | if (s->nblock_used == s->save_nblock+1) continue; | ||
| 720 | if (k1 != s->k0) { s->k0 = k1; continue; }; | ||
| 721 | |||
| 722 | BZ_GET_SMALL(k1); BZ_RAND_UPD_MASK; | ||
| 723 | k1 ^= BZ_RAND_MASK; s->nblock_used++; | ||
| 724 | s->state_out_len = ((Int32)k1) + 4; | ||
| 725 | BZ_GET_SMALL(s->k0); BZ_RAND_UPD_MASK; | ||
| 726 | s->k0 ^= BZ_RAND_MASK; s->nblock_used++; | ||
| 727 | } | ||
| 728 | |||
| 729 | } else { | ||
| 730 | |||
| 731 | while (True) { | ||
| 732 | /* try to finish existing run */ | ||
| 733 | while (True) { | ||
| 734 | if (s->strm->avail_out == 0) return; | ||
| 735 | if (s->state_out_len == 0) break; | ||
| 736 | *( (UChar*)(s->strm->next_out) ) = s->state_out_ch; | ||
| 737 | BZ_UPDATE_CRC ( s->calculatedBlockCRC, s->state_out_ch ); | ||
| 738 | s->state_out_len--; | ||
| 739 | s->strm->next_out++; | ||
| 740 | s->strm->avail_out--; | ||
| 741 | s->strm->total_out++; | ||
| 742 | } | ||
| 743 | |||
| 744 | /* can a new run be started? */ | ||
| 745 | if (s->nblock_used == s->save_nblock+1) return; | ||
| 746 | |||
| 747 | s->state_out_len = 1; | ||
| 748 | s->state_out_ch = s->k0; | ||
| 749 | BZ_GET_SMALL(k1); s->nblock_used++; | ||
| 750 | if (s->nblock_used == s->save_nblock+1) continue; | ||
| 751 | if (k1 != s->k0) { s->k0 = k1; continue; }; | ||
| 752 | |||
| 753 | s->state_out_len = 2; | ||
| 754 | BZ_GET_SMALL(k1); s->nblock_used++; | ||
| 755 | if (s->nblock_used == s->save_nblock+1) continue; | ||
| 756 | if (k1 != s->k0) { s->k0 = k1; continue; }; | ||
| 757 | |||
| 758 | s->state_out_len = 3; | ||
| 759 | BZ_GET_SMALL(k1); s->nblock_used++; | ||
| 760 | if (s->nblock_used == s->save_nblock+1) continue; | ||
| 761 | if (k1 != s->k0) { s->k0 = k1; continue; }; | ||
| 762 | |||
| 763 | BZ_GET_SMALL(k1); s->nblock_used++; | ||
| 764 | s->state_out_len = ((Int32)k1) + 4; | ||
| 765 | BZ_GET_SMALL(s->k0); s->nblock_used++; | ||
| 766 | } | ||
| 767 | |||
| 768 | } | ||
| 769 | } | ||
| 770 | |||
| 771 | |||
| 772 | /*---------------------------------------------------*/ | ||
| 773 | int BZ_API(bzDecompress) ( bz_stream *strm ) | ||
| 774 | { | ||
| 775 | DState* s; | ||
| 776 | if (strm == NULL) return BZ_PARAM_ERROR; | ||
| 777 | s = strm->state; | ||
| 778 | if (s == NULL) return BZ_PARAM_ERROR; | ||
| 779 | if (s->strm != strm) return BZ_PARAM_ERROR; | ||
| 780 | |||
| 781 | while (True) { | ||
| 782 | if (s->state == BZ_X_IDLE) return BZ_SEQUENCE_ERROR; | ||
| 783 | if (s->state == BZ_X_OUTPUT) { | ||
| 784 | if (s->smallDecompress) | ||
| 785 | unRLE_obuf_to_output_SMALL ( s ); else | ||
| 786 | unRLE_obuf_to_output_FAST ( s ); | ||
| 787 | if (s->nblock_used == s->save_nblock+1 && s->state_out_len == 0) { | ||
| 788 | BZ_FINALISE_CRC ( s->calculatedBlockCRC ); | ||
| 789 | if (s->verbosity >= 3) | ||
| 790 | VPrintf2 ( " {0x%x, 0x%x}", s->storedBlockCRC, | ||
| 791 | s->calculatedBlockCRC ); | ||
| 792 | if (s->verbosity >= 2) VPrintf0 ( "]" ); | ||
| 793 | if (s->calculatedBlockCRC != s->storedBlockCRC) | ||
| 794 | return BZ_DATA_ERROR; | ||
| 795 | s->calculatedCombinedCRC | ||
| 796 | = (s->calculatedCombinedCRC << 1) | | ||
| 797 | (s->calculatedCombinedCRC >> 31); | ||
| 798 | s->calculatedCombinedCRC ^= s->calculatedBlockCRC; | ||
| 799 | s->state = BZ_X_BLKHDR_1; | ||
| 800 | } else { | ||
| 801 | return BZ_OK; | ||
| 802 | } | ||
| 803 | } | ||
| 804 | if (s->state >= BZ_X_MAGIC_1) { | ||
| 805 | Int32 r = decompress ( s ); | ||
| 806 | if (r == BZ_STREAM_END) { | ||
| 807 | if (s->verbosity >= 3) | ||
| 808 | VPrintf2 ( "\n combined CRCs: stored = 0x%x, computed = 0x%x", | ||
| 809 | s->storedCombinedCRC, s->calculatedCombinedCRC ); | ||
| 810 | if (s->calculatedCombinedCRC != s->storedCombinedCRC) | ||
| 811 | return BZ_DATA_ERROR; | ||
| 812 | return r; | ||
| 813 | } | ||
| 814 | if (s->state != BZ_X_OUTPUT) return r; | ||
| 815 | } | ||
| 816 | } | ||
| 817 | |||
| 818 | AssertH ( 0, 6001 ); | ||
| 819 | /*notreached*/ | ||
| 820 | } | ||
| 821 | |||
| 822 | |||
| 823 | /*---------------------------------------------------*/ | ||
| 824 | int BZ_API(bzDecompressEnd) ( bz_stream *strm ) | ||
| 825 | { | ||
| 826 | DState* s; | ||
| 827 | if (strm == NULL) return BZ_PARAM_ERROR; | ||
| 828 | s = strm->state; | ||
| 829 | if (s == NULL) return BZ_PARAM_ERROR; | ||
| 830 | if (s->strm != strm) return BZ_PARAM_ERROR; | ||
| 831 | |||
| 832 | if (s->tt != NULL) BZFREE(s->tt); | ||
| 833 | if (s->ll16 != NULL) BZFREE(s->ll16); | ||
| 834 | if (s->ll4 != NULL) BZFREE(s->ll4); | ||
| 835 | |||
| 836 | BZFREE(strm->state); | ||
| 837 | strm->state = NULL; | ||
| 838 | |||
| 839 | return BZ_OK; | ||
| 840 | } | ||
| 841 | |||
| 842 | |||
| 843 | #ifndef BZ_NO_STDIO | ||
| 844 | /*---------------------------------------------------*/ | ||
| 845 | /*--- File I/O stuff ---*/ | ||
| 846 | /*---------------------------------------------------*/ | ||
| 847 | |||
| 848 | #define BZ_SETERR(eee) \ | ||
| 849 | { \ | ||
| 850 | if (bzerror != NULL) *bzerror = eee; \ | ||
| 851 | if (bzf != NULL) bzf->lastErr = eee; \ | ||
| 852 | } | ||
| 853 | |||
| 854 | typedef | ||
| 855 | struct { | ||
| 856 | FILE* handle; | ||
| 857 | Char buf[BZ_MAX_UNUSED]; | ||
| 858 | Int32 bufN; | ||
| 859 | Bool writing; | ||
| 860 | bz_stream strm; | ||
| 861 | Int32 lastErr; | ||
| 862 | Bool initialisedOk; | ||
| 863 | } | ||
| 864 | bzFile; | ||
| 865 | |||
| 866 | |||
| 867 | /*---------------------------------------------*/ | ||
| 868 | static Bool myfeof ( FILE* f ) | ||
| 869 | { | ||
| 870 | Int32 c = fgetc ( f ); | ||
| 871 | if (c == EOF) return True; | ||
| 872 | ungetc ( c, f ); | ||
| 873 | return False; | ||
| 874 | } | ||
| 875 | |||
| 876 | |||
| 877 | /*---------------------------------------------------*/ | ||
| 878 | BZFILE* BZ_API(bzWriteOpen) | ||
| 879 | ( int* bzerror, | ||
| 880 | FILE* f, | ||
| 881 | int blockSize100k, | ||
| 882 | int verbosity, | ||
| 883 | int workFactor ) | ||
| 884 | { | ||
| 885 | Int32 ret; | ||
| 886 | bzFile* bzf = NULL; | ||
| 887 | |||
| 888 | BZ_SETERR(BZ_OK); | ||
| 889 | |||
| 890 | if (f == NULL || | ||
| 891 | (blockSize100k < 1 || blockSize100k > 9) || | ||
| 892 | (workFactor < 0 || workFactor > 250) || | ||
| 893 | (verbosity < 0 || verbosity > 4)) | ||
| 894 | { BZ_SETERR(BZ_PARAM_ERROR); return NULL; }; | ||
| 895 | |||
| 896 | if (ferror(f)) | ||
| 897 | { BZ_SETERR(BZ_IO_ERROR); return NULL; }; | ||
| 898 | |||
| 899 | bzf = malloc ( sizeof(bzFile) ); | ||
| 900 | if (bzf == NULL) | ||
| 901 | { BZ_SETERR(BZ_MEM_ERROR); return NULL; }; | ||
| 902 | |||
| 903 | BZ_SETERR(BZ_OK); | ||
| 904 | bzf->initialisedOk = False; | ||
| 905 | bzf->bufN = 0; | ||
| 906 | bzf->handle = f; | ||
| 907 | bzf->writing = True; | ||
| 908 | bzf->strm.bzalloc = NULL; | ||
| 909 | bzf->strm.bzfree = NULL; | ||
| 910 | bzf->strm.opaque = NULL; | ||
| 911 | |||
| 912 | if (workFactor == 0) workFactor = 30; | ||
| 913 | ret = bzCompressInit ( &(bzf->strm), blockSize100k, | ||
| 914 | verbosity, workFactor ); | ||
| 915 | if (ret != BZ_OK) | ||
| 916 | { BZ_SETERR(ret); free(bzf); return NULL; }; | ||
| 917 | |||
| 918 | bzf->strm.avail_in = 0; | ||
| 919 | bzf->initialisedOk = True; | ||
| 920 | return bzf; | ||
| 921 | } | ||
| 922 | |||
| 923 | |||
| 924 | |||
| 925 | /*---------------------------------------------------*/ | ||
| 926 | void BZ_API(bzWrite) | ||
| 927 | ( int* bzerror, | ||
| 928 | BZFILE* b, | ||
| 929 | void* buf, | ||
| 930 | int len ) | ||
| 931 | { | ||
| 932 | Int32 n, n2, ret; | ||
| 933 | bzFile* bzf = (bzFile*)b; | ||
| 934 | |||
| 935 | BZ_SETERR(BZ_OK); | ||
| 936 | if (bzf == NULL || buf == NULL || len < 0) | ||
| 937 | { BZ_SETERR(BZ_PARAM_ERROR); return; }; | ||
| 938 | if (!(bzf->writing)) | ||
| 939 | { BZ_SETERR(BZ_SEQUENCE_ERROR); return; }; | ||
| 940 | if (ferror(bzf->handle)) | ||
| 941 | { BZ_SETERR(BZ_IO_ERROR); return; }; | ||
| 942 | |||
| 943 | if (len == 0) | ||
| 944 | { BZ_SETERR(BZ_OK); return; }; | ||
| 945 | |||
| 946 | bzf->strm.avail_in = len; | ||
| 947 | bzf->strm.next_in = buf; | ||
| 948 | |||
| 949 | while (True) { | ||
| 950 | bzf->strm.avail_out = BZ_MAX_UNUSED; | ||
| 951 | bzf->strm.next_out = bzf->buf; | ||
| 952 | ret = bzCompress ( &(bzf->strm), BZ_RUN ); | ||
| 953 | if (ret != BZ_RUN_OK) | ||
| 954 | { BZ_SETERR(ret); return; }; | ||
| 955 | |||
| 956 | if (bzf->strm.avail_out < BZ_MAX_UNUSED) { | ||
| 957 | n = BZ_MAX_UNUSED - bzf->strm.avail_out; | ||
| 958 | n2 = fwrite ( (void*)(bzf->buf), sizeof(UChar), | ||
| 959 | n, bzf->handle ); | ||
| 960 | if (n != n2 || ferror(bzf->handle)) | ||
| 961 | { BZ_SETERR(BZ_IO_ERROR); return; }; | ||
| 962 | } | ||
| 963 | |||
| 964 | if (bzf->strm.avail_in == 0) | ||
| 965 | { BZ_SETERR(BZ_OK); return; }; | ||
| 966 | } | ||
| 967 | } | ||
| 968 | |||
| 969 | |||
| 970 | /*---------------------------------------------------*/ | ||
| 971 | void BZ_API(bzWriteClose) | ||
| 972 | ( int* bzerror, | ||
| 973 | BZFILE* b, | ||
| 974 | int abandon, | ||
| 975 | unsigned int* nbytes_in, | ||
| 976 | unsigned int* nbytes_out ) | ||
| 977 | { | ||
| 978 | Int32 n, n2, ret; | ||
| 979 | bzFile* bzf = (bzFile*)b; | ||
| 980 | |||
| 981 | if (bzf == NULL) | ||
| 982 | { BZ_SETERR(BZ_OK); return; }; | ||
| 983 | if (!(bzf->writing)) | ||
| 984 | { BZ_SETERR(BZ_SEQUENCE_ERROR); return; }; | ||
| 985 | if (ferror(bzf->handle)) | ||
| 986 | { BZ_SETERR(BZ_IO_ERROR); return; }; | ||
| 987 | |||
| 988 | if (nbytes_in != NULL) *nbytes_in = 0; | ||
| 989 | if (nbytes_out != NULL) *nbytes_out = 0; | ||
| 990 | |||
| 991 | if ((!abandon) && bzf->lastErr == BZ_OK) { | ||
| 992 | while (True) { | ||
| 993 | bzf->strm.avail_out = BZ_MAX_UNUSED; | ||
| 994 | bzf->strm.next_out = bzf->buf; | ||
| 995 | ret = bzCompress ( &(bzf->strm), BZ_FINISH ); | ||
| 996 | if (ret != BZ_FINISH_OK && ret != BZ_STREAM_END) | ||
| 997 | { BZ_SETERR(ret); return; }; | ||
| 998 | |||
| 999 | if (bzf->strm.avail_out < BZ_MAX_UNUSED) { | ||
| 1000 | n = BZ_MAX_UNUSED - bzf->strm.avail_out; | ||
| 1001 | n2 = fwrite ( (void*)(bzf->buf), sizeof(UChar), | ||
| 1002 | n, bzf->handle ); | ||
| 1003 | if (n != n2 || ferror(bzf->handle)) | ||
| 1004 | { BZ_SETERR(BZ_IO_ERROR); return; }; | ||
| 1005 | } | ||
| 1006 | |||
| 1007 | if (ret == BZ_STREAM_END) break; | ||
| 1008 | } | ||
| 1009 | } | ||
| 1010 | |||
| 1011 | if ( !abandon && !ferror ( bzf->handle ) ) { | ||
| 1012 | fflush ( bzf->handle ); | ||
| 1013 | if (ferror(bzf->handle)) | ||
| 1014 | { BZ_SETERR(BZ_IO_ERROR); return; }; | ||
| 1015 | } | ||
| 1016 | |||
| 1017 | if (nbytes_in != NULL) *nbytes_in = bzf->strm.total_in; | ||
| 1018 | if (nbytes_out != NULL) *nbytes_out = bzf->strm.total_out; | ||
| 1019 | |||
| 1020 | BZ_SETERR(BZ_OK); | ||
| 1021 | bzCompressEnd ( &(bzf->strm) ); | ||
| 1022 | free ( bzf ); | ||
| 1023 | } | ||
| 1024 | |||
| 1025 | |||
| 1026 | /*---------------------------------------------------*/ | ||
| 1027 | BZFILE* BZ_API(bzReadOpen) | ||
| 1028 | ( int* bzerror, | ||
| 1029 | FILE* f, | ||
| 1030 | int verbosity, | ||
| 1031 | int small, | ||
| 1032 | void* unused, | ||
| 1033 | int nUnused ) | ||
| 1034 | { | ||
| 1035 | bzFile* bzf = NULL; | ||
| 1036 | int ret; | ||
| 1037 | |||
| 1038 | BZ_SETERR(BZ_OK); | ||
| 1039 | |||
| 1040 | if (f == NULL || | ||
| 1041 | (small != 0 && small != 1) || | ||
| 1042 | (verbosity < 0 || verbosity > 4) || | ||
| 1043 | (unused == NULL && nUnused != 0) || | ||
| 1044 | (unused != NULL && (nUnused < 0 || nUnused > BZ_MAX_UNUSED))) | ||
| 1045 | { BZ_SETERR(BZ_PARAM_ERROR); return NULL; }; | ||
| 1046 | |||
| 1047 | if (ferror(f)) | ||
| 1048 | { BZ_SETERR(BZ_IO_ERROR); return NULL; }; | ||
| 1049 | |||
| 1050 | bzf = malloc ( sizeof(bzFile) ); | ||
| 1051 | if (bzf == NULL) | ||
| 1052 | { BZ_SETERR(BZ_MEM_ERROR); return NULL; }; | ||
| 1053 | |||
| 1054 | BZ_SETERR(BZ_OK); | ||
| 1055 | |||
| 1056 | bzf->initialisedOk = False; | ||
| 1057 | bzf->handle = f; | ||
| 1058 | bzf->bufN = 0; | ||
| 1059 | bzf->writing = False; | ||
| 1060 | bzf->strm.bzalloc = NULL; | ||
| 1061 | bzf->strm.bzfree = NULL; | ||
| 1062 | bzf->strm.opaque = NULL; | ||
| 1063 | |||
| 1064 | while (nUnused > 0) { | ||
| 1065 | bzf->buf[bzf->bufN] = *((UChar*)(unused)); bzf->bufN++; | ||
| 1066 | unused = ((void*)( 1 + ((UChar*)(unused)) )); | ||
| 1067 | nUnused--; | ||
| 1068 | } | ||
| 1069 | |||
| 1070 | ret = bzDecompressInit ( &(bzf->strm), verbosity, small ); | ||
| 1071 | if (ret != BZ_OK) | ||
| 1072 | { BZ_SETERR(ret); free(bzf); return NULL; }; | ||
| 1073 | |||
| 1074 | bzf->strm.avail_in = bzf->bufN; | ||
| 1075 | bzf->strm.next_in = bzf->buf; | ||
| 1076 | |||
| 1077 | bzf->initialisedOk = True; | ||
| 1078 | return bzf; | ||
| 1079 | } | ||
| 1080 | |||
| 1081 | |||
| 1082 | /*---------------------------------------------------*/ | ||
| 1083 | void BZ_API(bzReadClose) ( int *bzerror, BZFILE *b ) | ||
| 1084 | { | ||
| 1085 | bzFile* bzf = (bzFile*)b; | ||
| 1086 | |||
| 1087 | BZ_SETERR(BZ_OK); | ||
| 1088 | if (bzf == NULL) | ||
| 1089 | { BZ_SETERR(BZ_OK); return; }; | ||
| 1090 | |||
| 1091 | if (bzf->writing) | ||
| 1092 | { BZ_SETERR(BZ_SEQUENCE_ERROR); return; }; | ||
| 1093 | |||
| 1094 | if (bzf->initialisedOk) | ||
| 1095 | (void)bzDecompressEnd ( &(bzf->strm) ); | ||
| 1096 | free ( bzf ); | ||
| 1097 | } | ||
| 1098 | |||
| 1099 | |||
| 1100 | /*---------------------------------------------------*/ | ||
| 1101 | int BZ_API(bzRead) | ||
| 1102 | ( int* bzerror, | ||
| 1103 | BZFILE* b, | ||
| 1104 | void* buf, | ||
| 1105 | int len ) | ||
| 1106 | { | ||
| 1107 | Int32 n, ret; | ||
| 1108 | bzFile* bzf = (bzFile*)b; | ||
| 1109 | |||
| 1110 | BZ_SETERR(BZ_OK); | ||
| 1111 | |||
| 1112 | if (bzf == NULL || buf == NULL || len < 0) | ||
| 1113 | { BZ_SETERR(BZ_PARAM_ERROR); return 0; }; | ||
| 1114 | |||
| 1115 | if (bzf->writing) | ||
| 1116 | { BZ_SETERR(BZ_SEQUENCE_ERROR); return 0; }; | ||
| 1117 | |||
| 1118 | if (len == 0) | ||
| 1119 | { BZ_SETERR(BZ_OK); return 0; }; | ||
| 1120 | |||
| 1121 | bzf->strm.avail_out = len; | ||
| 1122 | bzf->strm.next_out = buf; | ||
| 1123 | |||
| 1124 | while (True) { | ||
| 1125 | |||
| 1126 | if (ferror(bzf->handle)) | ||
| 1127 | { BZ_SETERR(BZ_IO_ERROR); return 0; }; | ||
| 1128 | |||
| 1129 | if (bzf->strm.avail_in == 0 && !myfeof(bzf->handle)) { | ||
| 1130 | n = fread ( bzf->buf, sizeof(UChar), | ||
| 1131 | BZ_MAX_UNUSED, bzf->handle ); | ||
| 1132 | if (ferror(bzf->handle)) | ||
| 1133 | { BZ_SETERR(BZ_IO_ERROR); return 0; }; | ||
| 1134 | bzf->bufN = n; | ||
| 1135 | bzf->strm.avail_in = bzf->bufN; | ||
| 1136 | bzf->strm.next_in = bzf->buf; | ||
| 1137 | } | ||
| 1138 | |||
| 1139 | ret = bzDecompress ( &(bzf->strm) ); | ||
| 1140 | |||
| 1141 | if (ret != BZ_OK && ret != BZ_STREAM_END) | ||
| 1142 | { BZ_SETERR(ret); return 0; }; | ||
| 1143 | |||
| 1144 | if (ret == BZ_OK && myfeof(bzf->handle) && | ||
| 1145 | bzf->strm.avail_in == 0 && bzf->strm.avail_out > 0) | ||
| 1146 | { BZ_SETERR(BZ_UNEXPECTED_EOF); return 0; }; | ||
| 1147 | |||
| 1148 | if (ret == BZ_STREAM_END) | ||
| 1149 | { BZ_SETERR(BZ_STREAM_END); | ||
| 1150 | return len - bzf->strm.avail_out; }; | ||
| 1151 | if (bzf->strm.avail_out == 0) | ||
| 1152 | { BZ_SETERR(BZ_OK); return len; }; | ||
| 1153 | |||
| 1154 | } | ||
| 1155 | |||
| 1156 | return 0; /*not reached*/ | ||
| 1157 | } | ||
| 1158 | |||
| 1159 | |||
| 1160 | /*---------------------------------------------------*/ | ||
| 1161 | void BZ_API(bzReadGetUnused) | ||
| 1162 | ( int* bzerror, | ||
| 1163 | BZFILE* b, | ||
| 1164 | void** unused, | ||
| 1165 | int* nUnused ) | ||
| 1166 | { | ||
| 1167 | bzFile* bzf = (bzFile*)b; | ||
| 1168 | if (bzf == NULL) | ||
| 1169 | { BZ_SETERR(BZ_PARAM_ERROR); return; }; | ||
| 1170 | if (bzf->lastErr != BZ_STREAM_END) | ||
| 1171 | { BZ_SETERR(BZ_SEQUENCE_ERROR); return; }; | ||
| 1172 | if (unused == NULL || nUnused == NULL) | ||
| 1173 | { BZ_SETERR(BZ_PARAM_ERROR); return; }; | ||
| 1174 | |||
| 1175 | BZ_SETERR(BZ_OK); | ||
| 1176 | *nUnused = bzf->strm.avail_in; | ||
| 1177 | *unused = bzf->strm.next_in; | ||
| 1178 | } | ||
| 1179 | #endif | ||
| 1180 | |||
| 1181 | |||
| 1182 | /*---------------------------------------------------*/ | ||
| 1183 | /*--- Misc convenience stuff ---*/ | ||
| 1184 | /*---------------------------------------------------*/ | ||
| 1185 | |||
| 1186 | /*---------------------------------------------------*/ | ||
| 1187 | int BZ_API(bzBuffToBuffCompress) | ||
| 1188 | ( char* dest, | ||
| 1189 | unsigned int* destLen, | ||
| 1190 | char* source, | ||
| 1191 | unsigned int sourceLen, | ||
| 1192 | int blockSize100k, | ||
| 1193 | int verbosity, | ||
| 1194 | int workFactor ) | ||
| 1195 | { | ||
| 1196 | bz_stream strm; | ||
| 1197 | int ret; | ||
| 1198 | |||
| 1199 | if (dest == NULL || destLen == NULL || | ||
| 1200 | source == NULL || | ||
| 1201 | blockSize100k < 1 || blockSize100k > 9 || | ||
| 1202 | verbosity < 0 || verbosity > 4 || | ||
| 1203 | workFactor < 0 || workFactor > 250) | ||
| 1204 | return BZ_PARAM_ERROR; | ||
| 1205 | |||
| 1206 | if (workFactor == 0) workFactor = 30; | ||
| 1207 | strm.bzalloc = NULL; | ||
| 1208 | strm.bzfree = NULL; | ||
| 1209 | strm.opaque = NULL; | ||
| 1210 | ret = bzCompressInit ( &strm, blockSize100k, | ||
| 1211 | verbosity, workFactor ); | ||
| 1212 | if (ret != BZ_OK) return ret; | ||
| 1213 | |||
| 1214 | strm.next_in = source; | ||
| 1215 | strm.next_out = dest; | ||
| 1216 | strm.avail_in = sourceLen; | ||
| 1217 | strm.avail_out = *destLen; | ||
| 1218 | |||
| 1219 | ret = bzCompress ( &strm, BZ_FINISH ); | ||
| 1220 | if (ret == BZ_FINISH_OK) goto output_overflow; | ||
| 1221 | if (ret != BZ_STREAM_END) goto errhandler; | ||
| 1222 | |||
| 1223 | /* normal termination */ | ||
| 1224 | *destLen -= strm.avail_out; | ||
| 1225 | bzCompressEnd ( &strm ); | ||
| 1226 | return BZ_OK; | ||
| 1227 | |||
| 1228 | output_overflow: | ||
| 1229 | bzCompressEnd ( &strm ); | ||
| 1230 | return BZ_OUTBUFF_FULL; | ||
| 1231 | |||
| 1232 | errhandler: | ||
| 1233 | bzCompressEnd ( &strm ); | ||
| 1234 | return ret; | ||
| 1235 | } | ||
| 1236 | |||
| 1237 | |||
| 1238 | /*---------------------------------------------------*/ | ||
| 1239 | int BZ_API(bzBuffToBuffDecompress) | ||
| 1240 | ( char* dest, | ||
| 1241 | unsigned int* destLen, | ||
| 1242 | char* source, | ||
| 1243 | unsigned int sourceLen, | ||
| 1244 | int small, | ||
| 1245 | int verbosity ) | ||
| 1246 | { | ||
| 1247 | bz_stream strm; | ||
| 1248 | int ret; | ||
| 1249 | |||
| 1250 | if (dest == NULL || destLen == NULL || | ||
| 1251 | source == NULL || | ||
| 1252 | (small != 0 && small != 1) || | ||
| 1253 | verbosity < 0 || verbosity > 4) | ||
| 1254 | return BZ_PARAM_ERROR; | ||
| 1255 | |||
| 1256 | strm.bzalloc = NULL; | ||
| 1257 | strm.bzfree = NULL; | ||
| 1258 | strm.opaque = NULL; | ||
| 1259 | ret = bzDecompressInit ( &strm, verbosity, small ); | ||
| 1260 | if (ret != BZ_OK) return ret; | ||
| 1261 | |||
| 1262 | strm.next_in = source; | ||
| 1263 | strm.next_out = dest; | ||
| 1264 | strm.avail_in = sourceLen; | ||
| 1265 | strm.avail_out = *destLen; | ||
| 1266 | |||
| 1267 | ret = bzDecompress ( &strm ); | ||
| 1268 | if (ret == BZ_OK) goto output_overflow_or_eof; | ||
| 1269 | if (ret != BZ_STREAM_END) goto errhandler; | ||
| 1270 | |||
| 1271 | /* normal termination */ | ||
| 1272 | *destLen -= strm.avail_out; | ||
| 1273 | bzDecompressEnd ( &strm ); | ||
| 1274 | return BZ_OK; | ||
| 1275 | |||
| 1276 | output_overflow_or_eof: | ||
| 1277 | if (strm.avail_out > 0) { | ||
| 1278 | bzDecompressEnd ( &strm ); | ||
| 1279 | return BZ_UNEXPECTED_EOF; | ||
| 1280 | } else { | ||
| 1281 | bzDecompressEnd ( &strm ); | ||
| 1282 | return BZ_OUTBUFF_FULL; | ||
| 1283 | }; | ||
| 1284 | |||
| 1285 | errhandler: | ||
| 1286 | bzDecompressEnd ( &strm ); | ||
| 1287 | return BZ_SEQUENCE_ERROR; | ||
| 1288 | } | ||
| 1289 | |||
| 1290 | |||
| 1291 | /*---------------------------------------------------*/ | ||
| 1292 | /*-- | ||
| 1293 | Code contributed by Yoshioka Tsuneo | ||
| 1294 | (QWF00133@niftyserve.or.jp/tsuneo-y@is.aist-nara.ac.jp), | ||
| 1295 | to support better zlib compatibility. | ||
| 1296 | This code is not _officially_ part of libbzip2 (yet); | ||
| 1297 | I haven't tested it, documented it, or considered the | ||
| 1298 | threading-safeness of it. | ||
| 1299 | If this code breaks, please contact both Yoshioka and me. | ||
| 1300 | --*/ | ||
| 1301 | /*---------------------------------------------------*/ | ||
| 1302 | |||
| 1303 | /*---------------------------------------------------*/ | ||
| 1304 | /*-- | ||
| 1305 | return version like "0.9.0c". | ||
| 1306 | --*/ | ||
| 1307 | const char * BZ_API(bzlibVersion)(void) | ||
| 1308 | { | ||
| 1309 | return BZ_VERSION; | ||
| 1310 | } | ||
| 1311 | |||
| 1312 | |||
| 1313 | #ifndef BZ_NO_STDIO | ||
| 1314 | /*---------------------------------------------------*/ | ||
| 1315 | |||
| 1316 | #if defined(_WIN32) || defined(OS2) || defined(MSDOS) | ||
| 1317 | # include <fcntl.h> | ||
| 1318 | # include <io.h> | ||
| 1319 | # define SET_BINARY_MODE(file) setmode(fileno(file),O_BINARY) | ||
| 1320 | #else | ||
| 1321 | # define SET_BINARY_MODE(file) | ||
| 1322 | #endif | ||
| 1323 | static | ||
| 1324 | BZFILE * bzopen_or_bzdopen | ||
| 1325 | ( const char *path, /* no use when bzdopen */ | ||
| 1326 | int fd, /* no use when bzdopen */ | ||
| 1327 | const char *mode, | ||
| 1328 | int open_mode) /* bzopen: 0, bzdopen:1 */ | ||
| 1329 | { | ||
| 1330 | int bzerr; | ||
| 1331 | char unused[BZ_MAX_UNUSED]; | ||
| 1332 | int blockSize100k = 9; | ||
| 1333 | int writing = 0; | ||
| 1334 | char mode2[10] = ""; | ||
| 1335 | FILE *fp = NULL; | ||
| 1336 | BZFILE *bzfp = NULL; | ||
| 1337 | int verbosity = 0; | ||
| 1338 | int workFactor = 30; | ||
| 1339 | int smallMode = 0; | ||
| 1340 | int nUnused = 0; | ||
| 1341 | |||
| 1342 | if(mode==NULL){return NULL;} | ||
| 1343 | while(*mode){ | ||
| 1344 | switch(*mode){ | ||
| 1345 | case 'r': | ||
| 1346 | writing = 0;break; | ||
| 1347 | case 'w': | ||
| 1348 | writing = 1;break; | ||
| 1349 | case 's': | ||
| 1350 | smallMode = 1;break; | ||
| 1351 | default: | ||
| 1352 | if(isdigit(*mode)){ | ||
| 1353 | blockSize100k = 0; | ||
| 1354 | while(isdigit(*mode)){ | ||
| 1355 | blockSize100k = blockSize100k*10 + *mode-'0'; | ||
| 1356 | mode++; | ||
| 1357 | } | ||
| 1358 | }else{ | ||
| 1359 | /* ignore */ | ||
| 1360 | } | ||
| 1361 | } | ||
| 1362 | mode++; | ||
| 1363 | } | ||
| 1364 | strcat(mode2, writing ? "w" : "r" ); | ||
| 1365 | strcat(mode2,"b"); /* binary mode */ | ||
| 1366 | |||
| 1367 | if(open_mode==0){ | ||
| 1368 | if(path==NULL || strcmp(path,"")==0){ | ||
| 1369 | fp = (writing ? stdout : stdin); | ||
| 1370 | SET_BINARY_MODE(fp); | ||
| 1371 | }else{ | ||
| 1372 | fp = fopen(path,mode2); | ||
| 1373 | } | ||
| 1374 | }else{ | ||
| 1375 | #ifdef BZ_STRICT_ANSI | ||
| 1376 | fp = NULL; | ||
| 1377 | #else | ||
| 1378 | fp = fdopen(fd,mode2); | ||
| 1379 | #endif | ||
| 1380 | } | ||
| 1381 | if(fp==NULL){return NULL;} | ||
| 1382 | |||
| 1383 | if(writing){ | ||
| 1384 | bzfp = bzWriteOpen(&bzerr,fp,blockSize100k,verbosity,workFactor); | ||
| 1385 | }else{ | ||
| 1386 | bzfp = bzReadOpen(&bzerr,fp,verbosity,smallMode,unused,nUnused); | ||
| 1387 | } | ||
| 1388 | if(bzfp==NULL){ | ||
| 1389 | if(fp!=stdin && fp!=stdout) fclose(fp); | ||
| 1390 | return NULL; | ||
| 1391 | } | ||
| 1392 | return bzfp; | ||
| 1393 | } | ||
| 1394 | |||
| 1395 | |||
| 1396 | /*---------------------------------------------------*/ | ||
| 1397 | /*-- | ||
| 1398 | open file for read or write. | ||
| 1399 | ex) bzopen("file","w9") | ||
| 1400 | case path="" or NULL => use stdin or stdout. | ||
| 1401 | --*/ | ||
| 1402 | BZFILE * BZ_API(bzopen) | ||
| 1403 | ( const char *path, | ||
| 1404 | const char *mode ) | ||
| 1405 | { | ||
| 1406 | return bzopen_or_bzdopen(path,-1,mode,/*bzopen*/0); | ||
| 1407 | } | ||
| 1408 | |||
| 1409 | |||
| 1410 | /*---------------------------------------------------*/ | ||
| 1411 | BZFILE * BZ_API(bzdopen) | ||
| 1412 | ( int fd, | ||
| 1413 | const char *mode ) | ||
| 1414 | { | ||
| 1415 | return bzopen_or_bzdopen(NULL,fd,mode,/*bzdopen*/1); | ||
| 1416 | } | ||
| 1417 | |||
| 1418 | |||
| 1419 | /*---------------------------------------------------*/ | ||
| 1420 | int BZ_API(bzread) (BZFILE* b, void* buf, int len ) | ||
| 1421 | { | ||
| 1422 | int bzerr, nread; | ||
| 1423 | if (((bzFile*)b)->lastErr == BZ_STREAM_END) return 0; | ||
| 1424 | nread = bzRead(&bzerr,b,buf,len); | ||
| 1425 | if (bzerr == BZ_OK || bzerr == BZ_STREAM_END) { | ||
| 1426 | return nread; | ||
| 1427 | } else { | ||
| 1428 | return -1; | ||
| 1429 | } | ||
| 1430 | } | ||
| 1431 | |||
| 1432 | |||
| 1433 | /*---------------------------------------------------*/ | ||
| 1434 | int BZ_API(bzwrite) (BZFILE* b, void* buf, int len ) | ||
| 1435 | { | ||
| 1436 | int bzerr; | ||
| 1437 | |||
| 1438 | bzWrite(&bzerr,b,buf,len); | ||
| 1439 | if(bzerr == BZ_OK){ | ||
| 1440 | return len; | ||
| 1441 | }else{ | ||
| 1442 | return -1; | ||
| 1443 | } | ||
| 1444 | } | ||
| 1445 | |||
| 1446 | |||
| 1447 | /*---------------------------------------------------*/ | ||
| 1448 | int BZ_API(bzflush) (BZFILE *b) | ||
| 1449 | { | ||
| 1450 | /* do nothing now... */ | ||
| 1451 | return 0; | ||
| 1452 | } | ||
| 1453 | |||
| 1454 | |||
| 1455 | /*---------------------------------------------------*/ | ||
| 1456 | void BZ_API(bzclose) (BZFILE* b) | ||
| 1457 | { | ||
| 1458 | int bzerr; | ||
| 1459 | FILE *fp = ((bzFile *)b)->handle; | ||
| 1460 | |||
| 1461 | if(b==NULL){return;} | ||
| 1462 | if(((bzFile*)b)->writing){ | ||
| 1463 | bzWriteClose(&bzerr,b,0,NULL,NULL); | ||
| 1464 | if(bzerr != BZ_OK){ | ||
| 1465 | bzWriteClose(NULL,b,1,NULL,NULL); | ||
| 1466 | } | ||
| 1467 | }else{ | ||
| 1468 | bzReadClose(&bzerr,b); | ||
| 1469 | } | ||
| 1470 | if(fp!=stdin && fp!=stdout){ | ||
| 1471 | fclose(fp); | ||
| 1472 | } | ||
| 1473 | } | ||
| 1474 | |||
| 1475 | |||
| 1476 | /*---------------------------------------------------*/ | ||
| 1477 | /*-- | ||
| 1478 | return last error code | ||
| 1479 | --*/ | ||
| 1480 | static char *bzerrorstrings[] = { | ||
| 1481 | "OK" | ||
| 1482 | ,"SEQUENCE_ERROR" | ||
| 1483 | ,"PARAM_ERROR" | ||
| 1484 | ,"MEM_ERROR" | ||
| 1485 | ,"DATA_ERROR" | ||
| 1486 | ,"DATA_ERROR_MAGIC" | ||
| 1487 | ,"IO_ERROR" | ||
| 1488 | ,"UNEXPECTED_EOF" | ||
| 1489 | ,"OUTBUFF_FULL" | ||
| 1490 | ,"???" /* for future */ | ||
| 1491 | ,"???" /* for future */ | ||
| 1492 | ,"???" /* for future */ | ||
| 1493 | ,"???" /* for future */ | ||
| 1494 | ,"???" /* for future */ | ||
| 1495 | ,"???" /* for future */ | ||
| 1496 | }; | ||
| 1497 | |||
| 1498 | |||
| 1499 | const char * BZ_API(bzerror) (BZFILE *b, int *errnum) | ||
| 1500 | { | ||
| 1501 | int err = ((bzFile *)b)->lastErr; | ||
| 1502 | |||
| 1503 | if(err>0) err = 0; | ||
| 1504 | *errnum = err; | ||
| 1505 | return bzerrorstrings[err*-1]; | ||
| 1506 | } | ||
| 1507 | #endif | ||
| 1508 | |||
| 1509 | |||
| 1510 | /*-------------------------------------------------------------*/ | ||
| 1511 | /*--- end bzlib.c ---*/ | ||
| 1512 | /*-------------------------------------------------------------*/ | ||
| @@ -0,0 +1,299 @@ | |||
| 1 | |||
| 2 | /*-------------------------------------------------------------*/ | ||
| 3 | /*--- Public header file for the library. ---*/ | ||
| 4 | /*--- bzlib.h ---*/ | ||
| 5 | /*-------------------------------------------------------------*/ | ||
| 6 | |||
| 7 | /*-- | ||
| 8 | This file is a part of bzip2 and/or libbzip2, a program and | ||
| 9 | library for lossless, block-sorting data compression. | ||
| 10 | |||
| 11 | Copyright (C) 1996-1998 Julian R Seward. All rights reserved. | ||
| 12 | |||
| 13 | Redistribution and use in source and binary forms, with or without | ||
| 14 | modification, are permitted provided that the following conditions | ||
| 15 | are met: | ||
| 16 | |||
| 17 | 1. Redistributions of source code must retain the above copyright | ||
| 18 | notice, this list of conditions and the following disclaimer. | ||
| 19 | |||
| 20 | 2. The origin of this software must not be misrepresented; you must | ||
| 21 | not claim that you wrote the original software. If you use this | ||
| 22 | software in a product, an acknowledgment in the product | ||
| 23 | documentation would be appreciated but is not required. | ||
| 24 | |||
| 25 | 3. Altered source versions must be plainly marked as such, and must | ||
| 26 | not be misrepresented as being the original software. | ||
| 27 | |||
| 28 | 4. The name of the author may not be used to endorse or promote | ||
| 29 | products derived from this software without specific prior written | ||
| 30 | permission. | ||
| 31 | |||
| 32 | THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS | ||
| 33 | OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED | ||
| 34 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE | ||
| 35 | ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY | ||
| 36 | DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL | ||
| 37 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE | ||
| 38 | GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS | ||
| 39 | INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, | ||
| 40 | WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING | ||
| 41 | NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS | ||
| 42 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | ||
| 43 | |||
| 44 | Julian Seward, Guildford, Surrey, UK. | ||
| 45 | jseward@acm.org | ||
| 46 | bzip2/libbzip2 version 0.9.0c of 18 October 1998 | ||
| 47 | |||
| 48 | This program is based on (at least) the work of: | ||
| 49 | Mike Burrows | ||
| 50 | David Wheeler | ||
| 51 | Peter Fenwick | ||
| 52 | Alistair Moffat | ||
| 53 | Radford Neal | ||
| 54 | Ian H. Witten | ||
| 55 | Robert Sedgewick | ||
| 56 | Jon L. Bentley | ||
| 57 | |||
| 58 | For more information on these sources, see the manual. | ||
| 59 | --*/ | ||
| 60 | |||
| 61 | |||
| 62 | #ifndef _BZLIB_H | ||
| 63 | #define _BZLIB_H | ||
| 64 | |||
| 65 | #define BZ_RUN 0 | ||
| 66 | #define BZ_FLUSH 1 | ||
| 67 | #define BZ_FINISH 2 | ||
| 68 | |||
| 69 | #define BZ_OK 0 | ||
| 70 | #define BZ_RUN_OK 1 | ||
| 71 | #define BZ_FLUSH_OK 2 | ||
| 72 | #define BZ_FINISH_OK 3 | ||
| 73 | #define BZ_STREAM_END 4 | ||
| 74 | #define BZ_SEQUENCE_ERROR (-1) | ||
| 75 | #define BZ_PARAM_ERROR (-2) | ||
| 76 | #define BZ_MEM_ERROR (-3) | ||
| 77 | #define BZ_DATA_ERROR (-4) | ||
| 78 | #define BZ_DATA_ERROR_MAGIC (-5) | ||
| 79 | #define BZ_IO_ERROR (-6) | ||
| 80 | #define BZ_UNEXPECTED_EOF (-7) | ||
| 81 | #define BZ_OUTBUFF_FULL (-8) | ||
| 82 | |||
| 83 | typedef | ||
| 84 | struct { | ||
| 85 | char *next_in; | ||
| 86 | unsigned int avail_in; | ||
| 87 | unsigned int total_in; | ||
| 88 | |||
| 89 | char *next_out; | ||
| 90 | unsigned int avail_out; | ||
| 91 | unsigned int total_out; | ||
| 92 | |||
| 93 | void *state; | ||
| 94 | |||
| 95 | void *(*bzalloc)(void *,int,int); | ||
| 96 | void (*bzfree)(void *,void *); | ||
| 97 | void *opaque; | ||
| 98 | } | ||
| 99 | bz_stream; | ||
| 100 | |||
| 101 | |||
| 102 | #ifndef BZ_IMPORT | ||
| 103 | #define BZ_EXPORT | ||
| 104 | #endif | ||
| 105 | |||
| 106 | #ifdef _WIN32 | ||
| 107 | # include <stdio.h> | ||
| 108 | # include <windows.h> | ||
| 109 | # ifdef small | ||
| 110 | /* windows.h define small to char */ | ||
| 111 | # undef small | ||
| 112 | # endif | ||
| 113 | # ifdef BZ_EXPORT | ||
| 114 | # define BZ_API(func) WINAPI func | ||
| 115 | # define BZ_EXTERN extern | ||
| 116 | # else | ||
| 117 | /* import windows dll dynamically */ | ||
| 118 | # define BZ_API(func) (WINAPI * func) | ||
| 119 | # define BZ_EXTERN | ||
| 120 | # endif | ||
| 121 | #else | ||
| 122 | # define BZ_API(func) func | ||
| 123 | # define BZ_EXTERN extern | ||
| 124 | #endif | ||
| 125 | |||
| 126 | |||
| 127 | /*-- Core (low-level) library functions --*/ | ||
| 128 | |||
| 129 | BZ_EXTERN int BZ_API(bzCompressInit) ( | ||
| 130 | bz_stream* strm, | ||
| 131 | int blockSize100k, | ||
| 132 | int verbosity, | ||
| 133 | int workFactor | ||
| 134 | ); | ||
| 135 | |||
| 136 | BZ_EXTERN int BZ_API(bzCompress) ( | ||
| 137 | bz_stream* strm, | ||
| 138 | int action | ||
| 139 | ); | ||
| 140 | |||
| 141 | BZ_EXTERN int BZ_API(bzCompressEnd) ( | ||
| 142 | bz_stream* strm | ||
| 143 | ); | ||
| 144 | |||
| 145 | BZ_EXTERN int BZ_API(bzDecompressInit) ( | ||
| 146 | bz_stream *strm, | ||
| 147 | int verbosity, | ||
| 148 | int small | ||
| 149 | ); | ||
| 150 | |||
| 151 | BZ_EXTERN int BZ_API(bzDecompress) ( | ||
| 152 | bz_stream* strm | ||
| 153 | ); | ||
| 154 | |||
| 155 | BZ_EXTERN int BZ_API(bzDecompressEnd) ( | ||
| 156 | bz_stream *strm | ||
| 157 | ); | ||
| 158 | |||
| 159 | |||
| 160 | |||
| 161 | /*-- High(er) level library functions --*/ | ||
| 162 | |||
| 163 | #ifndef BZ_NO_STDIO | ||
| 164 | #define BZ_MAX_UNUSED 5000 | ||
| 165 | |||
| 166 | typedef void BZFILE; | ||
| 167 | |||
| 168 | BZ_EXTERN BZFILE* BZ_API(bzReadOpen) ( | ||
| 169 | int* bzerror, | ||
| 170 | FILE* f, | ||
| 171 | int verbosity, | ||
| 172 | int small, | ||
| 173 | void* unused, | ||
| 174 | int nUnused | ||
| 175 | ); | ||
| 176 | |||
| 177 | BZ_EXTERN void BZ_API(bzReadClose) ( | ||
| 178 | int* bzerror, | ||
| 179 | BZFILE* b | ||
| 180 | ); | ||
| 181 | |||
| 182 | BZ_EXTERN void BZ_API(bzReadGetUnused) ( | ||
| 183 | int* bzerror, | ||
| 184 | BZFILE* b, | ||
| 185 | void** unused, | ||
| 186 | int* nUnused | ||
| 187 | ); | ||
| 188 | |||
| 189 | BZ_EXTERN int BZ_API(bzRead) ( | ||
| 190 | int* bzerror, | ||
| 191 | BZFILE* b, | ||
| 192 | void* buf, | ||
| 193 | int len | ||
| 194 | ); | ||
| 195 | |||
| 196 | BZ_EXTERN BZFILE* BZ_API(bzWriteOpen) ( | ||
| 197 | int* bzerror, | ||
| 198 | FILE* f, | ||
| 199 | int blockSize100k, | ||
| 200 | int verbosity, | ||
| 201 | int workFactor | ||
| 202 | ); | ||
| 203 | |||
| 204 | BZ_EXTERN void BZ_API(bzWrite) ( | ||
| 205 | int* bzerror, | ||
| 206 | BZFILE* b, | ||
| 207 | void* buf, | ||
| 208 | int len | ||
| 209 | ); | ||
| 210 | |||
| 211 | BZ_EXTERN void BZ_API(bzWriteClose) ( | ||
| 212 | int* bzerror, | ||
| 213 | BZFILE* b, | ||
| 214 | int abandon, | ||
| 215 | unsigned int* nbytes_in, | ||
| 216 | unsigned int* nbytes_out | ||
| 217 | ); | ||
| 218 | #endif | ||
| 219 | |||
| 220 | |||
| 221 | /*-- Utility functions --*/ | ||
| 222 | |||
| 223 | BZ_EXTERN int BZ_API(bzBuffToBuffCompress) ( | ||
| 224 | char* dest, | ||
| 225 | unsigned int* destLen, | ||
| 226 | char* source, | ||
| 227 | unsigned int sourceLen, | ||
| 228 | int blockSize100k, | ||
| 229 | int verbosity, | ||
| 230 | int workFactor | ||
| 231 | ); | ||
| 232 | |||
| 233 | BZ_EXTERN int BZ_API(bzBuffToBuffDecompress) ( | ||
| 234 | char* dest, | ||
| 235 | unsigned int* destLen, | ||
| 236 | char* source, | ||
| 237 | unsigned int sourceLen, | ||
| 238 | int small, | ||
| 239 | int verbosity | ||
| 240 | ); | ||
| 241 | |||
| 242 | |||
| 243 | /*-- | ||
| 244 | Code contributed by Yoshioka Tsuneo | ||
| 245 | (QWF00133@niftyserve.or.jp/tsuneo-y@is.aist-nara.ac.jp), | ||
| 246 | to support better zlib compatibility. | ||
| 247 | This code is not _officially_ part of libbzip2 (yet); | ||
| 248 | I haven't tested it, documented it, or considered the | ||
| 249 | threading-safeness of it. | ||
| 250 | If this code breaks, please contact both Yoshioka and me. | ||
| 251 | --*/ | ||
| 252 | |||
| 253 | BZ_EXTERN const char * BZ_API(bzlibVersion) ( | ||
| 254 | void | ||
| 255 | ); | ||
| 256 | |||
| 257 | #ifndef BZ_NO_STDIO | ||
| 258 | BZ_EXTERN BZFILE * BZ_API(bzopen) ( | ||
| 259 | const char *path, | ||
| 260 | const char *mode | ||
| 261 | ); | ||
| 262 | |||
| 263 | BZ_EXTERN BZFILE * BZ_API(bzdopen) ( | ||
| 264 | int fd, | ||
| 265 | const char *mode | ||
| 266 | ); | ||
| 267 | |||
| 268 | BZ_EXTERN int BZ_API(bzread) ( | ||
| 269 | BZFILE* b, | ||
| 270 | void* buf, | ||
| 271 | int len | ||
| 272 | ); | ||
| 273 | |||
| 274 | BZ_EXTERN int BZ_API(bzwrite) ( | ||
| 275 | BZFILE* b, | ||
| 276 | void* buf, | ||
| 277 | int len | ||
| 278 | ); | ||
| 279 | |||
| 280 | BZ_EXTERN int BZ_API(bzflush) ( | ||
| 281 | BZFILE* b | ||
| 282 | ); | ||
| 283 | |||
| 284 | BZ_EXTERN void BZ_API(bzclose) ( | ||
| 285 | BZFILE* b | ||
| 286 | ); | ||
| 287 | |||
| 288 | BZ_EXTERN const char * BZ_API(bzerror) ( | ||
| 289 | BZFILE *b, | ||
| 290 | int *errnum | ||
| 291 | ); | ||
| 292 | #endif | ||
| 293 | |||
| 294 | |||
| 295 | #endif | ||
| 296 | |||
| 297 | /*-------------------------------------------------------------*/ | ||
| 298 | /*--- end bzlib.h ---*/ | ||
| 299 | /*-------------------------------------------------------------*/ | ||
diff --git a/bzlib_private.h b/bzlib_private.h new file mode 100644 index 0000000..4044aef --- /dev/null +++ b/bzlib_private.h | |||
| @@ -0,0 +1,523 @@ | |||
| 1 | |||
| 2 | /*-------------------------------------------------------------*/ | ||
| 3 | /*--- Private header file for the library. ---*/ | ||
| 4 | /*--- bzlib_private.h ---*/ | ||
| 5 | /*-------------------------------------------------------------*/ | ||
| 6 | |||
| 7 | /*-- | ||
| 8 | This file is a part of bzip2 and/or libbzip2, a program and | ||
| 9 | library for lossless, block-sorting data compression. | ||
| 10 | |||
| 11 | Copyright (C) 1996-1998 Julian R Seward. All rights reserved. | ||
| 12 | |||
| 13 | Redistribution and use in source and binary forms, with or without | ||
| 14 | modification, are permitted provided that the following conditions | ||
| 15 | are met: | ||
| 16 | |||
| 17 | 1. Redistributions of source code must retain the above copyright | ||
| 18 | notice, this list of conditions and the following disclaimer. | ||
| 19 | |||
| 20 | 2. The origin of this software must not be misrepresented; you must | ||
| 21 | not claim that you wrote the original software. If you use this | ||
| 22 | software in a product, an acknowledgment in the product | ||
| 23 | documentation would be appreciated but is not required. | ||
| 24 | |||
| 25 | 3. Altered source versions must be plainly marked as such, and must | ||
| 26 | not be misrepresented as being the original software. | ||
| 27 | |||
| 28 | 4. The name of the author may not be used to endorse or promote | ||
| 29 | products derived from this software without specific prior written | ||
| 30 | permission. | ||
| 31 | |||
| 32 | THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS | ||
| 33 | OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED | ||
| 34 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE | ||
| 35 | ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY | ||
| 36 | DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL | ||
| 37 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE | ||
| 38 | GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS | ||
| 39 | INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, | ||
| 40 | WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING | ||
| 41 | NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS | ||
| 42 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | ||
| 43 | |||
| 44 | Julian Seward, Guildford, Surrey, UK. | ||
| 45 | jseward@acm.org | ||
| 46 | bzip2/libbzip2 version 0.9.0c of 18 October 1998 | ||
| 47 | |||
| 48 | This program is based on (at least) the work of: | ||
| 49 | Mike Burrows | ||
| 50 | David Wheeler | ||
| 51 | Peter Fenwick | ||
| 52 | Alistair Moffat | ||
| 53 | Radford Neal | ||
| 54 | Ian H. Witten | ||
| 55 | Robert Sedgewick | ||
| 56 | Jon L. Bentley | ||
| 57 | |||
| 58 | For more information on these sources, see the manual. | ||
| 59 | --*/ | ||
| 60 | |||
| 61 | |||
| 62 | #ifndef _BZLIB_PRIVATE_H | ||
| 63 | #define _BZLIB_PRIVATE_H | ||
| 64 | |||
| 65 | #include <stdlib.h> | ||
| 66 | |||
| 67 | #ifndef BZ_NO_STDIO | ||
| 68 | #include <stdio.h> | ||
| 69 | #include <ctype.h> | ||
| 70 | #include <string.h> | ||
| 71 | #endif | ||
| 72 | |||
| 73 | #include "bzlib.h" | ||
| 74 | |||
| 75 | |||
| 76 | |||
| 77 | /*-- General stuff. --*/ | ||
| 78 | |||
| 79 | #define BZ_VERSION "0.9.0c" | ||
| 80 | |||
| 81 | typedef char Char; | ||
| 82 | typedef unsigned char Bool; | ||
| 83 | typedef unsigned char UChar; | ||
| 84 | typedef int Int32; | ||
| 85 | typedef unsigned int UInt32; | ||
| 86 | typedef short Int16; | ||
| 87 | typedef unsigned short UInt16; | ||
| 88 | |||
| 89 | #define True ((Bool)1) | ||
| 90 | #define False ((Bool)0) | ||
| 91 | |||
| 92 | #ifndef __GNUC__ | ||
| 93 | #define __inline__ /* */ | ||
| 94 | #endif | ||
| 95 | |||
| 96 | #ifndef BZ_NO_STDIO | ||
| 97 | extern void bz__AssertH__fail ( int errcode ); | ||
| 98 | #define AssertH(cond,errcode) \ | ||
| 99 | { if (!(cond)) bz__AssertH__fail ( errcode ); } | ||
| 100 | #if BZ_DEBUG | ||
| 101 | #define AssertD(cond,msg) \ | ||
| 102 | { if (!(cond)) { \ | ||
| 103 | fprintf ( stderr, \ | ||
| 104 | "\n\nlibbzip2(debug build): internal error\n\t%s\n", msg );\ | ||
| 105 | exit(1); \ | ||
| 106 | }} | ||
| 107 | #else | ||
| 108 | #define AssertD(cond,msg) /* */ | ||
| 109 | #endif | ||
| 110 | #define VPrintf0(zf) \ | ||
| 111 | fprintf(stderr,zf) | ||
| 112 | #define VPrintf1(zf,za1) \ | ||
| 113 | fprintf(stderr,zf,za1) | ||
| 114 | #define VPrintf2(zf,za1,za2) \ | ||
| 115 | fprintf(stderr,zf,za1,za2) | ||
| 116 | #define VPrintf3(zf,za1,za2,za3) \ | ||
| 117 | fprintf(stderr,zf,za1,za2,za3) | ||
| 118 | #define VPrintf4(zf,za1,za2,za3,za4) \ | ||
| 119 | fprintf(stderr,zf,za1,za2,za3,za4) | ||
| 120 | #define VPrintf5(zf,za1,za2,za3,za4,za5) \ | ||
| 121 | fprintf(stderr,zf,za1,za2,za3,za4,za5) | ||
| 122 | #else | ||
| 123 | extern void bz_internal_error ( int errcode ); | ||
| 124 | #define AssertH(cond,errcode) \ | ||
| 125 | { if (!(cond)) bz_internal_error ( errcode ); } | ||
| 126 | #define AssertD(cond,msg) /* */ | ||
| 127 | #define VPrintf0(zf) /* */ | ||
| 128 | #define VPrintf1(zf,za1) /* */ | ||
| 129 | #define VPrintf2(zf,za1,za2) /* */ | ||
| 130 | #define VPrintf3(zf,za1,za2,za3) /* */ | ||
| 131 | #define VPrintf4(zf,za1,za2,za3,za4) /* */ | ||
| 132 | #define VPrintf5(zf,za1,za2,za3,za4,za5) /* */ | ||
| 133 | #endif | ||
| 134 | |||
| 135 | |||
| 136 | #define BZALLOC(nnn) (strm->bzalloc)(strm->opaque,(nnn),1) | ||
| 137 | #define BZFREE(ppp) (strm->bzfree)(strm->opaque,(ppp)) | ||
| 138 | |||
| 139 | |||
| 140 | /*-- Constants for the back end. --*/ | ||
| 141 | |||
| 142 | #define BZ_MAX_ALPHA_SIZE 258 | ||
| 143 | #define BZ_MAX_CODE_LEN 23 | ||
| 144 | |||
| 145 | #define BZ_RUNA 0 | ||
| 146 | #define BZ_RUNB 1 | ||
| 147 | |||
| 148 | #define BZ_N_GROUPS 6 | ||
| 149 | #define BZ_G_SIZE 50 | ||
| 150 | #define BZ_N_ITERS 4 | ||
| 151 | |||
| 152 | #define BZ_MAX_SELECTORS (2 + (900000 / BZ_G_SIZE)) | ||
| 153 | |||
| 154 | |||
| 155 | |||
| 156 | /*-- Stuff for randomising repetitive blocks. --*/ | ||
| 157 | |||
| 158 | extern Int32 rNums[512]; | ||
| 159 | |||
| 160 | #define BZ_RAND_DECLS \ | ||
| 161 | Int32 rNToGo; \ | ||
| 162 | Int32 rTPos \ | ||
| 163 | |||
| 164 | #define BZ_RAND_INIT_MASK \ | ||
| 165 | s->rNToGo = 0; \ | ||
| 166 | s->rTPos = 0 \ | ||
| 167 | |||
| 168 | #define BZ_RAND_MASK ((s->rNToGo == 1) ? 1 : 0) | ||
| 169 | |||
| 170 | #define BZ_RAND_UPD_MASK \ | ||
| 171 | if (s->rNToGo == 0) { \ | ||
| 172 | s->rNToGo = rNums[s->rTPos]; \ | ||
| 173 | s->rTPos++; \ | ||
| 174 | if (s->rTPos == 512) s->rTPos = 0; \ | ||
| 175 | } \ | ||
| 176 | s->rNToGo--; | ||
| 177 | |||
| 178 | |||
| 179 | |||
| 180 | /*-- Stuff for doing CRCs. --*/ | ||
| 181 | |||
| 182 | extern UInt32 crc32Table[256]; | ||
| 183 | |||
| 184 | #define BZ_INITIALISE_CRC(crcVar) \ | ||
| 185 | { \ | ||
| 186 | crcVar = 0xffffffffL; \ | ||
| 187 | } | ||
| 188 | |||
| 189 | #define BZ_FINALISE_CRC(crcVar) \ | ||
| 190 | { \ | ||
| 191 | crcVar = ~(crcVar); \ | ||
| 192 | } | ||
| 193 | |||
| 194 | #define BZ_UPDATE_CRC(crcVar,cha) \ | ||
| 195 | { \ | ||
| 196 | crcVar = (crcVar << 8) ^ \ | ||
| 197 | crc32Table[(crcVar >> 24) ^ \ | ||
| 198 | ((UChar)cha)]; \ | ||
| 199 | } | ||
| 200 | |||
| 201 | |||
| 202 | |||
| 203 | /*-- States and modes for compression. --*/ | ||
| 204 | |||
| 205 | #define BZ_M_IDLE 1 | ||
| 206 | #define BZ_M_RUNNING 2 | ||
| 207 | #define BZ_M_FLUSHING 3 | ||
| 208 | #define BZ_M_FINISHING 4 | ||
| 209 | |||
| 210 | #define BZ_S_OUTPUT 1 | ||
| 211 | #define BZ_S_INPUT 2 | ||
| 212 | |||
| 213 | #define BZ_NUM_OVERSHOOT_BYTES 20 | ||
| 214 | |||
| 215 | |||
| 216 | |||
| 217 | /*-- Structure holding all the compression-side stuff. --*/ | ||
| 218 | |||
| 219 | typedef | ||
| 220 | struct { | ||
| 221 | /* pointer back to the struct bz_stream */ | ||
| 222 | bz_stream* strm; | ||
| 223 | |||
| 224 | /* mode this stream is in, and whether inputting */ | ||
| 225 | /* or outputting data */ | ||
| 226 | Int32 mode; | ||
| 227 | Int32 state; | ||
| 228 | |||
| 229 | /* remembers avail_in when flush/finish requested */ | ||
| 230 | UInt32 avail_in_expect; | ||
| 231 | |||
| 232 | /* for doing the block sorting */ | ||
| 233 | UChar* block; | ||
| 234 | UInt16* quadrant; | ||
| 235 | UInt32* zptr; | ||
| 236 | UInt16* szptr; | ||
| 237 | Int32* ftab; | ||
| 238 | Int32 workDone; | ||
| 239 | Int32 workLimit; | ||
| 240 | Int32 workFactor; | ||
| 241 | Bool firstAttempt; | ||
| 242 | Bool blockRandomised; | ||
| 243 | Int32 origPtr; | ||
| 244 | |||
| 245 | /* run-length-encoding of the input */ | ||
| 246 | UInt32 state_in_ch; | ||
| 247 | Int32 state_in_len; | ||
| 248 | BZ_RAND_DECLS; | ||
| 249 | |||
| 250 | /* input and output limits and current posns */ | ||
| 251 | Int32 nblock; | ||
| 252 | Int32 nblockMAX; | ||
| 253 | Int32 numZ; | ||
| 254 | Int32 state_out_pos; | ||
| 255 | |||
| 256 | /* map of bytes used in block */ | ||
| 257 | Int32 nInUse; | ||
| 258 | Bool inUse[256]; | ||
| 259 | UChar unseqToSeq[256]; | ||
| 260 | |||
| 261 | /* the buffer for bit stream creation */ | ||
| 262 | UInt32 bsBuff; | ||
| 263 | Int32 bsLive; | ||
| 264 | |||
| 265 | /* block and combined CRCs */ | ||
| 266 | UInt32 blockCRC; | ||
| 267 | UInt32 combinedCRC; | ||
| 268 | |||
| 269 | /* misc administratium */ | ||
| 270 | Int32 verbosity; | ||
| 271 | Int32 blockNo; | ||
| 272 | Int32 nBlocksRandomised; | ||
| 273 | Int32 blockSize100k; | ||
| 274 | |||
| 275 | /* stuff for coding the MTF values */ | ||
| 276 | Int32 nMTF; | ||
| 277 | Int32 mtfFreq [BZ_MAX_ALPHA_SIZE]; | ||
| 278 | UChar selector [BZ_MAX_SELECTORS]; | ||
| 279 | UChar selectorMtf[BZ_MAX_SELECTORS]; | ||
| 280 | |||
| 281 | UChar len [BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE]; | ||
| 282 | Int32 code [BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE]; | ||
| 283 | Int32 rfreq[BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE]; | ||
| 284 | |||
| 285 | } | ||
| 286 | EState; | ||
| 287 | |||
| 288 | |||
| 289 | |||
| 290 | /*-- externs for compression. --*/ | ||
| 291 | |||
| 292 | extern void | ||
| 293 | blockSort ( EState* ); | ||
| 294 | |||
| 295 | extern void | ||
| 296 | compressBlock ( EState*, Bool ); | ||
| 297 | |||
| 298 | extern void | ||
| 299 | bsInitWrite ( EState* ); | ||
| 300 | |||
| 301 | extern void | ||
| 302 | hbAssignCodes ( Int32*, UChar*, Int32, Int32, Int32 ); | ||
| 303 | |||
| 304 | extern void | ||
| 305 | hbMakeCodeLengths ( UChar*, Int32*, Int32, Int32 ); | ||
| 306 | |||
| 307 | |||
| 308 | |||
| 309 | /*-- states for decompression. --*/ | ||
| 310 | |||
| 311 | #define BZ_X_IDLE 1 | ||
| 312 | #define BZ_X_OUTPUT 2 | ||
| 313 | |||
| 314 | #define BZ_X_MAGIC_1 10 | ||
| 315 | #define BZ_X_MAGIC_2 11 | ||
| 316 | #define BZ_X_MAGIC_3 12 | ||
| 317 | #define BZ_X_MAGIC_4 13 | ||
| 318 | #define BZ_X_BLKHDR_1 14 | ||
| 319 | #define BZ_X_BLKHDR_2 15 | ||
| 320 | #define BZ_X_BLKHDR_3 16 | ||
| 321 | #define BZ_X_BLKHDR_4 17 | ||
| 322 | #define BZ_X_BLKHDR_5 18 | ||
| 323 | #define BZ_X_BLKHDR_6 19 | ||
| 324 | #define BZ_X_BCRC_1 20 | ||
| 325 | #define BZ_X_BCRC_2 21 | ||
| 326 | #define BZ_X_BCRC_3 22 | ||
| 327 | #define BZ_X_BCRC_4 23 | ||
| 328 | #define BZ_X_RANDBIT 24 | ||
| 329 | #define BZ_X_ORIGPTR_1 25 | ||
| 330 | #define BZ_X_ORIGPTR_2 26 | ||
| 331 | #define BZ_X_ORIGPTR_3 27 | ||
| 332 | #define BZ_X_MAPPING_1 28 | ||
| 333 | #define BZ_X_MAPPING_2 29 | ||
| 334 | #define BZ_X_SELECTOR_1 30 | ||
| 335 | #define BZ_X_SELECTOR_2 31 | ||
| 336 | #define BZ_X_SELECTOR_3 32 | ||
| 337 | #define BZ_X_CODING_1 33 | ||
| 338 | #define BZ_X_CODING_2 34 | ||
| 339 | #define BZ_X_CODING_3 35 | ||
| 340 | #define BZ_X_MTF_1 36 | ||
| 341 | #define BZ_X_MTF_2 37 | ||
| 342 | #define BZ_X_MTF_3 38 | ||
| 343 | #define BZ_X_MTF_4 39 | ||
| 344 | #define BZ_X_MTF_5 40 | ||
| 345 | #define BZ_X_MTF_6 41 | ||
| 346 | #define BZ_X_ENDHDR_2 42 | ||
| 347 | #define BZ_X_ENDHDR_3 43 | ||
| 348 | #define BZ_X_ENDHDR_4 44 | ||
| 349 | #define BZ_X_ENDHDR_5 45 | ||
| 350 | #define BZ_X_ENDHDR_6 46 | ||
| 351 | #define BZ_X_CCRC_1 47 | ||
| 352 | #define BZ_X_CCRC_2 48 | ||
| 353 | #define BZ_X_CCRC_3 49 | ||
| 354 | #define BZ_X_CCRC_4 50 | ||
| 355 | |||
| 356 | |||
| 357 | |||
| 358 | /*-- Constants for the fast MTF decoder. --*/ | ||
| 359 | |||
| 360 | #define MTFA_SIZE 4096 | ||
| 361 | #define MTFL_SIZE 16 | ||
| 362 | |||
| 363 | |||
| 364 | |||
| 365 | /*-- Structure holding all the decompression-side stuff. --*/ | ||
| 366 | |||
| 367 | typedef | ||
| 368 | struct { | ||
| 369 | /* pointer back to the struct bz_stream */ | ||
| 370 | bz_stream* strm; | ||
| 371 | |||
| 372 | /* state indicator for this stream */ | ||
| 373 | Int32 state; | ||
| 374 | |||
| 375 | /* for doing the final run-length decoding */ | ||
| 376 | UChar state_out_ch; | ||
| 377 | Int32 state_out_len; | ||
| 378 | Bool blockRandomised; | ||
| 379 | BZ_RAND_DECLS; | ||
| 380 | |||
| 381 | /* the buffer for bit stream reading */ | ||
| 382 | UInt32 bsBuff; | ||
| 383 | Int32 bsLive; | ||
| 384 | |||
| 385 | /* misc administratium */ | ||
| 386 | Int32 blockSize100k; | ||
| 387 | Bool smallDecompress; | ||
| 388 | Int32 currBlockNo; | ||
| 389 | Int32 verbosity; | ||
| 390 | |||
| 391 | /* for undoing the Burrows-Wheeler transform */ | ||
| 392 | Int32 origPtr; | ||
| 393 | UInt32 tPos; | ||
| 394 | Int32 k0; | ||
| 395 | Int32 unzftab[256]; | ||
| 396 | Int32 nblock_used; | ||
| 397 | Int32 cftab[257]; | ||
| 398 | Int32 cftabCopy[257]; | ||
| 399 | |||
| 400 | /* for undoing the Burrows-Wheeler transform (FAST) */ | ||
| 401 | UInt32 *tt; | ||
| 402 | |||
| 403 | /* for undoing the Burrows-Wheeler transform (SMALL) */ | ||
| 404 | UInt16 *ll16; | ||
| 405 | UChar *ll4; | ||
| 406 | |||
| 407 | /* stored and calculated CRCs */ | ||
| 408 | UInt32 storedBlockCRC; | ||
| 409 | UInt32 storedCombinedCRC; | ||
| 410 | UInt32 calculatedBlockCRC; | ||
| 411 | UInt32 calculatedCombinedCRC; | ||
| 412 | |||
| 413 | /* map of bytes used in block */ | ||
| 414 | Int32 nInUse; | ||
| 415 | Bool inUse[256]; | ||
| 416 | Bool inUse16[16]; | ||
| 417 | UChar seqToUnseq[256]; | ||
| 418 | |||
| 419 | /* for decoding the MTF values */ | ||
| 420 | UChar mtfa [MTFA_SIZE]; | ||
| 421 | Int32 mtfbase[256 / MTFL_SIZE]; | ||
| 422 | UChar selector [BZ_MAX_SELECTORS]; | ||
| 423 | UChar selectorMtf[BZ_MAX_SELECTORS]; | ||
| 424 | UChar len [BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE]; | ||
| 425 | |||
| 426 | Int32 limit [BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE]; | ||
| 427 | Int32 base [BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE]; | ||
| 428 | Int32 perm [BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE]; | ||
| 429 | Int32 minLens[BZ_N_GROUPS]; | ||
| 430 | |||
| 431 | /* save area for scalars in the main decompress code */ | ||
| 432 | Int32 save_i; | ||
| 433 | Int32 save_j; | ||
| 434 | Int32 save_t; | ||
| 435 | Int32 save_alphaSize; | ||
| 436 | Int32 save_nGroups; | ||
| 437 | Int32 save_nSelectors; | ||
| 438 | Int32 save_EOB; | ||
| 439 | Int32 save_groupNo; | ||
| 440 | Int32 save_groupPos; | ||
| 441 | Int32 save_nextSym; | ||
| 442 | Int32 save_nblockMAX; | ||
| 443 | Int32 save_nblock; | ||
| 444 | Int32 save_es; | ||
| 445 | Int32 save_N; | ||
| 446 | Int32 save_curr; | ||
| 447 | Int32 save_zt; | ||
| 448 | Int32 save_zn; | ||
| 449 | Int32 save_zvec; | ||
| 450 | Int32 save_zj; | ||
| 451 | Int32 save_gSel; | ||
| 452 | Int32 save_gMinlen; | ||
| 453 | Int32* save_gLimit; | ||
| 454 | Int32* save_gBase; | ||
| 455 | Int32* save_gPerm; | ||
| 456 | |||
| 457 | } | ||
| 458 | DState; | ||
| 459 | |||
| 460 | |||
| 461 | |||
| 462 | /*-- Macros for decompression. --*/ | ||
| 463 | |||
| 464 | #define BZ_GET_FAST(cccc) \ | ||
| 465 | s->tPos = s->tt[s->tPos]; \ | ||
| 466 | cccc = (UChar)(s->tPos & 0xff); \ | ||
| 467 | s->tPos >>= 8; | ||
| 468 | |||
| 469 | #define BZ_GET_FAST_C(cccc) \ | ||
| 470 | c_tPos = c_tt[c_tPos]; \ | ||
| 471 | cccc = (UChar)(c_tPos & 0xff); \ | ||
| 472 | c_tPos >>= 8; | ||
| 473 | |||
| 474 | #define SET_LL4(i,n) \ | ||
| 475 | { if (((i) & 0x1) == 0) \ | ||
| 476 | s->ll4[(i) >> 1] = (s->ll4[(i) >> 1] & 0xf0) | (n); else \ | ||
| 477 | s->ll4[(i) >> 1] = (s->ll4[(i) >> 1] & 0x0f) | ((n) << 4); \ | ||
| 478 | } | ||
| 479 | |||
| 480 | #define GET_LL4(i) \ | ||
| 481 | (((UInt32)(s->ll4[(i) >> 1])) >> (((i) << 2) & 0x4) & 0xF) | ||
| 482 | |||
| 483 | #define SET_LL(i,n) \ | ||
| 484 | { s->ll16[i] = (UInt16)(n & 0x0000ffff); \ | ||
| 485 | SET_LL4(i, n >> 16); \ | ||
| 486 | } | ||
| 487 | |||
| 488 | #define GET_LL(i) \ | ||
| 489 | (((UInt32)s->ll16[i]) | (GET_LL4(i) << 16)) | ||
| 490 | |||
| 491 | #define BZ_GET_SMALL(cccc) \ | ||
| 492 | cccc = indexIntoF ( s->tPos, s->cftab ); \ | ||
| 493 | s->tPos = GET_LL(s->tPos); | ||
| 494 | |||
| 495 | |||
| 496 | /*-- externs for decompression. --*/ | ||
| 497 | |||
| 498 | extern Int32 | ||
| 499 | indexIntoF ( Int32, Int32* ); | ||
| 500 | |||
| 501 | extern Int32 | ||
| 502 | decompress ( DState* ); | ||
| 503 | |||
| 504 | extern void | ||
| 505 | hbCreateDecodeTables ( Int32*, Int32*, Int32*, UChar*, | ||
| 506 | Int32, Int32, Int32 ); | ||
| 507 | |||
| 508 | |||
| 509 | #endif | ||
| 510 | |||
| 511 | |||
| 512 | /*-- BZ_NO_STDIO seems to make NULL disappear on some platforms. --*/ | ||
| 513 | |||
| 514 | #ifdef BZ_NO_STDIO | ||
| 515 | #ifndef NULL | ||
| 516 | #define NULL 0 | ||
| 517 | #endif | ||
| 518 | #endif | ||
| 519 | |||
| 520 | |||
| 521 | /*-------------------------------------------------------------*/ | ||
| 522 | /*--- end bzlib_private.h ---*/ | ||
| 523 | /*-------------------------------------------------------------*/ | ||
diff --git a/compress.c b/compress.c new file mode 100644 index 0000000..23abd43 --- /dev/null +++ b/compress.c | |||
| @@ -0,0 +1,588 @@ | |||
| 1 | |||
| 2 | /*-------------------------------------------------------------*/ | ||
| 3 | /*--- Compression machinery (not incl block sorting) ---*/ | ||
| 4 | /*--- compress.c ---*/ | ||
| 5 | /*-------------------------------------------------------------*/ | ||
| 6 | |||
| 7 | /*-- | ||
| 8 | This file is a part of bzip2 and/or libbzip2, a program and | ||
| 9 | library for lossless, block-sorting data compression. | ||
| 10 | |||
| 11 | Copyright (C) 1996-1998 Julian R Seward. All rights reserved. | ||
| 12 | |||
| 13 | Redistribution and use in source and binary forms, with or without | ||
| 14 | modification, are permitted provided that the following conditions | ||
| 15 | are met: | ||
| 16 | |||
| 17 | 1. Redistributions of source code must retain the above copyright | ||
| 18 | notice, this list of conditions and the following disclaimer. | ||
| 19 | |||
| 20 | 2. The origin of this software must not be misrepresented; you must | ||
| 21 | not claim that you wrote the original software. If you use this | ||
| 22 | software in a product, an acknowledgment in the product | ||
| 23 | documentation would be appreciated but is not required. | ||
| 24 | |||
| 25 | 3. Altered source versions must be plainly marked as such, and must | ||
| 26 | not be misrepresented as being the original software. | ||
| 27 | |||
| 28 | 4. The name of the author may not be used to endorse or promote | ||
| 29 | products derived from this software without specific prior written | ||
| 30 | permission. | ||
| 31 | |||
| 32 | THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS | ||
| 33 | OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED | ||
| 34 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE | ||
| 35 | ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY | ||
| 36 | DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL | ||
| 37 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE | ||
| 38 | GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS | ||
| 39 | INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, | ||
| 40 | WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING | ||
| 41 | NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS | ||
| 42 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | ||
| 43 | |||
| 44 | Julian Seward, Guildford, Surrey, UK. | ||
| 45 | jseward@acm.org | ||
| 46 | bzip2/libbzip2 version 0.9.0 of 28 June 1998 | ||
| 47 | |||
| 48 | This program is based on (at least) the work of: | ||
| 49 | Mike Burrows | ||
| 50 | David Wheeler | ||
| 51 | Peter Fenwick | ||
| 52 | Alistair Moffat | ||
| 53 | Radford Neal | ||
| 54 | Ian H. Witten | ||
| 55 | Robert Sedgewick | ||
| 56 | Jon L. Bentley | ||
| 57 | |||
| 58 | For more information on these sources, see the manual. | ||
| 59 | --*/ | ||
| 60 | |||
| 61 | /*-- | ||
| 62 | CHANGES | ||
| 63 | ~~~~~~~ | ||
| 64 | 0.9.0 -- original version. | ||
| 65 | |||
| 66 | 0.9.0a/b -- no changes in this file. | ||
| 67 | |||
| 68 | 0.9.0c | ||
| 69 | * changed setting of nGroups in sendMTFValues() so as to | ||
| 70 | do a bit better on small files | ||
| 71 | --*/ | ||
| 72 | |||
| 73 | #include "bzlib_private.h" | ||
| 74 | |||
| 75 | |||
| 76 | /*---------------------------------------------------*/ | ||
| 77 | /*--- Bit stream I/O ---*/ | ||
| 78 | /*---------------------------------------------------*/ | ||
| 79 | |||
| 80 | /*---------------------------------------------------*/ | ||
| 81 | void bsInitWrite ( EState* s ) | ||
| 82 | { | ||
| 83 | s->bsLive = 0; | ||
| 84 | s->bsBuff = 0; | ||
| 85 | } | ||
| 86 | |||
| 87 | |||
| 88 | /*---------------------------------------------------*/ | ||
| 89 | static | ||
| 90 | void bsFinishWrite ( EState* s ) | ||
| 91 | { | ||
| 92 | while (s->bsLive > 0) { | ||
| 93 | ((UChar*)(s->quadrant))[s->numZ] = (UChar)(s->bsBuff >> 24); | ||
| 94 | s->numZ++; | ||
| 95 | s->bsBuff <<= 8; | ||
| 96 | s->bsLive -= 8; | ||
| 97 | } | ||
| 98 | } | ||
| 99 | |||
| 100 | |||
| 101 | /*---------------------------------------------------*/ | ||
| 102 | #define bsNEEDW(nz) \ | ||
| 103 | { \ | ||
| 104 | while (s->bsLive >= 8) { \ | ||
| 105 | ((UChar*)(s->quadrant))[s->numZ] \ | ||
| 106 | = (UChar)(s->bsBuff >> 24); \ | ||
| 107 | s->numZ++; \ | ||
| 108 | s->bsBuff <<= 8; \ | ||
| 109 | s->bsLive -= 8; \ | ||
| 110 | } \ | ||
| 111 | } | ||
| 112 | |||
| 113 | |||
| 114 | /*---------------------------------------------------*/ | ||
| 115 | static | ||
| 116 | void bsW ( EState* s, Int32 n, UInt32 v ) | ||
| 117 | { | ||
| 118 | bsNEEDW ( n ); | ||
| 119 | s->bsBuff |= (v << (32 - s->bsLive - n)); | ||
| 120 | s->bsLive += n; | ||
| 121 | } | ||
| 122 | |||
| 123 | |||
| 124 | /*---------------------------------------------------*/ | ||
| 125 | static | ||
| 126 | void bsPutUInt32 ( EState* s, UInt32 u ) | ||
| 127 | { | ||
| 128 | bsW ( s, 8, (u >> 24) & 0xffL ); | ||
| 129 | bsW ( s, 8, (u >> 16) & 0xffL ); | ||
| 130 | bsW ( s, 8, (u >> 8) & 0xffL ); | ||
| 131 | bsW ( s, 8, u & 0xffL ); | ||
| 132 | } | ||
| 133 | |||
| 134 | |||
| 135 | /*---------------------------------------------------*/ | ||
| 136 | static | ||
| 137 | void bsPutUChar ( EState* s, UChar c ) | ||
| 138 | { | ||
| 139 | bsW( s, 8, (UInt32)c ); | ||
| 140 | } | ||
| 141 | |||
| 142 | |||
| 143 | /*---------------------------------------------------*/ | ||
| 144 | /*--- The back end proper ---*/ | ||
| 145 | /*---------------------------------------------------*/ | ||
| 146 | |||
| 147 | /*---------------------------------------------------*/ | ||
| 148 | static | ||
| 149 | void makeMaps_e ( EState* s ) | ||
| 150 | { | ||
| 151 | Int32 i; | ||
| 152 | s->nInUse = 0; | ||
| 153 | for (i = 0; i < 256; i++) | ||
| 154 | if (s->inUse[i]) { | ||
| 155 | s->unseqToSeq[i] = s->nInUse; | ||
| 156 | s->nInUse++; | ||
| 157 | } | ||
| 158 | } | ||
| 159 | |||
| 160 | |||
| 161 | /*---------------------------------------------------*/ | ||
| 162 | static | ||
| 163 | void generateMTFValues ( EState* s ) | ||
| 164 | { | ||
| 165 | UChar yy[256]; | ||
| 166 | Int32 i, j; | ||
| 167 | UChar tmp; | ||
| 168 | UChar tmp2; | ||
| 169 | Int32 zPend; | ||
| 170 | Int32 wr; | ||
| 171 | Int32 EOB; | ||
| 172 | |||
| 173 | makeMaps_e ( s ); | ||
| 174 | EOB = s->nInUse+1; | ||
| 175 | |||
| 176 | for (i = 0; i <= EOB; i++) s->mtfFreq[i] = 0; | ||
| 177 | |||
| 178 | wr = 0; | ||
| 179 | zPend = 0; | ||
| 180 | for (i = 0; i < s->nInUse; i++) yy[i] = (UChar) i; | ||
| 181 | |||
| 182 | for (i = 0; i < s->nblock; i++) { | ||
| 183 | UChar ll_i; | ||
| 184 | |||
| 185 | AssertD ( wr <= i, "generateMTFValues(1)" ); | ||
| 186 | j = s->zptr[i]-1; if (j < 0) j += s->nblock; | ||
| 187 | ll_i = s->unseqToSeq[s->block[j]]; | ||
| 188 | AssertD ( ll_i < s->nInUse, "generateMTFValues(2a)" ); | ||
| 189 | |||
| 190 | j = 0; | ||
| 191 | tmp = yy[j]; | ||
| 192 | while ( ll_i != tmp ) { | ||
| 193 | j++; | ||
| 194 | tmp2 = tmp; | ||
| 195 | tmp = yy[j]; | ||
| 196 | yy[j] = tmp2; | ||
| 197 | }; | ||
| 198 | yy[0] = tmp; | ||
| 199 | |||
| 200 | if (j == 0) { | ||
| 201 | zPend++; | ||
| 202 | } else { | ||
| 203 | if (zPend > 0) { | ||
| 204 | zPend--; | ||
| 205 | while (True) { | ||
| 206 | switch (zPend % 2) { | ||
| 207 | case 0: s->szptr[wr] = BZ_RUNA; wr++; s->mtfFreq[BZ_RUNA]++; break; | ||
| 208 | case 1: s->szptr[wr] = BZ_RUNB; wr++; s->mtfFreq[BZ_RUNB]++; break; | ||
| 209 | }; | ||
| 210 | if (zPend < 2) break; | ||
| 211 | zPend = (zPend - 2) / 2; | ||
| 212 | }; | ||
| 213 | zPend = 0; | ||
| 214 | } | ||
| 215 | s->szptr[wr] = j+1; wr++; s->mtfFreq[j+1]++; | ||
| 216 | } | ||
| 217 | } | ||
| 218 | |||
| 219 | if (zPend > 0) { | ||
| 220 | zPend--; | ||
| 221 | while (True) { | ||
| 222 | switch (zPend % 2) { | ||
| 223 | case 0: s->szptr[wr] = BZ_RUNA; wr++; s->mtfFreq[BZ_RUNA]++; break; | ||
| 224 | case 1: s->szptr[wr] = BZ_RUNB; wr++; s->mtfFreq[BZ_RUNB]++; break; | ||
| 225 | }; | ||
| 226 | if (zPend < 2) break; | ||
| 227 | zPend = (zPend - 2) / 2; | ||
| 228 | }; | ||
| 229 | } | ||
| 230 | |||
| 231 | s->szptr[wr] = EOB; wr++; s->mtfFreq[EOB]++; | ||
| 232 | |||
| 233 | s->nMTF = wr; | ||
| 234 | } | ||
| 235 | |||
| 236 | |||
| 237 | /*---------------------------------------------------*/ | ||
| 238 | #define BZ_LESSER_ICOST 0 | ||
| 239 | #define BZ_GREATER_ICOST 15 | ||
| 240 | |||
| 241 | static | ||
| 242 | void sendMTFValues ( EState* s ) | ||
| 243 | { | ||
| 244 | Int32 v, t, i, j, gs, ge, totc, bt, bc, iter; | ||
| 245 | Int32 nSelectors, alphaSize, minLen, maxLen, selCtr; | ||
| 246 | Int32 nGroups, nBytes; | ||
| 247 | |||
| 248 | /*-- | ||
| 249 | UChar len [BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE]; | ||
| 250 | is a global since the decoder also needs it. | ||
| 251 | |||
| 252 | Int32 code[BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE]; | ||
| 253 | Int32 rfreq[BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE]; | ||
| 254 | are also globals only used in this proc. | ||
| 255 | Made global to keep stack frame size small. | ||
| 256 | --*/ | ||
| 257 | |||
| 258 | |||
| 259 | UInt16 cost[BZ_N_GROUPS]; | ||
| 260 | Int32 fave[BZ_N_GROUPS]; | ||
| 261 | |||
| 262 | if (s->verbosity >= 3) | ||
| 263 | VPrintf3( " %d in block, %d after MTF & 1-2 coding, " | ||
| 264 | "%d+2 syms in use\n", | ||
| 265 | s->nblock, s->nMTF, s->nInUse ); | ||
| 266 | |||
| 267 | alphaSize = s->nInUse+2; | ||
| 268 | for (t = 0; t < BZ_N_GROUPS; t++) | ||
| 269 | for (v = 0; v < alphaSize; v++) | ||
| 270 | s->len[t][v] = BZ_GREATER_ICOST; | ||
| 271 | |||
| 272 | /*--- Decide how many coding tables to use ---*/ | ||
| 273 | AssertH ( s->nMTF > 0, 3001 ); | ||
| 274 | if (s->nMTF < 200) nGroups = 2; else | ||
| 275 | if (s->nMTF < 600) nGroups = 3; else | ||
| 276 | if (s->nMTF < 1200) nGroups = 4; else | ||
| 277 | if (s->nMTF < 2400) nGroups = 5; else | ||
| 278 | nGroups = 6; | ||
| 279 | |||
| 280 | /*--- Generate an initial set of coding tables ---*/ | ||
| 281 | { | ||
| 282 | Int32 nPart, remF, tFreq, aFreq; | ||
| 283 | |||
| 284 | nPart = nGroups; | ||
| 285 | remF = s->nMTF; | ||
| 286 | gs = 0; | ||
| 287 | while (nPart > 0) { | ||
| 288 | tFreq = remF / nPart; | ||
| 289 | ge = gs-1; | ||
| 290 | aFreq = 0; | ||
| 291 | while (aFreq < tFreq && ge < alphaSize-1) { | ||
| 292 | ge++; | ||
| 293 | aFreq += s->mtfFreq[ge]; | ||
| 294 | } | ||
| 295 | |||
| 296 | if (ge > gs | ||
| 297 | && nPart != nGroups && nPart != 1 | ||
| 298 | && ((nGroups-nPart) % 2 == 1)) { | ||
| 299 | aFreq -= s->mtfFreq[ge]; | ||
| 300 | ge--; | ||
| 301 | } | ||
| 302 | |||
| 303 | if (s->verbosity >= 3) | ||
| 304 | VPrintf5( " initial group %d, [%d .. %d], " | ||
| 305 | "has %d syms (%4.1f%%)\n", | ||
| 306 | nPart, gs, ge, aFreq, | ||
| 307 | (100.0 * (float)aFreq) / (float)(s->nMTF) ); | ||
| 308 | |||
| 309 | for (v = 0; v < alphaSize; v++) | ||
| 310 | if (v >= gs && v <= ge) | ||
| 311 | s->len[nPart-1][v] = BZ_LESSER_ICOST; else | ||
| 312 | s->len[nPart-1][v] = BZ_GREATER_ICOST; | ||
| 313 | |||
| 314 | nPart--; | ||
| 315 | gs = ge+1; | ||
| 316 | remF -= aFreq; | ||
| 317 | } | ||
| 318 | } | ||
| 319 | |||
| 320 | /*--- | ||
| 321 | Iterate up to BZ_N_ITERS times to improve the tables. | ||
| 322 | ---*/ | ||
| 323 | for (iter = 0; iter < BZ_N_ITERS; iter++) { | ||
| 324 | |||
| 325 | for (t = 0; t < nGroups; t++) fave[t] = 0; | ||
| 326 | |||
| 327 | for (t = 0; t < nGroups; t++) | ||
| 328 | for (v = 0; v < alphaSize; v++) | ||
| 329 | s->rfreq[t][v] = 0; | ||
| 330 | |||
| 331 | nSelectors = 0; | ||
| 332 | totc = 0; | ||
| 333 | gs = 0; | ||
| 334 | while (True) { | ||
| 335 | |||
| 336 | /*--- Set group start & end marks. --*/ | ||
| 337 | if (gs >= s->nMTF) break; | ||
| 338 | ge = gs + BZ_G_SIZE - 1; | ||
| 339 | if (ge >= s->nMTF) ge = s->nMTF-1; | ||
| 340 | |||
| 341 | /*-- | ||
| 342 | Calculate the cost of this group as coded | ||
| 343 | by each of the coding tables. | ||
| 344 | --*/ | ||
| 345 | for (t = 0; t < nGroups; t++) cost[t] = 0; | ||
| 346 | |||
| 347 | if (nGroups == 6) { | ||
| 348 | register UInt16 cost0, cost1, cost2, cost3, cost4, cost5; | ||
| 349 | cost0 = cost1 = cost2 = cost3 = cost4 = cost5 = 0; | ||
| 350 | for (i = gs; i <= ge; i++) { | ||
| 351 | UInt16 icv = s->szptr[i]; | ||
| 352 | cost0 += s->len[0][icv]; | ||
| 353 | cost1 += s->len[1][icv]; | ||
| 354 | cost2 += s->len[2][icv]; | ||
| 355 | cost3 += s->len[3][icv]; | ||
| 356 | cost4 += s->len[4][icv]; | ||
| 357 | cost5 += s->len[5][icv]; | ||
| 358 | } | ||
| 359 | cost[0] = cost0; cost[1] = cost1; cost[2] = cost2; | ||
| 360 | cost[3] = cost3; cost[4] = cost4; cost[5] = cost5; | ||
| 361 | } else { | ||
| 362 | for (i = gs; i <= ge; i++) { | ||
| 363 | UInt16 icv = s->szptr[i]; | ||
| 364 | for (t = 0; t < nGroups; t++) cost[t] += s->len[t][icv]; | ||
| 365 | } | ||
| 366 | } | ||
| 367 | |||
| 368 | /*-- | ||
| 369 | Find the coding table which is best for this group, | ||
| 370 | and record its identity in the selector table. | ||
| 371 | --*/ | ||
| 372 | bc = 999999999; bt = -1; | ||
| 373 | for (t = 0; t < nGroups; t++) | ||
| 374 | if (cost[t] < bc) { bc = cost[t]; bt = t; }; | ||
| 375 | totc += bc; | ||
| 376 | fave[bt]++; | ||
| 377 | s->selector[nSelectors] = bt; | ||
| 378 | nSelectors++; | ||
| 379 | |||
| 380 | /*-- | ||
| 381 | Increment the symbol frequencies for the selected table. | ||
| 382 | --*/ | ||
| 383 | for (i = gs; i <= ge; i++) | ||
| 384 | s->rfreq[bt][ s->szptr[i] ]++; | ||
| 385 | |||
| 386 | gs = ge+1; | ||
| 387 | } | ||
| 388 | if (s->verbosity >= 3) { | ||
| 389 | VPrintf2 ( " pass %d: size is %d, grp uses are ", | ||
| 390 | iter+1, totc/8 ); | ||
| 391 | for (t = 0; t < nGroups; t++) | ||
| 392 | VPrintf1 ( "%d ", fave[t] ); | ||
| 393 | VPrintf0 ( "\n" ); | ||
| 394 | } | ||
| 395 | |||
| 396 | /*-- | ||
| 397 | Recompute the tables based on the accumulated frequencies. | ||
| 398 | --*/ | ||
| 399 | for (t = 0; t < nGroups; t++) | ||
| 400 | hbMakeCodeLengths ( &(s->len[t][0]), &(s->rfreq[t][0]), | ||
| 401 | alphaSize, 20 ); | ||
| 402 | } | ||
| 403 | |||
| 404 | |||
| 405 | AssertH( nGroups < 8, 3002 ); | ||
| 406 | AssertH( nSelectors < 32768 && | ||
| 407 | nSelectors <= (2 + (900000 / BZ_G_SIZE)), | ||
| 408 | 3003 ); | ||
| 409 | |||
| 410 | |||
| 411 | /*--- Compute MTF values for the selectors. ---*/ | ||
| 412 | { | ||
| 413 | UChar pos[BZ_N_GROUPS], ll_i, tmp2, tmp; | ||
| 414 | for (i = 0; i < nGroups; i++) pos[i] = i; | ||
| 415 | for (i = 0; i < nSelectors; i++) { | ||
| 416 | ll_i = s->selector[i]; | ||
| 417 | j = 0; | ||
| 418 | tmp = pos[j]; | ||
| 419 | while ( ll_i != tmp ) { | ||
| 420 | j++; | ||
| 421 | tmp2 = tmp; | ||
| 422 | tmp = pos[j]; | ||
| 423 | pos[j] = tmp2; | ||
| 424 | }; | ||
| 425 | pos[0] = tmp; | ||
| 426 | s->selectorMtf[i] = j; | ||
| 427 | } | ||
| 428 | }; | ||
| 429 | |||
| 430 | /*--- Assign actual codes for the tables. --*/ | ||
| 431 | for (t = 0; t < nGroups; t++) { | ||
| 432 | minLen = 32; | ||
| 433 | maxLen = 0; | ||
| 434 | for (i = 0; i < alphaSize; i++) { | ||
| 435 | if (s->len[t][i] > maxLen) maxLen = s->len[t][i]; | ||
| 436 | if (s->len[t][i] < minLen) minLen = s->len[t][i]; | ||
| 437 | } | ||
| 438 | AssertH ( !(maxLen > 20), 3004 ); | ||
| 439 | AssertH ( !(minLen < 1), 3005 ); | ||
| 440 | hbAssignCodes ( &(s->code[t][0]), &(s->len[t][0]), | ||
| 441 | minLen, maxLen, alphaSize ); | ||
| 442 | } | ||
| 443 | |||
| 444 | /*--- Transmit the mapping table. ---*/ | ||
| 445 | { | ||
| 446 | Bool inUse16[16]; | ||
| 447 | for (i = 0; i < 16; i++) { | ||
| 448 | inUse16[i] = False; | ||
| 449 | for (j = 0; j < 16; j++) | ||
| 450 | if (s->inUse[i * 16 + j]) inUse16[i] = True; | ||
| 451 | } | ||
| 452 | |||
| 453 | nBytes = s->numZ; | ||
| 454 | for (i = 0; i < 16; i++) | ||
| 455 | if (inUse16[i]) bsW(s,1,1); else bsW(s,1,0); | ||
| 456 | |||
| 457 | for (i = 0; i < 16; i++) | ||
| 458 | if (inUse16[i]) | ||
| 459 | for (j = 0; j < 16; j++) { | ||
| 460 | if (s->inUse[i * 16 + j]) bsW(s,1,1); else bsW(s,1,0); | ||
| 461 | } | ||
| 462 | |||
| 463 | if (s->verbosity >= 3) | ||
| 464 | VPrintf1( " bytes: mapping %d, ", s->numZ-nBytes ); | ||
| 465 | } | ||
| 466 | |||
| 467 | /*--- Now the selectors. ---*/ | ||
| 468 | nBytes = s->numZ; | ||
| 469 | bsW ( s, 3, nGroups ); | ||
| 470 | bsW ( s, 15, nSelectors ); | ||
| 471 | for (i = 0; i < nSelectors; i++) { | ||
| 472 | for (j = 0; j < s->selectorMtf[i]; j++) bsW(s,1,1); | ||
| 473 | bsW(s,1,0); | ||
| 474 | } | ||
| 475 | if (s->verbosity >= 3) | ||
| 476 | VPrintf1( "selectors %d, ", s->numZ-nBytes ); | ||
| 477 | |||
| 478 | /*--- Now the coding tables. ---*/ | ||
| 479 | nBytes = s->numZ; | ||
| 480 | |||
| 481 | for (t = 0; t < nGroups; t++) { | ||
| 482 | Int32 curr = s->len[t][0]; | ||
| 483 | bsW ( s, 5, curr ); | ||
| 484 | for (i = 0; i < alphaSize; i++) { | ||
| 485 | while (curr < s->len[t][i]) { bsW(s,2,2); curr++; /* 10 */ }; | ||
| 486 | while (curr > s->len[t][i]) { bsW(s,2,3); curr--; /* 11 */ }; | ||
| 487 | bsW ( s, 1, 0 ); | ||
| 488 | } | ||
| 489 | } | ||
| 490 | |||
| 491 | if (s->verbosity >= 3) | ||
| 492 | VPrintf1 ( "code lengths %d, ", s->numZ-nBytes ); | ||
| 493 | |||
| 494 | /*--- And finally, the block data proper ---*/ | ||
| 495 | nBytes = s->numZ; | ||
| 496 | selCtr = 0; | ||
| 497 | gs = 0; | ||
| 498 | while (True) { | ||
| 499 | if (gs >= s->nMTF) break; | ||
| 500 | ge = gs + BZ_G_SIZE - 1; | ||
| 501 | if (ge >= s->nMTF) ge = s->nMTF-1; | ||
| 502 | for (i = gs; i <= ge; i++) { | ||
| 503 | AssertH ( s->selector[selCtr] < nGroups, 3006 ); | ||
| 504 | bsW ( s, | ||
| 505 | s->len [s->selector[selCtr]] [s->szptr[i]], | ||
| 506 | s->code [s->selector[selCtr]] [s->szptr[i]] ); | ||
| 507 | } | ||
| 508 | |||
| 509 | gs = ge+1; | ||
| 510 | selCtr++; | ||
| 511 | } | ||
| 512 | AssertH( selCtr == nSelectors, 3007 ); | ||
| 513 | |||
| 514 | if (s->verbosity >= 3) | ||
| 515 | VPrintf1( "codes %d\n", s->numZ-nBytes ); | ||
| 516 | } | ||
| 517 | |||
| 518 | |||
| 519 | /*---------------------------------------------------*/ | ||
| 520 | void compressBlock ( EState* s, Bool is_last_block ) | ||
| 521 | { | ||
| 522 | if (s->nblock > 0) { | ||
| 523 | |||
| 524 | BZ_FINALISE_CRC ( s->blockCRC ); | ||
| 525 | s->combinedCRC = (s->combinedCRC << 1) | (s->combinedCRC >> 31); | ||
| 526 | s->combinedCRC ^= s->blockCRC; | ||
| 527 | if (s->blockNo > 1) s->numZ = 0; | ||
| 528 | |||
| 529 | if (s->verbosity >= 2) | ||
| 530 | VPrintf4( " block %d: crc = 0x%8x, " | ||
| 531 | "combined CRC = 0x%8x, size = %d\n", | ||
| 532 | s->blockNo, s->blockCRC, s->combinedCRC, s->nblock ); | ||
| 533 | |||
| 534 | blockSort ( s ); | ||
| 535 | } | ||
| 536 | |||
| 537 | /*-- If this is the first block, create the stream header. --*/ | ||
| 538 | if (s->blockNo == 1) { | ||
| 539 | bsInitWrite ( s ); | ||
| 540 | bsPutUChar ( s, 'B' ); | ||
| 541 | bsPutUChar ( s, 'Z' ); | ||
| 542 | bsPutUChar ( s, 'h' ); | ||
| 543 | bsPutUChar ( s, '0' + s->blockSize100k ); | ||
| 544 | } | ||
| 545 | |||
| 546 | if (s->nblock > 0) { | ||
| 547 | |||
| 548 | bsPutUChar ( s, 0x31 ); bsPutUChar ( s, 0x41 ); | ||
| 549 | bsPutUChar ( s, 0x59 ); bsPutUChar ( s, 0x26 ); | ||
| 550 | bsPutUChar ( s, 0x53 ); bsPutUChar ( s, 0x59 ); | ||
| 551 | |||
| 552 | /*-- Now the block's CRC, so it is in a known place. --*/ | ||
| 553 | bsPutUInt32 ( s, s->blockCRC ); | ||
| 554 | |||
| 555 | /*-- Now a single bit indicating randomisation. --*/ | ||
| 556 | if (s->blockRandomised) { | ||
| 557 | bsW(s,1,1); s->nBlocksRandomised++; | ||
| 558 | } else | ||
| 559 | bsW(s,1,0); | ||
| 560 | |||
| 561 | bsW ( s, 24, s->origPtr ); | ||
| 562 | generateMTFValues ( s ); | ||
| 563 | sendMTFValues ( s ); | ||
| 564 | } | ||
| 565 | |||
| 566 | |||
| 567 | /*-- If this is the last block, add the stream trailer. --*/ | ||
| 568 | if (is_last_block) { | ||
| 569 | |||
| 570 | if (s->verbosity >= 2 && s->nBlocksRandomised > 0) | ||
| 571 | VPrintf2 ( " %d block%s needed randomisation\n", | ||
| 572 | s->nBlocksRandomised, | ||
| 573 | s->nBlocksRandomised == 1 ? "" : "s" ); | ||
| 574 | |||
| 575 | bsPutUChar ( s, 0x17 ); bsPutUChar ( s, 0x72 ); | ||
| 576 | bsPutUChar ( s, 0x45 ); bsPutUChar ( s, 0x38 ); | ||
| 577 | bsPutUChar ( s, 0x50 ); bsPutUChar ( s, 0x90 ); | ||
| 578 | bsPutUInt32 ( s, s->combinedCRC ); | ||
| 579 | if (s->verbosity >= 2) | ||
| 580 | VPrintf1( " final combined CRC = 0x%x\n ", s->combinedCRC ); | ||
| 581 | bsFinishWrite ( s ); | ||
| 582 | } | ||
| 583 | } | ||
| 584 | |||
| 585 | |||
| 586 | /*-------------------------------------------------------------*/ | ||
| 587 | /*--- end compress.c ---*/ | ||
| 588 | /*-------------------------------------------------------------*/ | ||
diff --git a/crctable.c b/crctable.c new file mode 100644 index 0000000..2f3eacb --- /dev/null +++ b/crctable.c | |||
| @@ -0,0 +1,144 @@ | |||
| 1 | |||
| 2 | /*-------------------------------------------------------------*/ | ||
| 3 | /*--- Table for doing CRCs ---*/ | ||
| 4 | /*--- crctable.c ---*/ | ||
| 5 | /*-------------------------------------------------------------*/ | ||
| 6 | |||
| 7 | /*-- | ||
| 8 | This file is a part of bzip2 and/or libbzip2, a program and | ||
| 9 | library for lossless, block-sorting data compression. | ||
| 10 | |||
| 11 | Copyright (C) 1996-1998 Julian R Seward. All rights reserved. | ||
| 12 | |||
| 13 | Redistribution and use in source and binary forms, with or without | ||
| 14 | modification, are permitted provided that the following conditions | ||
| 15 | are met: | ||
| 16 | |||
| 17 | 1. Redistributions of source code must retain the above copyright | ||
| 18 | notice, this list of conditions and the following disclaimer. | ||
| 19 | |||
| 20 | 2. The origin of this software must not be misrepresented; you must | ||
| 21 | not claim that you wrote the original software. If you use this | ||
| 22 | software in a product, an acknowledgment in the product | ||
| 23 | documentation would be appreciated but is not required. | ||
| 24 | |||
| 25 | 3. Altered source versions must be plainly marked as such, and must | ||
| 26 | not be misrepresented as being the original software. | ||
| 27 | |||
| 28 | 4. The name of the author may not be used to endorse or promote | ||
| 29 | products derived from this software without specific prior written | ||
| 30 | permission. | ||
| 31 | |||
| 32 | THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS | ||
| 33 | OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED | ||
| 34 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE | ||
| 35 | ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY | ||
| 36 | DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL | ||
| 37 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE | ||
| 38 | GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS | ||
| 39 | INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, | ||
| 40 | WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING | ||
| 41 | NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS | ||
| 42 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | ||
| 43 | |||
| 44 | Julian Seward, Guildford, Surrey, UK. | ||
| 45 | jseward@acm.org | ||
| 46 | bzip2/libbzip2 version 0.9.0c of 18 October 1998 | ||
| 47 | |||
| 48 | This program is based on (at least) the work of: | ||
| 49 | Mike Burrows | ||
| 50 | David Wheeler | ||
| 51 | Peter Fenwick | ||
| 52 | Alistair Moffat | ||
| 53 | Radford Neal | ||
| 54 | Ian H. Witten | ||
| 55 | Robert Sedgewick | ||
| 56 | Jon L. Bentley | ||
| 57 | |||
| 58 | For more information on these sources, see the manual. | ||
| 59 | --*/ | ||
| 60 | |||
| 61 | |||
| 62 | #include "bzlib_private.h" | ||
| 63 | |||
| 64 | /*-- | ||
| 65 | I think this is an implementation of the AUTODIN-II, | ||
| 66 | Ethernet & FDDI 32-bit CRC standard. Vaguely derived | ||
| 67 | from code by Rob Warnock, in Section 51 of the | ||
| 68 | comp.compression FAQ. | ||
| 69 | --*/ | ||
| 70 | |||
| 71 | UInt32 crc32Table[256] = { | ||
| 72 | |||
| 73 | /*-- Ugly, innit? --*/ | ||
| 74 | |||
| 75 | 0x00000000L, 0x04c11db7L, 0x09823b6eL, 0x0d4326d9L, | ||
| 76 | 0x130476dcL, 0x17c56b6bL, 0x1a864db2L, 0x1e475005L, | ||
| 77 | 0x2608edb8L, 0x22c9f00fL, 0x2f8ad6d6L, 0x2b4bcb61L, | ||
| 78 | 0x350c9b64L, 0x31cd86d3L, 0x3c8ea00aL, 0x384fbdbdL, | ||
| 79 | 0x4c11db70L, 0x48d0c6c7L, 0x4593e01eL, 0x4152fda9L, | ||
| 80 | 0x5f15adacL, 0x5bd4b01bL, 0x569796c2L, 0x52568b75L, | ||
| 81 | 0x6a1936c8L, 0x6ed82b7fL, 0x639b0da6L, 0x675a1011L, | ||
| 82 | 0x791d4014L, 0x7ddc5da3L, 0x709f7b7aL, 0x745e66cdL, | ||
| 83 | 0x9823b6e0L, 0x9ce2ab57L, 0x91a18d8eL, 0x95609039L, | ||
| 84 | 0x8b27c03cL, 0x8fe6dd8bL, 0x82a5fb52L, 0x8664e6e5L, | ||
| 85 | 0xbe2b5b58L, 0xbaea46efL, 0xb7a96036L, 0xb3687d81L, | ||
| 86 | 0xad2f2d84L, 0xa9ee3033L, 0xa4ad16eaL, 0xa06c0b5dL, | ||
| 87 | 0xd4326d90L, 0xd0f37027L, 0xddb056feL, 0xd9714b49L, | ||
| 88 | 0xc7361b4cL, 0xc3f706fbL, 0xceb42022L, 0xca753d95L, | ||
| 89 | 0xf23a8028L, 0xf6fb9d9fL, 0xfbb8bb46L, 0xff79a6f1L, | ||
| 90 | 0xe13ef6f4L, 0xe5ffeb43L, 0xe8bccd9aL, 0xec7dd02dL, | ||
| 91 | 0x34867077L, 0x30476dc0L, 0x3d044b19L, 0x39c556aeL, | ||
| 92 | 0x278206abL, 0x23431b1cL, 0x2e003dc5L, 0x2ac12072L, | ||
| 93 | 0x128e9dcfL, 0x164f8078L, 0x1b0ca6a1L, 0x1fcdbb16L, | ||
| 94 | 0x018aeb13L, 0x054bf6a4L, 0x0808d07dL, 0x0cc9cdcaL, | ||
| 95 | 0x7897ab07L, 0x7c56b6b0L, 0x71159069L, 0x75d48ddeL, | ||
| 96 | 0x6b93dddbL, 0x6f52c06cL, 0x6211e6b5L, 0x66d0fb02L, | ||
| 97 | 0x5e9f46bfL, 0x5a5e5b08L, 0x571d7dd1L, 0x53dc6066L, | ||
| 98 | 0x4d9b3063L, 0x495a2dd4L, 0x44190b0dL, 0x40d816baL, | ||
| 99 | 0xaca5c697L, 0xa864db20L, 0xa527fdf9L, 0xa1e6e04eL, | ||
| 100 | 0xbfa1b04bL, 0xbb60adfcL, 0xb6238b25L, 0xb2e29692L, | ||
| 101 | 0x8aad2b2fL, 0x8e6c3698L, 0x832f1041L, 0x87ee0df6L, | ||
| 102 | 0x99a95df3L, 0x9d684044L, 0x902b669dL, 0x94ea7b2aL, | ||
| 103 | 0xe0b41de7L, 0xe4750050L, 0xe9362689L, 0xedf73b3eL, | ||
| 104 | 0xf3b06b3bL, 0xf771768cL, 0xfa325055L, 0xfef34de2L, | ||
| 105 | 0xc6bcf05fL, 0xc27dede8L, 0xcf3ecb31L, 0xcbffd686L, | ||
| 106 | 0xd5b88683L, 0xd1799b34L, 0xdc3abdedL, 0xd8fba05aL, | ||
| 107 | 0x690ce0eeL, 0x6dcdfd59L, 0x608edb80L, 0x644fc637L, | ||
| 108 | 0x7a089632L, 0x7ec98b85L, 0x738aad5cL, 0x774bb0ebL, | ||
| 109 | 0x4f040d56L, 0x4bc510e1L, 0x46863638L, 0x42472b8fL, | ||
| 110 | 0x5c007b8aL, 0x58c1663dL, 0x558240e4L, 0x51435d53L, | ||
| 111 | 0x251d3b9eL, 0x21dc2629L, 0x2c9f00f0L, 0x285e1d47L, | ||
| 112 | 0x36194d42L, 0x32d850f5L, 0x3f9b762cL, 0x3b5a6b9bL, | ||
| 113 | 0x0315d626L, 0x07d4cb91L, 0x0a97ed48L, 0x0e56f0ffL, | ||
| 114 | 0x1011a0faL, 0x14d0bd4dL, 0x19939b94L, 0x1d528623L, | ||
| 115 | 0xf12f560eL, 0xf5ee4bb9L, 0xf8ad6d60L, 0xfc6c70d7L, | ||
| 116 | 0xe22b20d2L, 0xe6ea3d65L, 0xeba91bbcL, 0xef68060bL, | ||
| 117 | 0xd727bbb6L, 0xd3e6a601L, 0xdea580d8L, 0xda649d6fL, | ||
| 118 | 0xc423cd6aL, 0xc0e2d0ddL, 0xcda1f604L, 0xc960ebb3L, | ||
| 119 | 0xbd3e8d7eL, 0xb9ff90c9L, 0xb4bcb610L, 0xb07daba7L, | ||
| 120 | 0xae3afba2L, 0xaafbe615L, 0xa7b8c0ccL, 0xa379dd7bL, | ||
| 121 | 0x9b3660c6L, 0x9ff77d71L, 0x92b45ba8L, 0x9675461fL, | ||
| 122 | 0x8832161aL, 0x8cf30badL, 0x81b02d74L, 0x857130c3L, | ||
| 123 | 0x5d8a9099L, 0x594b8d2eL, 0x5408abf7L, 0x50c9b640L, | ||
| 124 | 0x4e8ee645L, 0x4a4ffbf2L, 0x470cdd2bL, 0x43cdc09cL, | ||
| 125 | 0x7b827d21L, 0x7f436096L, 0x7200464fL, 0x76c15bf8L, | ||
| 126 | 0x68860bfdL, 0x6c47164aL, 0x61043093L, 0x65c52d24L, | ||
| 127 | 0x119b4be9L, 0x155a565eL, 0x18197087L, 0x1cd86d30L, | ||
| 128 | 0x029f3d35L, 0x065e2082L, 0x0b1d065bL, 0x0fdc1becL, | ||
| 129 | 0x3793a651L, 0x3352bbe6L, 0x3e119d3fL, 0x3ad08088L, | ||
| 130 | 0x2497d08dL, 0x2056cd3aL, 0x2d15ebe3L, 0x29d4f654L, | ||
| 131 | 0xc5a92679L, 0xc1683bceL, 0xcc2b1d17L, 0xc8ea00a0L, | ||
| 132 | 0xd6ad50a5L, 0xd26c4d12L, 0xdf2f6bcbL, 0xdbee767cL, | ||
| 133 | 0xe3a1cbc1L, 0xe760d676L, 0xea23f0afL, 0xeee2ed18L, | ||
| 134 | 0xf0a5bd1dL, 0xf464a0aaL, 0xf9278673L, 0xfde69bc4L, | ||
| 135 | 0x89b8fd09L, 0x8d79e0beL, 0x803ac667L, 0x84fbdbd0L, | ||
| 136 | 0x9abc8bd5L, 0x9e7d9662L, 0x933eb0bbL, 0x97ffad0cL, | ||
| 137 | 0xafb010b1L, 0xab710d06L, 0xa6322bdfL, 0xa2f33668L, | ||
| 138 | 0xbcb4666dL, 0xb8757bdaL, 0xb5365d03L, 0xb1f740b4L | ||
| 139 | }; | ||
| 140 | |||
| 141 | |||
| 142 | /*-------------------------------------------------------------*/ | ||
| 143 | /*--- end crctable.c ---*/ | ||
| 144 | /*-------------------------------------------------------------*/ | ||
diff --git a/decompress.c b/decompress.c new file mode 100644 index 0000000..ac2b0a5 --- /dev/null +++ b/decompress.c | |||
| @@ -0,0 +1,636 @@ | |||
| 1 | |||
| 2 | /*-------------------------------------------------------------*/ | ||
| 3 | /*--- Decompression machinery ---*/ | ||
| 4 | /*--- decompress.c ---*/ | ||
| 5 | /*-------------------------------------------------------------*/ | ||
| 6 | |||
| 7 | /*-- | ||
| 8 | This file is a part of bzip2 and/or libbzip2, a program and | ||
| 9 | library for lossless, block-sorting data compression. | ||
| 10 | |||
| 11 | Copyright (C) 1996-1998 Julian R Seward. All rights reserved. | ||
| 12 | |||
| 13 | Redistribution and use in source and binary forms, with or without | ||
| 14 | modification, are permitted provided that the following conditions | ||
| 15 | are met: | ||
| 16 | |||
| 17 | 1. Redistributions of source code must retain the above copyright | ||
| 18 | notice, this list of conditions and the following disclaimer. | ||
| 19 | |||
| 20 | 2. The origin of this software must not be misrepresented; you must | ||
| 21 | not claim that you wrote the original software. If you use this | ||
| 22 | software in a product, an acknowledgment in the product | ||
| 23 | documentation would be appreciated but is not required. | ||
| 24 | |||
| 25 | 3. Altered source versions must be plainly marked as such, and must | ||
| 26 | not be misrepresented as being the original software. | ||
| 27 | |||
| 28 | 4. The name of the author may not be used to endorse or promote | ||
| 29 | products derived from this software without specific prior written | ||
| 30 | permission. | ||
| 31 | |||
| 32 | THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS | ||
| 33 | OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED | ||
| 34 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE | ||
| 35 | ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY | ||
| 36 | DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL | ||
| 37 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE | ||
| 38 | GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS | ||
| 39 | INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, | ||
| 40 | WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING | ||
| 41 | NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS | ||
| 42 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | ||
| 43 | |||
| 44 | Julian Seward, Guildford, Surrey, UK. | ||
| 45 | jseward@acm.org | ||
| 46 | bzip2/libbzip2 version 0.9.0c of 18 October 1998 | ||
| 47 | |||
| 48 | This program is based on (at least) the work of: | ||
| 49 | Mike Burrows | ||
| 50 | David Wheeler | ||
| 51 | Peter Fenwick | ||
| 52 | Alistair Moffat | ||
| 53 | Radford Neal | ||
| 54 | Ian H. Witten | ||
| 55 | Robert Sedgewick | ||
| 56 | Jon L. Bentley | ||
| 57 | |||
| 58 | For more information on these sources, see the manual. | ||
| 59 | --*/ | ||
| 60 | |||
| 61 | |||
| 62 | #include "bzlib_private.h" | ||
| 63 | |||
| 64 | |||
| 65 | /*---------------------------------------------------*/ | ||
| 66 | static | ||
| 67 | void makeMaps_d ( DState* s ) | ||
| 68 | { | ||
| 69 | Int32 i; | ||
| 70 | s->nInUse = 0; | ||
| 71 | for (i = 0; i < 256; i++) | ||
| 72 | if (s->inUse[i]) { | ||
| 73 | s->seqToUnseq[s->nInUse] = i; | ||
| 74 | s->nInUse++; | ||
| 75 | } | ||
| 76 | } | ||
| 77 | |||
| 78 | |||
| 79 | /*---------------------------------------------------*/ | ||
| 80 | #define RETURN(rrr) \ | ||
| 81 | { retVal = rrr; goto save_state_and_return; }; | ||
| 82 | |||
| 83 | #define GET_BITS(lll,vvv,nnn) \ | ||
| 84 | case lll: s->state = lll; \ | ||
| 85 | while (True) { \ | ||
| 86 | if (s->bsLive >= nnn) { \ | ||
| 87 | UInt32 v; \ | ||
| 88 | v = (s->bsBuff >> \ | ||
| 89 | (s->bsLive-nnn)) & ((1 << nnn)-1); \ | ||
| 90 | s->bsLive -= nnn; \ | ||
| 91 | vvv = v; \ | ||
| 92 | break; \ | ||
| 93 | } \ | ||
| 94 | if (s->strm->avail_in == 0) RETURN(BZ_OK); \ | ||
| 95 | s->bsBuff \ | ||
| 96 | = (s->bsBuff << 8) | \ | ||
| 97 | ((UInt32) \ | ||
| 98 | (*((UChar*)(s->strm->next_in)))); \ | ||
| 99 | s->bsLive += 8; \ | ||
| 100 | s->strm->next_in++; \ | ||
| 101 | s->strm->avail_in--; \ | ||
| 102 | s->strm->total_in++; \ | ||
| 103 | } | ||
| 104 | |||
| 105 | #define GET_UCHAR(lll,uuu) \ | ||
| 106 | GET_BITS(lll,uuu,8) | ||
| 107 | |||
| 108 | #define GET_BIT(lll,uuu) \ | ||
| 109 | GET_BITS(lll,uuu,1) | ||
| 110 | |||
| 111 | /*---------------------------------------------------*/ | ||
| 112 | #define GET_MTF_VAL(label1,label2,lval) \ | ||
| 113 | { \ | ||
| 114 | if (groupPos == 0) { \ | ||
| 115 | groupNo++; \ | ||
| 116 | groupPos = BZ_G_SIZE; \ | ||
| 117 | gSel = s->selector[groupNo]; \ | ||
| 118 | gMinlen = s->minLens[gSel]; \ | ||
| 119 | gLimit = &(s->limit[gSel][0]); \ | ||
| 120 | gPerm = &(s->perm[gSel][0]); \ | ||
| 121 | gBase = &(s->base[gSel][0]); \ | ||
| 122 | } \ | ||
| 123 | groupPos--; \ | ||
| 124 | zn = gMinlen; \ | ||
| 125 | GET_BITS(label1, zvec, zn); \ | ||
| 126 | while (zvec > gLimit[zn]) { \ | ||
| 127 | zn++; \ | ||
| 128 | GET_BIT(label2, zj); \ | ||
| 129 | zvec = (zvec << 1) | zj; \ | ||
| 130 | }; \ | ||
| 131 | lval = gPerm[zvec - gBase[zn]]; \ | ||
| 132 | } | ||
| 133 | |||
| 134 | |||
| 135 | /*---------------------------------------------------*/ | ||
| 136 | Int32 decompress ( DState* s ) | ||
| 137 | { | ||
| 138 | UChar uc; | ||
| 139 | Int32 retVal; | ||
| 140 | Int32 minLen, maxLen; | ||
| 141 | bz_stream* strm = s->strm; | ||
| 142 | |||
| 143 | /* stuff that needs to be saved/restored */ | ||
| 144 | Int32 i ; | ||
| 145 | Int32 j; | ||
| 146 | Int32 t; | ||
| 147 | Int32 alphaSize; | ||
| 148 | Int32 nGroups; | ||
| 149 | Int32 nSelectors; | ||
| 150 | Int32 EOB; | ||
| 151 | Int32 groupNo; | ||
| 152 | Int32 groupPos; | ||
| 153 | Int32 nextSym; | ||
| 154 | Int32 nblockMAX; | ||
| 155 | Int32 nblock; | ||
| 156 | Int32 es; | ||
| 157 | Int32 N; | ||
| 158 | Int32 curr; | ||
| 159 | Int32 zt; | ||
| 160 | Int32 zn; | ||
| 161 | Int32 zvec; | ||
| 162 | Int32 zj; | ||
| 163 | Int32 gSel; | ||
| 164 | Int32 gMinlen; | ||
| 165 | Int32* gLimit; | ||
| 166 | Int32* gBase; | ||
| 167 | Int32* gPerm; | ||
| 168 | |||
| 169 | if (s->state == BZ_X_MAGIC_1) { | ||
| 170 | /*initialise the save area*/ | ||
| 171 | s->save_i = 0; | ||
| 172 | s->save_j = 0; | ||
| 173 | s->save_t = 0; | ||
| 174 | s->save_alphaSize = 0; | ||
| 175 | s->save_nGroups = 0; | ||
| 176 | s->save_nSelectors = 0; | ||
| 177 | s->save_EOB = 0; | ||
| 178 | s->save_groupNo = 0; | ||
| 179 | s->save_groupPos = 0; | ||
| 180 | s->save_nextSym = 0; | ||
| 181 | s->save_nblockMAX = 0; | ||
| 182 | s->save_nblock = 0; | ||
| 183 | s->save_es = 0; | ||
| 184 | s->save_N = 0; | ||
| 185 | s->save_curr = 0; | ||
| 186 | s->save_zt = 0; | ||
| 187 | s->save_zn = 0; | ||
| 188 | s->save_zvec = 0; | ||
| 189 | s->save_zj = 0; | ||
| 190 | s->save_gSel = 0; | ||
| 191 | s->save_gMinlen = 0; | ||
| 192 | s->save_gLimit = NULL; | ||
| 193 | s->save_gBase = NULL; | ||
| 194 | s->save_gPerm = NULL; | ||
| 195 | } | ||
| 196 | |||
| 197 | /*restore from the save area*/ | ||
| 198 | i = s->save_i; | ||
| 199 | j = s->save_j; | ||
| 200 | t = s->save_t; | ||
| 201 | alphaSize = s->save_alphaSize; | ||
| 202 | nGroups = s->save_nGroups; | ||
| 203 | nSelectors = s->save_nSelectors; | ||
| 204 | EOB = s->save_EOB; | ||
| 205 | groupNo = s->save_groupNo; | ||
| 206 | groupPos = s->save_groupPos; | ||
| 207 | nextSym = s->save_nextSym; | ||
| 208 | nblockMAX = s->save_nblockMAX; | ||
| 209 | nblock = s->save_nblock; | ||
| 210 | es = s->save_es; | ||
| 211 | N = s->save_N; | ||
| 212 | curr = s->save_curr; | ||
| 213 | zt = s->save_zt; | ||
| 214 | zn = s->save_zn; | ||
| 215 | zvec = s->save_zvec; | ||
| 216 | zj = s->save_zj; | ||
| 217 | gSel = s->save_gSel; | ||
| 218 | gMinlen = s->save_gMinlen; | ||
| 219 | gLimit = s->save_gLimit; | ||
| 220 | gBase = s->save_gBase; | ||
| 221 | gPerm = s->save_gPerm; | ||
| 222 | |||
| 223 | retVal = BZ_OK; | ||
| 224 | |||
| 225 | switch (s->state) { | ||
| 226 | |||
| 227 | GET_UCHAR(BZ_X_MAGIC_1, uc); | ||
| 228 | if (uc != 'B') RETURN(BZ_DATA_ERROR_MAGIC); | ||
| 229 | |||
| 230 | GET_UCHAR(BZ_X_MAGIC_2, uc); | ||
| 231 | if (uc != 'Z') RETURN(BZ_DATA_ERROR_MAGIC); | ||
| 232 | |||
| 233 | GET_UCHAR(BZ_X_MAGIC_3, uc) | ||
| 234 | if (uc != 'h') RETURN(BZ_DATA_ERROR_MAGIC); | ||
| 235 | |||
| 236 | GET_BITS(BZ_X_MAGIC_4, s->blockSize100k, 8) | ||
| 237 | if (s->blockSize100k < '1' || | ||
| 238 | s->blockSize100k > '9') RETURN(BZ_DATA_ERROR_MAGIC); | ||
| 239 | s->blockSize100k -= '0'; | ||
| 240 | |||
| 241 | if (s->smallDecompress) { | ||
| 242 | s->ll16 = BZALLOC( s->blockSize100k * 100000 * sizeof(UInt16) ); | ||
| 243 | s->ll4 = BZALLOC( | ||
| 244 | ((1 + s->blockSize100k * 100000) >> 1) * sizeof(UChar) | ||
| 245 | ); | ||
| 246 | if (s->ll16 == NULL || s->ll4 == NULL) RETURN(BZ_MEM_ERROR); | ||
| 247 | } else { | ||
| 248 | s->tt = BZALLOC( s->blockSize100k * 100000 * sizeof(Int32) ); | ||
| 249 | if (s->tt == NULL) RETURN(BZ_MEM_ERROR); | ||
| 250 | } | ||
| 251 | |||
| 252 | GET_UCHAR(BZ_X_BLKHDR_1, uc); | ||
| 253 | |||
| 254 | if (uc == 0x17) goto endhdr_2; | ||
| 255 | if (uc != 0x31) RETURN(BZ_DATA_ERROR); | ||
| 256 | GET_UCHAR(BZ_X_BLKHDR_2, uc); | ||
| 257 | if (uc != 0x41) RETURN(BZ_DATA_ERROR); | ||
| 258 | GET_UCHAR(BZ_X_BLKHDR_3, uc); | ||
| 259 | if (uc != 0x59) RETURN(BZ_DATA_ERROR); | ||
| 260 | GET_UCHAR(BZ_X_BLKHDR_4, uc); | ||
| 261 | if (uc != 0x26) RETURN(BZ_DATA_ERROR); | ||
| 262 | GET_UCHAR(BZ_X_BLKHDR_5, uc); | ||
| 263 | if (uc != 0x53) RETURN(BZ_DATA_ERROR); | ||
| 264 | GET_UCHAR(BZ_X_BLKHDR_6, uc); | ||
| 265 | if (uc != 0x59) RETURN(BZ_DATA_ERROR); | ||
| 266 | |||
| 267 | s->currBlockNo++; | ||
| 268 | if (s->verbosity >= 2) | ||
| 269 | VPrintf1 ( "\n [%d: huff+mtf ", s->currBlockNo ); | ||
| 270 | |||
| 271 | s->storedBlockCRC = 0; | ||
| 272 | GET_UCHAR(BZ_X_BCRC_1, uc); | ||
| 273 | s->storedBlockCRC = (s->storedBlockCRC << 8) | ((UInt32)uc); | ||
| 274 | GET_UCHAR(BZ_X_BCRC_2, uc); | ||
| 275 | s->storedBlockCRC = (s->storedBlockCRC << 8) | ((UInt32)uc); | ||
| 276 | GET_UCHAR(BZ_X_BCRC_3, uc); | ||
| 277 | s->storedBlockCRC = (s->storedBlockCRC << 8) | ((UInt32)uc); | ||
| 278 | GET_UCHAR(BZ_X_BCRC_4, uc); | ||
| 279 | s->storedBlockCRC = (s->storedBlockCRC << 8) | ((UInt32)uc); | ||
| 280 | |||
| 281 | GET_BITS(BZ_X_RANDBIT, s->blockRandomised, 1); | ||
| 282 | |||
| 283 | s->origPtr = 0; | ||
| 284 | GET_UCHAR(BZ_X_ORIGPTR_1, uc); | ||
| 285 | s->origPtr = (s->origPtr << 8) | ((Int32)uc); | ||
| 286 | GET_UCHAR(BZ_X_ORIGPTR_2, uc); | ||
| 287 | s->origPtr = (s->origPtr << 8) | ((Int32)uc); | ||
| 288 | GET_UCHAR(BZ_X_ORIGPTR_3, uc); | ||
| 289 | s->origPtr = (s->origPtr << 8) | ((Int32)uc); | ||
| 290 | |||
| 291 | /*--- Receive the mapping table ---*/ | ||
| 292 | for (i = 0; i < 16; i++) { | ||
| 293 | GET_BIT(BZ_X_MAPPING_1, uc); | ||
| 294 | if (uc == 1) | ||
| 295 | s->inUse16[i] = True; else | ||
| 296 | s->inUse16[i] = False; | ||
| 297 | } | ||
| 298 | |||
| 299 | for (i = 0; i < 256; i++) s->inUse[i] = False; | ||
| 300 | |||
| 301 | for (i = 0; i < 16; i++) | ||
| 302 | if (s->inUse16[i]) | ||
| 303 | for (j = 0; j < 16; j++) { | ||
| 304 | GET_BIT(BZ_X_MAPPING_2, uc); | ||
| 305 | if (uc == 1) s->inUse[i * 16 + j] = True; | ||
| 306 | } | ||
| 307 | makeMaps_d ( s ); | ||
| 308 | alphaSize = s->nInUse+2; | ||
| 309 | |||
| 310 | /*--- Now the selectors ---*/ | ||
| 311 | GET_BITS(BZ_X_SELECTOR_1, nGroups, 3); | ||
| 312 | GET_BITS(BZ_X_SELECTOR_2, nSelectors, 15); | ||
| 313 | for (i = 0; i < nSelectors; i++) { | ||
| 314 | j = 0; | ||
| 315 | while (True) { | ||
| 316 | GET_BIT(BZ_X_SELECTOR_3, uc); | ||
| 317 | if (uc == 0) break; | ||
| 318 | j++; | ||
| 319 | if (j > 5) RETURN(BZ_DATA_ERROR); | ||
| 320 | } | ||
| 321 | s->selectorMtf[i] = j; | ||
| 322 | } | ||
| 323 | |||
| 324 | /*--- Undo the MTF values for the selectors. ---*/ | ||
| 325 | { | ||
| 326 | UChar pos[BZ_N_GROUPS], tmp, v; | ||
| 327 | for (v = 0; v < nGroups; v++) pos[v] = v; | ||
| 328 | |||
| 329 | for (i = 0; i < nSelectors; i++) { | ||
| 330 | v = s->selectorMtf[i]; | ||
| 331 | tmp = pos[v]; | ||
| 332 | while (v > 0) { pos[v] = pos[v-1]; v--; } | ||
| 333 | pos[0] = tmp; | ||
| 334 | s->selector[i] = tmp; | ||
| 335 | } | ||
| 336 | } | ||
| 337 | |||
| 338 | /*--- Now the coding tables ---*/ | ||
| 339 | for (t = 0; t < nGroups; t++) { | ||
| 340 | GET_BITS(BZ_X_CODING_1, curr, 5); | ||
| 341 | for (i = 0; i < alphaSize; i++) { | ||
| 342 | while (True) { | ||
| 343 | if (curr < 1 || curr > 20) RETURN(BZ_DATA_ERROR); | ||
| 344 | GET_BIT(BZ_X_CODING_2, uc); | ||
| 345 | if (uc == 0) break; | ||
| 346 | GET_BIT(BZ_X_CODING_3, uc); | ||
| 347 | if (uc == 0) curr++; else curr--; | ||
| 348 | } | ||
| 349 | s->len[t][i] = curr; | ||
| 350 | } | ||
| 351 | } | ||
| 352 | |||
| 353 | /*--- Create the Huffman decoding tables ---*/ | ||
| 354 | for (t = 0; t < nGroups; t++) { | ||
| 355 | minLen = 32; | ||
| 356 | maxLen = 0; | ||
| 357 | for (i = 0; i < alphaSize; i++) { | ||
| 358 | if (s->len[t][i] > maxLen) maxLen = s->len[t][i]; | ||
| 359 | if (s->len[t][i] < minLen) minLen = s->len[t][i]; | ||
| 360 | } | ||
| 361 | hbCreateDecodeTables ( | ||
| 362 | &(s->limit[t][0]), | ||
| 363 | &(s->base[t][0]), | ||
| 364 | &(s->perm[t][0]), | ||
| 365 | &(s->len[t][0]), | ||
| 366 | minLen, maxLen, alphaSize | ||
| 367 | ); | ||
| 368 | s->minLens[t] = minLen; | ||
| 369 | } | ||
| 370 | |||
| 371 | /*--- Now the MTF values ---*/ | ||
| 372 | |||
| 373 | EOB = s->nInUse+1; | ||
| 374 | nblockMAX = 100000 * s->blockSize100k; | ||
| 375 | groupNo = -1; | ||
| 376 | groupPos = 0; | ||
| 377 | |||
| 378 | for (i = 0; i <= 255; i++) s->unzftab[i] = 0; | ||
| 379 | |||
| 380 | /*-- MTF init --*/ | ||
| 381 | { | ||
| 382 | Int32 ii, jj, kk; | ||
| 383 | kk = MTFA_SIZE-1; | ||
| 384 | for (ii = 256 / MTFL_SIZE - 1; ii >= 0; ii--) { | ||
| 385 | for (jj = MTFL_SIZE-1; jj >= 0; jj--) { | ||
| 386 | s->mtfa[kk] = (UChar)(ii * MTFL_SIZE + jj); | ||
| 387 | kk--; | ||
| 388 | } | ||
| 389 | s->mtfbase[ii] = kk + 1; | ||
| 390 | } | ||
| 391 | } | ||
| 392 | /*-- end MTF init --*/ | ||
| 393 | |||
| 394 | nblock = 0; | ||
| 395 | |||
| 396 | GET_MTF_VAL(BZ_X_MTF_1, BZ_X_MTF_2, nextSym); | ||
| 397 | |||
| 398 | while (True) { | ||
| 399 | |||
| 400 | if (nextSym == EOB) break; | ||
| 401 | |||
| 402 | if (nextSym == BZ_RUNA || nextSym == BZ_RUNB) { | ||
| 403 | |||
| 404 | es = -1; | ||
| 405 | N = 1; | ||
| 406 | do { | ||
| 407 | if (nextSym == BZ_RUNA) es = es + (0+1) * N; else | ||
| 408 | if (nextSym == BZ_RUNB) es = es + (1+1) * N; | ||
| 409 | N = N * 2; | ||
| 410 | GET_MTF_VAL(BZ_X_MTF_3, BZ_X_MTF_4, nextSym); | ||
| 411 | } | ||
| 412 | while (nextSym == BZ_RUNA || nextSym == BZ_RUNB); | ||
| 413 | |||
| 414 | es++; | ||
| 415 | uc = s->seqToUnseq[ s->mtfa[s->mtfbase[0]] ]; | ||
| 416 | s->unzftab[uc] += es; | ||
| 417 | |||
| 418 | if (s->smallDecompress) | ||
| 419 | while (es > 0) { | ||
| 420 | s->ll16[nblock] = (UInt16)uc; | ||
| 421 | nblock++; | ||
| 422 | es--; | ||
| 423 | } | ||
| 424 | else | ||
| 425 | while (es > 0) { | ||
| 426 | s->tt[nblock] = (UInt32)uc; | ||
| 427 | nblock++; | ||
| 428 | es--; | ||
| 429 | }; | ||
| 430 | |||
| 431 | if (nblock > nblockMAX) RETURN(BZ_DATA_ERROR); | ||
| 432 | continue; | ||
| 433 | |||
| 434 | } else { | ||
| 435 | |||
| 436 | if (nblock > nblockMAX) RETURN(BZ_DATA_ERROR); | ||
| 437 | |||
| 438 | /*-- uc = MTF ( nextSym-1 ) --*/ | ||
| 439 | { | ||
| 440 | Int32 ii, jj, kk, pp, lno, off; | ||
| 441 | UInt32 nn; | ||
| 442 | nn = (UInt32)(nextSym - 1); | ||
| 443 | |||
| 444 | if (nn < MTFL_SIZE) { | ||
| 445 | /* avoid general-case expense */ | ||
| 446 | pp = s->mtfbase[0]; | ||
| 447 | uc = s->mtfa[pp+nn]; | ||
| 448 | while (nn > 3) { | ||
| 449 | Int32 z = pp+nn; | ||
| 450 | s->mtfa[(z) ] = s->mtfa[(z)-1]; | ||
| 451 | s->mtfa[(z)-1] = s->mtfa[(z)-2]; | ||
| 452 | s->mtfa[(z)-2] = s->mtfa[(z)-3]; | ||
| 453 | s->mtfa[(z)-3] = s->mtfa[(z)-4]; | ||
| 454 | nn -= 4; | ||
| 455 | } | ||
| 456 | while (nn > 0) { | ||
| 457 | s->mtfa[(pp+nn)] = s->mtfa[(pp+nn)-1]; nn--; | ||
| 458 | }; | ||
| 459 | s->mtfa[pp] = uc; | ||
| 460 | } else { | ||
| 461 | /* general case */ | ||
| 462 | lno = nn / MTFL_SIZE; | ||
| 463 | off = nn % MTFL_SIZE; | ||
| 464 | pp = s->mtfbase[lno] + off; | ||
| 465 | uc = s->mtfa[pp]; | ||
| 466 | while (pp > s->mtfbase[lno]) { | ||
| 467 | s->mtfa[pp] = s->mtfa[pp-1]; pp--; | ||
| 468 | }; | ||
| 469 | s->mtfbase[lno]++; | ||
| 470 | while (lno > 0) { | ||
| 471 | s->mtfbase[lno]--; | ||
| 472 | s->mtfa[s->mtfbase[lno]] | ||
| 473 | = s->mtfa[s->mtfbase[lno-1] + MTFL_SIZE - 1]; | ||
| 474 | lno--; | ||
| 475 | } | ||
| 476 | s->mtfbase[0]--; | ||
| 477 | s->mtfa[s->mtfbase[0]] = uc; | ||
| 478 | if (s->mtfbase[0] == 0) { | ||
| 479 | kk = MTFA_SIZE-1; | ||
| 480 | for (ii = 256 / MTFL_SIZE-1; ii >= 0; ii--) { | ||
| 481 | for (jj = MTFL_SIZE-1; jj >= 0; jj--) { | ||
| 482 | s->mtfa[kk] = s->mtfa[s->mtfbase[ii] + jj]; | ||
| 483 | kk--; | ||
| 484 | } | ||
| 485 | s->mtfbase[ii] = kk + 1; | ||
| 486 | } | ||
| 487 | } | ||
| 488 | } | ||
| 489 | } | ||
| 490 | /*-- end uc = MTF ( nextSym-1 ) --*/ | ||
| 491 | |||
| 492 | s->unzftab[s->seqToUnseq[uc]]++; | ||
| 493 | if (s->smallDecompress) | ||
| 494 | s->ll16[nblock] = (UInt16)(s->seqToUnseq[uc]); else | ||
| 495 | s->tt[nblock] = (UInt32)(s->seqToUnseq[uc]); | ||
| 496 | nblock++; | ||
| 497 | |||
| 498 | GET_MTF_VAL(BZ_X_MTF_5, BZ_X_MTF_6, nextSym); | ||
| 499 | continue; | ||
| 500 | } | ||
| 501 | } | ||
| 502 | |||
| 503 | s->state_out_len = 0; | ||
| 504 | s->state_out_ch = 0; | ||
| 505 | BZ_INITIALISE_CRC ( s->calculatedBlockCRC ); | ||
| 506 | s->state = BZ_X_OUTPUT; | ||
| 507 | if (s->verbosity >= 2) VPrintf0 ( "rt+rld" ); | ||
| 508 | |||
| 509 | /*-- Set up cftab to facilitate generation of T^(-1) --*/ | ||
| 510 | s->cftab[0] = 0; | ||
| 511 | for (i = 1; i <= 256; i++) s->cftab[i] = s->unzftab[i-1]; | ||
| 512 | for (i = 1; i <= 256; i++) s->cftab[i] += s->cftab[i-1]; | ||
| 513 | |||
| 514 | if (s->smallDecompress) { | ||
| 515 | |||
| 516 | /*-- Make a copy of cftab, used in generation of T --*/ | ||
| 517 | for (i = 0; i <= 256; i++) s->cftabCopy[i] = s->cftab[i]; | ||
| 518 | |||
| 519 | /*-- compute the T vector --*/ | ||
| 520 | for (i = 0; i < nblock; i++) { | ||
| 521 | uc = (UChar)(s->ll16[i]); | ||
| 522 | SET_LL(i, s->cftabCopy[uc]); | ||
| 523 | s->cftabCopy[uc]++; | ||
| 524 | } | ||
| 525 | |||
| 526 | /*-- Compute T^(-1) by pointer reversal on T --*/ | ||
| 527 | i = s->origPtr; | ||
| 528 | j = GET_LL(i); | ||
| 529 | do { | ||
| 530 | Int32 tmp = GET_LL(j); | ||
| 531 | SET_LL(j, i); | ||
| 532 | i = j; | ||
| 533 | j = tmp; | ||
| 534 | } | ||
| 535 | while (i != s->origPtr); | ||
| 536 | |||
| 537 | s->tPos = s->origPtr; | ||
| 538 | s->nblock_used = 0; | ||
| 539 | if (s->blockRandomised) { | ||
| 540 | BZ_RAND_INIT_MASK; | ||
| 541 | BZ_GET_SMALL(s->k0); s->nblock_used++; | ||
| 542 | BZ_RAND_UPD_MASK; s->k0 ^= BZ_RAND_MASK; | ||
| 543 | } else { | ||
| 544 | BZ_GET_SMALL(s->k0); s->nblock_used++; | ||
| 545 | } | ||
| 546 | |||
| 547 | } else { | ||
| 548 | |||
| 549 | /*-- compute the T^(-1) vector --*/ | ||
| 550 | for (i = 0; i < nblock; i++) { | ||
| 551 | uc = (UChar)(s->tt[i] & 0xff); | ||
| 552 | s->tt[s->cftab[uc]] |= (i << 8); | ||
| 553 | s->cftab[uc]++; | ||
| 554 | } | ||
| 555 | |||
| 556 | s->tPos = s->tt[s->origPtr] >> 8; | ||
| 557 | s->nblock_used = 0; | ||
| 558 | if (s->blockRandomised) { | ||
| 559 | BZ_RAND_INIT_MASK; | ||
| 560 | BZ_GET_FAST(s->k0); s->nblock_used++; | ||
| 561 | BZ_RAND_UPD_MASK; s->k0 ^= BZ_RAND_MASK; | ||
| 562 | } else { | ||
| 563 | BZ_GET_FAST(s->k0); s->nblock_used++; | ||
| 564 | } | ||
| 565 | |||
| 566 | } | ||
| 567 | |||
| 568 | RETURN(BZ_OK); | ||
| 569 | |||
| 570 | |||
| 571 | |||
| 572 | endhdr_2: | ||
| 573 | |||
| 574 | GET_UCHAR(BZ_X_ENDHDR_2, uc); | ||
| 575 | if (uc != 0x72) RETURN(BZ_DATA_ERROR); | ||
| 576 | GET_UCHAR(BZ_X_ENDHDR_3, uc); | ||
| 577 | if (uc != 0x45) RETURN(BZ_DATA_ERROR); | ||
| 578 | GET_UCHAR(BZ_X_ENDHDR_4, uc); | ||
| 579 | if (uc != 0x38) RETURN(BZ_DATA_ERROR); | ||
| 580 | GET_UCHAR(BZ_X_ENDHDR_5, uc); | ||
| 581 | if (uc != 0x50) RETURN(BZ_DATA_ERROR); | ||
| 582 | GET_UCHAR(BZ_X_ENDHDR_6, uc); | ||
| 583 | if (uc != 0x90) RETURN(BZ_DATA_ERROR); | ||
| 584 | |||
| 585 | s->storedCombinedCRC = 0; | ||
| 586 | GET_UCHAR(BZ_X_CCRC_1, uc); | ||
| 587 | s->storedCombinedCRC = (s->storedCombinedCRC << 8) | ((UInt32)uc); | ||
| 588 | GET_UCHAR(BZ_X_CCRC_2, uc); | ||
| 589 | s->storedCombinedCRC = (s->storedCombinedCRC << 8) | ((UInt32)uc); | ||
| 590 | GET_UCHAR(BZ_X_CCRC_3, uc); | ||
| 591 | s->storedCombinedCRC = (s->storedCombinedCRC << 8) | ((UInt32)uc); | ||
| 592 | GET_UCHAR(BZ_X_CCRC_4, uc); | ||
| 593 | s->storedCombinedCRC = (s->storedCombinedCRC << 8) | ((UInt32)uc); | ||
| 594 | |||
| 595 | s->state = BZ_X_IDLE; | ||
| 596 | RETURN(BZ_STREAM_END); | ||
| 597 | |||
| 598 | default: AssertH ( False, 4001 ); | ||
| 599 | } | ||
| 600 | |||
| 601 | AssertH ( False, 4002 ); | ||
| 602 | |||
| 603 | save_state_and_return: | ||
| 604 | |||
| 605 | s->save_i = i; | ||
| 606 | s->save_j = j; | ||
| 607 | s->save_t = t; | ||
| 608 | s->save_alphaSize = alphaSize; | ||
| 609 | s->save_nGroups = nGroups; | ||
| 610 | s->save_nSelectors = nSelectors; | ||
| 611 | s->save_EOB = EOB; | ||
| 612 | s->save_groupNo = groupNo; | ||
| 613 | s->save_groupPos = groupPos; | ||
| 614 | s->save_nextSym = nextSym; | ||
| 615 | s->save_nblockMAX = nblockMAX; | ||
| 616 | s->save_nblock = nblock; | ||
| 617 | s->save_es = es; | ||
| 618 | s->save_N = N; | ||
| 619 | s->save_curr = curr; | ||
| 620 | s->save_zt = zt; | ||
| 621 | s->save_zn = zn; | ||
| 622 | s->save_zvec = zvec; | ||
| 623 | s->save_zj = zj; | ||
| 624 | s->save_gSel = gSel; | ||
| 625 | s->save_gMinlen = gMinlen; | ||
| 626 | s->save_gLimit = gLimit; | ||
| 627 | s->save_gBase = gBase; | ||
| 628 | s->save_gPerm = gPerm; | ||
| 629 | |||
| 630 | return retVal; | ||
| 631 | } | ||
| 632 | |||
| 633 | |||
| 634 | /*-------------------------------------------------------------*/ | ||
| 635 | /*--- end decompress.c ---*/ | ||
| 636 | /*-------------------------------------------------------------*/ | ||
diff --git a/dlltest.c b/dlltest.c new file mode 100644 index 0000000..ee81bcd --- /dev/null +++ b/dlltest.c | |||
| @@ -0,0 +1,163 @@ | |||
| 1 | /* | ||
| 2 | minibz2 | ||
| 3 | libbz2.dll test program. | ||
| 4 | by Yoshioka Tsuneo(QWF00133@nifty.ne.jp/tsuneo-y@is.aist-nara.ac.jp) | ||
| 5 | This file is Public Domain. | ||
| 6 | welcome any email to me. | ||
| 7 | |||
| 8 | usage: minibz2 [-d] [-{1,2,..9}] [[srcfilename] destfilename] | ||
| 9 | */ | ||
| 10 | |||
| 11 | #define BZ_IMPORT | ||
| 12 | #include "bzlib.h" | ||
| 13 | #include <stdio.h> | ||
| 14 | #include <stdlib.h> | ||
| 15 | #ifdef _WIN32 | ||
| 16 | #include <io.h> | ||
| 17 | #endif | ||
| 18 | |||
| 19 | |||
| 20 | #ifdef _WIN32 | ||
| 21 | |||
| 22 | #include <windows.h> | ||
| 23 | static int BZ2DLLLoaded = 0; | ||
| 24 | static HINSTANCE BZ2DLLhLib; | ||
| 25 | int BZ2DLLLoadLibrary(void) | ||
| 26 | { | ||
| 27 | HINSTANCE hLib; | ||
| 28 | |||
| 29 | if(BZ2DLLLoaded==1){return 0;} | ||
| 30 | hLib=LoadLibrary("libbz2.dll"); | ||
| 31 | if(hLib == NULL){ | ||
| 32 | puts("Can't load libbz2.dll"); | ||
| 33 | return -1; | ||
| 34 | } | ||
| 35 | BZ2DLLLoaded=1; | ||
| 36 | BZ2DLLhLib=hLib; | ||
| 37 | bzlibVersion=GetProcAddress(hLib,"bzlibVersion"); | ||
| 38 | bzopen=GetProcAddress(hLib,"bzopen"); | ||
| 39 | bzdopen=GetProcAddress(hLib,"bzdopen"); | ||
| 40 | bzread=GetProcAddress(hLib,"bzread"); | ||
| 41 | bzwrite=GetProcAddress(hLib,"bzwrite"); | ||
| 42 | bzflush=GetProcAddress(hLib,"bzflush"); | ||
| 43 | bzclose=GetProcAddress(hLib,"bzclose"); | ||
| 44 | bzerror=GetProcAddress(hLib,"bzerror"); | ||
| 45 | return 0; | ||
| 46 | |||
| 47 | } | ||
| 48 | int BZ2DLLFreeLibrary(void) | ||
| 49 | { | ||
| 50 | if(BZ2DLLLoaded==0){return 0;} | ||
| 51 | FreeLibrary(BZ2DLLhLib); | ||
| 52 | BZ2DLLLoaded=0; | ||
| 53 | } | ||
| 54 | #endif /* WIN32 */ | ||
| 55 | |||
| 56 | void usage(void) | ||
| 57 | { | ||
| 58 | puts("usage: minibz2 [-d] [-{1,2,..9}] [[srcfilename] destfilename]"); | ||
| 59 | } | ||
| 60 | |||
| 61 | void main(int argc,char *argv[]) | ||
| 62 | { | ||
| 63 | int decompress = 0; | ||
| 64 | int level = 9; | ||
| 65 | char *fn_r,*fn_w; | ||
| 66 | |||
| 67 | #ifdef _WIN32 | ||
| 68 | if(BZ2DLLLoadLibrary()<0){ | ||
| 69 | puts("can't load dll"); | ||
| 70 | exit(1); | ||
| 71 | } | ||
| 72 | #endif | ||
| 73 | while(++argv,--argc){ | ||
| 74 | if(**argv =='-' || **argv=='/'){ | ||
| 75 | char *p; | ||
| 76 | |||
| 77 | for(p=*argv+1;*p;p++){ | ||
| 78 | if(*p=='d'){ | ||
| 79 | decompress = 1; | ||
| 80 | }else if('1'<=*p && *p<='9'){ | ||
| 81 | level = *p - '0'; | ||
| 82 | }else{ | ||
| 83 | usage(); | ||
| 84 | exit(1); | ||
| 85 | } | ||
| 86 | } | ||
| 87 | }else{ | ||
| 88 | break; | ||
| 89 | } | ||
| 90 | } | ||
| 91 | if(argc>=1){ | ||
| 92 | fn_r = *argv; | ||
| 93 | argc--;argv++; | ||
| 94 | }else{ | ||
| 95 | fn_r = NULL; | ||
| 96 | } | ||
| 97 | if(argc>=1){ | ||
| 98 | fn_w = *argv; | ||
| 99 | argc--;argv++; | ||
| 100 | }else{ | ||
| 101 | fn_w = NULL; | ||
| 102 | } | ||
| 103 | { | ||
| 104 | int len; | ||
| 105 | char buff[0x1000]; | ||
| 106 | char mode[10]; | ||
| 107 | |||
| 108 | if(decompress){ | ||
| 109 | BZFILE *BZ2fp_r; | ||
| 110 | FILE *fp_w; | ||
| 111 | |||
| 112 | if(fn_w){ | ||
| 113 | if((fp_w = fopen(fn_w,"wb"))==NULL){ | ||
| 114 | printf("can't open [%s]\n",fn_w); | ||
| 115 | perror("reason:"); | ||
| 116 | exit(1); | ||
| 117 | } | ||
| 118 | }else{ | ||
| 119 | fp_w = stdout; | ||
| 120 | } | ||
| 121 | if((BZ2fp_r == NULL && (BZ2fp_r = bzdopen(fileno(stdin),"rb"))==NULL) | ||
| 122 | || (BZ2fp_r != NULL && (BZ2fp_r = bzopen(fn_r,"rb"))==NULL)){ | ||
| 123 | printf("can't bz2openstream\n"); | ||
| 124 | exit(1); | ||
| 125 | } | ||
| 126 | while((len=bzread(BZ2fp_r,buff,0x1000))>0){ | ||
| 127 | fwrite(buff,1,len,fp_w); | ||
| 128 | } | ||
| 129 | bzclose(BZ2fp_r); | ||
| 130 | if(fp_w != stdout) fclose(fp_w); | ||
| 131 | }else{ | ||
| 132 | BZFILE *BZ2fp_w; | ||
| 133 | FILE *fp_r; | ||
| 134 | |||
| 135 | if(fn_r){ | ||
| 136 | if((fp_r = fopen(fn_r,"rb"))==NULL){ | ||
| 137 | printf("can't open [%s]\n",fn_r); | ||
| 138 | perror("reason:"); | ||
| 139 | exit(1); | ||
| 140 | } | ||
| 141 | }else{ | ||
| 142 | fp_r = stdin; | ||
| 143 | } | ||
| 144 | mode[0]='w'; | ||
| 145 | mode[1] = '0' + level; | ||
| 146 | mode[2] = '\0'; | ||
| 147 | |||
| 148 | if((fn_w == NULL && (BZ2fp_w = bzdopen(fileno(stdout),mode))==NULL) | ||
| 149 | || (fn_w !=NULL && (BZ2fp_w = bzopen(fn_w,mode))==NULL)){ | ||
| 150 | printf("can't bz2openstream\n"); | ||
| 151 | exit(1); | ||
| 152 | } | ||
| 153 | while((len=fread(buff,1,0x1000,fp_r))>0){ | ||
| 154 | bzwrite(BZ2fp_w,buff,len); | ||
| 155 | } | ||
| 156 | bzclose(BZ2fp_w); | ||
| 157 | if(fp_r!=stdin)fclose(fp_r); | ||
| 158 | } | ||
| 159 | } | ||
| 160 | #ifdef _WIN32 | ||
| 161 | BZ2DLLFreeLibrary(); | ||
| 162 | #endif | ||
| 163 | } | ||
diff --git a/dlltest.dsp b/dlltest.dsp new file mode 100644 index 0000000..4b1615e --- /dev/null +++ b/dlltest.dsp | |||
| @@ -0,0 +1,93 @@ | |||
| 1 | # Microsoft Developer Studio Project File - Name="dlltest" - Package Owner=<4> | ||
| 2 | # Microsoft Developer Studio Generated Build File, Format Version 5.00 | ||
| 3 | # ** •ÒW‚µ‚È‚¢‚Å‚‚¾‚³‚¢ ** | ||
| 4 | |||
| 5 | # TARGTYPE "Win32 (x86) Console Application" 0x0103 | ||
| 6 | |||
| 7 | CFG=dlltest - Win32 Debug | ||
| 8 | !MESSAGE ‚±‚ê‚Í—LŒø‚ÈÒ²¸Ì§²Ù‚ł͂ ‚è‚Ü‚¹‚ñB ‚±‚ÌÌßÛ¼Þª¸Ä‚ðËÞÙÄÞ‚·‚邽‚ß‚É‚Í NMAKE ‚ðŽg—p‚µ‚Ä‚‚¾‚³‚¢B | ||
| 9 | !MESSAGE [Ò²¸Ì§²Ù‚Ì´¸½Îß°Ä] ºÏÝÄÞ‚ðŽg—p‚µ‚ÄŽÀs‚µ‚Ä‚‚¾‚³‚¢ | ||
| 10 | !MESSAGE | ||
| 11 | !MESSAGE NMAKE /f "dlltest.mak". | ||
| 12 | !MESSAGE | ||
| 13 | !MESSAGE NMAKE ‚ÌŽÀsŽž‚É\¬‚ðŽw’è‚Å‚«‚Ü‚· | ||
| 14 | !MESSAGE ºÏÝÄÞ ×²Ýã‚ÅϸۂÌÝ’è‚ð’è‹`‚µ‚Ü‚·B—á: | ||
| 15 | !MESSAGE | ||
| 16 | !MESSAGE NMAKE /f "dlltest.mak" CFG="dlltest - Win32 Debug" | ||
| 17 | !MESSAGE | ||
| 18 | !MESSAGE ‘I‘ð‰Â”\‚ÈËÞÙÄÞ Ó°ÄÞ: | ||
| 19 | !MESSAGE | ||
| 20 | !MESSAGE "dlltest - Win32 Release" ("Win32 (x86) Console Application" —p) | ||
| 21 | !MESSAGE "dlltest - Win32 Debug" ("Win32 (x86) Console Application" —p) | ||
| 22 | !MESSAGE | ||
| 23 | |||
| 24 | # Begin Project | ||
| 25 | # PROP Scc_ProjName "" | ||
| 26 | # PROP Scc_LocalPath "" | ||
| 27 | CPP=cl.exe | ||
| 28 | RSC=rc.exe | ||
| 29 | |||
| 30 | !IF "$(CFG)" == "dlltest - Win32 Release" | ||
| 31 | |||
| 32 | # PROP BASE Use_MFC 0 | ||
| 33 | # PROP BASE Use_Debug_Libraries 0 | ||
| 34 | # PROP BASE Output_Dir "Release" | ||
| 35 | # PROP BASE Intermediate_Dir "Release" | ||
| 36 | # PROP BASE Target_Dir "" | ||
| 37 | # PROP Use_MFC 0 | ||
| 38 | # PROP Use_Debug_Libraries 0 | ||
| 39 | # PROP Output_Dir "Release" | ||
| 40 | # PROP Intermediate_Dir "Release" | ||
| 41 | # PROP Ignore_Export_Lib 0 | ||
| 42 | # PROP Target_Dir "" | ||
| 43 | # ADD BASE CPP /nologo /W3 /GX /O2 /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /c | ||
| 44 | # ADD CPP /nologo /W3 /GX /O2 /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /c | ||
| 45 | # ADD BASE RSC /l 0x411 /d "NDEBUG" | ||
| 46 | # ADD RSC /l 0x411 /d "NDEBUG" | ||
| 47 | BSC32=bscmake.exe | ||
| 48 | # ADD BASE BSC32 /nologo | ||
| 49 | # ADD BSC32 /nologo | ||
| 50 | LINK32=link.exe | ||
| 51 | # ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /machine:I386 | ||
| 52 | # ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /machine:I386 /out:"minibz2.exe" | ||
| 53 | |||
| 54 | !ELSEIF "$(CFG)" == "dlltest - Win32 Debug" | ||
| 55 | |||
| 56 | # PROP BASE Use_MFC 0 | ||
| 57 | # PROP BASE Use_Debug_Libraries 1 | ||
| 58 | # PROP BASE Output_Dir "dlltest_" | ||
| 59 | # PROP BASE Intermediate_Dir "dlltest_" | ||
| 60 | # PROP BASE Target_Dir "" | ||
| 61 | # PROP Use_MFC 0 | ||
| 62 | # PROP Use_Debug_Libraries 1 | ||
| 63 | # PROP Output_Dir "dlltest_" | ||
| 64 | # PROP Intermediate_Dir "dlltest_" | ||
| 65 | # PROP Ignore_Export_Lib 0 | ||
| 66 | # PROP Target_Dir "" | ||
| 67 | # ADD BASE CPP /nologo /W3 /Gm /GX /Zi /Od /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /c | ||
| 68 | # ADD CPP /nologo /W3 /Gm /GX /Zi /Od /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /c | ||
| 69 | # ADD BASE RSC /l 0x411 /d "_DEBUG" | ||
| 70 | # ADD RSC /l 0x411 /d "_DEBUG" | ||
| 71 | BSC32=bscmake.exe | ||
| 72 | # ADD BASE BSC32 /nologo | ||
| 73 | # ADD BSC32 /nologo | ||
| 74 | LINK32=link.exe | ||
| 75 | # ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /debug /machine:I386 /pdbtype:sept | ||
| 76 | # ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /debug /machine:I386 /out:"minibz2.exe" /pdbtype:sept | ||
| 77 | |||
| 78 | !ENDIF | ||
| 79 | |||
| 80 | # Begin Target | ||
| 81 | |||
| 82 | # Name "dlltest - Win32 Release" | ||
| 83 | # Name "dlltest - Win32 Debug" | ||
| 84 | # Begin Source File | ||
| 85 | |||
| 86 | SOURCE=.\bzlib.h | ||
| 87 | # End Source File | ||
| 88 | # Begin Source File | ||
| 89 | |||
| 90 | SOURCE=.\dlltest.c | ||
| 91 | # End Source File | ||
| 92 | # End Target | ||
| 93 | # End Project | ||
diff --git a/howbig.c b/howbig.c new file mode 100644 index 0000000..9f2ad7c --- /dev/null +++ b/howbig.c | |||
| @@ -0,0 +1,37 @@ | |||
| 1 | |||
| 2 | #include <stdio.h> | ||
| 3 | #include <assert.h> | ||
| 4 | #include "bzlib.h" | ||
| 5 | |||
| 6 | unsigned char ibuff[1000000]; | ||
| 7 | unsigned char obuff[1000000]; | ||
| 8 | |||
| 9 | void doone ( int n ) | ||
| 10 | { | ||
| 11 | int i, j, k, q, nobuff; | ||
| 12 | q = 0; | ||
| 13 | |||
| 14 | for (k = 0; k < 1; k++) { | ||
| 15 | for (i = 0; i < n; i++) | ||
| 16 | ibuff[i] = ((unsigned long)(random())) & 0xff; | ||
| 17 | nobuff = 1000000; | ||
| 18 | j = bzBuffToBuffCompress ( obuff, &nobuff, ibuff, n, 9,0,0 ); | ||
| 19 | assert (j == BZ_OK); | ||
| 20 | if (nobuff > q) q = nobuff; | ||
| 21 | } | ||
| 22 | printf ( "%d %d(%d)\n", n, q, (int)((float)n * 1.01 - (float)q) ); | ||
| 23 | } | ||
| 24 | |||
| 25 | int main ( int argc, char** argv ) | ||
| 26 | { | ||
| 27 | int i; | ||
| 28 | i = 0; | ||
| 29 | while (1) { | ||
| 30 | if (i >= 900000) break; | ||
| 31 | doone(i); | ||
| 32 | if ( (int)(1.10 * i) > i ) | ||
| 33 | i = (int)(1.10 * i); else i++; | ||
| 34 | } | ||
| 35 | |||
| 36 | return 0; | ||
| 37 | } \ No newline at end of file | ||
diff --git a/huffman.c b/huffman.c new file mode 100644 index 0000000..8254990 --- /dev/null +++ b/huffman.c | |||
| @@ -0,0 +1,228 @@ | |||
| 1 | |||
| 2 | /*-------------------------------------------------------------*/ | ||
| 3 | /*--- Huffman coding low-level stuff ---*/ | ||
| 4 | /*--- huffman.c ---*/ | ||
| 5 | /*-------------------------------------------------------------*/ | ||
| 6 | |||
| 7 | /*-- | ||
| 8 | This file is a part of bzip2 and/or libbzip2, a program and | ||
| 9 | library for lossless, block-sorting data compression. | ||
| 10 | |||
| 11 | Copyright (C) 1996-1998 Julian R Seward. All rights reserved. | ||
| 12 | |||
| 13 | Redistribution and use in source and binary forms, with or without | ||
| 14 | modification, are permitted provided that the following conditions | ||
| 15 | are met: | ||
| 16 | |||
| 17 | 1. Redistributions of source code must retain the above copyright | ||
| 18 | notice, this list of conditions and the following disclaimer. | ||
| 19 | |||
| 20 | 2. The origin of this software must not be misrepresented; you must | ||
| 21 | not claim that you wrote the original software. If you use this | ||
| 22 | software in a product, an acknowledgment in the product | ||
| 23 | documentation would be appreciated but is not required. | ||
| 24 | |||
| 25 | 3. Altered source versions must be plainly marked as such, and must | ||
| 26 | not be misrepresented as being the original software. | ||
| 27 | |||
| 28 | 4. The name of the author may not be used to endorse or promote | ||
| 29 | products derived from this software without specific prior written | ||
| 30 | permission. | ||
| 31 | |||
| 32 | THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS | ||
| 33 | OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED | ||
| 34 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE | ||
| 35 | ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY | ||
| 36 | DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL | ||
| 37 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE | ||
| 38 | GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS | ||
| 39 | INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, | ||
| 40 | WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING | ||
| 41 | NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS | ||
| 42 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | ||
| 43 | |||
| 44 | Julian Seward, Guildford, Surrey, UK. | ||
| 45 | jseward@acm.org | ||
| 46 | bzip2/libbzip2 version 0.9.0c of 18 October 1998 | ||
| 47 | |||
| 48 | This program is based on (at least) the work of: | ||
| 49 | Mike Burrows | ||
| 50 | David Wheeler | ||
| 51 | Peter Fenwick | ||
| 52 | Alistair Moffat | ||
| 53 | Radford Neal | ||
| 54 | Ian H. Witten | ||
| 55 | Robert Sedgewick | ||
| 56 | Jon L. Bentley | ||
| 57 | |||
| 58 | For more information on these sources, see the manual. | ||
| 59 | --*/ | ||
| 60 | |||
| 61 | |||
| 62 | #include "bzlib_private.h" | ||
| 63 | |||
| 64 | /*---------------------------------------------------*/ | ||
| 65 | #define WEIGHTOF(zz0) ((zz0) & 0xffffff00) | ||
| 66 | #define DEPTHOF(zz1) ((zz1) & 0x000000ff) | ||
| 67 | #define MYMAX(zz2,zz3) ((zz2) > (zz3) ? (zz2) : (zz3)) | ||
| 68 | |||
| 69 | #define ADDWEIGHTS(zw1,zw2) \ | ||
| 70 | (WEIGHTOF(zw1)+WEIGHTOF(zw2)) | \ | ||
| 71 | (1 + MYMAX(DEPTHOF(zw1),DEPTHOF(zw2))) | ||
| 72 | |||
| 73 | #define UPHEAP(z) \ | ||
| 74 | { \ | ||
| 75 | Int32 zz, tmp; \ | ||
| 76 | zz = z; tmp = heap[zz]; \ | ||
| 77 | while (weight[tmp] < weight[heap[zz >> 1]]) { \ | ||
| 78 | heap[zz] = heap[zz >> 1]; \ | ||
| 79 | zz >>= 1; \ | ||
| 80 | } \ | ||
| 81 | heap[zz] = tmp; \ | ||
| 82 | } | ||
| 83 | |||
| 84 | #define DOWNHEAP(z) \ | ||
| 85 | { \ | ||
| 86 | Int32 zz, yy, tmp; \ | ||
| 87 | zz = z; tmp = heap[zz]; \ | ||
| 88 | while (True) { \ | ||
| 89 | yy = zz << 1; \ | ||
| 90 | if (yy > nHeap) break; \ | ||
| 91 | if (yy < nHeap && \ | ||
| 92 | weight[heap[yy+1]] < weight[heap[yy]]) \ | ||
| 93 | yy++; \ | ||
| 94 | if (weight[tmp] < weight[heap[yy]]) break; \ | ||
| 95 | heap[zz] = heap[yy]; \ | ||
| 96 | zz = yy; \ | ||
| 97 | } \ | ||
| 98 | heap[zz] = tmp; \ | ||
| 99 | } | ||
| 100 | |||
| 101 | |||
| 102 | /*---------------------------------------------------*/ | ||
| 103 | void hbMakeCodeLengths ( UChar *len, | ||
| 104 | Int32 *freq, | ||
| 105 | Int32 alphaSize, | ||
| 106 | Int32 maxLen ) | ||
| 107 | { | ||
| 108 | /*-- | ||
| 109 | Nodes and heap entries run from 1. Entry 0 | ||
| 110 | for both the heap and nodes is a sentinel. | ||
| 111 | --*/ | ||
| 112 | Int32 nNodes, nHeap, n1, n2, i, j, k; | ||
| 113 | Bool tooLong; | ||
| 114 | |||
| 115 | Int32 heap [ BZ_MAX_ALPHA_SIZE + 2 ]; | ||
| 116 | Int32 weight [ BZ_MAX_ALPHA_SIZE * 2 ]; | ||
| 117 | Int32 parent [ BZ_MAX_ALPHA_SIZE * 2 ]; | ||
| 118 | |||
| 119 | for (i = 0; i < alphaSize; i++) | ||
| 120 | weight[i+1] = (freq[i] == 0 ? 1 : freq[i]) << 8; | ||
| 121 | |||
| 122 | while (True) { | ||
| 123 | |||
| 124 | nNodes = alphaSize; | ||
| 125 | nHeap = 0; | ||
| 126 | |||
| 127 | heap[0] = 0; | ||
| 128 | weight[0] = 0; | ||
| 129 | parent[0] = -2; | ||
| 130 | |||
| 131 | for (i = 1; i <= alphaSize; i++) { | ||
| 132 | parent[i] = -1; | ||
| 133 | nHeap++; | ||
| 134 | heap[nHeap] = i; | ||
| 135 | UPHEAP(nHeap); | ||
| 136 | } | ||
| 137 | |||
| 138 | AssertH( nHeap < (BZ_MAX_ALPHA_SIZE+2), 2001 ); | ||
| 139 | |||
| 140 | while (nHeap > 1) { | ||
| 141 | n1 = heap[1]; heap[1] = heap[nHeap]; nHeap--; DOWNHEAP(1); | ||
| 142 | n2 = heap[1]; heap[1] = heap[nHeap]; nHeap--; DOWNHEAP(1); | ||
| 143 | nNodes++; | ||
| 144 | parent[n1] = parent[n2] = nNodes; | ||
| 145 | weight[nNodes] = ADDWEIGHTS(weight[n1], weight[n2]); | ||
| 146 | parent[nNodes] = -1; | ||
| 147 | nHeap++; | ||
| 148 | heap[nHeap] = nNodes; | ||
| 149 | UPHEAP(nHeap); | ||
| 150 | } | ||
| 151 | |||
| 152 | AssertH( nNodes < (BZ_MAX_ALPHA_SIZE * 2), 2002 ); | ||
| 153 | |||
| 154 | tooLong = False; | ||
| 155 | for (i = 1; i <= alphaSize; i++) { | ||
| 156 | j = 0; | ||
| 157 | k = i; | ||
| 158 | while (parent[k] >= 0) { k = parent[k]; j++; } | ||
| 159 | len[i-1] = j; | ||
| 160 | if (j > maxLen) tooLong = True; | ||
| 161 | } | ||
| 162 | |||
| 163 | if (! tooLong) break; | ||
| 164 | |||
| 165 | for (i = 1; i < alphaSize; i++) { | ||
| 166 | j = weight[i] >> 8; | ||
| 167 | j = 1 + (j / 2); | ||
| 168 | weight[i] = j << 8; | ||
| 169 | } | ||
| 170 | } | ||
| 171 | } | ||
| 172 | |||
| 173 | |||
| 174 | /*---------------------------------------------------*/ | ||
| 175 | void hbAssignCodes ( Int32 *code, | ||
| 176 | UChar *length, | ||
| 177 | Int32 minLen, | ||
| 178 | Int32 maxLen, | ||
| 179 | Int32 alphaSize ) | ||
| 180 | { | ||
| 181 | Int32 n, vec, i; | ||
| 182 | |||
| 183 | vec = 0; | ||
| 184 | for (n = minLen; n <= maxLen; n++) { | ||
| 185 | for (i = 0; i < alphaSize; i++) | ||
| 186 | if (length[i] == n) { code[i] = vec; vec++; }; | ||
| 187 | vec <<= 1; | ||
| 188 | } | ||
| 189 | } | ||
| 190 | |||
| 191 | |||
| 192 | /*---------------------------------------------------*/ | ||
| 193 | void hbCreateDecodeTables ( Int32 *limit, | ||
| 194 | Int32 *base, | ||
| 195 | Int32 *perm, | ||
| 196 | UChar *length, | ||
| 197 | Int32 minLen, | ||
| 198 | Int32 maxLen, | ||
| 199 | Int32 alphaSize ) | ||
| 200 | { | ||
| 201 | Int32 pp, i, j, vec; | ||
| 202 | |||
| 203 | pp = 0; | ||
| 204 | for (i = minLen; i <= maxLen; i++) | ||
| 205 | for (j = 0; j < alphaSize; j++) | ||
| 206 | if (length[j] == i) { perm[pp] = j; pp++; }; | ||
| 207 | |||
| 208 | for (i = 0; i < BZ_MAX_CODE_LEN; i++) base[i] = 0; | ||
| 209 | for (i = 0; i < alphaSize; i++) base[length[i]+1]++; | ||
| 210 | |||
| 211 | for (i = 1; i < BZ_MAX_CODE_LEN; i++) base[i] += base[i-1]; | ||
| 212 | |||
| 213 | for (i = 0; i < BZ_MAX_CODE_LEN; i++) limit[i] = 0; | ||
| 214 | vec = 0; | ||
| 215 | |||
| 216 | for (i = minLen; i <= maxLen; i++) { | ||
| 217 | vec += (base[i+1] - base[i]); | ||
| 218 | limit[i] = vec-1; | ||
| 219 | vec <<= 1; | ||
| 220 | } | ||
| 221 | for (i = minLen + 1; i <= maxLen; i++) | ||
| 222 | base[i] = ((limit[i-1] + 1) << 1) - base[i]; | ||
| 223 | } | ||
| 224 | |||
| 225 | |||
| 226 | /*-------------------------------------------------------------*/ | ||
| 227 | /*--- end huffman.c ---*/ | ||
| 228 | /*-------------------------------------------------------------*/ | ||
diff --git a/libbz2.def b/libbz2.def new file mode 100644 index 0000000..ba0f54e --- /dev/null +++ b/libbz2.def | |||
| @@ -0,0 +1,25 @@ | |||
| 1 | LIBRARY LIBBZ2 | ||
| 2 | DESCRIPTION "libbzip2: library for data compression" | ||
| 3 | EXPORTS | ||
| 4 | bzCompressInit | ||
| 5 | bzCompress | ||
| 6 | bzCompressEnd | ||
| 7 | bzDecompressInit | ||
| 8 | bzDecompress | ||
| 9 | bzDecompressEnd | ||
| 10 | bzReadOpen | ||
| 11 | bzReadClose | ||
| 12 | bzReadGetUnused | ||
| 13 | bzRead | ||
| 14 | bzWriteOpen | ||
| 15 | bzWrite | ||
| 16 | bzWriteClose | ||
| 17 | bzBuffToBuffCompress | ||
| 18 | bzBuffToBuffDecompress | ||
| 19 | bzlibVersion | ||
| 20 | bzopen | ||
| 21 | bzdopen | ||
| 22 | bzread | ||
| 23 | bzwrite | ||
| 24 | bzflush | ||
| 25 | bzclose | ||
diff --git a/libbz2.dsp b/libbz2.dsp new file mode 100644 index 0000000..a21a20f --- /dev/null +++ b/libbz2.dsp | |||
| @@ -0,0 +1,130 @@ | |||
| 1 | # Microsoft Developer Studio Project File - Name="libbz2" - Package Owner=<4> | ||
| 2 | # Microsoft Developer Studio Generated Build File, Format Version 5.00 | ||
| 3 | # ** •ÒW‚µ‚È‚¢‚Å‚‚¾‚³‚¢ ** | ||
| 4 | |||
| 5 | # TARGTYPE "Win32 (x86) Dynamic-Link Library" 0x0102 | ||
| 6 | |||
| 7 | CFG=libbz2 - Win32 Debug | ||
| 8 | !MESSAGE ‚±‚ê‚Í—LŒø‚ÈÒ²¸Ì§²Ù‚ł͂ ‚è‚Ü‚¹‚ñB ‚±‚ÌÌßÛ¼Þª¸Ä‚ðËÞÙÄÞ‚·‚邽‚ß‚É‚Í NMAKE ‚ðŽg—p‚µ‚Ä‚‚¾‚³‚¢B | ||
| 9 | !MESSAGE [Ò²¸Ì§²Ù‚Ì´¸½Îß°Ä] ºÏÝÄÞ‚ðŽg—p‚µ‚ÄŽÀs‚µ‚Ä‚‚¾‚³‚¢ | ||
| 10 | !MESSAGE | ||
| 11 | !MESSAGE NMAKE /f "libbz2.mak". | ||
| 12 | !MESSAGE | ||
| 13 | !MESSAGE NMAKE ‚ÌŽÀsŽž‚É\¬‚ðŽw’è‚Å‚«‚Ü‚· | ||
| 14 | !MESSAGE ºÏÝÄÞ ×²Ýã‚ÅϸۂÌÝ’è‚ð’è‹`‚µ‚Ü‚·B—á: | ||
| 15 | !MESSAGE | ||
| 16 | !MESSAGE NMAKE /f "libbz2.mak" CFG="libbz2 - Win32 Debug" | ||
| 17 | !MESSAGE | ||
| 18 | !MESSAGE ‘I‘ð‰Â”\‚ÈËÞÙÄÞ Ó°ÄÞ: | ||
| 19 | !MESSAGE | ||
| 20 | !MESSAGE "libbz2 - Win32 Release" ("Win32 (x86) Dynamic-Link Library" —p) | ||
| 21 | !MESSAGE "libbz2 - Win32 Debug" ("Win32 (x86) Dynamic-Link Library" —p) | ||
| 22 | !MESSAGE | ||
| 23 | |||
| 24 | # Begin Project | ||
| 25 | # PROP Scc_ProjName "" | ||
| 26 | # PROP Scc_LocalPath "" | ||
| 27 | CPP=cl.exe | ||
| 28 | MTL=midl.exe | ||
| 29 | RSC=rc.exe | ||
| 30 | |||
| 31 | !IF "$(CFG)" == "libbz2 - Win32 Release" | ||
| 32 | |||
| 33 | # PROP BASE Use_MFC 0 | ||
| 34 | # PROP BASE Use_Debug_Libraries 0 | ||
| 35 | # PROP BASE Output_Dir "Release" | ||
| 36 | # PROP BASE Intermediate_Dir "Release" | ||
| 37 | # PROP BASE Target_Dir "" | ||
| 38 | # PROP Use_MFC 0 | ||
| 39 | # PROP Use_Debug_Libraries 0 | ||
| 40 | # PROP Output_Dir "Release" | ||
| 41 | # PROP Intermediate_Dir "Release" | ||
| 42 | # PROP Ignore_Export_Lib 0 | ||
| 43 | # PROP Target_Dir "" | ||
| 44 | # ADD BASE CPP /nologo /MT /W3 /GX /O2 /D "WIN32" /D "NDEBUG" /D "_WINDOWS" /YX /FD /c | ||
| 45 | # ADD CPP /nologo /MT /W3 /GX /O2 /D "WIN32" /D "NDEBUG" /D "_WINDOWS" /YX /FD /c | ||
| 46 | # ADD BASE MTL /nologo /D "NDEBUG" /mktyplib203 /o NUL /win32 | ||
| 47 | # ADD MTL /nologo /D "NDEBUG" /mktyplib203 /o NUL /win32 | ||
| 48 | # ADD BASE RSC /l 0x411 /d "NDEBUG" | ||
| 49 | # ADD RSC /l 0x411 /d "NDEBUG" | ||
| 50 | BSC32=bscmake.exe | ||
| 51 | # ADD BASE BSC32 /nologo | ||
| 52 | # ADD BSC32 /nologo | ||
| 53 | LINK32=link.exe | ||
| 54 | # ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:windows /dll /machine:I386 | ||
| 55 | # ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:windows /dll /machine:I386 /out:"libbz2.dll" | ||
| 56 | |||
| 57 | !ELSEIF "$(CFG)" == "libbz2 - Win32 Debug" | ||
| 58 | |||
| 59 | # PROP BASE Use_MFC 0 | ||
| 60 | # PROP BASE Use_Debug_Libraries 1 | ||
| 61 | # PROP BASE Output_Dir "Debug" | ||
| 62 | # PROP BASE Intermediate_Dir "Debug" | ||
| 63 | # PROP BASE Target_Dir "" | ||
| 64 | # PROP Use_MFC 0 | ||
| 65 | # PROP Use_Debug_Libraries 1 | ||
| 66 | # PROP Output_Dir "Debug" | ||
| 67 | # PROP Intermediate_Dir "Debug" | ||
| 68 | # PROP Ignore_Export_Lib 0 | ||
| 69 | # PROP Target_Dir "" | ||
| 70 | # ADD BASE CPP /nologo /MTd /W3 /Gm /GX /Zi /Od /D "WIN32" /D "_DEBUG" /D "_WINDOWS" /YX /FD /c | ||
| 71 | # ADD CPP /nologo /MTd /W3 /Gm /GX /Zi /Od /D "WIN32" /D "_DEBUG" /D "_WINDOWS" /YX /FD /c | ||
| 72 | # ADD BASE MTL /nologo /D "_DEBUG" /mktyplib203 /o NUL /win32 | ||
| 73 | # ADD MTL /nologo /D "_DEBUG" /mktyplib203 /o NUL /win32 | ||
| 74 | # ADD BASE RSC /l 0x411 /d "_DEBUG" | ||
| 75 | # ADD RSC /l 0x411 /d "_DEBUG" | ||
| 76 | BSC32=bscmake.exe | ||
| 77 | # ADD BASE BSC32 /nologo | ||
| 78 | # ADD BSC32 /nologo | ||
| 79 | LINK32=link.exe | ||
| 80 | # ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:windows /dll /debug /machine:I386 /pdbtype:sept | ||
| 81 | # ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:windows /dll /debug /machine:I386 /out:"libbz2.dll" /pdbtype:sept | ||
| 82 | |||
| 83 | !ENDIF | ||
| 84 | |||
| 85 | # Begin Target | ||
| 86 | |||
| 87 | # Name "libbz2 - Win32 Release" | ||
| 88 | # Name "libbz2 - Win32 Debug" | ||
| 89 | # Begin Source File | ||
| 90 | |||
| 91 | SOURCE=.\blocksort.c | ||
| 92 | # End Source File | ||
| 93 | # Begin Source File | ||
| 94 | |||
| 95 | SOURCE=.\bzlib.c | ||
| 96 | # End Source File | ||
| 97 | # Begin Source File | ||
| 98 | |||
| 99 | SOURCE=.\bzlib.h | ||
| 100 | # End Source File | ||
| 101 | # Begin Source File | ||
| 102 | |||
| 103 | SOURCE=.\bzlib_private.h | ||
| 104 | # End Source File | ||
| 105 | # Begin Source File | ||
| 106 | |||
| 107 | SOURCE=.\compress.c | ||
| 108 | # End Source File | ||
| 109 | # Begin Source File | ||
| 110 | |||
| 111 | SOURCE=.\crctable.c | ||
| 112 | # End Source File | ||
| 113 | # Begin Source File | ||
| 114 | |||
| 115 | SOURCE=.\decompress.c | ||
| 116 | # End Source File | ||
| 117 | # Begin Source File | ||
| 118 | |||
| 119 | SOURCE=.\huffman.c | ||
| 120 | # End Source File | ||
| 121 | # Begin Source File | ||
| 122 | |||
| 123 | SOURCE=.\libbz2.def | ||
| 124 | # End Source File | ||
| 125 | # Begin Source File | ||
| 126 | |||
| 127 | SOURCE=.\randtable.c | ||
| 128 | # End Source File | ||
| 129 | # End Target | ||
| 130 | # End Project | ||
diff --git a/manual.texi b/manual.texi new file mode 100644 index 0000000..99ce661 --- /dev/null +++ b/manual.texi | |||
| @@ -0,0 +1,2100 @@ | |||
| 1 | \input texinfo @c -*- Texinfo -*- | ||
| 2 | @setfilename bzip2.info | ||
| 3 | |||
| 4 | @ignore | ||
| 5 | This file documents bzip2 version 0.9.0c, and associated library | ||
| 6 | libbzip2, written by Julian Seward (jseward@acm.org). | ||
| 7 | |||
| 8 | Copyright (C) 1996-1998 Julian R Seward | ||
| 9 | |||
| 10 | Permission is granted to make and distribute verbatim copies of | ||
| 11 | this manual provided the copyright notice and this permission notice | ||
| 12 | are preserved on all copies. | ||
| 13 | |||
| 14 | Permission is granted to copy and distribute translations of this manual | ||
| 15 | into another language, under the above conditions for verbatim copies. | ||
| 16 | @end ignore | ||
| 17 | |||
| 18 | @ifinfo | ||
| 19 | @format | ||
| 20 | START-INFO-DIR-ENTRY | ||
| 21 | * Bzip2: (bzip2). A program and library for data compression. | ||
| 22 | END-INFO-DIR-ENTRY | ||
| 23 | @end format | ||
| 24 | |||
| 25 | @end ifinfo | ||
| 26 | |||
| 27 | @iftex | ||
| 28 | @c @finalout | ||
| 29 | @settitle bzip2 and libbzip2 | ||
| 30 | @titlepage | ||
| 31 | @title bzip2 and libbzip2 | ||
| 32 | @subtitle a program and library for data compression | ||
| 33 | @subtitle copyright (C) 1996-1998 Julian Seward | ||
| 34 | @subtitle version 0.9.0c of 18 October 1998 | ||
| 35 | @author Julian Seward | ||
| 36 | |||
| 37 | @end titlepage | ||
| 38 | @end iftex | ||
| 39 | |||
| 40 | |||
| 41 | @parindent 0mm | ||
| 42 | @parskip 2mm | ||
| 43 | |||
| 44 | |||
| 45 | This program, @code{bzip2}, | ||
| 46 | and associated library @code{libbzip2}, are | ||
| 47 | Copyright (C) 1996-1998 Julian R Seward. All rights reserved. | ||
| 48 | |||
| 49 | Redistribution and use in source and binary forms, with or without | ||
| 50 | modification, are permitted provided that the following conditions | ||
| 51 | are met: | ||
| 52 | @itemize @bullet | ||
| 53 | @item | ||
| 54 | Redistributions of source code must retain the above copyright | ||
| 55 | notice, this list of conditions and the following disclaimer. | ||
| 56 | @item | ||
| 57 | The origin of this software must not be misrepresented; you must | ||
| 58 | not claim that you wrote the original software. If you use this | ||
| 59 | software in a product, an acknowledgment in the product | ||
| 60 | documentation would be appreciated but is not required. | ||
| 61 | @item | ||
| 62 | Altered source versions must be plainly marked as such, and must | ||
| 63 | not be misrepresented as being the original software. | ||
| 64 | @item | ||
| 65 | The name of the author may not be used to endorse or promote | ||
| 66 | products derived from this software without specific prior written | ||
| 67 | permission. | ||
| 68 | @end itemize | ||
| 69 | THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS | ||
| 70 | OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED | ||
| 71 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE | ||
| 72 | ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY | ||
| 73 | DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL | ||
| 74 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE | ||
| 75 | GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS | ||
| 76 | INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, | ||
| 77 | WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING | ||
| 78 | NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS | ||
| 79 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | ||
| 80 | |||
| 81 | Julian Seward, Guildford, Surrey, UK. | ||
| 82 | |||
| 83 | @code{jseward@@acm.org} | ||
| 84 | |||
| 85 | @code{http://www.muraroa.demon.co.uk} | ||
| 86 | |||
| 87 | @code{bzip2}/@code{libbzip2} version 0.9.0c of 18 October 1998. | ||
| 88 | |||
| 89 | PATENTS: To the best of my knowledge, @code{bzip2} does not use any patented | ||
| 90 | algorithms. However, I do not have the resources available to carry out | ||
| 91 | a full patent search. Therefore I cannot give any guarantee of the | ||
| 92 | above statement. | ||
| 93 | |||
| 94 | |||
| 95 | |||
| 96 | |||
| 97 | |||
| 98 | |||
| 99 | |||
| 100 | @node Overview, Implementation, Top, Top | ||
| 101 | @chapter Introduction | ||
| 102 | |||
| 103 | @code{bzip2} compresses files using the Burrows-Wheeler | ||
| 104 | block-sorting text compression algorithm, and Huffman coding. | ||
| 105 | Compression is generally considerably better than that | ||
| 106 | achieved by more conventional LZ77/LZ78-based compressors, | ||
| 107 | and approaches the performance of the PPM family of statistical compressors. | ||
| 108 | |||
| 109 | @code{bzip2} is built on top of @code{libbzip2}, a flexible library | ||
| 110 | for handling compressed data in the @code{bzip2} format. This manual | ||
| 111 | describes both how to use the program and | ||
| 112 | how to work with the library interface. Most of the | ||
| 113 | manual is devoted to this library, not the program, | ||
| 114 | which is good news if your interest is only in the program. | ||
| 115 | |||
| 116 | Chapter 2 describes how to use @code{bzip2}; this is the only part | ||
| 117 | you need to read if you just want to know how to operate the program. | ||
| 118 | Chapter 3 describes the programming interfaces in detail, and | ||
| 119 | Chapter 4 records some miscellaneous notes which I thought | ||
| 120 | ought to be recorded somewhere. | ||
| 121 | |||
| 122 | |||
| 123 | @chapter How to use @code{bzip2} | ||
| 124 | |||
| 125 | This chapter contains a copy of the @code{bzip2} man page, | ||
| 126 | and nothing else. | ||
| 127 | @example | ||
| 128 | NAME | ||
| 129 | bzip2, bunzip2 - a block-sorting file compressor, v0.9.0 | ||
| 130 | bzcat - decompresses files to stdout | ||
| 131 | bzip2recover - recovers data from damaged bzip2 files | ||
| 132 | |||
| 133 | |||
| 134 | SYNOPSIS | ||
| 135 | bzip2 [ -cdfkstvzVL123456789 ] [ filenames ... ] | ||
| 136 | bunzip2 [ -fkvsVL ] [ filenames ... ] | ||
| 137 | bzcat [ -s ] [ filenames ... ] | ||
| 138 | bzip2recover filename | ||
| 139 | |||
| 140 | |||
| 141 | DESCRIPTION | ||
| 142 | bzip2 compresses files using the Burrows-Wheeler block- | ||
| 143 | sorting text compression algorithm, and Huffman coding. | ||
| 144 | Compression is generally considerably better than that | ||
| 145 | achieved by more conventional LZ77/LZ78-based compressors, | ||
| 146 | and approaches the performance of the PPM family of sta- | ||
| 147 | tistical compressors. | ||
| 148 | |||
| 149 | The command-line options are deliberately very similar to | ||
| 150 | those of GNU Gzip, but they are not identical. | ||
| 151 | |||
| 152 | bzip2 expects a list of file names to accompany the com- | ||
| 153 | mand-line flags. Each file is replaced by a compressed | ||
| 154 | version of itself, with the name "original_name.bz2". | ||
| 155 | Each compressed file has the same modification date and | ||
| 156 | permissions as the corresponding original, so that these | ||
| 157 | properties can be correctly restored at decompression | ||
| 158 | time. File name handling is naive in the sense that there | ||
| 159 | is no mechanism for preserving original file names, per- | ||
| 160 | missions and dates in filesystems which lack these con- | ||
| 161 | cepts, or have serious file name length restrictions, such | ||
| 162 | as MS-DOS. | ||
| 163 | |||
| 164 | bzip2 and bunzip2 will by default not overwrite existing | ||
| 165 | files; if you want this to happen, specify the -f flag. | ||
| 166 | |||
| 167 | If no file names are specified, bzip2 compresses from | ||
| 168 | standard input to standard output. In this case, bzip2 | ||
| 169 | will decline to write compressed output to a terminal, as | ||
| 170 | this would be entirely incomprehensible and therefore | ||
| 171 | pointless. | ||
| 172 | |||
| 173 | bunzip2 (or bzip2 -d ) decompresses and restores all spec- | ||
| 174 | ified files whose names end in ".bz2". Files without this | ||
| 175 | suffix are ignored. Again, supplying no filenames causes | ||
| 176 | decompression from standard input to standard output. | ||
| 177 | |||
| 178 | bunzip2 will correctly decompress a file which is the con- | ||
| 179 | catenation of two or more compressed files. The result is | ||
| 180 | the concatenation of the corresponding uncompressed files. | ||
| 181 | Integrity testing (-t) of concatenated compressed files is | ||
| 182 | also supported. | ||
| 183 | |||
| 184 | You can also compress or decompress files to the standard | ||
| 185 | output by giving the -c flag. Multiple files may be com- | ||
| 186 | pressed and decompressed like this. The resulting outputs | ||
| 187 | are fed sequentially to stdout. Compression of multiple | ||
| 188 | files in this manner generates a stream containing multi- | ||
| 189 | ple compressed file representations. Such a stream can be | ||
| 190 | decompressed correctly only by bzip2 version 0.9.0 or | ||
| 191 | later. Earlier versions of bzip2 will stop after decom- | ||
| 192 | pressing the first file in the stream. | ||
| 193 | |||
| 194 | bzcat (or bzip2 -dc ) decompresses all specified files to | ||
| 195 | the standard output. | ||
| 196 | |||
| 197 | Compression is always performed, even if the compressed | ||
| 198 | file is slightly larger than the original. Files of less | ||
| 199 | than about one hundred bytes tend to get larger, since the | ||
| 200 | compression mechanism has a constant overhead in the | ||
| 201 | region of 50 bytes. Random data (including the output of | ||
| 202 | most file compressors) is coded at about 8.05 bits per | ||
| 203 | byte, giving an expansion of around 0.5%. | ||
| 204 | |||
| 205 | As a self-check for your protection, bzip2 uses 32-bit | ||
| 206 | CRCs to make sure that the decompressed version of a file | ||
| 207 | is identical to the original. This guards against corrup- | ||
| 208 | tion of the compressed data, and against undetected bugs | ||
| 209 | in bzip2 (hopefully very unlikely). The chances of data | ||
| 210 | corruption going undetected is microscopic, about one | ||
| 211 | chance in four billion for each file processed. Be aware, | ||
| 212 | though, that the check occurs upon decompression, so it | ||
| 213 | can only tell you that that something is wrong. It can't | ||
| 214 | help you recover the original uncompressed data. You can | ||
| 215 | use bzip2recover to try to recover data from damaged | ||
| 216 | files. | ||
| 217 | |||
| 218 | Return values: 0 for a normal exit, 1 for environmental | ||
| 219 | problems (file not found, invalid flags, I/O errors, &c), | ||
| 220 | 2 to indicate a corrupt compressed file, 3 for an internal | ||
| 221 | consistency error (eg, bug) which caused bzip2 to panic. | ||
| 222 | |||
| 223 | |||
| 224 | MEMORY MANAGEMENT | ||
| 225 | Bzip2 compresses large files in blocks. The block size | ||
| 226 | affects both the compression ratio achieved, and the | ||
| 227 | amount of memory needed both for compression and decom- | ||
| 228 | pression. The flags -1 through -9 specify the block size | ||
| 229 | to be 100,000 bytes through 900,000 bytes (the default) | ||
| 230 | respectively. At decompression-time, the block size used | ||
| 231 | for compression is read from the header of the compressed | ||
| 232 | file, and bunzip2 then allocates itself just enough memory | ||
| 233 | to decompress the file. Since block sizes are stored in | ||
| 234 | compressed files, it follows that the flags -1 to -9 are | ||
| 235 | irrelevant to and so ignored during decompression. | ||
| 236 | |||
| 237 | Compression and decompression requirements, in bytes, can | ||
| 238 | be estimated as: | ||
| 239 | |||
| 240 | Compression: 400k + ( 7 x block size ) | ||
| 241 | |||
| 242 | Decompression: 100k + ( 4 x block size ), or | ||
| 243 | 100k + ( 2.5 x block size ) | ||
| 244 | |||
| 245 | Larger block sizes give rapidly diminishing marginal | ||
| 246 | returns; most of the compression comes from the first two | ||
| 247 | or three hundred k of block size, a fact worth bearing in | ||
| 248 | mind when using bzip2 on small machines. It is also | ||
| 249 | important to appreciate that the decompression memory | ||
| 250 | requirement is set at compression-time by the choice of | ||
| 251 | block size. | ||
| 252 | |||
| 253 | For files compressed with the default 900k block size, | ||
| 254 | bunzip2 will require about 3700 kbytes to decompress. To | ||
| 255 | support decompression of any file on a 4 megabyte machine, | ||
| 256 | bunzip2 has an option to decompress using approximately | ||
| 257 | half this amount of memory, about 2300 kbytes. Decompres- | ||
| 258 | sion speed is also halved, so you should use this option | ||
| 259 | only where necessary. The relevant flag is -s. | ||
| 260 | |||
| 261 | In general, try and use the largest block size memory con- | ||
| 262 | straints allow, since that maximises the compression | ||
| 263 | achieved. Compression and decompression speed are virtu- | ||
| 264 | ally unaffected by block size. | ||
| 265 | |||
| 266 | Another significant point applies to files which fit in a | ||
| 267 | single block -- that means most files you'd encounter | ||
| 268 | using a large block size. The amount of real memory | ||
| 269 | touched is proportional to the size of the file, since the | ||
| 270 | file is smaller than a block. For example, compressing a | ||
| 271 | file 20,000 bytes long with the flag -9 will cause the | ||
| 272 | compressor to allocate around 6700k of memory, but only | ||
| 273 | touch 400k + 20000 * 7 = 540 kbytes of it. Similarly, the | ||
| 274 | decompressor will allocate 3700k but only touch 100k + | ||
| 275 | 20000 * 4 = 180 kbytes. | ||
| 276 | |||
| 277 | Here is a table which summarises the maximum memory usage | ||
| 278 | for different block sizes. Also recorded is the total | ||
| 279 | compressed size for 14 files of the Calgary Text Compres- | ||
| 280 | sion Corpus totalling 3,141,622 bytes. This column gives | ||
| 281 | some feel for how compression varies with block size. | ||
| 282 | These figures tend to understate the advantage of larger | ||
| 283 | block sizes for larger files, since the Corpus is domi- | ||
| 284 | nated by smaller files. | ||
| 285 | |||
| 286 | Compress Decompress Decompress Corpus | ||
| 287 | Flag usage usage -s usage Size | ||
| 288 | |||
| 289 | -1 1100k 500k 350k 914704 | ||
| 290 | -2 1800k 900k 600k 877703 | ||
| 291 | -3 2500k 1300k 850k 860338 | ||
| 292 | -4 3200k 1700k 1100k 846899 | ||
| 293 | -5 3900k 2100k 1350k 845160 | ||
| 294 | -6 4600k 2500k 1600k 838626 | ||
| 295 | -7 5400k 2900k 1850k 834096 | ||
| 296 | -8 6000k 3300k 2100k 828642 | ||
| 297 | -9 6700k 3700k 2350k 828642 | ||
| 298 | |||
| 299 | |||
| 300 | OPTIONS | ||
| 301 | -c --stdout | ||
| 302 | Compress or decompress to standard output. -c will | ||
| 303 | decompress multiple files to stdout, but will only | ||
| 304 | compress a single file to stdout. | ||
| 305 | |||
| 306 | -d --decompress | ||
| 307 | Force decompression. bzip2, bunzip2 and bzcat are | ||
| 308 | really the same program, and the decision about | ||
| 309 | what actions to take is done on the basis of which | ||
| 310 | name is used. This flag overrides that mechanism, | ||
| 311 | and forces bzip2 to decompress. | ||
| 312 | |||
| 313 | -z --compress | ||
| 314 | The complement to -d: forces compression, regard- | ||
| 315 | less of the invokation name. | ||
| 316 | |||
| 317 | -t --test | ||
| 318 | Check integrity of the specified file(s), but don't | ||
| 319 | decompress them. This really performs a trial | ||
| 320 | decompression and throws away the result. | ||
| 321 | |||
| 322 | -f --force | ||
| 323 | Force overwrite of output files. Normally, bzip2 | ||
| 324 | will not overwrite existing output files. | ||
| 325 | |||
| 326 | -k --keep | ||
| 327 | Keep (don't delete) input files during compression | ||
| 328 | or decompression. | ||
| 329 | |||
| 330 | -s --small | ||
| 331 | Reduce memory usage, for compression, decompression | ||
| 332 | and testing. Files are decompressed and tested | ||
| 333 | using a modified algorithm which only requires 2.5 | ||
| 334 | bytes per block byte. This means any file can be | ||
| 335 | decompressed in 2300k of memory, albeit at about | ||
| 336 | half the normal speed. | ||
| 337 | |||
| 338 | During compression, -s selects a block size of | ||
| 339 | 200k, which limits memory use to around the same | ||
| 340 | figure, at the expense of your compression ratio. | ||
| 341 | In short, if your machine is low on memory (8 | ||
| 342 | megabytes or less), use -s for everything. See | ||
| 343 | MEMORY MANAGEMENT above. | ||
| 344 | |||
| 345 | -v --verbose | ||
| 346 | Verbose mode -- show the compression ratio for each | ||
| 347 | file processed. Further -v's increase the ver- | ||
| 348 | bosity level, spewing out lots of information which | ||
| 349 | is primarily of interest for diagnostic purposes. | ||
| 350 | |||
| 351 | -L --license -V --version | ||
| 352 | Display the software version, license terms and | ||
| 353 | conditions. | ||
| 354 | |||
| 355 | -1 to -9 | ||
| 356 | Set the block size to 100 k, 200 k .. 900 k when | ||
| 357 | compressing. Has no effect when decompressing. | ||
| 358 | See MEMORY MANAGEMENT above. | ||
| 359 | |||
| 360 | --repetitive-fast | ||
| 361 | bzip2 injects some small pseudo-random variations | ||
| 362 | into very repetitive blocks to limit worst-case | ||
| 363 | performance during compression. If sorting runs | ||
| 364 | into difficulties, the block is randomised, and | ||
| 365 | sorting is restarted. Very roughly, bzip2 persists | ||
| 366 | for three times as long as a well-behaved input | ||
| 367 | would take before resorting to randomisation. This | ||
| 368 | flag makes it give up much sooner. | ||
| 369 | |||
| 370 | --repetitive-best | ||
| 371 | Opposite of --repetitive-fast; try a lot harder | ||
| 372 | before resorting to randomisation. | ||
| 373 | |||
| 374 | |||
| 375 | RECOVERING DATA FROM DAMAGED FILES | ||
| 376 | bzip2 compresses files in blocks, usually 900kbytes long. | ||
| 377 | Each block is handled independently. If a media or trans- | ||
| 378 | mission error causes a multi-block .bz2 file to become | ||
| 379 | damaged, it may be possible to recover data from the | ||
| 380 | undamaged blocks in the file. | ||
| 381 | |||
| 382 | The compressed representation of each block is delimited | ||
| 383 | by a 48-bit pattern, which makes it possible to find the | ||
| 384 | block boundaries with reasonable certainty. Each block | ||
| 385 | also carries its own 32-bit CRC, so damaged blocks can be | ||
| 386 | distinguished from undamaged ones. | ||
| 387 | |||
| 388 | bzip2recover is a simple program whose purpose is to | ||
| 389 | search for blocks in .bz2 files, and write each block out | ||
| 390 | into its own .bz2 file. You can then use bzip2 -t to test | ||
| 391 | the integrity of the resulting files, and decompress those | ||
| 392 | which are undamaged. | ||
| 393 | |||
| 394 | bzip2recover takes a single argument, the name of the dam- | ||
| 395 | aged file, and writes a number of files "rec0001file.bz2", | ||
| 396 | "rec0002file.bz2", etc, containing the extracted blocks. | ||
| 397 | The output filenames are designed so that the use of | ||
| 398 | wildcards in subsequent processing -- for example, "bzip2 | ||
| 399 | -dc rec*file.bz2 > recovered_data" -- lists the files in | ||
| 400 | the "right" order. | ||
| 401 | |||
| 402 | bzip2recover should be of most use dealing with large .bz2 | ||
| 403 | files, as these will contain many blocks. It is clearly | ||
| 404 | futile to use it on damaged single-block files, since a | ||
| 405 | damaged block cannot be recovered. If you wish to min- | ||
| 406 | imise any potential data loss through media or transmis- | ||
| 407 | sion errors, you might consider compressing with a smaller | ||
| 408 | block size. | ||
| 409 | |||
| 410 | |||
| 411 | PERFORMANCE NOTES | ||
| 412 | The sorting phase of compression gathers together similar | ||
| 413 | strings in the file. Because of this, files containing | ||
| 414 | very long runs of repeated symbols, like "aabaabaabaab | ||
| 415 | ..." (repeated several hundred times) may compress | ||
| 416 | extraordinarily slowly. You can use the -vvvvv option to | ||
| 417 | monitor progress in great detail, if you want. Decompres- | ||
| 418 | sion speed is unaffected. | ||
| 419 | |||
| 420 | Such pathological cases seem rare in practice, appearing | ||
| 421 | mostly in artificially-constructed test files, and in low- | ||
| 422 | level disk images. It may be inadvisable to use bzip2 to | ||
| 423 | compress the latter. If you do get a file which causes | ||
| 424 | severe slowness in compression, try making the block size | ||
| 425 | as small as possible, with flag -1. | ||
| 426 | |||
| 427 | bzip2 usually allocates several megabytes of memory to | ||
| 428 | operate in, and then charges all over it in a fairly ran- | ||
| 429 | dom fashion. This means that performance, both for com- | ||
| 430 | pressing and decompressing, is largely determined by the | ||
| 431 | speed at which your machine can service cache misses. | ||
| 432 | Because of this, small changes to the code to reduce the | ||
| 433 | miss rate have been observed to give disproportionately | ||
| 434 | large performance improvements. I imagine bzip2 will per- | ||
| 435 | form best on machines with very large caches. | ||
| 436 | |||
| 437 | |||
| 438 | CAVEATS | ||
| 439 | I/O error messages are not as helpful as they could be. | ||
| 440 | Bzip2 tries hard to detect I/O errors and exit cleanly, | ||
| 441 | but the details of what the problem is sometimes seem | ||
| 442 | rather misleading. | ||
| 443 | |||
| 444 | This manual page pertains to version 0.9.0 of bzip2. Com- | ||
| 445 | pressed data created by this version is entirely forwards | ||
| 446 | and backwards compatible with the previous public release, | ||
| 447 | version 0.1pl2, but with the following exception: 0.9.0 | ||
| 448 | can correctly decompress multiple concatenated compressed | ||
| 449 | files. 0.1pl2 cannot do this; it will stop after decom- | ||
| 450 | pressing just the first file in the stream. | ||
| 451 | |||
| 452 | Wildcard expansion for Windows 95 and NT is flaky. | ||
| 453 | |||
| 454 | bzip2recover uses 32-bit integers to represent bit posi- | ||
| 455 | tions in compressed files, so it cannot handle compressed | ||
| 456 | files more than 512 megabytes long. This could easily be | ||
| 457 | fixed. | ||
| 458 | |||
| 459 | |||
| 460 | AUTHOR | ||
| 461 | Julian Seward, jseward@@acm.org. | ||
| 462 | |||
| 463 | The ideas embodied in bzip2 are due to (at least) the fol- | ||
| 464 | lowing people: Michael Burrows and David Wheeler (for the | ||
| 465 | block sorting transformation), David Wheeler (again, for | ||
| 466 | the Huffman coder), Peter Fenwick (for the structured cod- | ||
| 467 | ing model in the original bzip, and many refinements), and | ||
| 468 | Alistair Moffat, Radford Neal and Ian Witten (for the | ||
| 469 | arithmetic coder in the original bzip). I am much | ||
| 470 | indebted for their help, support and advice. See the man- | ||
| 471 | ual in the source distribution for pointers to sources of | ||
| 472 | documentation. Christian von Roques encouraged me to look | ||
| 473 | for faster sorting algorithms, so as to speed up compres- | ||
| 474 | sion. Bela Lubkin encouraged me to improve the worst-case | ||
| 475 | compression performance. Many people sent patches, helped | ||
| 476 | with portability problems, lent machines, gave advice and | ||
| 477 | were generally helpful. | ||
| 478 | @end example | ||
| 479 | |||
| 480 | |||
| 481 | |||
| 482 | |||
| 483 | |||
| 484 | @chapter Programming with @code{libbzip2} | ||
| 485 | |||
| 486 | This chapter describes the programming interface to @code{libbzip2}. | ||
| 487 | |||
| 488 | For general background information, particularly about memory | ||
| 489 | use and performance aspects, you'd be well advised to read Chapter 2 | ||
| 490 | as well. | ||
| 491 | |||
| 492 | @section Top-level structure | ||
| 493 | |||
| 494 | @code{libbzip2} is a flexible library for compressing and decompressing | ||
| 495 | data in the @code{bzip2} data format. Although packaged as a single | ||
| 496 | entity, it helps to regard the library as three separate parts: the low | ||
| 497 | level interface, and the high level interface, and some utility | ||
| 498 | functions. | ||
| 499 | |||
| 500 | The structure of @code{libbzip2}'s interfaces is similar to | ||
| 501 | that of Jean-loup Gailly's and Mark Adler's excellent @code{zlib} | ||
| 502 | library. | ||
| 503 | |||
| 504 | @subsection Low-level summary | ||
| 505 | |||
| 506 | This interface provides services for compressing and decompressing | ||
| 507 | data in memory. There's no provision for dealing with files, streams | ||
| 508 | or any other I/O mechanisms, just straight memory-to-memory work. | ||
| 509 | In fact, this part of the library can be compiled without inclusion | ||
| 510 | of @code{stdio.h}, which may be helpful for embedded applications. | ||
| 511 | |||
| 512 | The low-level part of the library has no global variables and | ||
| 513 | is therefore thread-safe. | ||
| 514 | |||
| 515 | Six routines make up the low level interface: | ||
| 516 | @code{bzCompressInit}, @code{bzCompress}, and @* @code{bzCompressEnd} | ||
| 517 | for compression, | ||
| 518 | and a corresponding trio @code{bzDecompressInit}, @* @code{bzDecompress} | ||
| 519 | and @code{bzDecompressEnd} for decompression. | ||
| 520 | The @code{*Init} functions allocate | ||
| 521 | memory for compression/decompression and do other | ||
| 522 | initialisations, whilst the @code{*End} functions close down operations | ||
| 523 | and release memory. | ||
| 524 | |||
| 525 | The real work is done by @code{bzCompress} and @code{bzDecompress}. | ||
| 526 | These compress/decompress data from a user-supplied input buffer | ||
| 527 | to a user-supplied output buffer. These buffers can be any size; | ||
| 528 | arbitrary quantities of data are handled by making repeated calls | ||
| 529 | to these functions. This is a flexible mechanism allowing a | ||
| 530 | consumer-pull style of activity, or producer-push, or a mixture of | ||
| 531 | both. | ||
| 532 | |||
| 533 | |||
| 534 | |||
| 535 | @subsection High-level summary | ||
| 536 | |||
| 537 | This interface provides some handy wrappers around the low-level | ||
| 538 | interface to facilitate reading and writing @code{bzip2} format | ||
| 539 | files (@code{.bz2} files). The routines provide hooks to facilitate | ||
| 540 | reading files in which the @code{bzip2} data stream is embedded | ||
| 541 | within some larger-scale file structure, or where there are | ||
| 542 | multiple @code{bzip2} data streams concatenated end-to-end. | ||
| 543 | |||
| 544 | For reading files, @code{bzReadOpen}, @code{bzRead}, @code{bzReadClose} | ||
| 545 | and @code{bzReadGetUnused} are supplied. For writing files, | ||
| 546 | @code{bzWriteOpen}, @code{bzWrite} and @code{bzWriteFinish} are | ||
| 547 | available. | ||
| 548 | |||
| 549 | As with the low-level library, no global variables are used | ||
| 550 | so the library is per se thread-safe. However, if I/O errors | ||
| 551 | occur whilst reading or writing the underlying compressed files, | ||
| 552 | you may have to consult @code{errno} to determine the cause of | ||
| 553 | the error. In that case, you'd need a C library which correctly | ||
| 554 | supports @code{errno} in a multithreaded environment. | ||
| 555 | |||
| 556 | To make the library a little simpler and more portable, | ||
| 557 | @code{bzReadOpen} and @code{bzWriteOpen} require you to pass them file | ||
| 558 | handles (@code{FILE*}s) which have previously been opened for reading or | ||
| 559 | writing respectively. That avoids portability problems associated with | ||
| 560 | file operations and file attributes, whilst not being much of an | ||
| 561 | imposition on the programmer. | ||
| 562 | |||
| 563 | |||
| 564 | |||
| 565 | @subsection Utility functions summary | ||
| 566 | For very simple needs, @code{bzBuffToBuffCompress} and | ||
| 567 | @code{bzBuffToBuffDecompress} are provided. These compress | ||
| 568 | data in memory from one buffer to another buffer in a single | ||
| 569 | function call. You should assess whether these functions | ||
| 570 | fulfill your memory-to-memory compression/decompression | ||
| 571 | requirements before investing effort in understanding the more | ||
| 572 | general but more complex low-level interface. | ||
| 573 | |||
| 574 | Yoshioka Tsuneo (@code{QWF00133@@niftyserve.or.jp} / | ||
| 575 | @code{tsuneo-y@@is.aist-nara.ac.jp}) has contributed some functions to | ||
| 576 | give better @code{zlib} compatibility. These functions are | ||
| 577 | @code{bzopen}, @code{bzread}, @code{bzwrite}, @code{bzflush}, | ||
| 578 | @code{bzclose}, | ||
| 579 | @code{bzerror} and @code{bzlibVersion}. You may find these functions | ||
| 580 | more convenient for simple file reading and writing, than those in the | ||
| 581 | high-level interface. These functions are not (yet) officially part of | ||
| 582 | the library, and are not further documented here. If they break, you | ||
| 583 | get to keep all the pieces. I hope to document them properly when time | ||
| 584 | permits. | ||
| 585 | |||
| 586 | Yoshioka also contributed modifications to allow the library to be | ||
| 587 | built as a Windows DLL. | ||
| 588 | |||
| 589 | |||
| 590 | @section Error handling | ||
| 591 | |||
| 592 | The library is designed to recover cleanly in all situations, including | ||
| 593 | the worst-case situation of decompressing random data. I'm not | ||
| 594 | 100% sure that it can always do this, so you might want to add | ||
| 595 | a signal handler to catch segmentation violations during decompression | ||
| 596 | if you are feeling especially paranoid. I would be interested in | ||
| 597 | hearing more about the robustness of the library to corrupted | ||
| 598 | compressed data. | ||
| 599 | |||
| 600 | The file @code{bzlib.h} contains all definitions needed to use | ||
| 601 | the library. In particular, you should definitely not include | ||
| 602 | @code{bzlib_private.h}. | ||
| 603 | |||
| 604 | In @code{bzlib.h}, the various return values are defined. The following | ||
| 605 | list is not intended as an exhaustive description of the circumstances | ||
| 606 | in which a given value may be returned -- those descriptions are given | ||
| 607 | later. Rather, it is intended to convey the rough meaning of each | ||
| 608 | return value. The first five actions are normal and not intended to | ||
| 609 | denote an error situation. | ||
| 610 | @table @code | ||
| 611 | @item BZ_OK | ||
| 612 | The requested action was completed successfully. | ||
| 613 | @item BZ_RUN_OK | ||
| 614 | @itemx BZ_FLUSH_OK | ||
| 615 | @itemx BZ_FINISH_OK | ||
| 616 | In @code{bzCompress}, the requested flush/finish/nothing-special action | ||
| 617 | was completed successfully. | ||
| 618 | @item BZ_STREAM_END | ||
| 619 | Compression of data was completed, or the logical stream end was | ||
| 620 | detected during decompression. | ||
| 621 | @end table | ||
| 622 | |||
| 623 | The following return values indicate an error of some kind. | ||
| 624 | @table @code | ||
| 625 | @item BZ_SEQUENCE_ERROR | ||
| 626 | When using the library, it is important to call the functions in the | ||
| 627 | correct sequence and with data structures (buffers etc) in the correct | ||
| 628 | states. @code{libbzip2} checks as much as it can to ensure this is | ||
| 629 | happening, and returns @code{BZ_SEQUENCE_ERROR} if not. Code which | ||
| 630 | complies precisely with the function semantics, as detailed below, | ||
| 631 | should never receive this value; such an event denotes buggy code | ||
| 632 | which you should investigate. | ||
| 633 | @item BZ_PARAM_ERROR | ||
| 634 | Returned when a parameter to a function call is out of range | ||
| 635 | or otherwise manifestly incorrect. As with @code{BZ_SEQUENCE_ERROR}, | ||
| 636 | this denotes a bug in the client code. The distinction between | ||
| 637 | @code{BZ_PARAM_ERROR} and @code{BZ_SEQUENCE_ERROR} is a bit hazy, but still worth | ||
| 638 | making. | ||
| 639 | @item BZ_MEM_ERROR | ||
| 640 | Returned when a request to allocate memory failed. Note that the | ||
| 641 | quantity of memory needed to decompress a stream cannot be determined | ||
| 642 | until the stream's header has been read. So @code{bzDecompress} and | ||
| 643 | @code{bzRead} may return @code{BZ_MEM_ERROR} even though some of | ||
| 644 | the compressed data has been read. The same is not true for | ||
| 645 | compression; once @code{bzCompressInit} or @code{bzWriteOpen} have | ||
| 646 | successfully completed, @code{BZ_MEM_ERROR} cannot occur. | ||
| 647 | @item BZ_DATA_ERROR | ||
| 648 | Returned when a data integrity error is detected during decompression. | ||
| 649 | Most importantly, this means when stored and computed CRCs for the | ||
| 650 | data do not match. This value is also returned upon detection of any | ||
| 651 | other anomaly in the compressed data. | ||
| 652 | @item BZ_DATA_ERROR_MAGIC | ||
| 653 | As a special case of @code{BZ_DATA_ERROR}, it is sometimes useful to | ||
| 654 | know when the compressed stream does not start with the correct | ||
| 655 | magic bytes (@code{'B' 'Z' 'h'}). | ||
| 656 | @item BZ_IO_ERROR | ||
| 657 | Returned by @code{bzRead} and @code{bzRead} when there is an error | ||
| 658 | reading or writing in the compressed file, and by @code{bzReadOpen} | ||
| 659 | and @code{bzWriteOpen} for attempts to use a file for which the | ||
| 660 | error indicator (viz, @code{ferror(f)}) is set. | ||
| 661 | On receipt of @code{BZ_IO_ERROR}, the caller should consult | ||
| 662 | @code{errno} and/or @code{perror} to acquire operating-system | ||
| 663 | specific information about the problem. | ||
| 664 | @item BZ_UNEXPECTED_EOF | ||
| 665 | Returned by @code{bzRead} when the compressed file finishes | ||
| 666 | before the logical end of stream is detected. | ||
| 667 | @item BZ_OUTBUFF_FULL | ||
| 668 | Returned by @code{bzBuffToBuffCompress} and | ||
| 669 | @code{bzBuffToBuffDecompress} to indicate that the output data | ||
| 670 | will not fit into the output buffer provided. | ||
| 671 | @end table | ||
| 672 | |||
| 673 | |||
| 674 | |||
| 675 | @section Low-level interface | ||
| 676 | |||
| 677 | @subsection @code{bzCompressInit} | ||
| 678 | @example | ||
| 679 | typedef | ||
| 680 | struct @{ | ||
| 681 | char *next_in; | ||
| 682 | unsigned int avail_in; | ||
| 683 | unsigned int total_in; | ||
| 684 | |||
| 685 | char *next_out; | ||
| 686 | unsigned int avail_out; | ||
| 687 | unsigned int total_out; | ||
| 688 | |||
| 689 | void *state; | ||
| 690 | |||
| 691 | void *(*bzalloc)(void *,int,int); | ||
| 692 | void (*bzfree)(void *,void *); | ||
| 693 | void *opaque; | ||
| 694 | @} | ||
| 695 | bz_stream; | ||
| 696 | |||
| 697 | int bzCompressInit ( bz_stream *strm, | ||
| 698 | int blockSize100k, | ||
| 699 | int verbosity, | ||
| 700 | int workFactor ); | ||
| 701 | |||
| 702 | @end example | ||
| 703 | |||
| 704 | Prepares for compression. The @code{bz_stream} structure | ||
| 705 | holds all data pertaining to the compression activity. | ||
| 706 | A @code{bz_stream} structure should be allocated and initialised | ||
| 707 | prior to the call. | ||
| 708 | The fields of @code{bz_stream} | ||
| 709 | comprise the entirety of the user-visible data. @code{state} | ||
| 710 | is a pointer to the private data structures required for compression. | ||
| 711 | |||
| 712 | Custom memory allocators are supported, via fields @code{bzalloc}, | ||
| 713 | @code{bzfree}, | ||
| 714 | and @code{opaque}. The value | ||
| 715 | @code{opaque} is passed to as the first argument to | ||
| 716 | all calls to @code{bzalloc} and @code{bzfree}, but is | ||
| 717 | otherwise ignored by the library. | ||
| 718 | The call @code{bzalloc ( opaque, n, m )} is expected to return a | ||
| 719 | pointer @code{p} to | ||
| 720 | @code{n * m} bytes of memory, and @code{bzfree ( opaque, p )} | ||
| 721 | should free | ||
| 722 | that memory. | ||
| 723 | |||
| 724 | If you don't want to use a custom memory allocator, set @code{bzalloc}, | ||
| 725 | @code{bzfree} and | ||
| 726 | @code{opaque} to @code{NULL}, | ||
| 727 | and the library will then use the standard @code{malloc}/@code{free} | ||
| 728 | routines. | ||
| 729 | |||
| 730 | Before calling @code{bzCompressInit}, fields @code{bzalloc}, | ||
| 731 | @code{bzfree} and @code{opaque} should | ||
| 732 | be filled appropriately, as just described. Upon return, the internal | ||
| 733 | state will have been allocated and initialised, and @code{total_in} and | ||
| 734 | @code{total_out} will have been set to zero. | ||
| 735 | These last two fields are used by the library | ||
| 736 | to inform the caller of the total amount of data passed into and out of | ||
| 737 | the library, respectively. You should not try to change them. | ||
| 738 | |||
| 739 | Parameter @code{blockSize100k} specifies the block size to be used for | ||
| 740 | compression. It should be a value between 1 and 9 inclusive, and the | ||
| 741 | actual block size used is 100000 x this figure. 9 gives the best | ||
| 742 | compression but takes most memory. | ||
| 743 | |||
| 744 | Parameter @code{verbosity} should be set to a number between 0 and 4 | ||
| 745 | inclusive. 0 is silent, and greater numbers give increasingly verbose | ||
| 746 | monitoring/debugging output. If the library has been compiled with | ||
| 747 | @code{-DBZ_NO_STDIO}, no such output will appear for any verbosity | ||
| 748 | setting. | ||
| 749 | |||
| 750 | Parameter @code{workFactor} controls how the compression phase behaves | ||
| 751 | when presented with worst case, highly repetitive, input data. | ||
| 752 | If compression runs into difficulties caused by repetitive data, | ||
| 753 | some pseudo-random variations are inserted into the block, and | ||
| 754 | compression is restarted. Lower values of @code{workFactor} | ||
| 755 | reduce the tolerance of compression to repetitive data. | ||
| 756 | You should set this parameter carefully; too low, and | ||
| 757 | compression ratio suffers, too high, and your average-to-worst | ||
| 758 | case compression times can become very large. | ||
| 759 | The default value of 30 | ||
| 760 | gives reasonable behaviour over a wide range of circumstances. | ||
| 761 | |||
| 762 | Allowable values range from 0 to 250 inclusive. 0 is a special | ||
| 763 | case, equivalent to using the default value of 30. | ||
| 764 | |||
| 765 | Note that the randomisation process is entirely transparent. | ||
| 766 | If the library decides to randomise and restart compression on a | ||
| 767 | block, it does so without comment. Randomised blocks are | ||
| 768 | automatically de-randomised during decompression, so data | ||
| 769 | integrity is never compromised. | ||
| 770 | |||
| 771 | Possible return values: | ||
| 772 | @display | ||
| 773 | @code{BZ_PARAM_ERROR} | ||
| 774 | if @code{strm} is @code{NULL} | ||
| 775 | or @code{blockSize} < 1 or @code{blockSize} > 9 | ||
| 776 | or @code{verbosity} < 0 or @code{verbosity} > 4 | ||
| 777 | or @code{workFactor} < 0 or @code{workFactor} > 250 | ||
| 778 | @code{BZ_MEM_ERROR} | ||
| 779 | if not enough memory is available | ||
| 780 | @code{BZ_OK} | ||
| 781 | otherwise | ||
| 782 | @end display | ||
| 783 | Allowable next actions: | ||
| 784 | @display | ||
| 785 | @code{bzCompress} | ||
| 786 | if @code{BZ_OK} is returned | ||
| 787 | no specific action needed in case of error | ||
| 788 | @end display | ||
| 789 | |||
| 790 | @subsection @code{bzCompress} | ||
| 791 | @example | ||
| 792 | int bzCompress ( bz_stream *strm, int action ); | ||
| 793 | @end example | ||
| 794 | Provides more input and/or output buffer space for the library. The | ||
| 795 | caller maintains input and output buffers, and calls @code{bzCompress} to | ||
| 796 | transfer data between them. | ||
| 797 | |||
| 798 | Before each call to @code{bzCompress}, @code{next_in} should point at | ||
| 799 | the data to be compressed, and @code{avail_in} should indicate how many | ||
| 800 | bytes the library may read. @code{bzCompress} updates @code{next_in}, | ||
| 801 | @code{avail_in} and @code{total_in} to reflect the number of bytes it | ||
| 802 | has read. | ||
| 803 | |||
| 804 | Similarly, @code{next_out} should point to a buffer in which the | ||
| 805 | compressed data is to be placed, with @code{avail_out} indicating how | ||
| 806 | much output space is available. @code{bzCompress} updates | ||
| 807 | @code{next_out}, @code{avail_out} and @code{total_out} to reflect the | ||
| 808 | number of bytes output. | ||
| 809 | |||
| 810 | You may provide and remove as little or as much data as you like on each | ||
| 811 | call of @code{bzCompress}. In the limit, it is acceptable to supply and | ||
| 812 | remove data one byte at a time, although this would be terribly | ||
| 813 | inefficient. You should always ensure that at least one byte of output | ||
| 814 | space is available at each call. | ||
| 815 | |||
| 816 | A second purpose of @code{bzCompress} is to request a change of mode of the | ||
| 817 | compressed stream. | ||
| 818 | |||
| 819 | Conceptually, a compressed stream can be in one of four states: IDLE, | ||
| 820 | RUNNING, FLUSHING and FINISHING. Before initialisation | ||
| 821 | (@code{bzCompressInit}) and after termination (@code{bzCompressEnd}), a | ||
| 822 | stream is regarded as IDLE. | ||
| 823 | |||
| 824 | Upon initialisation (@code{bzCompressInit}), the stream is placed in the | ||
| 825 | RUNNING state. Subsequent calls to @code{bzCompress} should pass | ||
| 826 | @code{BZ_RUN} as the requested action; other actions are illegal and | ||
| 827 | will result in @code{BZ_SEQUENCE_ERROR}. | ||
| 828 | |||
| 829 | At some point, the calling program will have provided all the input data | ||
| 830 | it wants to. It will then want to finish up -- in effect, asking the | ||
| 831 | library to process any data it might have buffered internally. In this | ||
| 832 | state, @code{bzCompress} will no longer attempt to read data from | ||
| 833 | @code{next_in}, but it will want to write data to @code{next_out}. | ||
| 834 | Because the output buffer supplied by the user can be arbitrarily small, | ||
| 835 | the finishing-up operation cannot necessarily be done with a single call | ||
| 836 | of @code{bzCompress}. | ||
| 837 | |||
| 838 | Instead, the calling program passes @code{BZ_FINISH} as an action to | ||
| 839 | @code{bzCompress}. This changes the stream's state to FINISHING. Any | ||
| 840 | remaining input (ie, @code{next_in[0 .. avail_in-1]}) is compressed and | ||
| 841 | transferred to the output buffer. To do this, @code{bzCompress} must be | ||
| 842 | called repeatedly until all the output has been consumed. At that | ||
| 843 | point, @code{bzCompress} returns @code{BZ_STREAM_END}, and the stream's | ||
| 844 | state is set back to IDLE. @code{bzCompressEnd} should then be | ||
| 845 | called. | ||
| 846 | |||
| 847 | Just to make sure the calling program does not cheat, the library makes | ||
| 848 | a note of @code{avail_in} at the time of the first call to | ||
| 849 | @code{bzCompress} which has @code{BZ_FINISH} as an action (ie, at the | ||
| 850 | time the program has announced its intention to not supply any more | ||
| 851 | input). By comparing this value with that of @code{avail_in} over | ||
| 852 | subsequent calls to @code{bzCompress}, the library can detect any | ||
| 853 | attempts to slip in more data to compress. Any calls for which this is | ||
| 854 | detected will return @code{BZ_SEQUENCE_ERROR}. This indicates a | ||
| 855 | programming mistake which should be corrected. | ||
| 856 | |||
| 857 | Instead of asking to finish, the calling program may ask | ||
| 858 | @code{bzCompress} to take all the remaining input, compress it and | ||
| 859 | terminate the current (Burrows-Wheeler) compression block. This could | ||
| 860 | be useful for error control purposes. The mechanism is analogous to | ||
| 861 | that for finishing: call @code{bzCompress} with an action of | ||
| 862 | @code{BZ_FLUSH}, remove output data, and persist with the | ||
| 863 | @code{BZ_FLUSH} action until the value @code{BZ_RUN} is returned. As | ||
| 864 | with finishing, @code{bzCompress} detects any attempt to provide more | ||
| 865 | input data once the flush has begun. | ||
| 866 | |||
| 867 | Once the flush is complete, the stream returns to the normal RUNNING | ||
| 868 | state. | ||
| 869 | |||
| 870 | This all sounds pretty complex, but isn't really. Here's a table | ||
| 871 | which shows which actions are allowable in each state, what action | ||
| 872 | will be taken, what the next state is, and what the non-error return | ||
| 873 | values are. Note that you can't explicitly ask what state the | ||
| 874 | stream is in, but nor do you need to -- it can be inferred from the | ||
| 875 | values returned by @code{bzCompress}. | ||
| 876 | @display | ||
| 877 | IDLE/@code{any} | ||
| 878 | Illegal. IDLE state only exists after @code{bzCompressEnd} or | ||
| 879 | before @code{bzCompressInit}. | ||
| 880 | Return value = @code{BZ_SEQUENCE_ERROR} | ||
| 881 | |||
| 882 | RUNNING/@code{BZ_RUN} | ||
| 883 | Compress from @code{next_in} to @code{next_out} as much as possible. | ||
| 884 | Next state = RUNNING | ||
| 885 | Return value = @code{BZ_RUN_OK} | ||
| 886 | |||
| 887 | RUNNING/@code{BZ_FLUSH} | ||
| 888 | Remember current value of @code{next_in}. Compress from @code{next_in} | ||
| 889 | to @code{next_out} as much as possible, but do not accept any more input. | ||
| 890 | Next state = FLUSHING | ||
| 891 | Return value = @code{BZ_FLUSH_OK} | ||
| 892 | |||
| 893 | RUNNING/@code{BZ_FINISH} | ||
| 894 | Remember current value of @code{next_in}. Compress from @code{next_in} | ||
| 895 | to @code{next_out} as much as possible, but do not accept any more input. | ||
| 896 | Next state = FINISHING | ||
| 897 | Return value = @code{BZ_FINISH_OK} | ||
| 898 | |||
| 899 | FLUSHING/@code{BZ_FLUSH} | ||
| 900 | Compress from @code{next_in} to @code{next_out} as much as possible, | ||
| 901 | but do not accept any more input. | ||
| 902 | If all the existing input has been used up and all compressed | ||
| 903 | output has been removed | ||
| 904 | Next state = RUNNING; Return value = @code{BZ_RUN_OK} | ||
| 905 | else | ||
| 906 | Next state = FLUSHING; Return value = @code{BZ_FLUSH_OK} | ||
| 907 | |||
| 908 | FLUSHING/other | ||
| 909 | Illegal. | ||
| 910 | Return value = @code{BZ_SEQUENCE_ERROR} | ||
| 911 | |||
| 912 | FINISHING/@code{BZ_FINISH} | ||
| 913 | Compress from @code{next_in} to @code{next_out} as much as possible, | ||
| 914 | but to not accept any more input. | ||
| 915 | If all the existing input has been used up and all compressed | ||
| 916 | output has been removed | ||
| 917 | Next state = IDLE; Return value = @code{BZ_STREAM_END} | ||
| 918 | else | ||
| 919 | Next state = FINISHING; Return value = @code{BZ_FINISHING} | ||
| 920 | |||
| 921 | FINISHING/other | ||
| 922 | Illegal. | ||
| 923 | Return value = @code{BZ_SEQUENCE_ERROR} | ||
| 924 | @end display | ||
| 925 | |||
| 926 | That still looks complicated? Well, fair enough. The usual sequence | ||
| 927 | of calls for compressing a load of data is: | ||
| 928 | @itemize @bullet | ||
| 929 | @item Get started with @code{bzCompressInit}. | ||
| 930 | @item Shovel data in and shlurp out its compressed form using zero or more | ||
| 931 | calls of @code{bzCompress} with action = @code{BZ_RUN}. | ||
| 932 | @item Finish up. | ||
| 933 | Repeatedly call @code{bzCompress} with action = @code{BZ_FINISH}, | ||
| 934 | copying out the compressed output, until @code{BZ_STREAM_END} is returned. | ||
| 935 | @item Close up and go home. Call @code{bzCompressEnd}. | ||
| 936 | @end itemize | ||
| 937 | If the data you want to compress fits into your input buffer all | ||
| 938 | at once, you can skip the calls of @code{bzCompress ( ..., BZ_RUN )} and | ||
| 939 | just do the @code{bzCompress ( ..., BZ_FINISH )} calls. | ||
| 940 | |||
| 941 | All required memory is allocated by @code{bzCompressInit}. The | ||
| 942 | compression library can accept any data at all (obviously). So you | ||
| 943 | shouldn't get any error return values from the @code{bzCompress} calls. | ||
| 944 | If you do, they will be @code{BZ_SEQUENCE_ERROR}, and indicate a bug in | ||
| 945 | your programming. | ||
| 946 | |||
| 947 | Trivial other possible return values: | ||
| 948 | @display | ||
| 949 | @code{BZ_PARAM_ERROR} | ||
| 950 | if @code{strm} is @code{NULL}, or @code{strm->s} is @code{NULL} | ||
| 951 | @end display | ||
| 952 | |||
| 953 | @subsection @code{bzCompressEnd} | ||
| 954 | @example | ||
| 955 | int bzCompressEnd ( bz_stream *strm ); | ||
| 956 | @end example | ||
| 957 | Releases all memory associated with a compression stream. | ||
| 958 | |||
| 959 | Possible return values: | ||
| 960 | @display | ||
| 961 | @code{BZ_PARAM_ERROR} if @code{strm} is @code{NULL} or @code{strm->s} is @code{NULL} | ||
| 962 | @code{BZ_OK} otherwise | ||
| 963 | @end display | ||
| 964 | |||
| 965 | |||
| 966 | @subsection @code{bzDecompressInit} | ||
| 967 | @example | ||
| 968 | int bzDecompressInit ( bz_stream *strm, int verbosity, int small ); | ||
| 969 | @end example | ||
| 970 | Prepares for decompression. As with @code{bzCompressInit}, a | ||
| 971 | @code{bz_stream} record should be allocated and initialised before the | ||
| 972 | call. Fields @code{bzalloc}, @code{bzfree} and @code{opaque} should be | ||
| 973 | set if a custom memory allocator is required, or made @code{NULL} for | ||
| 974 | the normal @code{malloc}/@code{free} routines. Upon return, the internal | ||
| 975 | state will have been initialised, and @code{total_in} and | ||
| 976 | @code{total_out} will be zero. | ||
| 977 | |||
| 978 | For the meaning of parameter @code{verbosity}, see @code{bzCompressInit}. | ||
| 979 | |||
| 980 | If @code{small} is nonzero, the library will use an alternative | ||
| 981 | decompression algorithm which uses less memory but at the cost of | ||
| 982 | decompressing more slowly (roughly speaking, half the speed, but the | ||
| 983 | maximum memory requirement drops to around 2300k). See Chapter 2 for | ||
| 984 | more information on memory management. | ||
| 985 | |||
| 986 | Note that the amount of memory needed to decompress | ||
| 987 | a stream cannot be determined until the stream's header has been read, | ||
| 988 | so even if @code{bzDecompressInit} succeeds, a subsequent | ||
| 989 | @code{bzDecompress} could fail with @code{BZ_MEM_ERROR}. | ||
| 990 | |||
| 991 | Possible return values: | ||
| 992 | @display | ||
| 993 | @code{BZ_PARAM_ERROR} | ||
| 994 | if @code{(small != 0 && small != 1)} | ||
| 995 | or @code{(verbosity < 0 || verbosity > 4)} | ||
| 996 | @code{BZ_MEM_ERROR} | ||
| 997 | if insufficient memory is available | ||
| 998 | @end display | ||
| 999 | |||
| 1000 | Allowable next actions: | ||
| 1001 | @display | ||
| 1002 | @code{bzDecompress} | ||
| 1003 | if @code{BZ_OK} was returned | ||
| 1004 | no specific action required in case of error | ||
| 1005 | @end display | ||
| 1006 | |||
| 1007 | |||
| 1008 | |||
| 1009 | @subsection @code{bzDecompress} | ||
| 1010 | @example | ||
| 1011 | int bzDecompress ( bz_stream *strm ); | ||
| 1012 | @end example | ||
| 1013 | Provides more input and/out output buffer space for the library. The | ||
| 1014 | caller maintains input and output buffers, and uses @code{bzDecompress} | ||
| 1015 | to transfer data between them. | ||
| 1016 | |||
| 1017 | Before each call to @code{bzDecompress}, @code{next_in} | ||
| 1018 | should point at the compressed data, | ||
| 1019 | and @code{avail_in} should indicate how many bytes the library | ||
| 1020 | may read. @code{bzDecompress} updates @code{next_in}, @code{avail_in} | ||
| 1021 | and @code{total_in} | ||
| 1022 | to reflect the number of bytes it has read. | ||
| 1023 | |||
| 1024 | Similarly, @code{next_out} should point to a buffer in which the uncompressed | ||
| 1025 | output is to be placed, with @code{avail_out} indicating how much output space | ||
| 1026 | is available. @code{bzCompress} updates @code{next_out}, | ||
| 1027 | @code{avail_out} and @code{total_out} to reflect | ||
| 1028 | the number of bytes output. | ||
| 1029 | |||
| 1030 | You may provide and remove as little or as much data as you like on | ||
| 1031 | each call of @code{bzDecompress}. | ||
| 1032 | In the limit, it is acceptable to | ||
| 1033 | supply and remove data one byte at a time, although this would be | ||
| 1034 | terribly inefficient. You should always ensure that at least one | ||
| 1035 | byte of output space is available at each call. | ||
| 1036 | |||
| 1037 | Use of @code{bzDecompress} is simpler than @code{bzCompress}. | ||
| 1038 | |||
| 1039 | You should provide input and remove output as described above, and | ||
| 1040 | repeatedly call @code{bzDecompress} until @code{BZ_STREAM_END} is | ||
| 1041 | returned. Appearance of @code{BZ_STREAM_END} denotes that | ||
| 1042 | @code{bzDecompress} has detected the logical end of the compressed | ||
| 1043 | stream. @code{bzDecompress} will not produce @code{BZ_STREAM_END} until | ||
| 1044 | all output data has been placed into the output buffer, so once | ||
| 1045 | @code{BZ_STREAM_END} appears, you are guaranteed to have available all | ||
| 1046 | the decompressed output, and @code{bzDecompressEnd} can safely be | ||
| 1047 | called. | ||
| 1048 | |||
| 1049 | If case of an error return value, you should call @code{bzDecompressEnd} | ||
| 1050 | to clean up and release memory. | ||
| 1051 | |||
| 1052 | Possible return values: | ||
| 1053 | @display | ||
| 1054 | @code{BZ_PARAM_ERROR} | ||
| 1055 | if @code{strm} is @code{NULL} or @code{strm->s} is @code{NULL} | ||
| 1056 | or @code{strm->avail_out < 1} | ||
| 1057 | @code{BZ_DATA_ERROR} | ||
| 1058 | if a data integrity error is detected in the compressed stream | ||
| 1059 | @code{BZ_DATA_ERROR_MAGIC} | ||
| 1060 | if the compressed stream doesn't begin with the right magic bytes | ||
| 1061 | @code{BZ_MEM_ERROR} | ||
| 1062 | if there wasn't enough memory available | ||
| 1063 | @code{BZ_STREAM_END} | ||
| 1064 | if the logical end of the data stream was detected and all | ||
| 1065 | output in has been consumed, eg @code{s->avail_out > 0} | ||
| 1066 | @code{BZ_OK} | ||
| 1067 | otherwise | ||
| 1068 | @end display | ||
| 1069 | Allowable next actions: | ||
| 1070 | @display | ||
| 1071 | @code{bzDecompress} | ||
| 1072 | if @code{BZ_OK} was returned | ||
| 1073 | @code{bzDecompressEnd} | ||
| 1074 | otherwise | ||
| 1075 | @end display | ||
| 1076 | |||
| 1077 | |||
| 1078 | @subsection @code{bzDecompressEnd} | ||
| 1079 | @example | ||
| 1080 | int bzDecompressEnd ( bz_stream *strm ); | ||
| 1081 | @end example | ||
| 1082 | Releases all memory associated with a decompression stream. | ||
| 1083 | |||
| 1084 | Possible return values: | ||
| 1085 | @display | ||
| 1086 | @code{BZ_PARAM_ERROR} | ||
| 1087 | if @code{strm} is @code{NULL} or @code{strm->s} is @code{NULL} | ||
| 1088 | @code{BZ_OK} | ||
| 1089 | otherwise | ||
| 1090 | @end display | ||
| 1091 | |||
| 1092 | Allowable next actions: | ||
| 1093 | @display | ||
| 1094 | None. | ||
| 1095 | @end display | ||
| 1096 | |||
| 1097 | |||
| 1098 | @section High-level interface | ||
| 1099 | |||
| 1100 | This interface provides functions for reading and writing | ||
| 1101 | @code{bzip2} format files. First, some general points. | ||
| 1102 | |||
| 1103 | @itemize @bullet | ||
| 1104 | @item All of the functions take an @code{int*} first argument, | ||
| 1105 | @code{bzerror}. | ||
| 1106 | After each call, @code{bzerror} should be consulted first to determine | ||
| 1107 | the outcome of the call. If @code{bzerror} is @code{BZ_OK}, | ||
| 1108 | the call completed | ||
| 1109 | successfully, and only then should the return value of the function | ||
| 1110 | (if any) be consulted. If @code{bzerror} is @code{BZ_IO_ERROR}, | ||
| 1111 | there was an error | ||
| 1112 | reading/writing the underlying compressed file, and you should | ||
| 1113 | then consult @code{errno}/@code{perror} to determine the | ||
| 1114 | cause of the difficulty. | ||
| 1115 | @code{bzerror} may also be set to various other values; precise details are | ||
| 1116 | given on a per-function basis below. | ||
| 1117 | @item If @code{bzerror} indicates an error | ||
| 1118 | (ie, anything except @code{BZ_OK} and @code{BZ_STREAM_END}), | ||
| 1119 | you should immediately call @code{bzReadClose} (or @code{bzWriteClose}, | ||
| 1120 | depending on whether you are attempting to read or to write) | ||
| 1121 | to free up all resources associated | ||
| 1122 | with the stream. Once an error has been indicated, behaviour of all calls | ||
| 1123 | except @code{bzReadClose} (@code{bzWriteClose}) is undefined. | ||
| 1124 | The implication is that (1) @code{bzerror} should | ||
| 1125 | be checked after each call, and (2) if @code{bzerror} indicates an error, | ||
| 1126 | @code{bzReadClose} (@code{bzWriteClose}) should then be called to clean up. | ||
| 1127 | @item The @code{FILE*} arguments passed to | ||
| 1128 | @code{bzReadOpen}/@code{bzWriteOpen} | ||
| 1129 | should be set to binary mode. | ||
| 1130 | Most Unix systems will do this by default, but other platforms, | ||
| 1131 | including Windows and Mac, will not. If you omit this, you may | ||
| 1132 | encounter problems when moving code to new platforms. | ||
| 1133 | @item Memory allocation requests are handled by | ||
| 1134 | @code{malloc}/@code{free}. | ||
| 1135 | At present | ||
| 1136 | there is no facility for user-defined memory allocators in the file I/O | ||
| 1137 | functions (could easily be added, though). | ||
| 1138 | @end itemize | ||
| 1139 | |||
| 1140 | |||
| 1141 | |||
| 1142 | @subsection @code{bzReadOpen} | ||
| 1143 | @example | ||
| 1144 | typedef void BZFILE; | ||
| 1145 | |||
| 1146 | BZFILE *bzReadOpen ( int *bzerror, FILE *f, | ||
| 1147 | int small, int verbosity, | ||
| 1148 | void *unused, int nUnused ); | ||
| 1149 | @end example | ||
| 1150 | Prepare to read compressed data from file handle @code{f}. @code{f} | ||
| 1151 | should refer to a file which has been opened for reading, and for which | ||
| 1152 | the error indicator (@code{ferror(f)})is not set. If @code{small} is 1, | ||
| 1153 | the library will try to decompress using less memory, at the expense of | ||
| 1154 | speed. | ||
| 1155 | |||
| 1156 | For reasons explained below, @code{bzRead} will decompress the | ||
| 1157 | @code{nUnused} bytes starting at @code{unused}, before starting to read | ||
| 1158 | from the file @code{f}. At most @code{BZ_MAX_UNUSED} bytes may be | ||
| 1159 | supplied like this. If this facility is not required, you should pass | ||
| 1160 | @code{NULL} and @code{0} for @code{unused} and n@code{Unused} | ||
| 1161 | respectively. | ||
| 1162 | |||
| 1163 | For the meaning of parameters @code{small} and @code{verbosity}, | ||
| 1164 | see @code{bzDecompressInit}. | ||
| 1165 | |||
| 1166 | The amount of memory needed to decompress a file cannot be determined | ||
| 1167 | until the file's header has been read. So it is possible that | ||
| 1168 | @code{bzReadOpen} returns @code{BZ_OK} but a subsequent call of | ||
| 1169 | @code{bzRead} will return @code{BZ_MEM_ERROR}. | ||
| 1170 | |||
| 1171 | Possible assignments to @code{bzerror}: | ||
| 1172 | @display | ||
| 1173 | @code{BZ_PARAM_ERROR} | ||
| 1174 | if @code{f} is @code{NULL} | ||
| 1175 | or @code{small} is neither @code{0} nor @code{1} | ||
| 1176 | or @code{(unused == NULL && nUnused != 0)} | ||
| 1177 | or @code{(unused != NULL && !(0 <= nUnused <= BZ_MAX_UNUSED))} | ||
| 1178 | @code{BZ_IO_ERROR} | ||
| 1179 | if @code{ferror(f)} is nonzero | ||
| 1180 | @code{BZ_MEM_ERROR} | ||
| 1181 | if insufficient memory is available | ||
| 1182 | @code{BZ_OK} | ||
| 1183 | otherwise. | ||
| 1184 | @end display | ||
| 1185 | |||
| 1186 | Possible return values: | ||
| 1187 | @display | ||
| 1188 | Pointer to an abstract @code{BZFILE} | ||
| 1189 | if @code{bzerror} is @code{BZ_OK} | ||
| 1190 | @code{NULL} | ||
| 1191 | otherwise | ||
| 1192 | @end display | ||
| 1193 | |||
| 1194 | Allowable next actions: | ||
| 1195 | @display | ||
| 1196 | @code{bzRead} | ||
| 1197 | if @code{bzerror} is @code{BZ_OK} | ||
| 1198 | @code{bzClose} | ||
| 1199 | otherwise | ||
| 1200 | @end display | ||
| 1201 | |||
| 1202 | |||
| 1203 | @subsection @code{bzRead} | ||
| 1204 | @example | ||
| 1205 | int bzRead ( int *bzerror, BZFILE *b, void *buf, int len ); | ||
| 1206 | @end example | ||
| 1207 | Reads up to @code{len} (uncompressed) bytes from the compressed file | ||
| 1208 | @code{b} into | ||
| 1209 | the buffer @code{buf}. If the read was successful, | ||
| 1210 | @code{bzerror} is set to @code{BZ_OK} | ||
| 1211 | and the number of bytes read is returned. If the logical end-of-stream | ||
| 1212 | was detected, @code{bzerror} will be set to @code{BZ_STREAM_END}, | ||
| 1213 | and the number | ||
| 1214 | of bytes read is returned. All other @code{bzerror} values denote an error. | ||
| 1215 | |||
| 1216 | @code{bzRead} will supply @code{len} bytes, | ||
| 1217 | unless the logical stream end is detected | ||
| 1218 | or an error occurs. Because of this, it is possible to detect the | ||
| 1219 | stream end by observing when the number of bytes returned is | ||
| 1220 | less than the number | ||
| 1221 | requested. Nevertheless, this is regarded as inadvisable; you should | ||
| 1222 | instead check @code{bzerror} after every call and watch out for | ||
| 1223 | @code{BZ_STREAM_END}. | ||
| 1224 | |||
| 1225 | Internally, @code{bzRead} copies data from the compressed file in chunks | ||
| 1226 | of size @code{BZ_MAX_UNUSED} bytes | ||
| 1227 | before decompressing it. If the file contains more bytes than strictly | ||
| 1228 | needed to reach the logical end-of-stream, @code{bzRead} will almost certainly | ||
| 1229 | read some of the trailing data before signalling @code{BZ_SEQUENCE_END}. | ||
| 1230 | To collect the read but unused data once @code{BZ_SEQUENCE_END} has | ||
| 1231 | appeared, call @code{bzReadGetUnused} immediately before @code{bzReadClose}. | ||
| 1232 | |||
| 1233 | Possible assignments to @code{bzerror}: | ||
| 1234 | @display | ||
| 1235 | @code{BZ_PARAM_ERROR} | ||
| 1236 | if @code{b} is @code{NULL} or @code{buf} is @code{NULL} or @code{len < 0} | ||
| 1237 | @code{BZ_SEQUENCE_ERROR} | ||
| 1238 | if @code{b} was opened with @code{bzWriteOpen} | ||
| 1239 | @code{BZ_IO_ERROR} | ||
| 1240 | if there is an error reading from the compressed file | ||
| 1241 | @code{BZ_UNEXPECTED_EOF} | ||
| 1242 | if the compressed file ended before the logical end-of-stream was detected | ||
| 1243 | @code{BZ_DATA_ERROR} | ||
| 1244 | if a data integrity error was detected in the compressed stream | ||
| 1245 | @code{BZ_DATA_ERROR_MAGIC} | ||
| 1246 | if the stream does not begin with the requisite header bytes (ie, is not | ||
| 1247 | a @code{bzip2} data file). This is really a special case of @code{BZ_DATA_ERROR}. | ||
| 1248 | @code{BZ_MEM_ERROR} | ||
| 1249 | if insufficient memory was available | ||
| 1250 | @code{BZ_STREAM_END} | ||
| 1251 | if the logical end of stream was detected. | ||
| 1252 | @code{BZ_OK} | ||
| 1253 | otherwise. | ||
| 1254 | @end display | ||
| 1255 | |||
| 1256 | Possible return values: | ||
| 1257 | @display | ||
| 1258 | number of bytes read | ||
| 1259 | if @code{bzerror} is @code{BZ_OK} or @code{BZ_STREAM_END} | ||
| 1260 | undefined | ||
| 1261 | otherwise | ||
| 1262 | @end display | ||
| 1263 | |||
| 1264 | Allowable next actions: | ||
| 1265 | @display | ||
| 1266 | collect data from @code{buf}, then @code{bzRead} or @code{bzReadClose} | ||
| 1267 | if @code{bzerror} is @code{BZ_OK} | ||
| 1268 | collect data from @code{buf}, then @code{bzReadClose} or @code{bzReadGetUnused} | ||
| 1269 | if @code{bzerror} is @code{BZ_SEQUENCE_END} | ||
| 1270 | @code{bzReadClose} | ||
| 1271 | otherwise | ||
| 1272 | @end display | ||
| 1273 | |||
| 1274 | |||
| 1275 | |||
| 1276 | @subsection @code{bzReadGetUnused} | ||
| 1277 | @example | ||
| 1278 | void bzReadGetUnused ( int* bzerror, BZFILE *b, | ||
| 1279 | void** unused, int* nUnused ); | ||
| 1280 | @end example | ||
| 1281 | Returns data which was read from the compressed file but was not needed | ||
| 1282 | to get to the logical end-of-stream. @code{*unused} is set to the address | ||
| 1283 | of the data, and @code{*nUnused} to the number of bytes. @code{*nUnused} will | ||
| 1284 | be set to a value between @code{0} and @code{BZ_MAX_UNUSED} inclusive. | ||
| 1285 | |||
| 1286 | This function may only be called once @code{bzRead} has signalled | ||
| 1287 | @code{BZ_STREAM_END} but before @code{bzReadClose}. | ||
| 1288 | |||
| 1289 | Possible assignments to @code{bzerror}: | ||
| 1290 | @display | ||
| 1291 | @code{BZ_PARAM_ERROR} | ||
| 1292 | if @code{b} is @code{NULL} | ||
| 1293 | or @code{unused} is @code{NULL} or @code{nUnused} is @code{NULL} | ||
| 1294 | @code{BZ_SEQUENCE_ERROR} | ||
| 1295 | if @code{BZ_STREAM_END} has not been signalled | ||
| 1296 | or if @code{b} was opened with @code{bzWriteOpen} | ||
| 1297 | @code{BZ_OK} | ||
| 1298 | otherwise | ||
| 1299 | @end display | ||
| 1300 | |||
| 1301 | Allowable next actions: | ||
| 1302 | @display | ||
| 1303 | @code{bzReadClose} | ||
| 1304 | @end display | ||
| 1305 | |||
| 1306 | |||
| 1307 | @subsection @code{bzReadClose} | ||
| 1308 | @example | ||
| 1309 | void bzReadClose ( int *bzerror, BZFILE *b ); | ||
| 1310 | @end example | ||
| 1311 | Releases all memory pertaining to the compressed file @code{b}. | ||
| 1312 | @code{bzReadClose} does not call @code{fclose} on the underlying file | ||
| 1313 | handle, so you should do that yourself if appropriate. | ||
| 1314 | @code{bzReadClose} should be called to clean up after all error | ||
| 1315 | situations. | ||
| 1316 | |||
| 1317 | Possible assignments to @code{bzerror}: | ||
| 1318 | @display | ||
| 1319 | @code{BZ_SEQUENCE_ERROR} | ||
| 1320 | if @code{b} was opened with @code{bzOpenWrite} | ||
| 1321 | @code{BZ_OK} | ||
| 1322 | otherwise | ||
| 1323 | @end display | ||
| 1324 | |||
| 1325 | Allowable next actions: | ||
| 1326 | @display | ||
| 1327 | none | ||
| 1328 | @end display | ||
| 1329 | |||
| 1330 | |||
| 1331 | |||
| 1332 | @subsection @code{bzWriteOpen} | ||
| 1333 | @example | ||
| 1334 | BZFILE *bzWriteOpen ( int *bzerror, FILE *f, | ||
| 1335 | int blockSize100k, int verbosity, | ||
| 1336 | int workFactor ); | ||
| 1337 | @end example | ||
| 1338 | Prepare to write compressed data to file handle @code{f}. | ||
| 1339 | @code{f} should refer to | ||
| 1340 | a file which has been opened for writing, and for which the error | ||
| 1341 | indicator (@code{ferror(f)})is not set. | ||
| 1342 | |||
| 1343 | For the meaning of parameters @code{blockSize100k}, | ||
| 1344 | @code{verbosity} and @code{workFactor}, see | ||
| 1345 | @* @code{bzCompressInit}. | ||
| 1346 | |||
| 1347 | All required memory is allocated at this stage, so if the call | ||
| 1348 | completes successfully, @code{BZ_MEM_ERROR} cannot be signalled by a | ||
| 1349 | subsequent call to @code{bzWrite}. | ||
| 1350 | |||
| 1351 | Possible assignments to @code{bzerror}: | ||
| 1352 | @display | ||
| 1353 | @code{BZ_PARAM_ERROR} | ||
| 1354 | if @code{f} is @code{NULL} | ||
| 1355 | or @code{blockSize100k < 1} or @code{blockSize100k > 9} | ||
| 1356 | @code{BZ_IO_ERROR} | ||
| 1357 | if @code{ferror(f)} is nonzero | ||
| 1358 | @code{BZ_MEM_ERROR} | ||
| 1359 | if insufficient memory is available | ||
| 1360 | @code{BZ_OK} | ||
| 1361 | otherwise | ||
| 1362 | @end display | ||
| 1363 | |||
| 1364 | Possible return values: | ||
| 1365 | @display | ||
| 1366 | Pointer to an abstract @code{BZFILE} | ||
| 1367 | if @code{bzerror} is @code{BZ_OK} | ||
| 1368 | @code{NULL} | ||
| 1369 | otherwise | ||
| 1370 | @end display | ||
| 1371 | |||
| 1372 | Allowable next actions: | ||
| 1373 | @display | ||
| 1374 | @code{bzWrite} | ||
| 1375 | if @code{bzerror} is @code{BZ_OK} | ||
| 1376 | (you could go directly to @code{bzWriteClose}, but this would be pretty pointless) | ||
| 1377 | @code{bzWriteClose} | ||
| 1378 | otherwise | ||
| 1379 | @end display | ||
| 1380 | |||
| 1381 | |||
| 1382 | |||
| 1383 | @subsection @code{bzWrite} | ||
| 1384 | @example | ||
| 1385 | void bzWrite ( int *bzerror, BZFILE *b, void *buf, int len ); | ||
| 1386 | @end example | ||
| 1387 | Absorbs @code{len} bytes from the buffer @code{buf}, eventually to be | ||
| 1388 | compressed and written to the file. | ||
| 1389 | |||
| 1390 | Possible assignments to @code{bzerror}: | ||
| 1391 | @display | ||
| 1392 | @code{BZ_PARAM_ERROR} | ||
| 1393 | if @code{b} is @code{NULL} or @code{buf} is @code{NULL} or @code{len < 0} | ||
| 1394 | @code{BZ_SEQUENCE_ERROR} | ||
| 1395 | if b was opened with @code{bzReadOpen} | ||
| 1396 | @code{BZ_IO_ERROR} | ||
| 1397 | if there is an error writing the compressed file. | ||
| 1398 | @code{BZ_OK} | ||
| 1399 | otherwise | ||
| 1400 | @end display | ||
| 1401 | |||
| 1402 | |||
| 1403 | |||
| 1404 | |||
| 1405 | @subsection @code{bzWriteClose} | ||
| 1406 | @example | ||
| 1407 | int bzWriteClose ( int *bzerror, BZFILE* f, | ||
| 1408 | int abandon, | ||
| 1409 | unsigned int* nbytes_in, | ||
| 1410 | unsigned int* nbytes_out ); | ||
| 1411 | @end example | ||
| 1412 | |||
| 1413 | Compresses and flushes to the compressed file all data so far supplied | ||
| 1414 | by @code{bzWrite}. The logical end-of-stream markers are also written, so | ||
| 1415 | subsequent calls to @code{bzWrite} are illegal. All memory associated | ||
| 1416 | with the compressed file @code{b} is released. | ||
| 1417 | @code{fflush} is called on the | ||
| 1418 | compressed file, but it is not @code{fclose}'d. | ||
| 1419 | |||
| 1420 | If @code{bzWriteClose} is called to clean up after an error, the only | ||
| 1421 | action is to release the memory. The library records the error codes | ||
| 1422 | issued by previous calls, so this situation will be detected | ||
| 1423 | automatically. There is no attempt to complete the compression | ||
| 1424 | operation, nor to @code{fflush} the compressed file. You can force this | ||
| 1425 | behaviour to happen even in the case of no error, by passing a nonzero | ||
| 1426 | value to @code{abandon}. | ||
| 1427 | |||
| 1428 | If @code{nbytes_in} is non-null, @code{*nbytes_in} will be set to be the | ||
| 1429 | total volume of uncompressed data handled. Similarly, @code{nbytes_out} | ||
| 1430 | will be set to the total volume of compressed data written. | ||
| 1431 | |||
| 1432 | Possible assignments to @code{bzerror}: | ||
| 1433 | @display | ||
| 1434 | @code{BZ_SEQUENCE_ERROR} | ||
| 1435 | if @code{b} was opened with @code{bzReadOpen} | ||
| 1436 | @code{BZ_IO_ERROR} | ||
| 1437 | if there is an error writing the compressed file | ||
| 1438 | @code{BZ_OK} | ||
| 1439 | otherwise | ||
| 1440 | @end display | ||
| 1441 | |||
| 1442 | @subsection Handling embedded compressed data streams | ||
| 1443 | |||
| 1444 | The high-level library facilitates use of | ||
| 1445 | @code{bzip2} data streams which form some part of a surrounding, larger | ||
| 1446 | data stream. | ||
| 1447 | @itemize @bullet | ||
| 1448 | @item For writing, the library takes an open file handle, writes | ||
| 1449 | compressed data to it, @code{fflush}es it but does not @code{fclose} it. | ||
| 1450 | The calling application can write its own data before and after the | ||
| 1451 | compressed data stream, using that same file handle. | ||
| 1452 | @item Reading is more complex, and the facilities are not as general | ||
| 1453 | as they could be since generality is hard to reconcile with efficiency. | ||
| 1454 | @code{bzRead} reads from the compressed file in blocks of size | ||
| 1455 | @code{BZ_MAX_UNUSED} bytes, and in doing so probably will overshoot | ||
| 1456 | the logical end of compressed stream. | ||
| 1457 | To recover this data once decompression has | ||
| 1458 | ended, call @code{bzReadGetUnused} after the last call of @code{bzRead} | ||
| 1459 | (the one returning @code{BZ_STREAM_END}) but before calling | ||
| 1460 | @code{bzReadClose}. | ||
| 1461 | @end itemize | ||
| 1462 | |||
| 1463 | This mechanism makes it easy to decompress multiple @code{bzip2} | ||
| 1464 | streams placed end-to-end. As the end of one stream, when @code{bzRead} | ||
| 1465 | returns @code{BZ_STREAM_END}, call @code{bzReadGetUnused} to collect the | ||
| 1466 | unused data (copy it into your own buffer somewhere). | ||
| 1467 | That data forms the start of the next compressed stream. | ||
| 1468 | To start uncompressing that next stream, call @code{bzReadOpen} again, | ||
| 1469 | feeding in the unused data via the @code{unused}/@code{nUnused} | ||
| 1470 | parameters. | ||
| 1471 | Keep doing this until @code{BZ_STREAM_END} return coincides with the | ||
| 1472 | physical end of file (@code{feof(f)}). In this situation | ||
| 1473 | @code{bzReadGetUnused} | ||
| 1474 | will of course return no data. | ||
| 1475 | |||
| 1476 | This should give some feel for how the high-level interface can be used. | ||
| 1477 | If you require extra flexibility, you'll have to bite the bullet and get | ||
| 1478 | to grips with the low-level interface. | ||
| 1479 | |||
| 1480 | @subsection Standard file-reading/writing code | ||
| 1481 | Here's how you'd write data to a compressed file: | ||
| 1482 | @example @code | ||
| 1483 | FILE* f; | ||
| 1484 | BZFILE* b; | ||
| 1485 | int nBuf; | ||
| 1486 | char buf[ /* whatever size you like */ ]; | ||
| 1487 | int bzerror; | ||
| 1488 | int nWritten; | ||
| 1489 | |||
| 1490 | f = fopen ( "myfile.bz2", "w" ); | ||
| 1491 | if (!f) @{ | ||
| 1492 | /* handle error */ | ||
| 1493 | @} | ||
| 1494 | b = bzWriteOpen ( &bzerror, f, 9 ); | ||
| 1495 | if (bzerror != BZ_OK) @{ | ||
| 1496 | bzWriteClose ( b ); | ||
| 1497 | /* handle error */ | ||
| 1498 | @} | ||
| 1499 | |||
| 1500 | while ( /* condition */ ) @{ | ||
| 1501 | /* get data to write into buf, and set nBuf appropriately */ | ||
| 1502 | nWritten = bzWrite ( &bzerror, b, buf, nBuf ); | ||
| 1503 | if (bzerror == BZ_IO_ERROR) @{ | ||
| 1504 | bzWriteClose ( &bzerror, b ); | ||
| 1505 | /* handle error */ | ||
| 1506 | @} | ||
| 1507 | @} | ||
| 1508 | |||
| 1509 | bzWriteClose ( &bzerror, b ); | ||
| 1510 | if (bzerror == BZ_IO_ERROR) @{ | ||
| 1511 | /* handle error */ | ||
| 1512 | @} | ||
| 1513 | @end example | ||
| 1514 | And to read from a compressed file: | ||
| 1515 | @example | ||
| 1516 | FILE* f; | ||
| 1517 | BZFILE* b; | ||
| 1518 | int nBuf; | ||
| 1519 | char buf[ /* whatever size you like */ ]; | ||
| 1520 | int bzerror; | ||
| 1521 | int nWritten; | ||
| 1522 | |||
| 1523 | f = fopen ( "myfile.bz2", "r" ); | ||
| 1524 | if (!f) @{ | ||
| 1525 | /* handle error */ | ||
| 1526 | @} | ||
| 1527 | b = bzReadOpen ( &bzerror, f, 0, NULL, 0 ); | ||
| 1528 | if (bzerror != BZ_OK) @{ | ||
| 1529 | bzReadClose ( &bzerror, b ); | ||
| 1530 | /* handle error */ | ||
| 1531 | @} | ||
| 1532 | |||
| 1533 | bzerror = BZ_OK; | ||
| 1534 | while (bzerror == BZ_OK && /* arbitrary other conditions */) @{ | ||
| 1535 | nBuf = bzRead ( &bzerror, b, buf, /* size of buf */ ); | ||
| 1536 | if (bzerror == BZ_OK) @{ | ||
| 1537 | /* do something with buf[0 .. nBuf-1] */ | ||
| 1538 | @} | ||
| 1539 | @} | ||
| 1540 | if (bzerror != BZ_STREAM_END) @{ | ||
| 1541 | bzReadClose ( &bzerror, b ); | ||
| 1542 | /* handle error */ | ||
| 1543 | @} else @{ | ||
| 1544 | bzReadClose ( &bzerror ); | ||
| 1545 | @} | ||
| 1546 | @end example | ||
| 1547 | |||
| 1548 | |||
| 1549 | |||
| 1550 | @section Utility functions | ||
| 1551 | @subsection @code{bzBuffToBuffCompress} | ||
| 1552 | @example | ||
| 1553 | int bzBuffToBuffCompress( char* dest, | ||
| 1554 | unsigned int* destLen, | ||
| 1555 | char* source, | ||
| 1556 | unsigned int sourceLen, | ||
| 1557 | int blockSize100k, | ||
| 1558 | int verbosity, | ||
| 1559 | int workFactor ); | ||
| 1560 | @end example | ||
| 1561 | Attempts to compress the data in @code{source[0 .. sourceLen-1]} | ||
| 1562 | into the destination buffer, @code{dest[0 .. *destLen-1]}. | ||
| 1563 | If the destination buffer is big enough, @code{*destLen} is | ||
| 1564 | set to the size of the compressed data, and @code{BZ_OK} is | ||
| 1565 | returned. If the compressed data won't fit, @code{*destLen} | ||
| 1566 | is unchanged, and @code{BZ_OUTBUFF_FULL} is returned. | ||
| 1567 | |||
| 1568 | Compression in this manner is a one-shot event, done with a single call | ||
| 1569 | to this function. The resulting compressed data is a complete | ||
| 1570 | @code{bzip2} format data stream. There is no mechanism for making | ||
| 1571 | additional calls to provide extra input data. If you want that kind of | ||
| 1572 | mechanism, use the low-level interface. | ||
| 1573 | |||
| 1574 | For the meaning of parameters @code{blockSize100k}, @code{verbosity} | ||
| 1575 | and @code{workFactor}, @* see @code{bzCompressInit}. | ||
| 1576 | |||
| 1577 | To guarantee that the compressed data will fit in its buffer, allocate | ||
| 1578 | an output buffer of size 1% larger than the uncompressed data, plus | ||
| 1579 | six hundred extra bytes. | ||
| 1580 | |||
| 1581 | @code{bzBuffToBuffDecompress} will not write data at or | ||
| 1582 | beyond @code{dest[*destLen]}, even in case of buffer overflow. | ||
| 1583 | |||
| 1584 | Possible return values: | ||
| 1585 | @display | ||
| 1586 | @code{BZ_PARAM_ERROR} | ||
| 1587 | if @code{dest} is @code{NULL} or @code{destLen} is @code{NULL} | ||
| 1588 | or @code{blockSize100k < 1} or @code{blockSize100k > 9} | ||
| 1589 | or @code{verbosity < 0} or @code{verbosity > 4} | ||
| 1590 | or @code{workFactor < 0} or @code{workFactor > 250} | ||
| 1591 | @code{BZ_MEM_ERROR} | ||
| 1592 | if insufficient memory is available | ||
| 1593 | @code{BZ_OUTBUFF_FULL} | ||
| 1594 | if the size of the compressed data exceeds @code{*destLen} | ||
| 1595 | @code{BZ_OK} | ||
| 1596 | otherwise | ||
| 1597 | @end display | ||
| 1598 | |||
| 1599 | |||
| 1600 | |||
| 1601 | @subsection @code{bzBuffToBuffDecompress} | ||
| 1602 | @example | ||
| 1603 | int bzBuffToBuffDecompress ( char* dest, | ||
| 1604 | unsigned int* destLen, | ||
| 1605 | char* source, | ||
| 1606 | unsigned int sourceLen, | ||
| 1607 | int small, | ||
| 1608 | int verbosity ); | ||
| 1609 | @end example | ||
| 1610 | Attempts to decompress the data in @code{source[0 .. sourceLen-1]} | ||
| 1611 | into the destination buffer, @code{dest[0 .. *destLen-1]}. | ||
| 1612 | If the destination buffer is big enough, @code{*destLen} is | ||
| 1613 | set to the size of the uncompressed data, and @code{BZ_OK} is | ||
| 1614 | returned. If the compressed data won't fit, @code{*destLen} | ||
| 1615 | is unchanged, and @code{BZ_OUTBUFF_FULL} is returned. | ||
| 1616 | |||
| 1617 | @code{source} is assumed to hold a complete @code{bzip2} format | ||
| 1618 | data stream. @code{bzBuffToBuffDecompress} tries to decompress | ||
| 1619 | the entirety of the stream into the output buffer. | ||
| 1620 | |||
| 1621 | For the meaning of parameters @code{small} and @code{verbosity}, | ||
| 1622 | see @code{bzDecompressInit}. | ||
| 1623 | |||
| 1624 | Because the compression ratio of the compressed data cannot be known in | ||
| 1625 | advance, there is no easy way to guarantee that the output buffer will | ||
| 1626 | be big enough. You may of course make arrangements in your code to | ||
| 1627 | record the size of the uncompressed data, but such a mechanism is beyond | ||
| 1628 | the scope of this library. | ||
| 1629 | |||
| 1630 | @code{bzBuffToBuffDecompress} will not write data at or | ||
| 1631 | beyond @code{dest[*destLen]}, even in case of buffer overflow. | ||
| 1632 | |||
| 1633 | Possible return values: | ||
| 1634 | @display | ||
| 1635 | @code{BZ_PARAM_ERROR} | ||
| 1636 | if @code{dest} is @code{NULL} or @code{destLen} is @code{NULL} | ||
| 1637 | or @code{small != 0 && small != 1} | ||
| 1638 | or @code{verbosity < 0} or @code{verbosity > 4} | ||
| 1639 | @code{BZ_MEM_ERROR} | ||
| 1640 | if insufficient memory is available | ||
| 1641 | @code{BZ_OUTBUFF_FULL} | ||
| 1642 | if the size of the compressed data exceeds @code{*destLen} | ||
| 1643 | @code{BZ_DATA_ERROR} | ||
| 1644 | if a data integrity error was detected in the compressed data | ||
| 1645 | @code{BZ_DATA_ERROR_MAGIC} | ||
| 1646 | if the compressed data doesn't begin with the right magic bytes | ||
| 1647 | @code{BZ_UNEXPECTED_EOF} | ||
| 1648 | if the compressed data ends unexpectedly | ||
| 1649 | @code{BZ_OK} | ||
| 1650 | otherwise | ||
| 1651 | @end display | ||
| 1652 | |||
| 1653 | |||
| 1654 | |||
| 1655 | @section Using the library in a @code{stdio}-free environment | ||
| 1656 | |||
| 1657 | @subsection Getting rid of @code{stdio} | ||
| 1658 | |||
| 1659 | In a deeply embedded application, you might want to use just | ||
| 1660 | the memory-to-memory functions. You can do this conveniently | ||
| 1661 | by compiling the library with preprocessor symbol @code{BZ_NO_STDIO} | ||
| 1662 | defined. Doing this gives you a library containing only the following | ||
| 1663 | eight functions: | ||
| 1664 | |||
| 1665 | @code{bzCompressInit}, @code{bzCompress}, @code{bzCompressEnd} @* | ||
| 1666 | @code{bzDecompressInit}, @code{bzDecompress}, @code{bzDecompressEnd} @* | ||
| 1667 | @code{bzBuffToBuffCompress}, @code{bzBuffToBuffDecompress} | ||
| 1668 | |||
| 1669 | When compiled like this, all functions will ignore @code{verbosity} | ||
| 1670 | settings. | ||
| 1671 | |||
| 1672 | @subsection Critical error handling | ||
| 1673 | @code{libbzip2} contains a number of internal assertion checks which | ||
| 1674 | should, needless to say, never be activated. Nevertheless, if an | ||
| 1675 | assertion should fail, behaviour depends on whether or not the library | ||
| 1676 | was compiled with @code{BZ_NO_STDIO} set. | ||
| 1677 | |||
| 1678 | For a normal compile, an assertion failure yields the message | ||
| 1679 | @example | ||
| 1680 | bzip2/libbzip2, v0.9.0: internal error number N. | ||
| 1681 | This is a bug in bzip2/libbzip2, v0.9.0. Please report | ||
| 1682 | it to me at: jseward@@acm.org. If this happened when | ||
| 1683 | you were using some program which uses libbzip2 as a | ||
| 1684 | component, you should also report this bug to the author(s) | ||
| 1685 | of that program. Please make an effort to report this bug; | ||
| 1686 | timely and accurate bug reports eventually lead to higher | ||
| 1687 | quality software. Thx. Julian Seward, 27 June 1998. | ||
| 1688 | @end example | ||
| 1689 | where @code{N} is some error code number. @code{exit(3)} | ||
| 1690 | is then called. | ||
| 1691 | |||
| 1692 | For a @code{stdio}-free library, assertion failures result | ||
| 1693 | in a call to a function declared as: | ||
| 1694 | @example | ||
| 1695 | extern void bz_internal_error ( int errcode ); | ||
| 1696 | @end example | ||
| 1697 | The relevant code is passed as a parameter. You should supply | ||
| 1698 | such a function. | ||
| 1699 | |||
| 1700 | In either case, once an assertion failure has occurred, any | ||
| 1701 | @code{bz_stream} records involved can be regarded as invalid. | ||
| 1702 | You should not attempt to resume normal operation with them. | ||
| 1703 | |||
| 1704 | You may, of course, change critical error handling to suit | ||
| 1705 | your needs. As I said above, critical errors indicate bugs | ||
| 1706 | in the library and should not occur. All "normal" error | ||
| 1707 | situations are indicated via error return codes from functions, | ||
| 1708 | and can be recovered from. | ||
| 1709 | |||
| 1710 | |||
| 1711 | @section Making a Windows DLL | ||
| 1712 | Everything related to Windows has been contributed by Yoshioka Tsuneo | ||
| 1713 | @* (@code{QWF00133@@niftyserve.or.jp} / | ||
| 1714 | @code{tsuneo-y@@is.aist-nara.ac.jp}), so you should send your queries to | ||
| 1715 | him (but perhaps Cc: me, @code{jseward@@acm.org}). | ||
| 1716 | |||
| 1717 | My vague understanding of what to do is: using Visual C++ 5.0, | ||
| 1718 | open the project file @code{libbz2.dsp}, and build. That's all. | ||
| 1719 | |||
| 1720 | If you can't | ||
| 1721 | open the project file for some reason, make a new one, naming these files: | ||
| 1722 | @code{blocksort.c}, @code{bzlib.c}, @code{compress.c}, | ||
| 1723 | @code{crctable.c}, @code{decompress.c}, @code{huffman.c}, @* | ||
| 1724 | @code{randtable.c} and @code{libbz2.def}. You might also need | ||
| 1725 | to name the header files @code{bzlib.h} and @code{bzlib_private.h}. | ||
| 1726 | |||
| 1727 | If you don't use VC++, you may need to define the proprocessor symbol | ||
| 1728 | @code{_WIN32}. | ||
| 1729 | |||
| 1730 | Finally, @code{dlltest.c} is a sample program using the DLL. It has a | ||
| 1731 | project file, @code{dlltest.dsp}. | ||
| 1732 | |||
| 1733 | I haven't tried any of this stuff myself, but it all looks plausible. | ||
| 1734 | |||
| 1735 | |||
| 1736 | |||
| 1737 | @chapter Miscellanea | ||
| 1738 | |||
| 1739 | These are just some random thoughts of mine. Your mileage may | ||
| 1740 | vary. | ||
| 1741 | |||
| 1742 | @section Limitations of the compressed file format | ||
| 1743 | @code{bzip2-0.9.0} uses exactly the same file format as the previous | ||
| 1744 | version, @code{bzip2-0.1}. This decision was made in the interests of | ||
| 1745 | stability. Creating yet another incompatible compressed file format | ||
| 1746 | would create further confusion and disruption for users. | ||
| 1747 | |||
| 1748 | Nevertheless, this is not a painless decision. Development | ||
| 1749 | work since the release of @code{bzip2-0.1} in August 1997 | ||
| 1750 | has shown complexities in the file format which slow down | ||
| 1751 | decompression and, in retrospect, are unnecessary. These are: | ||
| 1752 | @itemize @bullet | ||
| 1753 | @item The run-length encoder, which is the first of the | ||
| 1754 | compression transformations, is entirely irrelevant. | ||
| 1755 | The original purpose was to protect the sorting algorithm | ||
| 1756 | from the very worst case input: a string of repeated | ||
| 1757 | symbols. But algorithm steps Q6a and Q6b in the original | ||
| 1758 | Burrows-Wheeler technical report (SRC-124) show how | ||
| 1759 | repeats can be handled without difficulty in block | ||
| 1760 | sorting. | ||
| 1761 | @item The randomisation mechanism doesn't really need to be | ||
| 1762 | there. Udi Manber and Gene Myers published a suffix | ||
| 1763 | array construction algorithm a few years back, which | ||
| 1764 | can be employed to sort any block, no matter how | ||
| 1765 | repetitive, in O(N log N) time. Subsequent work by | ||
| 1766 | Kunihiko Sadakane has produced a derivative O(N (log N)^2) | ||
| 1767 | algorithm which usually outperforms the Manber-Myers | ||
| 1768 | algorithm. | ||
| 1769 | |||
| 1770 | I could have changed to Sadakane's algorithm, but I find | ||
| 1771 | it to be slower than @code{bzip2}'s existing algorithm for | ||
| 1772 | most inputs, and the randomisation mechanism protects | ||
| 1773 | adequately against bad cases. I didn't think it was | ||
| 1774 | a good tradeoff to make. Partly this is due to the fact | ||
| 1775 | that I was not flooded with email complaints about | ||
| 1776 | @code{bzip2-0.1}'s performance on repetitive data, so | ||
| 1777 | perhaps it isn't a problem for real inputs. | ||
| 1778 | |||
| 1779 | Probably the best long-term solution | ||
| 1780 | is to use the existing sorting | ||
| 1781 | algorithm initially, and fall back to a O(N (log N)^2) | ||
| 1782 | algorithm if the standard algorithm gets into difficulties. | ||
| 1783 | This can be done without much difficulty; I made | ||
| 1784 | a prototype implementation of it some months now. | ||
| 1785 | @item The compressed file format was never designed to be | ||
| 1786 | handled by a library, and I have had to jump though | ||
| 1787 | some hoops to produce an efficient implementation of | ||
| 1788 | decompression. It's a bit hairy. Try passing | ||
| 1789 | @code{decompress.c} through the C preprocessor | ||
| 1790 | and you'll see what I mean. Much of this complexity | ||
| 1791 | could have been avoided if the compressed size of | ||
| 1792 | each block of data was recorded in the data stream. | ||
| 1793 | @item An Adler-32 checksum, rather than a CRC32 checksum, | ||
| 1794 | would be faster to compute. | ||
| 1795 | @end itemize | ||
| 1796 | It would be fair to say that the @code{bzip2} format was frozen | ||
| 1797 | before I properly and fully understood the performance | ||
| 1798 | consequences of doing so. | ||
| 1799 | |||
| 1800 | Improvements which I have been able to incorporate into | ||
| 1801 | 0.9.0, despite using the same file format, are: | ||
| 1802 | @itemize @bullet | ||
| 1803 | @item Single array implementation of the inverse BWT. This | ||
| 1804 | significantly speeds up decompression, presumably | ||
| 1805 | because it reduces the number of cache misses. | ||
| 1806 | @item Faster inverse MTF transform for large MTF values. The | ||
| 1807 | new implementation is based on the notion of sliding blocks | ||
| 1808 | of values. | ||
| 1809 | @item @code{bzip2-0.9.0} now reads and writes files with @code{fread} | ||
| 1810 | and @code{fwrite}; version 0.1 used @code{putc} and @code{getc}. | ||
| 1811 | Duh! I'm embarrassed at my own moronicness (moronicity?) on this | ||
| 1812 | one. | ||
| 1813 | |||
| 1814 | @end itemize | ||
| 1815 | Further ahead, it would be nice | ||
| 1816 | to be able to do random access into files. This will | ||
| 1817 | require some careful design of compressed file formats. | ||
| 1818 | |||
| 1819 | |||
| 1820 | |||
| 1821 | @section Portability issues | ||
| 1822 | After some consideration, I have decided not to use | ||
| 1823 | GNU @code{autoconf} to configure 0.9.0. | ||
| 1824 | |||
| 1825 | @code{autoconf}, admirable and wonderful though it is, | ||
| 1826 | mainly assists with portability problems between Unix-like | ||
| 1827 | platforms. But @code{bzip2} doesn't have much in the way | ||
| 1828 | of portability problems on Unix; most of the difficulties appear | ||
| 1829 | when porting to the Mac, or to Microsoft's operating systems. | ||
| 1830 | @code{autoconf} doesn't help in those cases, and brings in a | ||
| 1831 | whole load of new complexity. | ||
| 1832 | |||
| 1833 | Most people should be able to compile the library and program | ||
| 1834 | under Unix straight out-of-the-box, so to speak, especially | ||
| 1835 | if you have a version of GNU C available. | ||
| 1836 | |||
| 1837 | There are a couple of @code{__inline__} directives in the code. GNU C | ||
| 1838 | (@code{gcc}) should be able to handle them. If your compiler doesn't | ||
| 1839 | like them, just @code{#define} @code{__inline__} to be null. One | ||
| 1840 | easy way to do this is to compile with the flag @code{-D__inline__=}, | ||
| 1841 | which should be understood by most Unix compilers. | ||
| 1842 | |||
| 1843 | If you still have difficulties, try compiling with the macro | ||
| 1844 | @code{BZ_STRICT_ANSI} defined. This should enable you to build the | ||
| 1845 | library in a strictly ANSI compliant environment. Building the program | ||
| 1846 | itself like this is dangerous and not supported, since you remove | ||
| 1847 | @code{bzip2}'s checks against compressing directories, symbolic links, | ||
| 1848 | devices, and other not-really-a-file entities. This could cause | ||
| 1849 | filesystem corruption! | ||
| 1850 | |||
| 1851 | One other thing: if you create a @code{bzip2} binary for public | ||
| 1852 | distribution, please try and link it statically (@code{gcc -s}). This | ||
| 1853 | avoids all sorts of library-version issues that others may encounter | ||
| 1854 | later on. | ||
| 1855 | |||
| 1856 | |||
| 1857 | @section Reporting bugs | ||
| 1858 | I tried pretty hard to make sure @code{bzip2} is | ||
| 1859 | bug free, both by design and by testing. Hopefully | ||
| 1860 | you'll never need to read this section for real. | ||
| 1861 | |||
| 1862 | Nevertheless, if @code{bzip2} dies with a segmentation | ||
| 1863 | fault, a bus error or an internal assertion failure, it | ||
| 1864 | will ask you to email me a bug report. Experience with | ||
| 1865 | version 0.1 shows that almost all these problems can | ||
| 1866 | be traced to either compiler bugs or hardware problems. | ||
| 1867 | @itemize @bullet | ||
| 1868 | @item | ||
| 1869 | Recompile the program with no optimisation, and see if it | ||
| 1870 | works. And/or try a different compiler. | ||
| 1871 | I heard all sorts of stories about various flavours | ||
| 1872 | of GNU C (and other compilers) generating bad code for | ||
| 1873 | @code{bzip2}, and I've run across two such examples myself. | ||
| 1874 | |||
| 1875 | 2.7.X versions of GNU C are known to generate bad code from | ||
| 1876 | time to time, at high optimisation levels. | ||
| 1877 | If you get problems, try using the flags | ||
| 1878 | @code{-O2} @code{-fomit-frame-pointer} @code{-fno-strength-reduce}. | ||
| 1879 | You should specifically @emph{not} use @code{-funroll-loops}. | ||
| 1880 | |||
| 1881 | You may notice that the Makefile runs four tests as part of | ||
| 1882 | the build process. If the program passes all of these, it's | ||
| 1883 | a pretty good (but not 100%) indication that the compiler has | ||
| 1884 | done its job correctly. | ||
| 1885 | @item | ||
| 1886 | If @code{bzip2} crashes randomly, and the crashes are not | ||
| 1887 | repeatable, you may have a flaky memory subsystem. @code{bzip2} | ||
| 1888 | really hammers your memory hierarchy, and if it's a bit marginal, | ||
| 1889 | you may get these problems. Ditto if your disk or I/O subsystem | ||
| 1890 | is slowly failing. Yup, this really does happen. | ||
| 1891 | |||
| 1892 | Try using a different machine of the same type, and see if | ||
| 1893 | you can repeat the problem. | ||
| 1894 | @item This isn't really a bug, but ... If @code{bzip2} tells | ||
| 1895 | you your file is corrupted on decompression, and you | ||
| 1896 | obtained the file via FTP, there is a possibility that you | ||
| 1897 | forgot to tell FTP to do a binary mode transfer. That absolutely | ||
| 1898 | will cause the file to be non-decompressible. You'll have to transfer | ||
| 1899 | it again. | ||
| 1900 | @end itemize | ||
| 1901 | |||
| 1902 | If you've incorporated @code{libbzip2} into your own program | ||
| 1903 | and are getting problems, please, please, please, check that the | ||
| 1904 | parameters you are passing in calls to the library, are | ||
| 1905 | correct, and in accordance with what the documentation says | ||
| 1906 | is allowable. I have tried to make the library robust against | ||
| 1907 | such problems, but I'm sure I haven't succeeded. | ||
| 1908 | |||
| 1909 | Finally, if the above comments don't help, you'll have to send | ||
| 1910 | me a bug report. Now, it's just amazing how many people will | ||
| 1911 | send me a bug report saying something like | ||
| 1912 | @display | ||
| 1913 | bzip2 crashed with segmentation fault on my machine | ||
| 1914 | @end display | ||
| 1915 | and absolutely nothing else. Needless to say, a such a report | ||
| 1916 | is @emph{totally, utterly, completely and comprehensively 100% useless; | ||
| 1917 | a waste of your time, my time, and net bandwidth}. | ||
| 1918 | With no details at all, there's no way I can possibly begin | ||
| 1919 | to figure out what the problem is. | ||
| 1920 | |||
| 1921 | The rules of the game are: facts, facts, facts. Don't omit | ||
| 1922 | them because "oh, they won't be relevant". At the bare | ||
| 1923 | minimum: | ||
| 1924 | @display | ||
| 1925 | Machine type. Operating system version. | ||
| 1926 | Exact version of @code{bzip2} (do @code{bzip2 -V}). | ||
| 1927 | Exact version of the compiler used. | ||
| 1928 | Flags passed to the compiler. | ||
| 1929 | @end display | ||
| 1930 | However, the most important single thing that will help me is | ||
| 1931 | the file that you were trying to compress or decompress at the | ||
| 1932 | time the problem happened. Without that, my ability to do anything | ||
| 1933 | more than speculate about the cause, is limited. | ||
| 1934 | |||
| 1935 | Please remember that I connect to the Internet with a modem, so | ||
| 1936 | you should contact me before mailing me huge files. | ||
| 1937 | |||
| 1938 | |||
| 1939 | @section Did you get the right package? | ||
| 1940 | |||
| 1941 | @code{bzip2} is a resource hog. It soaks up large amounts of CPU cycles | ||
| 1942 | and memory. Also, it gives very large latencies. In the worst case, you | ||
| 1943 | can feed many megabytes of uncompressed data into the library before | ||
| 1944 | getting any compressed output, so this probably rules out applications | ||
| 1945 | requiring interactive behaviour. | ||
| 1946 | |||
| 1947 | These aren't faults of my implementation, I hope, but more | ||
| 1948 | an intrinsic property of the Burrows-Wheeler transform (unfortunately). | ||
| 1949 | Maybe this isn't what you want. | ||
| 1950 | |||
| 1951 | If you want a compressor and/or library which is faster, uses less | ||
| 1952 | memory but gets pretty good compression, and has minimal latency, | ||
| 1953 | consider Jean-loup | ||
| 1954 | Gailly's and Mark Adler's work, @code{zlib-1.1.2} and | ||
| 1955 | @code{gzip-1.2.4}. Look for them at | ||
| 1956 | @code{http://www.cdrom.com/pub/infozip/zlib} and | ||
| 1957 | @code{http://www.gzip.org} respectively. | ||
| 1958 | |||
| 1959 | For something faster and lighter still, you might try Markus F X J | ||
| 1960 | Oberhumer's @code{LZO} real-time compression/decompression library, at | ||
| 1961 | @* @code{http://wildsau.idv.uni-linz.ac.at/mfx/lzo.html}. | ||
| 1962 | |||
| 1963 | If you want to use the @code{bzip2} algorithms to compress small blocks | ||
| 1964 | of data, 64k bytes or smaller, for example on an on-the-fly disk | ||
| 1965 | compressor, you'd be well advised not to use this library. Instead, | ||
| 1966 | I've made a special library tuned for that kind of use. It's part of | ||
| 1967 | @code{e2compr-0.40}, an on-the-fly disk compressor for the Linux | ||
| 1968 | @code{ext2} filesystem. Look at | ||
| 1969 | @code{http://www.netspace.net.au/~reiter/e2compr}. | ||
| 1970 | |||
| 1971 | |||
| 1972 | |||
| 1973 | @section Testing | ||
| 1974 | |||
| 1975 | A record of the tests I've done. | ||
| 1976 | |||
| 1977 | First, some data sets: | ||
| 1978 | @itemize @bullet | ||
| 1979 | @item B: a directory containing a 6001 files, one for every length in the | ||
| 1980 | range 0 to 6000 bytes. The files contain random lowercase | ||
| 1981 | letters. 18.7 megabytes. | ||
| 1982 | @item H: my home directory tree. Documents, source code, mail files, | ||
| 1983 | compressed data. H contains B, and also a directory of | ||
| 1984 | files designed as boundary cases for the sorting; mostly very | ||
| 1985 | repetitive, nasty files. 445 megabytes. | ||
| 1986 | @item A: directory tree holding various applications built from source: | ||
| 1987 | @code{egcs-1.0.2}, @code{gcc-2.8.1}, KDE Beta 4, GTK, Octave, etc. | ||
| 1988 | 827 megabytes. | ||
| 1989 | @item P: directory tree holding large amounts of source code (@code{.tar} | ||
| 1990 | files) of the entire GNU distribution, plus a couple of | ||
| 1991 | Linux distributions. 2400 megabytes. | ||
| 1992 | @end itemize | ||
| 1993 | The tests conducted are as follows. Each test means compressing | ||
| 1994 | (a copy of) each file in the data set, decompressing it and | ||
| 1995 | comparing it against the original. | ||
| 1996 | |||
| 1997 | First, a bunch of tests with block sizes, internal buffer | ||
| 1998 | sizes and randomisation lengths set very small, | ||
| 1999 | to detect any problems with the | ||
| 2000 | blocking, buffering and randomisation mechanisms. | ||
| 2001 | This required modifying the source code so as to try to | ||
| 2002 | break it. | ||
| 2003 | @enumerate | ||
| 2004 | @item Data set H, with | ||
| 2005 | buffer size of 1 byte, and block size of 23 bytes. | ||
| 2006 | @item Data set B, buffer sizes 1 byte, block size 1 byte. | ||
| 2007 | @item As (2) but small-mode decompression (first 1700 files). | ||
| 2008 | @item As (2) with block size 2 bytes. | ||
| 2009 | @item As (2) with block size 3 bytes. | ||
| 2010 | @item As (2) with block size 4 bytes. | ||
| 2011 | @item As (2) with block size 5 bytes. | ||
| 2012 | @item As (2) with block size 6 bytes and small-mode decompression. | ||
| 2013 | @item H with normal buffer sizes (5000 bytes), normal block | ||
| 2014 | size (up to 900000 bytes), but with randomisation | ||
| 2015 | mechanism running intensely (randomising approximately every | ||
| 2016 | third byte). | ||
| 2017 | @item As (9) with small-mode decompression. | ||
| 2018 | @end enumerate | ||
| 2019 | Then some tests with unmodified source code. | ||
| 2020 | @enumerate | ||
| 2021 | @item H, all settings normal. | ||
| 2022 | @item As (1), with small-mode decompress. | ||
| 2023 | @item H, compress with flag @code{-1}. | ||
| 2024 | @item H, compress with flag @code{-s}, decompress with flag @code{-s}. | ||
| 2025 | @item Forwards compatibility: H, @code{bzip2-0.1pl2} compressing, | ||
| 2026 | @code{bzip2-0.9.0} decompressing, all settings normal. | ||
| 2027 | @item Backwards compatibility: H, @code{bzip2-0.9.0} compressing, | ||
| 2028 | @code{bzip2-0.1pl2} decompressing, all settings normal. | ||
| 2029 | @item Bigger tests: A, all settings normal. | ||
| 2030 | @item P, all settings normal. | ||
| 2031 | @item Misc test: about 100 megabytes of @code{.tar} files with | ||
| 2032 | @code{bzip2} compiled with Purify. | ||
| 2033 | @item Misc tests to make sure it builds and runs ok on non-Linux/x86 | ||
| 2034 | platforms. | ||
| 2035 | @end enumerate | ||
| 2036 | These tests were conducted on a 205 MHz Cyrix 6x86MX machine, running | ||
| 2037 | Linux 2.0.32. They represent nearly a week of continuous computation. | ||
| 2038 | All tests completed successfully. | ||
| 2039 | |||
| 2040 | |||
| 2041 | @section Further reading | ||
| 2042 | @code{bzip2} is not research work, in the sense that it doesn't present | ||
| 2043 | any new ideas. Rather, it's an engineering exercise based on existing | ||
| 2044 | ideas. | ||
| 2045 | |||
| 2046 | Four documents describe essentially all the ideas behind @code{bzip2}: | ||
| 2047 | @example | ||
| 2048 | Michael Burrows and D. J. Wheeler: | ||
| 2049 | "A block-sorting lossless data compression algorithm" | ||
| 2050 | 10th May 1994. | ||
| 2051 | Digital SRC Research Report 124. | ||
| 2052 | ftp://ftp.digital.com/pub/DEC/SRC/research-reports/SRC-124.ps.gz | ||
| 2053 | If you have trouble finding it, try searching at the | ||
| 2054 | New Zealand Digital Library, http://www.nzdl.org. | ||
| 2055 | |||
| 2056 | Daniel S. Hirschberg and Debra A. LeLewer | ||
| 2057 | "Efficient Decoding of Prefix Codes" | ||
| 2058 | Communications of the ACM, April 1990, Vol 33, Number 4. | ||
| 2059 | You might be able to get an electronic copy of this | ||
| 2060 | from the ACM Digital Library. | ||
| 2061 | |||
| 2062 | David J. Wheeler | ||
| 2063 | Program bred3.c and accompanying document bred3.ps. | ||
| 2064 | This contains the idea behind the multi-table Huffman | ||
| 2065 | coding scheme. | ||
| 2066 | ftp://ftp.cl.cam.ac.uk/pub/user/djw3/ | ||
| 2067 | |||
| 2068 | Jon L. Bentley and Robert Sedgewick | ||
| 2069 | "Fast Algorithms for Sorting and Searching Strings" | ||
| 2070 | Available from Sedgewick's web page, | ||
| 2071 | www.cs.princeton.edu/~rs | ||
| 2072 | @end example | ||
| 2073 | The following paper gives valuable additional insights into the | ||
| 2074 | algorithm, but is not immediately the basis of any code | ||
| 2075 | used in bzip2. | ||
| 2076 | @example | ||
| 2077 | Peter Fenwick: | ||
| 2078 | Block Sorting Text Compression | ||
| 2079 | Proceedings of the 19th Australasian Computer Science Conference, | ||
| 2080 | Melbourne, Australia. Jan 31 - Feb 2, 1996. | ||
| 2081 | ftp://ftp.cs.auckland.ac.nz/pub/peter-f/ACSC96paper.ps | ||
| 2082 | @end example | ||
| 2083 | Kunihiko Sadakane's sorting algorithm, mentioned above, | ||
| 2084 | is available from: | ||
| 2085 | @example | ||
| 2086 | http://naomi.is.s.u-tokyo.ac.jp/~sada/papers/Sada98b.ps.gz | ||
| 2087 | @end example | ||
| 2088 | The Manber-Myers suffix array construction | ||
| 2089 | algorithm is described in a paper | ||
| 2090 | available from: | ||
| 2091 | @example | ||
| 2092 | http://www.cs.arizona.edu/people/gene/PAPERS/suffix.ps | ||
| 2093 | @end example | ||
| 2094 | |||
| 2095 | |||
| 2096 | |||
| 2097 | @contents | ||
| 2098 | |||
| 2099 | @bye | ||
| 2100 | |||
diff --git a/randtable.c b/randtable.c new file mode 100644 index 0000000..27b34af --- /dev/null +++ b/randtable.c | |||
| @@ -0,0 +1,124 @@ | |||
| 1 | |||
| 2 | /*-------------------------------------------------------------*/ | ||
| 3 | /*--- Table for randomising repetitive blocks ---*/ | ||
| 4 | /*--- randtable.c ---*/ | ||
| 5 | /*-------------------------------------------------------------*/ | ||
| 6 | |||
| 7 | /*-- | ||
| 8 | This file is a part of bzip2 and/or libbzip2, a program and | ||
| 9 | library for lossless, block-sorting data compression. | ||
| 10 | |||
| 11 | Copyright (C) 1996-1998 Julian R Seward. All rights reserved. | ||
| 12 | |||
| 13 | Redistribution and use in source and binary forms, with or without | ||
| 14 | modification, are permitted provided that the following conditions | ||
| 15 | are met: | ||
| 16 | |||
| 17 | 1. Redistributions of source code must retain the above copyright | ||
| 18 | notice, this list of conditions and the following disclaimer. | ||
| 19 | |||
| 20 | 2. The origin of this software must not be misrepresented; you must | ||
| 21 | not claim that you wrote the original software. If you use this | ||
| 22 | software in a product, an acknowledgment in the product | ||
| 23 | documentation would be appreciated but is not required. | ||
| 24 | |||
| 25 | 3. Altered source versions must be plainly marked as such, and must | ||
| 26 | not be misrepresented as being the original software. | ||
| 27 | |||
| 28 | 4. The name of the author may not be used to endorse or promote | ||
| 29 | products derived from this software without specific prior written | ||
| 30 | permission. | ||
| 31 | |||
| 32 | THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS | ||
| 33 | OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED | ||
| 34 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE | ||
| 35 | ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY | ||
| 36 | DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL | ||
| 37 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE | ||
| 38 | GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS | ||
| 39 | INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, | ||
| 40 | WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING | ||
| 41 | NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS | ||
| 42 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | ||
| 43 | |||
| 44 | Julian Seward, Guildford, Surrey, UK. | ||
| 45 | jseward@acm.org | ||
| 46 | bzip2/libbzip2 version 0.9.0c of 18 October 1998 | ||
| 47 | |||
| 48 | This program is based on (at least) the work of: | ||
| 49 | Mike Burrows | ||
| 50 | David Wheeler | ||
| 51 | Peter Fenwick | ||
| 52 | Alistair Moffat | ||
| 53 | Radford Neal | ||
| 54 | Ian H. Witten | ||
| 55 | Robert Sedgewick | ||
| 56 | Jon L. Bentley | ||
| 57 | |||
| 58 | For more information on these sources, see the manual. | ||
| 59 | --*/ | ||
| 60 | |||
| 61 | |||
| 62 | #include "bzlib_private.h" | ||
| 63 | |||
| 64 | |||
| 65 | /*---------------------------------------------*/ | ||
| 66 | Int32 rNums[512] = { | ||
| 67 | 619, 720, 127, 481, 931, 816, 813, 233, 566, 247, | ||
| 68 | 985, 724, 205, 454, 863, 491, 741, 242, 949, 214, | ||
| 69 | 733, 859, 335, 708, 621, 574, 73, 654, 730, 472, | ||
| 70 | 419, 436, 278, 496, 867, 210, 399, 680, 480, 51, | ||
| 71 | 878, 465, 811, 169, 869, 675, 611, 697, 867, 561, | ||
| 72 | 862, 687, 507, 283, 482, 129, 807, 591, 733, 623, | ||
| 73 | 150, 238, 59, 379, 684, 877, 625, 169, 643, 105, | ||
| 74 | 170, 607, 520, 932, 727, 476, 693, 425, 174, 647, | ||
| 75 | 73, 122, 335, 530, 442, 853, 695, 249, 445, 515, | ||
| 76 | 909, 545, 703, 919, 874, 474, 882, 500, 594, 612, | ||
| 77 | 641, 801, 220, 162, 819, 984, 589, 513, 495, 799, | ||
| 78 | 161, 604, 958, 533, 221, 400, 386, 867, 600, 782, | ||
| 79 | 382, 596, 414, 171, 516, 375, 682, 485, 911, 276, | ||
| 80 | 98, 553, 163, 354, 666, 933, 424, 341, 533, 870, | ||
| 81 | 227, 730, 475, 186, 263, 647, 537, 686, 600, 224, | ||
| 82 | 469, 68, 770, 919, 190, 373, 294, 822, 808, 206, | ||
| 83 | 184, 943, 795, 384, 383, 461, 404, 758, 839, 887, | ||
| 84 | 715, 67, 618, 276, 204, 918, 873, 777, 604, 560, | ||
| 85 | 951, 160, 578, 722, 79, 804, 96, 409, 713, 940, | ||
| 86 | 652, 934, 970, 447, 318, 353, 859, 672, 112, 785, | ||
| 87 | 645, 863, 803, 350, 139, 93, 354, 99, 820, 908, | ||
| 88 | 609, 772, 154, 274, 580, 184, 79, 626, 630, 742, | ||
| 89 | 653, 282, 762, 623, 680, 81, 927, 626, 789, 125, | ||
| 90 | 411, 521, 938, 300, 821, 78, 343, 175, 128, 250, | ||
| 91 | 170, 774, 972, 275, 999, 639, 495, 78, 352, 126, | ||
| 92 | 857, 956, 358, 619, 580, 124, 737, 594, 701, 612, | ||
| 93 | 669, 112, 134, 694, 363, 992, 809, 743, 168, 974, | ||
| 94 | 944, 375, 748, 52, 600, 747, 642, 182, 862, 81, | ||
| 95 | 344, 805, 988, 739, 511, 655, 814, 334, 249, 515, | ||
| 96 | 897, 955, 664, 981, 649, 113, 974, 459, 893, 228, | ||
| 97 | 433, 837, 553, 268, 926, 240, 102, 654, 459, 51, | ||
| 98 | 686, 754, 806, 760, 493, 403, 415, 394, 687, 700, | ||
| 99 | 946, 670, 656, 610, 738, 392, 760, 799, 887, 653, | ||
| 100 | 978, 321, 576, 617, 626, 502, 894, 679, 243, 440, | ||
| 101 | 680, 879, 194, 572, 640, 724, 926, 56, 204, 700, | ||
| 102 | 707, 151, 457, 449, 797, 195, 791, 558, 945, 679, | ||
| 103 | 297, 59, 87, 824, 713, 663, 412, 693, 342, 606, | ||
| 104 | 134, 108, 571, 364, 631, 212, 174, 643, 304, 329, | ||
| 105 | 343, 97, 430, 751, 497, 314, 983, 374, 822, 928, | ||
| 106 | 140, 206, 73, 263, 980, 736, 876, 478, 430, 305, | ||
| 107 | 170, 514, 364, 692, 829, 82, 855, 953, 676, 246, | ||
| 108 | 369, 970, 294, 750, 807, 827, 150, 790, 288, 923, | ||
| 109 | 804, 378, 215, 828, 592, 281, 565, 555, 710, 82, | ||
| 110 | 896, 831, 547, 261, 524, 462, 293, 465, 502, 56, | ||
| 111 | 661, 821, 976, 991, 658, 869, 905, 758, 745, 193, | ||
| 112 | 768, 550, 608, 933, 378, 286, 215, 979, 792, 961, | ||
| 113 | 61, 688, 793, 644, 986, 403, 106, 366, 905, 644, | ||
| 114 | 372, 567, 466, 434, 645, 210, 389, 550, 919, 135, | ||
| 115 | 780, 773, 635, 389, 707, 100, 626, 958, 165, 504, | ||
| 116 | 920, 176, 193, 713, 857, 265, 203, 50, 668, 108, | ||
| 117 | 645, 990, 626, 197, 510, 357, 358, 850, 858, 364, | ||
| 118 | 936, 638 | ||
| 119 | }; | ||
| 120 | |||
| 121 | |||
| 122 | /*-------------------------------------------------------------*/ | ||
| 123 | /*--- end randtable.c ---*/ | ||
| 124 | /*-------------------------------------------------------------*/ | ||
diff --git a/test.bat b/test.bat deleted file mode 100644 index 30b747d..0000000 --- a/test.bat +++ /dev/null | |||
| @@ -1,9 +0,0 @@ | |||
| 1 | @rem | ||
| 2 | @rem MSDOS test driver for bzip2 | ||
| 3 | @rem | ||
| 4 | type words1 | ||
| 5 | .\bzip2 -1 < sample1.ref > sample1.rbz | ||
| 6 | .\bzip2 -2 < sample2.ref > sample2.rbz | ||
| 7 | .\bzip2 -dvv < sample1.bz2 > sample1.tst | ||
| 8 | .\bzip2 -dvv < sample2.bz2 > sample2.tst | ||
| 9 | type words3sh \ No newline at end of file | ||
diff --git a/test.cmd b/test.cmd deleted file mode 100644 index f7bc866..0000000 --- a/test.cmd +++ /dev/null | |||
| @@ -1,9 +0,0 @@ | |||
| 1 | @rem | ||
| 2 | @rem OS/2 test driver for bzip2 | ||
| 3 | @rem | ||
| 4 | type words1 | ||
| 5 | .\bzip2 -1 < sample1.ref > sample1.rbz | ||
| 6 | .\bzip2 -2 < sample2.ref > sample2.rbz | ||
| 7 | .\bzip2 -dvv < sample1.bz2 > sample1.tst | ||
| 8 | .\bzip2 -dvv < sample2.bz2 > sample2.tst | ||
| 9 | type words3sh \ No newline at end of file | ||
| @@ -1,7 +0,0 @@ | |||
| 1 | ***-------------------------------------------------*** | ||
| 2 | ***--------- IMPORTANT: READ WHAT FOLLOWS! ---------*** | ||
| 3 | ***--------- viz: pay attention :-) ---------*** | ||
| 4 | ***-------------------------------------------------*** | ||
| 5 | |||
| 6 | Compiling bzip2 ... | ||
| 7 | |||
| @@ -1,5 +1,4 @@ | |||
| 1 | 1 | ||
| 2 | |||
| 3 | Doing 4 tests (2 compress, 2 uncompress) ... | 2 | Doing 4 tests (2 compress, 2 uncompress) ... |
| 4 | If there's a problem, things might stop at this point. | 3 | If there's a problem, things might stop at this point. |
| 5 | 4 | ||
| @@ -1,5 +1,4 @@ | |||
| 1 | 1 | ||
| 2 | |||
| 3 | Checking test results. If any of the four "cmp"s which follow | 2 | Checking test results. If any of the four "cmp"s which follow |
| 4 | report any differences, something is wrong. If you can't easily | 3 | report any differences, something is wrong. If you can't easily |
| 5 | figure out what, please let me know (jseward@acm.org). | 4 | figure out what, please let me know (jseward@acm.org). |
| @@ -1,23 +1,20 @@ | |||
| 1 | 1 | ||
| 2 | |||
| 3 | If you got this far and the "cmp"s didn't find anything amiss, looks | 2 | If you got this far and the "cmp"s didn't find anything amiss, looks |
| 4 | like you're in business. You should install bzip2 and bunzip2: | 3 | like you're in business. You should install bzip2, bunzip2 and bzcat: |
| 5 | 4 | ||
| 6 | copy bzip2 to a public place, maybe /usr/bin. | 5 | Copy bzip2 and bzip2recover to a public place, maybe /usr/bin. |
| 7 | In that public place, make bunzip2 a symbolic link | 6 | In that public place, make bunzip2 and bzcat be |
| 8 | to the bzip2 you just copied there. | 7 | symbolic links to the bzip2 you just copied there. |
| 9 | Put the manual page, bzip2.1, somewhere appropriate; | 8 | Put the manual page, bzip2.1, somewhere appropriate; |
| 10 | perhaps in /usr/man/man1. | 9 | perhaps in /usr/man/man1. |
| 11 | 10 | ||
| 12 | Complete instructions for use are in the preformatted | 11 | Instructions for use are in the preformatted manual page, in the file |
| 13 | manual page, in the file bzip2.1.preformatted. | 12 | bzip2.txt. For more detailed documentation, read the full manual. |
| 13 | It is available in Postscript form (manual.ps) and HTML form | ||
| 14 | (manual_toc.html). | ||
| 14 | 15 | ||
| 15 | You can also do "bzip2 --help" to see some helpful information. | 16 | You can also do "bzip2 --help" to see some helpful information. |
| 16 | |||
| 17 | "bzip2 -L" displays the software license. | 17 | "bzip2 -L" displays the software license. |
| 18 | 18 | ||
| 19 | Please read the README file carefully. | 19 | Happy compressing. -- JRS, 30 August 1998. |
| 20 | Finally, note that bzip2 comes with ABSOLUTELY NO WARRANTY. | ||
| 21 | |||
| 22 | Happy compressing! | ||
| 23 | 20 | ||
diff --git a/words3sh b/words3sh deleted file mode 100644 index 1139177..0000000 --- a/words3sh +++ /dev/null | |||
| @@ -1,12 +0,0 @@ | |||
| 1 | If you got this far and the "bzip2 -dvv"s give identical | ||
| 2 | stored vs computed CRCs, you're probably in business. | ||
| 3 | Complete instructions for use are in the preformatted manual page, | ||
| 4 | in the file bzip2.txt. | ||
| 5 | |||
| 6 | You can also do "bzip2 --help" to see some helpful information. | ||
| 7 | "bzip2 -L" displays the software license. | ||
| 8 | |||
| 9 | Please read the README file carefully. | ||
| 10 | Finally, note that bzip2 comes with ABSOLUTELY NO WARRANTY. | ||
| 11 | |||
| 12 | Happy compressing! \ No newline at end of file | ||
