diff options
Diffstat (limited to 'README')
| -rw-r--r-- | README | 243 |
1 files changed, 243 insertions, 0 deletions
| @@ -0,0 +1,243 @@ | |||
| 1 | |||
| 2 | GREETINGS! | ||
| 3 | |||
| 4 | This is the README for bzip2, my block-sorting file compressor, | ||
| 5 | version 0.1. | ||
| 6 | |||
| 7 | bzip2 is distributed under the GNU General Public License version 2; | ||
| 8 | for details, see the file LICENSE. Pointers to the algorithms used | ||
| 9 | are in ALGORITHMS. Instructions for use are in bzip2.1.preformatted. | ||
| 10 | |||
| 11 | Please read this file carefully. | ||
| 12 | |||
| 13 | |||
| 14 | |||
| 15 | HOW TO BUILD | ||
| 16 | |||
| 17 | -- for UNIX: | ||
| 18 | |||
| 19 | Type `make'. (tough, huh? :-) | ||
| 20 | |||
| 21 | This creates binaries "bzip2", and "bunzip2", | ||
| 22 | which is a symbolic link to "bzip2". | ||
| 23 | |||
| 24 | It also runs four compress-decompress tests to make sure | ||
| 25 | things are working properly. If all goes well, you should be up & | ||
| 26 | running. Please be sure to read the output from `make' | ||
| 27 | just to be sure that the tests went ok. | ||
| 28 | |||
| 29 | To install bzip2 properly: | ||
| 30 | |||
| 31 | -- Copy the binary "bzip2" to a publically visible place, | ||
| 32 | possibly /usr/bin, /usr/common/bin or /usr/local/bin. | ||
| 33 | |||
| 34 | -- In that directory, make "bunzip2" be a symbolic link | ||
| 35 | to "bzip2". | ||
| 36 | |||
| 37 | -- Copy the manual page, bzip2.1, to the relevant place. | ||
| 38 | Probably the right place is /usr/man/man1/. | ||
| 39 | |||
| 40 | -- for Windows 95 and NT: | ||
| 41 | |||
| 42 | For a start, do you *really* want to recompile bzip2? | ||
| 43 | The standard distribution includes a pre-compiled version | ||
| 44 | for Windows 95 and NT, `bzip2.exe'. | ||
| 45 | |||
| 46 | This executable was created with Jacob Navia's excellent | ||
| 47 | port to Win32 of Chris Fraser & David Hanson's excellent | ||
| 48 | ANSI C compiler, "lcc". You can get to it at the pages | ||
| 49 | of the CS department of Princeton University, | ||
| 50 | www.cs.princeton.edu. | ||
| 51 | I have not tried to compile this version of bzip2 with | ||
| 52 | a commercial C compiler such as MS Visual C, as I don't | ||
| 53 | have one available. | ||
| 54 | |||
| 55 | Note that lcc is designed primarily to be portable and | ||
| 56 | fast. Code quality is a secondary aim, so bzip2.exe | ||
| 57 | runs perhaps 40% slower than it could if compiled with | ||
| 58 | a good optimising compiler. | ||
| 59 | |||
| 60 | I compiled a previous version of bzip (0.21) with Borland | ||
| 61 | C 5.0, which worked fine, and with MS VC++ 2.0, which | ||
| 62 | didn't. Here is an comment from the README for bzip-0.21. | ||
| 63 | |||
| 64 | MS VC++ 2.0's optimising compiler has a bug which, at | ||
| 65 | maximum optimisation, gives an executable which produces | ||
| 66 | garbage compressed files. Proceed with caution. | ||
| 67 | I do not know whether or not this happens with later | ||
| 68 | versions of VC++. | ||
| 69 | |||
| 70 | Edit the defines starting at line 86 of bzip.c to | ||
| 71 | select your platform/compiler combination, and then compile. | ||
| 72 | Then check that the resulting executable (assumed to be | ||
| 73 | called bzip.exe) works correctly, using the SELFTEST.BAT file. | ||
| 74 | Bearing in mind the previous paragraph, the self-test is | ||
| 75 | important. | ||
| 76 | |||
| 77 | Note that the defines which bzip-0.21 had, to support | ||
| 78 | compilation with VC 2.0 and BC 5.0, are gone. Windows | ||
| 79 | is not my preferred operating system, and I am, for the | ||
| 80 | moment, content with the modestly fast executable created | ||
| 81 | by lcc-win32. | ||
| 82 | |||
| 83 | A manual page is supplied, unformatted (bzip2.1), | ||
| 84 | preformatted (bzip2.1.preformatted), and preformatted | ||
| 85 | and sanitised for MS-DOS (bzip2.txt). | ||
| 86 | |||
| 87 | |||
| 88 | |||
| 89 | COMPILATION NOTES | ||
| 90 | |||
| 91 | bzip2 should work on any 32 or 64-bit machine. It is known to work | ||
| 92 | [meaning: it has compiled and passed self-tests] on the | ||
| 93 | following platform-os combinations: | ||
| 94 | |||
| 95 | Intel i386/i486 running Linux 2.0.21 | ||
| 96 | Sun Sparcs (various) running SunOS 4.1.4 and Solaris 2.5 | ||
| 97 | Intel i386/i486 running Windows 95 and NT | ||
| 98 | DEC Alpha running Digital Unix 4.0 | ||
| 99 | |||
| 100 | Following the release of bzip-0.21, many people mailed me | ||
| 101 | from around the world to say they had made it work on all sorts | ||
| 102 | of weird and wonderful machines. Chances are, if you have | ||
| 103 | a reasonable ANSI C compiler and a 32-bit machine, you can | ||
| 104 | get it to work. | ||
| 105 | |||
| 106 | The #defines starting at around line 82 of bzip2.c supply some | ||
| 107 | degree of platform-independance. If you configure bzip2 for some | ||
| 108 | new far-out platform which is not covered by the existing definitions, | ||
| 109 | please send me the relevant definitions. | ||
| 110 | |||
| 111 | I recommend GNU C for compilation. The code is standard ANSI C, | ||
| 112 | except for the Unix-specific file handling, so any ANSI C compiler | ||
| 113 | should work. Note however that the many routines marked INLINE | ||
| 114 | should be inlined by your compiler, else performance will be very | ||
| 115 | poor. Asking your compiler to unroll loops gives some | ||
| 116 | small improvement too; for gcc, the relevant flag is | ||
| 117 | -funroll-loops. | ||
| 118 | |||
| 119 | On a 386/486 machines, I'd recommend giving gcc the | ||
| 120 | -fomit-frame-pointer flag; this liberates another register for | ||
| 121 | allocation, which measurably improves performance. | ||
| 122 | |||
| 123 | I used the abovementioned lcc compiler to develop bzip2. | ||
| 124 | I would highly recommend this compiler for day-to-day development; | ||
| 125 | it is fast, reliable, lightweight, has an excellent profiler, | ||
| 126 | and is generally excellent. And it's fun to retarget, if you're | ||
| 127 | into that kind of thing. | ||
| 128 | |||
| 129 | If you compile bzip2 on a new platform or with a new compiler, | ||
| 130 | please be sure to run the four compress-decompress tests, either | ||
| 131 | using the Makefile, or with the test.bat (MSDOS) or test.cmd (OS/2) | ||
| 132 | files. Some compilers have been seen to introduce subtle bugs | ||
| 133 | when optimising, so this check is important. Ideally you should | ||
| 134 | then go on to test bzip2 on a file several megabytes or even | ||
| 135 | tens of megabytes long, just to be 110% sure. ``Professional | ||
| 136 | programmers are paranoid programmers.'' (anon). | ||
| 137 | |||
| 138 | |||
| 139 | |||
| 140 | VALIDATION | ||
| 141 | |||
| 142 | Correct operation, in the sense that a compressed file can always be | ||
| 143 | decompressed to reproduce the original, is obviously of paramount | ||
| 144 | importance. To validate bzip2, I used a modified version of | ||
| 145 | Mark Nelson's churn program. Churn is an automated test driver | ||
| 146 | which recursively traverses a directory structure, using bzip2 to | ||
| 147 | compress and then decompress each file it encounters, and checking | ||
| 148 | that the decompressed data is the same as the original. As test | ||
| 149 | material, I used several runs over several filesystems of differing | ||
| 150 | sizes. | ||
| 151 | |||
| 152 | One set of tests was done on my base Linux filesystem, | ||
| 153 | 410 megabytes in 23,000 files. There were several runs over | ||
| 154 | this filesystem, in various configurations designed to break bzip2. | ||
| 155 | That filesystem also contained some specially constructed test | ||
| 156 | files designed to exercise boundary cases in the code. | ||
| 157 | This included files of zero length, various long, highly repetitive | ||
| 158 | files, and some files which generate blocks with all values the same. | ||
| 159 | |||
| 160 | The other set of tests was done just with the "normal" configuration, | ||
| 161 | but on a much larger quantity of data. | ||
| 162 | |||
| 163 | Tests are: | ||
| 164 | |||
| 165 | Linux FS, 410M, 23000 files | ||
| 166 | |||
| 167 | As above, with --repetitive-fast | ||
| 168 | |||
| 169 | As above, with -1 | ||
| 170 | |||
| 171 | Low level disk image of a disk containing | ||
| 172 | Windows NT4.0; 420M in a single huge file | ||
| 173 | |||
| 174 | Linux distribution, incl Slackware, | ||
| 175 | all GNU sources. 1900M in 2300 files. | ||
| 176 | |||
| 177 | Approx ~100M compiler sources and related | ||
| 178 | programming tools, running under Purify. | ||
| 179 | |||
| 180 | About 500M of data in 120 files of around | ||
| 181 | 4 M each. This is raw data from a | ||
| 182 | biomagnetometer (SQUID-based thing). | ||
| 183 | |||
| 184 | Overall, total volume of test data is about | ||
| 185 | 3300 megabytes in 25000 files. | ||
| 186 | |||
| 187 | The distribution does four tests after building bzip. These tests | ||
| 188 | include test decompressions of pre-supplied compressed files, so | ||
| 189 | they not only test that bzip works correctly on the machine it was | ||
| 190 | built on, but can also decompress files compressed on a different | ||
| 191 | machine. This guards against unforseen interoperability problems. | ||
| 192 | |||
| 193 | |||
| 194 | Please read and be aware of the following: | ||
| 195 | |||
| 196 | WARNING: | ||
| 197 | |||
| 198 | This program (attempts to) compress data by performing several | ||
| 199 | non-trivial transformations on it. Unless you are 100% familiar | ||
| 200 | with *all* the algorithms contained herein, and with the | ||
| 201 | consequences of modifying them, you should NOT meddle with the | ||
| 202 | compression or decompression machinery. Incorrect changes can and | ||
| 203 | very likely *will* lead to disastrous loss of data. | ||
| 204 | |||
| 205 | |||
| 206 | DISCLAIMER: | ||
| 207 | |||
| 208 | I TAKE NO RESPONSIBILITY FOR ANY LOSS OF DATA ARISING FROM THE | ||
| 209 | USE OF THIS PROGRAM, HOWSOEVER CAUSED. | ||
| 210 | |||
| 211 | Every compression of a file implies an assumption that the | ||
| 212 | compressed file can be decompressed to reproduce the original. | ||
| 213 | Great efforts in design, coding and testing have been made to | ||
| 214 | ensure that this program works correctly. However, the complexity | ||
| 215 | of the algorithms, and, in particular, the presence of various | ||
| 216 | special cases in the code which occur with very low but non-zero | ||
| 217 | probability make it impossible to rule out the possibility of bugs | ||
| 218 | remaining in the program. DO NOT COMPRESS ANY DATA WITH THIS | ||
| 219 | PROGRAM UNLESS YOU ARE PREPARED TO ACCEPT THE POSSIBILITY, HOWEVER | ||
| 220 | SMALL, THAT THE DATA WILL NOT BE RECOVERABLE. | ||
| 221 | |||
| 222 | That is not to say this program is inherently unreliable. Indeed, | ||
| 223 | I very much hope the opposite is true. bzip2 has been carefully | ||
| 224 | constructed and extensively tested. | ||
| 225 | |||
| 226 | End of nasty legalities. | ||
| 227 | |||
| 228 | |||
| 229 | I hope you find bzip2 useful. Feel free to contact me at | ||
| 230 | jseward@acm.org | ||
| 231 | if you have any suggestions or queries. Many people mailed me with | ||
| 232 | comments, suggestions and patches after the releases of 0.15 and 0.21, | ||
| 233 | and the changes in bzip2 are largely a result of this feedback. | ||
| 234 | I thank you for your comments. | ||
| 235 | |||
| 236 | Julian Seward | ||
| 237 | |||
| 238 | Manchester, UK | ||
| 239 | 18 July 1996 (version 0.15) | ||
| 240 | 25 August 1996 (version 0.21) | ||
| 241 | |||
| 242 | Guildford, Surrey, UK | ||
| 243 | 7 August 1997 (bzip2, version 0.0) \ No newline at end of file | ||
