diff options
Diffstat (limited to 'README')
| -rw-r--r-- | README | 230 |
1 files changed, 57 insertions, 173 deletions
| @@ -1,194 +1,61 @@ | |||
| 1 | 1 | ||
| 2 | GREETINGS! | ||
| 3 | 2 | ||
| 4 | This is the README for bzip2, my block-sorting file compressor, | 3 | This is the README for bzip2, a block-sorting file compressor, version |
| 5 | version 0.1. | 4 | 0.9.0. This version is fully compatible with the previous public |
| 5 | release, bzip2-0.1pl2. | ||
| 6 | 6 | ||
| 7 | bzip2 is distributed under the GNU General Public License version 2; | 7 | bzip2-0.9.0 is distributed under a BSD-style license. For details, |
| 8 | for details, see the file LICENSE. Pointers to the algorithms used | 8 | see the file LICENSE. |
| 9 | are in ALGORITHMS. Instructions for use are in bzip2.1.preformatted. | ||
| 10 | 9 | ||
| 11 | Please read all of this file carefully. | 10 | Complete documentation is available in Postscript form (manual.ps) |
| 11 | or html (manual_toc.html). A plain-text version of the manual page is | ||
| 12 | available as bzip2.txt. | ||
| 12 | 13 | ||
| 13 | 14 | ||
| 15 | HOW TO BUILD -- UNIX | ||
| 14 | 16 | ||
| 15 | HOW TO BUILD | 17 | Type `make'. |
| 16 | 18 | ||
| 17 | -- for UNIX: | 19 | This creates binaries "bzip2" and "bzip2recover". |
| 18 | 20 | ||
| 19 | Type `make'. (tough, huh? :-) | 21 | It also runs four compress-decompress tests to make sure things are |
| 22 | working properly. If all goes well, you should be up & running. | ||
| 23 | Please be sure to read the output from `make' just to be sure that the | ||
| 24 | tests went ok. | ||
| 20 | 25 | ||
| 21 | This creates binaries "bzip2", and "bunzip2", | 26 | To install bzip2 properly: |
| 22 | which is a symbolic link to "bzip2". | ||
| 23 | 27 | ||
| 24 | It also runs four compress-decompress tests to make sure | 28 | * Copy the binaries "bzip2" and "bzip2recover" to a publically visible |
| 25 | things are working properly. If all goes well, you should be up & | 29 | place, possibly /usr/bin or /usr/local/bin. |
| 26 | running. Please be sure to read the output from `make' | ||
| 27 | just to be sure that the tests went ok. | ||
| 28 | 30 | ||
| 29 | To install bzip2 properly: | 31 | * In that directory, make "bunzip2" and "bzcat" be symbolic links |
| 32 | to "bzip2". | ||
| 30 | 33 | ||
| 31 | -- Copy the binary "bzip2" to a publically visible place, | 34 | * Copy the manual page, bzip2.1, to the relevant place. |
| 32 | possibly /usr/bin, /usr/common/bin or /usr/local/bin. | 35 | Probably the right place is /usr/man/man1/. |
| 33 | |||
| 34 | -- In that directory, make "bunzip2" be a symbolic link | ||
| 35 | to "bzip2". | ||
| 36 | |||
| 37 | -- Copy the manual page, bzip2.1, to the relevant place. | ||
| 38 | Probably the right place is /usr/man/man1/. | ||
| 39 | |||
| 40 | -- for Windows 95 and NT: | ||
| 41 | 36 | ||
| 42 | For a start, do you *really* want to recompile bzip2? | 37 | If you want to program with the library, you'll need to copy libbz2.a |
| 43 | The standard distribution includes a pre-compiled version | 38 | and bzlib.h to /usr/lib and /usr/include respectively. |
| 44 | for Windows 95 and NT, `bzip2.exe'. | 39 | |
| 45 | 40 | ||
| 46 | This executable was created with Jacob Navia's excellent | 41 | HOW TO BUILD -- Windows 95, NT, DOS, Mac, etc. |
| 47 | port to Win32 of Chris Fraser & David Hanson's excellent | ||
| 48 | ANSI C compiler, "lcc". You can get to it at the pages | ||
| 49 | of the CS department of Princeton University, | ||
| 50 | www.cs.princeton.edu. | ||
| 51 | I have not tried to compile this version of bzip2 with | ||
| 52 | a commercial C compiler such as MS Visual C, as I don't | ||
| 53 | have one available. | ||
| 54 | |||
| 55 | Note that lcc is designed primarily to be portable and | ||
| 56 | fast. Code quality is a secondary aim, so bzip2.exe | ||
| 57 | runs perhaps 40% slower than it could if compiled with | ||
| 58 | a good optimising compiler. | ||
| 59 | |||
| 60 | I compiled a previous version of bzip (0.21) with Borland | ||
| 61 | C 5.0, which worked fine, and with MS VC++ 2.0, which | ||
| 62 | didn't. Here is an comment from the README for bzip-0.21. | ||
| 63 | |||
| 64 | MS VC++ 2.0's optimising compiler has a bug which, at | ||
| 65 | maximum optimisation, gives an executable which produces | ||
| 66 | garbage compressed files. Proceed with caution. | ||
| 67 | I do not know whether or not this happens with later | ||
| 68 | versions of VC++. | ||
| 69 | |||
| 70 | Edit the defines starting at line 86 of bzip.c to | ||
| 71 | select your platform/compiler combination, and then compile. | ||
| 72 | Then check that the resulting executable (assumed to be | ||
| 73 | called bzip.exe) works correctly, using the SELFTEST.BAT file. | ||
| 74 | Bearing in mind the previous paragraph, the self-test is | ||
| 75 | important. | ||
| 76 | |||
| 77 | Note that the defines which bzip-0.21 had, to support | ||
| 78 | compilation with VC 2.0 and BC 5.0, are gone. Windows | ||
| 79 | is not my preferred operating system, and I am, for the | ||
| 80 | moment, content with the modestly fast executable created | ||
| 81 | by lcc-win32. | ||
| 82 | |||
| 83 | A manual page is supplied, unformatted (bzip2.1), | ||
| 84 | preformatted (bzip2.1.preformatted), and preformatted | ||
| 85 | and sanitised for MS-DOS (bzip2.txt). | ||
| 86 | |||
| 87 | |||
| 88 | |||
| 89 | COMPILATION NOTES | ||
| 90 | |||
| 91 | bzip2 should work on any 32 or 64-bit machine. It is known to work | ||
| 92 | [meaning: it has compiled and passed self-tests] on the | ||
| 93 | following platform-os combinations: | ||
| 94 | |||
| 95 | Intel i386/i486 running Linux 2.0.21 | ||
| 96 | Sun Sparcs (various) running SunOS 4.1.4 and Solaris 2.5 | ||
| 97 | Intel i386/i486 running Windows 95 and NT | ||
| 98 | DEC Alpha running Digital Unix 4.0 | ||
| 99 | |||
| 100 | Following the release of bzip-0.21, many people mailed me | ||
| 101 | from around the world to say they had made it work on all sorts | ||
| 102 | of weird and wonderful machines. Chances are, if you have | ||
| 103 | a reasonable ANSI C compiler and a 32-bit machine, you can | ||
| 104 | get it to work. | ||
| 105 | |||
| 106 | The #defines starting at around line 82 of bzip2.c supply some | ||
| 107 | degree of platform-independance. If you configure bzip2 for some | ||
| 108 | new far-out platform which is not covered by the existing definitions, | ||
| 109 | please send me the relevant definitions. | ||
| 110 | |||
| 111 | I recommend GNU C for compilation. The code is standard ANSI C, | ||
| 112 | except for the Unix-specific file handling, so any ANSI C compiler | ||
| 113 | should work. Note however that the many routines marked INLINE | ||
| 114 | should be inlined by your compiler, else performance will be very | ||
| 115 | poor. Asking your compiler to unroll loops gives some | ||
| 116 | small improvement too; for gcc, the relevant flag is | ||
| 117 | -funroll-loops. | ||
| 118 | |||
| 119 | On a 386/486 machines, I'd recommend giving gcc the | ||
| 120 | -fomit-frame-pointer flag; this liberates another register for | ||
| 121 | allocation, which measurably improves performance. | ||
| 122 | |||
| 123 | I used the abovementioned lcc compiler to develop bzip2. | ||
| 124 | I would highly recommend this compiler for day-to-day development; | ||
| 125 | it is fast, reliable, lightweight, has an excellent profiler, | ||
| 126 | and is generally excellent. And it's fun to retarget, if you're | ||
| 127 | into that kind of thing. | ||
| 128 | |||
| 129 | If you compile bzip2 on a new platform or with a new compiler, | ||
| 130 | please be sure to run the four compress-decompress tests, either | ||
| 131 | using the Makefile, or with the test.bat (MSDOS) or test.cmd (OS/2) | ||
| 132 | files. Some compilers have been seen to introduce subtle bugs | ||
| 133 | when optimising, so this check is important. Ideally you should | ||
| 134 | then go on to test bzip2 on a file several megabytes or even | ||
| 135 | tens of megabytes long, just to be 110% sure. ``Professional | ||
| 136 | programmers are paranoid programmers.'' (anon). | ||
| 137 | 42 | ||
| 43 | It's difficult for me to support compilation on all these platforms. | ||
| 44 | My approach is to collect binaries for these platforms, and put them | ||
| 45 | on my web page (http://www.muraroa.demon.co.uk). Look there. | ||
| 138 | 46 | ||
| 139 | 47 | ||
| 140 | VALIDATION | 48 | VALIDATION |
| 141 | 49 | ||
| 142 | Correct operation, in the sense that a compressed file can always be | 50 | Correct operation, in the sense that a compressed file can always be |
| 143 | decompressed to reproduce the original, is obviously of paramount | 51 | decompressed to reproduce the original, is obviously of paramount |
| 144 | importance. To validate bzip2, I used a modified version of | 52 | importance. To validate bzip2, I used a modified version of Mark |
| 145 | Mark Nelson's churn program. Churn is an automated test driver | 53 | Nelson's churn program. Churn is an automated test driver which |
| 146 | which recursively traverses a directory structure, using bzip2 to | 54 | recursively traverses a directory structure, using bzip2 to compress |
| 147 | compress and then decompress each file it encounters, and checking | 55 | and then decompress each file it encounters, and checking that the |
| 148 | that the decompressed data is the same as the original. As test | 56 | decompressed data is the same as the original. There are more details |
| 149 | material, I used several runs over several filesystems of differing | 57 | in Section 4 of the user guide. |
| 150 | sizes. | ||
| 151 | |||
| 152 | One set of tests was done on my base Linux filesystem, | ||
| 153 | 410 megabytes in 23,000 files. There were several runs over | ||
| 154 | this filesystem, in various configurations designed to break bzip2. | ||
| 155 | That filesystem also contained some specially constructed test | ||
| 156 | files designed to exercise boundary cases in the code. | ||
| 157 | This included files of zero length, various long, highly repetitive | ||
| 158 | files, and some files which generate blocks with all values the same. | ||
| 159 | 58 | ||
| 160 | The other set of tests was done just with the "normal" configuration, | ||
| 161 | but on a much larger quantity of data. | ||
| 162 | |||
| 163 | Tests are: | ||
| 164 | |||
| 165 | Linux FS, 410M, 23000 files | ||
| 166 | |||
| 167 | As above, with --repetitive-fast | ||
| 168 | |||
| 169 | As above, with -1 | ||
| 170 | |||
| 171 | Low level disk image of a disk containing | ||
| 172 | Windows NT4.0; 420M in a single huge file | ||
| 173 | |||
| 174 | Linux distribution, incl Slackware, | ||
| 175 | all GNU sources. 1900M in 2300 files. | ||
| 176 | |||
| 177 | Approx ~100M compiler sources and related | ||
| 178 | programming tools, running under Purify. | ||
| 179 | |||
| 180 | About 500M of data in 120 files of around | ||
| 181 | 4 M each. This is raw data from a | ||
| 182 | biomagnetometer (SQUID-based thing). | ||
| 183 | |||
| 184 | Overall, total volume of test data is about | ||
| 185 | 3300 megabytes in 25000 files. | ||
| 186 | |||
| 187 | The distribution does four tests after building bzip. These tests | ||
| 188 | include test decompressions of pre-supplied compressed files, so | ||
| 189 | they not only test that bzip works correctly on the machine it was | ||
| 190 | built on, but can also decompress files compressed on a different | ||
| 191 | machine. This guards against unforseen interoperability problems. | ||
| 192 | 59 | ||
| 193 | 60 | ||
| 194 | Please read and be aware of the following: | 61 | Please read and be aware of the following: |
| @@ -234,14 +101,30 @@ PATENTS: | |||
| 234 | End of legalities. | 101 | End of legalities. |
| 235 | 102 | ||
| 236 | 103 | ||
| 104 | WHAT'S NEW IN 0.9.0 (as compared to 0.1pl2) ? | ||
| 105 | |||
| 106 | * Approx 10% faster compression, 30% faster decompression | ||
| 107 | * -t (test mode) is a lot quicker | ||
| 108 | * Can decompress concatenated compressed files | ||
| 109 | * Programming interface, so programs can directly read/write .bz2 files | ||
| 110 | * Less restrictive (BSD-style) licensing | ||
| 111 | * Flag handling more compatible with GNU gzip | ||
| 112 | * Much more documentation, i.e., a proper user manual | ||
| 113 | * Hopefully, improved portability (at least of the library) | ||
| 114 | |||
| 115 | |||
| 237 | I hope you find bzip2 useful. Feel free to contact me at | 116 | I hope you find bzip2 useful. Feel free to contact me at |
| 238 | jseward@acm.org | 117 | jseward@acm.org |
| 239 | if you have any suggestions or queries. Many people mailed me with | 118 | if you have any suggestions or queries. Many people mailed me with |
| 240 | comments, suggestions and patches after the releases of 0.15 and 0.21, | 119 | comments, suggestions and patches after the releases of bzip-0.15, |
| 241 | and the changes in bzip2 are largely a result of this feedback. | 120 | bzip-0.21 and bzip2-0.1pl2, and the changes in bzip2 are largely a |
| 242 | I thank you for your comments. | 121 | result of this feedback. I thank you for your comments. |
| 122 | |||
| 123 | At least for the time being, bzip2's "home" is | ||
| 124 | http://www.muraroa.demon.co.uk. | ||
| 243 | 125 | ||
| 244 | Julian Seward | 126 | Julian Seward |
| 127 | jseward@acm.org | ||
| 245 | 128 | ||
| 246 | Manchester, UK | 129 | Manchester, UK |
| 247 | 18 July 1996 (version 0.15) | 130 | 18 July 1996 (version 0.15) |
| @@ -250,4 +133,5 @@ Manchester, UK | |||
| 250 | Guildford, Surrey, UK | 133 | Guildford, Surrey, UK |
| 251 | 7 August 1997 (bzip2, version 0.1) | 134 | 7 August 1997 (bzip2, version 0.1) |
| 252 | 29 August 1997 (bzip2, version 0.1pl2) | 135 | 29 August 1997 (bzip2, version 0.1pl2) |
| 136 | 23 August 1998 (bzip2, version 0.9.0) | ||
| 253 | 137 | ||
