diff options
Diffstat (limited to 'README')
-rw-r--r-- | README | 230 |
1 files changed, 57 insertions, 173 deletions
@@ -1,194 +1,61 @@ | |||
1 | 1 | ||
2 | GREETINGS! | ||
3 | 2 | ||
4 | This is the README for bzip2, my block-sorting file compressor, | 3 | This is the README for bzip2, a block-sorting file compressor, version |
5 | version 0.1. | 4 | 0.9.0. This version is fully compatible with the previous public |
5 | release, bzip2-0.1pl2. | ||
6 | 6 | ||
7 | bzip2 is distributed under the GNU General Public License version 2; | 7 | bzip2-0.9.0 is distributed under a BSD-style license. For details, |
8 | for details, see the file LICENSE. Pointers to the algorithms used | 8 | see the file LICENSE. |
9 | are in ALGORITHMS. Instructions for use are in bzip2.1.preformatted. | ||
10 | 9 | ||
11 | Please read all of this file carefully. | 10 | Complete documentation is available in Postscript form (manual.ps) |
11 | or html (manual_toc.html). A plain-text version of the manual page is | ||
12 | available as bzip2.txt. | ||
12 | 13 | ||
13 | 14 | ||
15 | HOW TO BUILD -- UNIX | ||
14 | 16 | ||
15 | HOW TO BUILD | 17 | Type `make'. |
16 | 18 | ||
17 | -- for UNIX: | 19 | This creates binaries "bzip2" and "bzip2recover". |
18 | 20 | ||
19 | Type `make'. (tough, huh? :-) | 21 | It also runs four compress-decompress tests to make sure things are |
22 | working properly. If all goes well, you should be up & running. | ||
23 | Please be sure to read the output from `make' just to be sure that the | ||
24 | tests went ok. | ||
20 | 25 | ||
21 | This creates binaries "bzip2", and "bunzip2", | 26 | To install bzip2 properly: |
22 | which is a symbolic link to "bzip2". | ||
23 | 27 | ||
24 | It also runs four compress-decompress tests to make sure | 28 | * Copy the binaries "bzip2" and "bzip2recover" to a publically visible |
25 | things are working properly. If all goes well, you should be up & | 29 | place, possibly /usr/bin or /usr/local/bin. |
26 | running. Please be sure to read the output from `make' | ||
27 | just to be sure that the tests went ok. | ||
28 | 30 | ||
29 | To install bzip2 properly: | 31 | * In that directory, make "bunzip2" and "bzcat" be symbolic links |
32 | to "bzip2". | ||
30 | 33 | ||
31 | -- Copy the binary "bzip2" to a publically visible place, | 34 | * Copy the manual page, bzip2.1, to the relevant place. |
32 | possibly /usr/bin, /usr/common/bin or /usr/local/bin. | 35 | Probably the right place is /usr/man/man1/. |
33 | |||
34 | -- In that directory, make "bunzip2" be a symbolic link | ||
35 | to "bzip2". | ||
36 | |||
37 | -- Copy the manual page, bzip2.1, to the relevant place. | ||
38 | Probably the right place is /usr/man/man1/. | ||
39 | |||
40 | -- for Windows 95 and NT: | ||
41 | 36 | ||
42 | For a start, do you *really* want to recompile bzip2? | 37 | If you want to program with the library, you'll need to copy libbz2.a |
43 | The standard distribution includes a pre-compiled version | 38 | and bzlib.h to /usr/lib and /usr/include respectively. |
44 | for Windows 95 and NT, `bzip2.exe'. | 39 | |
45 | 40 | ||
46 | This executable was created with Jacob Navia's excellent | 41 | HOW TO BUILD -- Windows 95, NT, DOS, Mac, etc. |
47 | port to Win32 of Chris Fraser & David Hanson's excellent | ||
48 | ANSI C compiler, "lcc". You can get to it at the pages | ||
49 | of the CS department of Princeton University, | ||
50 | www.cs.princeton.edu. | ||
51 | I have not tried to compile this version of bzip2 with | ||
52 | a commercial C compiler such as MS Visual C, as I don't | ||
53 | have one available. | ||
54 | |||
55 | Note that lcc is designed primarily to be portable and | ||
56 | fast. Code quality is a secondary aim, so bzip2.exe | ||
57 | runs perhaps 40% slower than it could if compiled with | ||
58 | a good optimising compiler. | ||
59 | |||
60 | I compiled a previous version of bzip (0.21) with Borland | ||
61 | C 5.0, which worked fine, and with MS VC++ 2.0, which | ||
62 | didn't. Here is an comment from the README for bzip-0.21. | ||
63 | |||
64 | MS VC++ 2.0's optimising compiler has a bug which, at | ||
65 | maximum optimisation, gives an executable which produces | ||
66 | garbage compressed files. Proceed with caution. | ||
67 | I do not know whether or not this happens with later | ||
68 | versions of VC++. | ||
69 | |||
70 | Edit the defines starting at line 86 of bzip.c to | ||
71 | select your platform/compiler combination, and then compile. | ||
72 | Then check that the resulting executable (assumed to be | ||
73 | called bzip.exe) works correctly, using the SELFTEST.BAT file. | ||
74 | Bearing in mind the previous paragraph, the self-test is | ||
75 | important. | ||
76 | |||
77 | Note that the defines which bzip-0.21 had, to support | ||
78 | compilation with VC 2.0 and BC 5.0, are gone. Windows | ||
79 | is not my preferred operating system, and I am, for the | ||
80 | moment, content with the modestly fast executable created | ||
81 | by lcc-win32. | ||
82 | |||
83 | A manual page is supplied, unformatted (bzip2.1), | ||
84 | preformatted (bzip2.1.preformatted), and preformatted | ||
85 | and sanitised for MS-DOS (bzip2.txt). | ||
86 | |||
87 | |||
88 | |||
89 | COMPILATION NOTES | ||
90 | |||
91 | bzip2 should work on any 32 or 64-bit machine. It is known to work | ||
92 | [meaning: it has compiled and passed self-tests] on the | ||
93 | following platform-os combinations: | ||
94 | |||
95 | Intel i386/i486 running Linux 2.0.21 | ||
96 | Sun Sparcs (various) running SunOS 4.1.4 and Solaris 2.5 | ||
97 | Intel i386/i486 running Windows 95 and NT | ||
98 | DEC Alpha running Digital Unix 4.0 | ||
99 | |||
100 | Following the release of bzip-0.21, many people mailed me | ||
101 | from around the world to say they had made it work on all sorts | ||
102 | of weird and wonderful machines. Chances are, if you have | ||
103 | a reasonable ANSI C compiler and a 32-bit machine, you can | ||
104 | get it to work. | ||
105 | |||
106 | The #defines starting at around line 82 of bzip2.c supply some | ||
107 | degree of platform-independance. If you configure bzip2 for some | ||
108 | new far-out platform which is not covered by the existing definitions, | ||
109 | please send me the relevant definitions. | ||
110 | |||
111 | I recommend GNU C for compilation. The code is standard ANSI C, | ||
112 | except for the Unix-specific file handling, so any ANSI C compiler | ||
113 | should work. Note however that the many routines marked INLINE | ||
114 | should be inlined by your compiler, else performance will be very | ||
115 | poor. Asking your compiler to unroll loops gives some | ||
116 | small improvement too; for gcc, the relevant flag is | ||
117 | -funroll-loops. | ||
118 | |||
119 | On a 386/486 machines, I'd recommend giving gcc the | ||
120 | -fomit-frame-pointer flag; this liberates another register for | ||
121 | allocation, which measurably improves performance. | ||
122 | |||
123 | I used the abovementioned lcc compiler to develop bzip2. | ||
124 | I would highly recommend this compiler for day-to-day development; | ||
125 | it is fast, reliable, lightweight, has an excellent profiler, | ||
126 | and is generally excellent. And it's fun to retarget, if you're | ||
127 | into that kind of thing. | ||
128 | |||
129 | If you compile bzip2 on a new platform or with a new compiler, | ||
130 | please be sure to run the four compress-decompress tests, either | ||
131 | using the Makefile, or with the test.bat (MSDOS) or test.cmd (OS/2) | ||
132 | files. Some compilers have been seen to introduce subtle bugs | ||
133 | when optimising, so this check is important. Ideally you should | ||
134 | then go on to test bzip2 on a file several megabytes or even | ||
135 | tens of megabytes long, just to be 110% sure. ``Professional | ||
136 | programmers are paranoid programmers.'' (anon). | ||
137 | 42 | ||
43 | It's difficult for me to support compilation on all these platforms. | ||
44 | My approach is to collect binaries for these platforms, and put them | ||
45 | on my web page (http://www.muraroa.demon.co.uk). Look there. | ||
138 | 46 | ||
139 | 47 | ||
140 | VALIDATION | 48 | VALIDATION |
141 | 49 | ||
142 | Correct operation, in the sense that a compressed file can always be | 50 | Correct operation, in the sense that a compressed file can always be |
143 | decompressed to reproduce the original, is obviously of paramount | 51 | decompressed to reproduce the original, is obviously of paramount |
144 | importance. To validate bzip2, I used a modified version of | 52 | importance. To validate bzip2, I used a modified version of Mark |
145 | Mark Nelson's churn program. Churn is an automated test driver | 53 | Nelson's churn program. Churn is an automated test driver which |
146 | which recursively traverses a directory structure, using bzip2 to | 54 | recursively traverses a directory structure, using bzip2 to compress |
147 | compress and then decompress each file it encounters, and checking | 55 | and then decompress each file it encounters, and checking that the |
148 | that the decompressed data is the same as the original. As test | 56 | decompressed data is the same as the original. There are more details |
149 | material, I used several runs over several filesystems of differing | 57 | in Section 4 of the user guide. |
150 | sizes. | ||
151 | |||
152 | One set of tests was done on my base Linux filesystem, | ||
153 | 410 megabytes in 23,000 files. There were several runs over | ||
154 | this filesystem, in various configurations designed to break bzip2. | ||
155 | That filesystem also contained some specially constructed test | ||
156 | files designed to exercise boundary cases in the code. | ||
157 | This included files of zero length, various long, highly repetitive | ||
158 | files, and some files which generate blocks with all values the same. | ||
159 | 58 | ||
160 | The other set of tests was done just with the "normal" configuration, | ||
161 | but on a much larger quantity of data. | ||
162 | |||
163 | Tests are: | ||
164 | |||
165 | Linux FS, 410M, 23000 files | ||
166 | |||
167 | As above, with --repetitive-fast | ||
168 | |||
169 | As above, with -1 | ||
170 | |||
171 | Low level disk image of a disk containing | ||
172 | Windows NT4.0; 420M in a single huge file | ||
173 | |||
174 | Linux distribution, incl Slackware, | ||
175 | all GNU sources. 1900M in 2300 files. | ||
176 | |||
177 | Approx ~100M compiler sources and related | ||
178 | programming tools, running under Purify. | ||
179 | |||
180 | About 500M of data in 120 files of around | ||
181 | 4 M each. This is raw data from a | ||
182 | biomagnetometer (SQUID-based thing). | ||
183 | |||
184 | Overall, total volume of test data is about | ||
185 | 3300 megabytes in 25000 files. | ||
186 | |||
187 | The distribution does four tests after building bzip. These tests | ||
188 | include test decompressions of pre-supplied compressed files, so | ||
189 | they not only test that bzip works correctly on the machine it was | ||
190 | built on, but can also decompress files compressed on a different | ||
191 | machine. This guards against unforseen interoperability problems. | ||
192 | 59 | ||
193 | 60 | ||
194 | Please read and be aware of the following: | 61 | Please read and be aware of the following: |
@@ -234,14 +101,30 @@ PATENTS: | |||
234 | End of legalities. | 101 | End of legalities. |
235 | 102 | ||
236 | 103 | ||
104 | WHAT'S NEW IN 0.9.0 (as compared to 0.1pl2) ? | ||
105 | |||
106 | * Approx 10% faster compression, 30% faster decompression | ||
107 | * -t (test mode) is a lot quicker | ||
108 | * Can decompress concatenated compressed files | ||
109 | * Programming interface, so programs can directly read/write .bz2 files | ||
110 | * Less restrictive (BSD-style) licensing | ||
111 | * Flag handling more compatible with GNU gzip | ||
112 | * Much more documentation, i.e., a proper user manual | ||
113 | * Hopefully, improved portability (at least of the library) | ||
114 | |||
115 | |||
237 | I hope you find bzip2 useful. Feel free to contact me at | 116 | I hope you find bzip2 useful. Feel free to contact me at |
238 | jseward@acm.org | 117 | jseward@acm.org |
239 | if you have any suggestions or queries. Many people mailed me with | 118 | if you have any suggestions or queries. Many people mailed me with |
240 | comments, suggestions and patches after the releases of 0.15 and 0.21, | 119 | comments, suggestions and patches after the releases of bzip-0.15, |
241 | and the changes in bzip2 are largely a result of this feedback. | 120 | bzip-0.21 and bzip2-0.1pl2, and the changes in bzip2 are largely a |
242 | I thank you for your comments. | 121 | result of this feedback. I thank you for your comments. |
122 | |||
123 | At least for the time being, bzip2's "home" is | ||
124 | http://www.muraroa.demon.co.uk. | ||
243 | 125 | ||
244 | Julian Seward | 126 | Julian Seward |
127 | jseward@acm.org | ||
245 | 128 | ||
246 | Manchester, UK | 129 | Manchester, UK |
247 | 18 July 1996 (version 0.15) | 130 | 18 July 1996 (version 0.15) |
@@ -250,4 +133,5 @@ Manchester, UK | |||
250 | Guildford, Surrey, UK | 133 | Guildford, Surrey, UK |
251 | 7 August 1997 (bzip2, version 0.1) | 134 | 7 August 1997 (bzip2, version 0.1) |
252 | 29 August 1997 (bzip2, version 0.1pl2) | 135 | 29 August 1997 (bzip2, version 0.1pl2) |
136 | 23 August 1998 (bzip2, version 0.9.0) | ||
253 | 137 | ||