diff options
author | Julian Seward <jseward@acm.org> | 1997-08-07 22:13:13 +0200 |
---|---|---|
committer | Julian Seward <jseward@acm.org> | 1997-08-07 22:13:13 +0200 |
commit | 33d134030248633ffa7d60c0a35a783c46da034b (patch) | |
tree | b760dc34185dccc7054989c1472478574223cc31 /README | |
download | bzip2-33d134030248633ffa7d60c0a35a783c46da034b.tar.gz bzip2-33d134030248633ffa7d60c0a35a783c46da034b.tar.bz2 bzip2-33d134030248633ffa7d60c0a35a783c46da034b.zip |
bzip2-0.1bzip2-0.1
Diffstat (limited to 'README')
-rw-r--r-- | README | 243 |
1 files changed, 243 insertions, 0 deletions
@@ -0,0 +1,243 @@ | |||
1 | |||
2 | GREETINGS! | ||
3 | |||
4 | This is the README for bzip2, my block-sorting file compressor, | ||
5 | version 0.1. | ||
6 | |||
7 | bzip2 is distributed under the GNU General Public License version 2; | ||
8 | for details, see the file LICENSE. Pointers to the algorithms used | ||
9 | are in ALGORITHMS. Instructions for use are in bzip2.1.preformatted. | ||
10 | |||
11 | Please read this file carefully. | ||
12 | |||
13 | |||
14 | |||
15 | HOW TO BUILD | ||
16 | |||
17 | -- for UNIX: | ||
18 | |||
19 | Type `make'. (tough, huh? :-) | ||
20 | |||
21 | This creates binaries "bzip2", and "bunzip2", | ||
22 | which is a symbolic link to "bzip2". | ||
23 | |||
24 | It also runs four compress-decompress tests to make sure | ||
25 | things are working properly. If all goes well, you should be up & | ||
26 | running. Please be sure to read the output from `make' | ||
27 | just to be sure that the tests went ok. | ||
28 | |||
29 | To install bzip2 properly: | ||
30 | |||
31 | -- Copy the binary "bzip2" to a publically visible place, | ||
32 | possibly /usr/bin, /usr/common/bin or /usr/local/bin. | ||
33 | |||
34 | -- In that directory, make "bunzip2" be a symbolic link | ||
35 | to "bzip2". | ||
36 | |||
37 | -- Copy the manual page, bzip2.1, to the relevant place. | ||
38 | Probably the right place is /usr/man/man1/. | ||
39 | |||
40 | -- for Windows 95 and NT: | ||
41 | |||
42 | For a start, do you *really* want to recompile bzip2? | ||
43 | The standard distribution includes a pre-compiled version | ||
44 | for Windows 95 and NT, `bzip2.exe'. | ||
45 | |||
46 | This executable was created with Jacob Navia's excellent | ||
47 | port to Win32 of Chris Fraser & David Hanson's excellent | ||
48 | ANSI C compiler, "lcc". You can get to it at the pages | ||
49 | of the CS department of Princeton University, | ||
50 | www.cs.princeton.edu. | ||
51 | I have not tried to compile this version of bzip2 with | ||
52 | a commercial C compiler such as MS Visual C, as I don't | ||
53 | have one available. | ||
54 | |||
55 | Note that lcc is designed primarily to be portable and | ||
56 | fast. Code quality is a secondary aim, so bzip2.exe | ||
57 | runs perhaps 40% slower than it could if compiled with | ||
58 | a good optimising compiler. | ||
59 | |||
60 | I compiled a previous version of bzip (0.21) with Borland | ||
61 | C 5.0, which worked fine, and with MS VC++ 2.0, which | ||
62 | didn't. Here is an comment from the README for bzip-0.21. | ||
63 | |||
64 | MS VC++ 2.0's optimising compiler has a bug which, at | ||
65 | maximum optimisation, gives an executable which produces | ||
66 | garbage compressed files. Proceed with caution. | ||
67 | I do not know whether or not this happens with later | ||
68 | versions of VC++. | ||
69 | |||
70 | Edit the defines starting at line 86 of bzip.c to | ||
71 | select your platform/compiler combination, and then compile. | ||
72 | Then check that the resulting executable (assumed to be | ||
73 | called bzip.exe) works correctly, using the SELFTEST.BAT file. | ||
74 | Bearing in mind the previous paragraph, the self-test is | ||
75 | important. | ||
76 | |||
77 | Note that the defines which bzip-0.21 had, to support | ||
78 | compilation with VC 2.0 and BC 5.0, are gone. Windows | ||
79 | is not my preferred operating system, and I am, for the | ||
80 | moment, content with the modestly fast executable created | ||
81 | by lcc-win32. | ||
82 | |||
83 | A manual page is supplied, unformatted (bzip2.1), | ||
84 | preformatted (bzip2.1.preformatted), and preformatted | ||
85 | and sanitised for MS-DOS (bzip2.txt). | ||
86 | |||
87 | |||
88 | |||
89 | COMPILATION NOTES | ||
90 | |||
91 | bzip2 should work on any 32 or 64-bit machine. It is known to work | ||
92 | [meaning: it has compiled and passed self-tests] on the | ||
93 | following platform-os combinations: | ||
94 | |||
95 | Intel i386/i486 running Linux 2.0.21 | ||
96 | Sun Sparcs (various) running SunOS 4.1.4 and Solaris 2.5 | ||
97 | Intel i386/i486 running Windows 95 and NT | ||
98 | DEC Alpha running Digital Unix 4.0 | ||
99 | |||
100 | Following the release of bzip-0.21, many people mailed me | ||
101 | from around the world to say they had made it work on all sorts | ||
102 | of weird and wonderful machines. Chances are, if you have | ||
103 | a reasonable ANSI C compiler and a 32-bit machine, you can | ||
104 | get it to work. | ||
105 | |||
106 | The #defines starting at around line 82 of bzip2.c supply some | ||
107 | degree of platform-independance. If you configure bzip2 for some | ||
108 | new far-out platform which is not covered by the existing definitions, | ||
109 | please send me the relevant definitions. | ||
110 | |||
111 | I recommend GNU C for compilation. The code is standard ANSI C, | ||
112 | except for the Unix-specific file handling, so any ANSI C compiler | ||
113 | should work. Note however that the many routines marked INLINE | ||
114 | should be inlined by your compiler, else performance will be very | ||
115 | poor. Asking your compiler to unroll loops gives some | ||
116 | small improvement too; for gcc, the relevant flag is | ||
117 | -funroll-loops. | ||
118 | |||
119 | On a 386/486 machines, I'd recommend giving gcc the | ||
120 | -fomit-frame-pointer flag; this liberates another register for | ||
121 | allocation, which measurably improves performance. | ||
122 | |||
123 | I used the abovementioned lcc compiler to develop bzip2. | ||
124 | I would highly recommend this compiler for day-to-day development; | ||
125 | it is fast, reliable, lightweight, has an excellent profiler, | ||
126 | and is generally excellent. And it's fun to retarget, if you're | ||
127 | into that kind of thing. | ||
128 | |||
129 | If you compile bzip2 on a new platform or with a new compiler, | ||
130 | please be sure to run the four compress-decompress tests, either | ||
131 | using the Makefile, or with the test.bat (MSDOS) or test.cmd (OS/2) | ||
132 | files. Some compilers have been seen to introduce subtle bugs | ||
133 | when optimising, so this check is important. Ideally you should | ||
134 | then go on to test bzip2 on a file several megabytes or even | ||
135 | tens of megabytes long, just to be 110% sure. ``Professional | ||
136 | programmers are paranoid programmers.'' (anon). | ||
137 | |||
138 | |||
139 | |||
140 | VALIDATION | ||
141 | |||
142 | Correct operation, in the sense that a compressed file can always be | ||
143 | decompressed to reproduce the original, is obviously of paramount | ||
144 | importance. To validate bzip2, I used a modified version of | ||
145 | Mark Nelson's churn program. Churn is an automated test driver | ||
146 | which recursively traverses a directory structure, using bzip2 to | ||
147 | compress and then decompress each file it encounters, and checking | ||
148 | that the decompressed data is the same as the original. As test | ||
149 | material, I used several runs over several filesystems of differing | ||
150 | sizes. | ||
151 | |||
152 | One set of tests was done on my base Linux filesystem, | ||
153 | 410 megabytes in 23,000 files. There were several runs over | ||
154 | this filesystem, in various configurations designed to break bzip2. | ||
155 | That filesystem also contained some specially constructed test | ||
156 | files designed to exercise boundary cases in the code. | ||
157 | This included files of zero length, various long, highly repetitive | ||
158 | files, and some files which generate blocks with all values the same. | ||
159 | |||
160 | The other set of tests was done just with the "normal" configuration, | ||
161 | but on a much larger quantity of data. | ||
162 | |||
163 | Tests are: | ||
164 | |||
165 | Linux FS, 410M, 23000 files | ||
166 | |||
167 | As above, with --repetitive-fast | ||
168 | |||
169 | As above, with -1 | ||
170 | |||
171 | Low level disk image of a disk containing | ||
172 | Windows NT4.0; 420M in a single huge file | ||
173 | |||
174 | Linux distribution, incl Slackware, | ||
175 | all GNU sources. 1900M in 2300 files. | ||
176 | |||
177 | Approx ~100M compiler sources and related | ||
178 | programming tools, running under Purify. | ||
179 | |||
180 | About 500M of data in 120 files of around | ||
181 | 4 M each. This is raw data from a | ||
182 | biomagnetometer (SQUID-based thing). | ||
183 | |||
184 | Overall, total volume of test data is about | ||
185 | 3300 megabytes in 25000 files. | ||
186 | |||
187 | The distribution does four tests after building bzip. These tests | ||
188 | include test decompressions of pre-supplied compressed files, so | ||
189 | they not only test that bzip works correctly on the machine it was | ||
190 | built on, but can also decompress files compressed on a different | ||
191 | machine. This guards against unforseen interoperability problems. | ||
192 | |||
193 | |||
194 | Please read and be aware of the following: | ||
195 | |||
196 | WARNING: | ||
197 | |||
198 | This program (attempts to) compress data by performing several | ||
199 | non-trivial transformations on it. Unless you are 100% familiar | ||
200 | with *all* the algorithms contained herein, and with the | ||
201 | consequences of modifying them, you should NOT meddle with the | ||
202 | compression or decompression machinery. Incorrect changes can and | ||
203 | very likely *will* lead to disastrous loss of data. | ||
204 | |||
205 | |||
206 | DISCLAIMER: | ||
207 | |||
208 | I TAKE NO RESPONSIBILITY FOR ANY LOSS OF DATA ARISING FROM THE | ||
209 | USE OF THIS PROGRAM, HOWSOEVER CAUSED. | ||
210 | |||
211 | Every compression of a file implies an assumption that the | ||
212 | compressed file can be decompressed to reproduce the original. | ||
213 | Great efforts in design, coding and testing have been made to | ||
214 | ensure that this program works correctly. However, the complexity | ||
215 | of the algorithms, and, in particular, the presence of various | ||
216 | special cases in the code which occur with very low but non-zero | ||
217 | probability make it impossible to rule out the possibility of bugs | ||
218 | remaining in the program. DO NOT COMPRESS ANY DATA WITH THIS | ||
219 | PROGRAM UNLESS YOU ARE PREPARED TO ACCEPT THE POSSIBILITY, HOWEVER | ||
220 | SMALL, THAT THE DATA WILL NOT BE RECOVERABLE. | ||
221 | |||
222 | That is not to say this program is inherently unreliable. Indeed, | ||
223 | I very much hope the opposite is true. bzip2 has been carefully | ||
224 | constructed and extensively tested. | ||
225 | |||
226 | End of nasty legalities. | ||
227 | |||
228 | |||
229 | I hope you find bzip2 useful. Feel free to contact me at | ||
230 | jseward@acm.org | ||
231 | if you have any suggestions or queries. Many people mailed me with | ||
232 | comments, suggestions and patches after the releases of 0.15 and 0.21, | ||
233 | and the changes in bzip2 are largely a result of this feedback. | ||
234 | I thank you for your comments. | ||
235 | |||
236 | Julian Seward | ||
237 | |||
238 | Manchester, UK | ||
239 | 18 July 1996 (version 0.15) | ||
240 | 25 August 1996 (version 0.21) | ||
241 | |||
242 | Guildford, Surrey, UK | ||
243 | 7 August 1997 (bzip2, version 0.0) \ No newline at end of file | ||