diff options
| author | Rob Landley <rob@landley.net> | 2006-05-11 15:00:32 +0000 |
|---|---|---|
| committer | Rob Landley <rob@landley.net> | 2006-05-11 15:00:32 +0000 |
| commit | b73d2bf4bfac8f43cb068ba7a63a057eb9ca88ce (patch) | |
| tree | 55b5e472cdb4d24c07621941769976dd99702c88 /docs/busybox.net/programming.html | |
| parent | 8d2cb8be3b4c6be632911a2705b13e084ab9ef72 (diff) | |
| download | busybox-w32-b73d2bf4bfac8f43cb068ba7a63a057eb9ca88ce.tar.gz busybox-w32-b73d2bf4bfac8f43cb068ba7a63a057eb9ca88ce.tar.bz2 busybox-w32-b73d2bf4bfac8f43cb068ba7a63a057eb9ca88ce.zip | |
Reorganize FAQ, update a few entries, and consolidate with programming.html.
Diffstat (limited to 'docs/busybox.net/programming.html')
| -rw-r--r-- | docs/busybox.net/programming.html | 584 |
1 files changed, 0 insertions, 584 deletions
diff --git a/docs/busybox.net/programming.html b/docs/busybox.net/programming.html deleted file mode 100644 index b73e6ef95..000000000 --- a/docs/busybox.net/programming.html +++ /dev/null | |||
| @@ -1,584 +0,0 @@ | |||
| 1 | <!--#include file="header.html" --> | ||
| 2 | |||
| 3 | <h2>Rob's notes on programming busybox.</h2> | ||
| 4 | |||
| 5 | <ul> | ||
| 6 | <li><a href="#goals">What are the goals of busybox?</a></li> | ||
| 7 | <li><a href="#design">What is the design of busybox?</a></li> | ||
| 8 | <li><a href="#source">How is the source code organized?</a></li> | ||
| 9 | <ul> | ||
| 10 | <li><a href="#source_applets">The applet directories.</a></li> | ||
| 11 | <li><a href="#source_libbb">The busybox shared library (libbb)</a></li> | ||
| 12 | </ul> | ||
| 13 | <li><a href="#adding">Adding an applet to busybox</a></li> | ||
| 14 | <li><a href="#standards">What standards does busybox adhere to?</a></li> | ||
| 15 | <li><a href="#portability">Portability.</a></li> | ||
| 16 | <li><a href="#tips">Tips and tricks.</a></li> | ||
| 17 | <ul> | ||
| 18 | <li><a href="#tips_encrypted_passwords">Encrypted Passwords</a></li> | ||
| 19 | <li><a href="#tips_vfork">Fork and vfork</a></li> | ||
| 20 | <li><a href="#tips_short_read">Short reads and writes</a></li> | ||
| 21 | <li><a href="#tips_memory">Memory used by relocatable code, PIC, and static linking.</a></li> | ||
| 22 | <li><a href="#tips_kernel_headers">Including Linux kernel headers.</a></li> | ||
| 23 | </ul> | ||
| 24 | <li><a href="#who">Who are the BusyBox developers?</a></li> | ||
| 25 | </ul> | ||
| 26 | |||
| 27 | <h2><b><a name="goals">What are the goals of busybox?</a></b></h2> | ||
| 28 | |||
| 29 | <p>Busybox aims to be the smallest and simplest correct implementation of the | ||
| 30 | standard Linux command line tools. First and foremost, this means the | ||
| 31 | smallest executable size we can manage. We also want to have the simplest | ||
| 32 | and cleanest implementation we can manage, be <a href="#standards">standards | ||
| 33 | compliant</a>, minimize run-time memory usage (heap and stack), run fast, and | ||
| 34 | take over the world.</p> | ||
| 35 | |||
| 36 | <h2><b><a name="design">What is the design of busybox?</a></b></h2> | ||
| 37 | |||
| 38 | <p>Busybox is like a swiss army knife: one thing with many functions. | ||
| 39 | The busybox executable can act like many different programs depending on | ||
| 40 | the name used to invoke it. Normal practice is to create a bunch of symlinks | ||
| 41 | pointing to the busybox binary, each of which triggers a different busybox | ||
| 42 | function. (See <a href="FAQ.html#getting_started">getting started</a> in the | ||
| 43 | FAQ for more information on usage, and <a href="BusyBox.html">the | ||
| 44 | busybox documentation</a> for a list of symlink names and what they do.) | ||
| 45 | |||
| 46 | <p>The "one binary to rule them all" approach is primarily for size reasons: a | ||
| 47 | single multi-purpose executable is smaller then many small files could be. | ||
| 48 | This way busybox only has one set of ELF headers, it can easily share code | ||
| 49 | between different apps even when statically linked, it has better packing | ||
| 50 | efficiency by avoding gaps between files or compression dictionary resets, | ||
| 51 | and so on.</p> | ||
| 52 | |||
| 53 | <p>Work is underway on new options such as "make standalone" to build separate | ||
| 54 | binaries for each applet, and a "libbb.so" to make the busybox common code | ||
| 55 | available as a shared library. Neither is ready yet at the time of this | ||
| 56 | writing.</p> | ||
| 57 | |||
| 58 | <a name="source"></a> | ||
| 59 | |||
| 60 | <h2><a name="source_applets"><b>The applet directories</b></a></h2> | ||
| 61 | |||
| 62 | <p>The directory "applets" contains the busybox startup code (applets.c and | ||
| 63 | busybox.c), and several subdirectories containing the code for the individual | ||
| 64 | applets.</p> | ||
| 65 | |||
| 66 | <p>Busybox execution starts with the main() function in applets/busybox.c, | ||
| 67 | which sets the global variable bb_applet_name to argv[0] and calls | ||
| 68 | run_applet_by_name() in applets/applets.c. That uses the applets[] array | ||
| 69 | (defined in include/busybox.h and filled out in include/applets.h) to | ||
| 70 | transfer control to the appropriate APPLET_main() function (such as | ||
| 71 | cat_main() or sed_main()). The individual applet takes it from there.</p> | ||
| 72 | |||
| 73 | <p>This is why calling busybox under a different name triggers different | ||
| 74 | functionality: main() looks up argv[0] in applets[] to get a function pointer | ||
| 75 | to APPLET_main().</p> | ||
| 76 | |||
| 77 | <p>Busybox applets may also be invoked through the multiplexor applet | ||
| 78 | "busybox" (see busybox_main() in applets/busybox.c), and through the | ||
| 79 | standalone shell (grep for STANDALONE_SHELL in applets/shell/*.c). | ||
| 80 | See <a href="FAQ.html#getting_started">getting started</a> in the | ||
| 81 | FAQ for more information on these alternate usage mechanisms, which are | ||
| 82 | just different ways to reach the relevant APPLET_main() function.</p> | ||
| 83 | |||
| 84 | <p>The applet subdirectories (archival, console-tools, coreutils, | ||
| 85 | debianutils, e2fsprogs, editors, findutils, init, loginutils, miscutils, | ||
| 86 | modutils, networking, procps, shell, sysklogd, and util-linux) correspond | ||
| 87 | to the configuration sub-menus in menuconfig. Each subdirectory contains the | ||
| 88 | code to implement the applets in that sub-menu, as well as a Config.in | ||
| 89 | file defining that configuration sub-menu (with dependencies and help text | ||
| 90 | for each applet), and the makefile segment (Makefile.in) for that | ||
| 91 | subdirectory.</p> | ||
| 92 | |||
| 93 | <p>The run-time --help is stored in usage_messages[], which is initialized at | ||
| 94 | the start of applets/applets.c and gets its help text from usage.h. During the | ||
| 95 | build this help text is also used to generate the BusyBox documentation (in | ||
| 96 | html, txt, and man page formats) in the docs directory. See | ||
| 97 | <a href="#adding">adding an applet to busybox</a> for more | ||
| 98 | information.</p> | ||
| 99 | |||
| 100 | <h2><a name="source_libbb"><b>libbb</b></a></h2> | ||
| 101 | |||
| 102 | <p>Most non-setup code shared between busybox applets lives in the libbb | ||
| 103 | directory. It's a mess that evolved over the years without much auditing | ||
| 104 | or cleanup. For anybody looking for a great project to break into busybox | ||
| 105 | development with, documenting libbb would be both incredibly useful and good | ||
| 106 | experience.</p> | ||
| 107 | |||
| 108 | <p>Common themes in libbb include allocation functions that test | ||
| 109 | for failure and abort the program with an error message so the caller doesn't | ||
| 110 | have to test the return value (xmalloc(), xstrdup(), etc), wrapped versions | ||
| 111 | of open(), close(), read(), and write() that test for their own failures | ||
| 112 | and/or retry automatically, linked list management functions (llist.c), | ||
| 113 | command line argument parsing (getopt_ulflags.c), and a whole lot more.</p> | ||
| 114 | |||
| 115 | <h2><a name="adding"><b>Adding an applet to busybox</b></a></h2> | ||
| 116 | |||
| 117 | <p>To add a new applet to busybox, first pick a name for the applet and | ||
| 118 | a corresponding CONFIG_NAME. Then do this:</p> | ||
| 119 | |||
| 120 | <ul> | ||
| 121 | <li>Figure out where in the busybox source tree your applet best fits, | ||
| 122 | and put your source code there. Be sure to use APPLET_main() instead | ||
| 123 | of main(), where APPLET is the name of your applet.</li> | ||
| 124 | |||
| 125 | <li>Add your applet to the relevant Config.in file (which file you add | ||
| 126 | it to determines where it shows up in "make menuconfig"). This uses | ||
| 127 | the same general format as the linux kernel's configuration system.</li> | ||
| 128 | |||
| 129 | <li>Add your applet to the relevant Makefile.in file (in the same | ||
| 130 | directory as the Config.in you chose), using the existing entries as a | ||
| 131 | template and the same CONFIG symbol as you used for Config.in. (Don't | ||
| 132 | forget "needlibm" or "needcrypt" if your applet needs libm or | ||
| 133 | libcrypt.)</li> | ||
| 134 | |||
| 135 | <li>Add your applet to "include/applets.h", using one of the existing | ||
| 136 | entries as a template. (Note: this is in alphabetical order. Applets | ||
| 137 | are found via binary search, and if you add an applet out of order it | ||
| 138 | won't work.)</li> | ||
| 139 | |||
| 140 | <li>Add your applet's runtime help text to "include/usage.h". You need | ||
| 141 | at least appname_trivial_usage (the minimal help text, always included | ||
| 142 | in the busybox binary when this applet is enabled) and appname_full_usage | ||
| 143 | (extra help text included in the busybox binary with | ||
| 144 | CONFIG_FEATURE_VERBOSE_USAGE is enabled), or it won't compile. | ||
| 145 | The other two help entry types (appname_example_usage and | ||
| 146 | appname_notes_usage) are optional. They don't take up space in the binary, | ||
| 147 | but instead show up in the generated documentation (BusyBox.html, | ||
| 148 | BusyBox.txt, and the man page BusyBox.1).</li> | ||
| 149 | |||
| 150 | <li>Run menuconfig, switch your applet on, compile, test, and fix the | ||
| 151 | bugs. Be sure to try both "allyesconfig" and "allnoconfig" (and | ||
| 152 | "allbareconfig" if relevant).</li> | ||
| 153 | |||
| 154 | </ul> | ||
| 155 | |||
| 156 | <h2><a name="standards">What standards does busybox adhere to?</a></h2> | ||
| 157 | |||
| 158 | <p>The standard we're paying attention to is the "Shell and Utilities" | ||
| 159 | portion of the <a href="http://www.opengroup.org/onlinepubs/009695399/">Open | ||
| 160 | Group Base Standards</a> (also known as the Single Unix Specification version | ||
| 161 | 3 or SUSv3). Note that paying attention isn't necessarily the same thing as | ||
| 162 | following it.</p> | ||
| 163 | |||
| 164 | <p>SUSv3 doesn't even mention things like init, mount, tar, or losetup, nor | ||
| 165 | commonly used options like echo's '-e' and '-n', or sed's '-i'. Busybox is | ||
| 166 | driven by what real users actually need, not the fact the standard believes | ||
| 167 | we should implement ed or sccs. For size reasons, we're unlikely to include | ||
| 168 | much internationalization support beyond UTF-8, and on top of all that, our | ||
| 169 | configuration menu lets developers chop out features to produce smaller but | ||
| 170 | very non-standard utilities.</p> | ||
| 171 | |||
| 172 | <p>Also, Busybox is aimed primarily at Linux. Unix standards are interesting | ||
| 173 | because Linux tries to adhere to them, but portability to dozens of platforms | ||
| 174 | is only interesting in terms of offering a restricted feature set that works | ||
| 175 | everywhere, not growing dozens of platform-specific extensions. Busybox | ||
| 176 | should be portable to all hardware platforms Linux supports, and any other | ||
| 177 | similar operating systems that are easy to do and won't require much | ||
| 178 | maintenance.</p> | ||
| 179 | |||
| 180 | <p>In practice, standards compliance tends to be a clean-up step once an | ||
| 181 | applet is otherwise finished. When polishing and testing a busybox applet, | ||
| 182 | we ensure we have at least the option of full standards compliance, or else | ||
| 183 | document where we (intentionally) fall short.</p> | ||
| 184 | |||
| 185 | <h2><a name="portability">Portability.</a></h2> | ||
| 186 | |||
| 187 | <p>Busybox is a Linux project, but that doesn't mean we don't have to worry | ||
| 188 | about portability. First of all, there are different hardware platforms, | ||
| 189 | different C library implementations, different versions of the kernel and | ||
| 190 | build toolchain... The file "include/platform.h" exists to centralize and | ||
| 191 | encapsulate various platform-specific things in one place, so most busybox | ||
| 192 | code doesn't have to care where it's running.</p> | ||
| 193 | |||
| 194 | <p>To start with, Linux runs on dozens of hardware platforms. We try to test | ||
| 195 | each release on x86, x86-64, arm, power pc, and mips. (Since qemu can handle | ||
| 196 | all of these, this isn't that hard.) This means we have to care about a number | ||
| 197 | of portability issues like endianness, word size, and alignment, all of which | ||
| 198 | belong in platform.h. That header handles conditional #includes and gives | ||
| 199 | us macros we can use in the rest of our code. At some point in the future | ||
| 200 | we might grow a platform.c, possibly even a platform subdirectory. As long | ||
| 201 | as the applets themselves don't have to care.</p> | ||
| 202 | |||
| 203 | <p>On a related note, we made the "default signedness of char varies" problem | ||
| 204 | go away by feeding the compiler -funsigned-char. This gives us consistent | ||
| 205 | behavior on all platforms, and defaults to 8-bit clean text processing (which | ||
| 206 | gets us halfway to UTF-8 support). NOMMU support is less easily separated | ||
| 207 | (see the tips section later in this document), but we're working on it.</p> | ||
| 208 | |||
| 209 | <p>Another type of portability is build environments: we unapologetically use | ||
| 210 | a number of gcc and glibc extensions (as does the Linux kernel), but these have | ||
| 211 | been picked up by packages like uClibc, TCC, and Intel's C Compiler. As for | ||
| 212 | gcc, we take advantage of newer compiler optimizations to get the smallest | ||
| 213 | possible size, but we also regression test against an older build environment | ||
| 214 | using the Red Hat 9 image at "http://busybox.net/downloads/qemu". This has a | ||
| 215 | 2.4 kernel, gcc 3.2, make 3.79.1, and glibc 2.3, and is the oldest | ||
| 216 | build/deployment environment we still put any effort into maintaining. (If | ||
| 217 | anyone takes an interest in older kernels you're welcome to submit patches, | ||
| 218 | but the effort would probably be better spent | ||
| 219 | <a href="http://www.selenic.com/linux-tiny/">trimming | ||
| 220 | down the 2.6 kernel</a>.) Older gcc versions than that are uninteresting since | ||
| 221 | we now use c99 features, although | ||
| 222 | <a href="http://fabrice.bellard.free.fr/tcc/">tcc</a> might be worth a | ||
| 223 | look.</p> | ||
| 224 | |||
| 225 | <p>We also test busybox against the current release of uClibc. Older versions | ||
| 226 | of uClibc aren't very interesting (they were buggy, and uClibc wasn't really | ||
| 227 | usable as a general-purpose C library before version 0.9.26 anyway).</p> | ||
| 228 | |||
| 229 | <p>Other unix implementations are mostly uninteresting, since Linux binaries | ||
| 230 | have become the new standard for portable Unix programs. Specifically, | ||
| 231 | the ubiquity of Linux was cited as the main reason the Intel Binary | ||
| 232 | Compatability Standard 2 died, by the standards group organized to name a | ||
| 233 | successor to ibcs2: <a href="http://www.telly.org/86open/">the 86open | ||
| 234 | project</a>. That project disbanded in 1999 with the endorsement of an | ||
| 235 | existing standard: Linux ELF binaries. Since then, the major players at the | ||
| 236 | time (such as <a | ||
| 237 | href=http://www-03.ibm.com/servers/aix/products/aixos/linux/index.html>AIX</a>, <a | ||
| 238 | href=http://www.sun.com/software/solaris/ds/linux_interop.jsp#3>Solaris</a>, and | ||
| 239 | <a href=http://www.onlamp.com/pub/a/bsd/2000/03/17/linuxapps.html>FreeBSD</a>) | ||
| 240 | have all either grown Linux support or folded.</p> | ||
| 241 | |||
| 242 | <p>The major exceptions are newcomer MacOS X, some embedded environments | ||
| 243 | (such as newlib+libgloss) which provide a posix environment but not a full | ||
| 244 | Linux environment, and environments like Cygwin that provide only partial Linux | ||
| 245 | emulation. Also, some embedded Linux systems run a Linux kernel but amputate | ||
| 246 | things like the /proc directory to save space.</p> | ||
| 247 | |||
| 248 | <p>Supporting these systems is largely a question of providing a clean subset | ||
| 249 | of BusyBox's functionality -- whichever applets can easily be made to | ||
| 250 | work in that environment. Annotating the configuration system to | ||
| 251 | indicate which applets require which prerequisites (such as procfs) is | ||
| 252 | also welcome. Other efforts to support these systems (swapping #include | ||
| 253 | files to build in different environments, adding adapter code to platform.h, | ||
| 254 | adding more extensive special-case supporting infrastructure such as mount's | ||
| 255 | legacy mtab support) are handled on a case-by-case basis. Support that can be | ||
| 256 | cleanly hidden in platform.h is reasonably attractive, and failing that | ||
| 257 | support that can be cleanly separated into a separate conditionally compiled | ||
| 258 | file is at least worth a look. Special-case code in the body of an applet is | ||
| 259 | something we're trying to avoid.</p> | ||
| 260 | |||
| 261 | <h2><a name="tips" />Programming tips and tricks.</a></h2> | ||
| 262 | |||
| 263 | <p>Various things busybox uses that aren't particularly well documented | ||
| 264 | elsewhere.</p> | ||
| 265 | |||
| 266 | <h2><a name="tips_encrypted_passwords">Encrypted Passwords</a></h2> | ||
| 267 | |||
| 268 | <p>Password fields in /etc/passwd and /etc/shadow are in a special format. | ||
| 269 | If the first character isn't '$', then it's an old DES style password. If | ||
| 270 | the first character is '$' then the password is actually three fields | ||
| 271 | separated by '$' characters:</p> | ||
| 272 | <pre> | ||
| 273 | <b>$type$salt$encrypted_password</b> | ||
| 274 | </pre> | ||
| 275 | |||
| 276 | <p>The "type" indicates which encryption algorithm to use: 1 for MD5 and 2 for SHA1.</p> | ||
| 277 | |||
| 278 | <p>The "salt" is a bunch of ramdom characters (generally 8) the encryption | ||
| 279 | algorithm uses to perturb the password in a known and reproducible way (such | ||
| 280 | as by appending the random data to the unencrypted password, or combining | ||
| 281 | them with exclusive or). Salt is randomly generated when setting a password, | ||
| 282 | and then the same salt value is re-used when checking the password. (Salt is | ||
| 283 | thus stored unencrypted.)</p> | ||
| 284 | |||
| 285 | <p>The advantage of using salt is that the same cleartext password encrypted | ||
| 286 | with a different salt value produces a different encrypted value. | ||
| 287 | If each encrypted password uses a different salt value, an attacker is forced | ||
| 288 | to do the cryptographic math all over again for each password they want to | ||
| 289 | check. Without salt, they could simply produce a big dictionary of commonly | ||
| 290 | used passwords ahead of time, and look up each password in a stolen password | ||
| 291 | file to see if it's a known value. (Even if there are billions of possible | ||
| 292 | passwords in the dictionary, checking each one is just a binary search against | ||
| 293 | a file only a few gigabytes long.) With salt they can't even tell if two | ||
| 294 | different users share the same password without guessing what that password | ||
| 295 | is and decrypting it. They also can't precompute the attack dictionary for | ||
| 296 | a specific password until they know what the salt value is.</p> | ||
| 297 | |||
| 298 | <p>The third field is the encrypted password (plus the salt). For md5 this | ||
| 299 | is 22 bytes.</p> | ||
| 300 | |||
| 301 | <p>The busybox function to handle all this is pw_encrypt(clear, salt) in | ||
| 302 | "libbb/pw_encrypt.c". The first argument is the clear text password to be | ||
| 303 | encrypted, and the second is a string in "$type$salt$password" format, from | ||
| 304 | which the "type" and "salt" fields will be extracted to produce an encrypted | ||
| 305 | value. (Only the first two fields are needed, the third $ is equivalent to | ||
| 306 | the end of the string.) The return value is an encrypted password in | ||
| 307 | /etc/passwd format, with all three $ separated fields. It's stored in | ||
| 308 | a static buffer, 128 bytes long.</p> | ||
| 309 | |||
| 310 | <p>So when checking an existing password, if pw_encrypt(text, | ||
| 311 | old_encrypted_password) returns a string that compares identical to | ||
| 312 | old_encrypted_password, you've got the right password. When setting a new | ||
| 313 | password, generate a random 8 character salt string, put it in the right | ||
| 314 | format with sprintf(buffer, "$%c$%s", type, salt), and feed buffer as the | ||
| 315 | second argument to pw_encrypt(text,buffer).</p> | ||
| 316 | |||
| 317 | <h2><a name="tips_vfork">Fork and vfork</a></h2> | ||
| 318 | |||
| 319 | <p>On systems that haven't got a Memory Management Unit, fork() is unreasonably | ||
| 320 | expensive to implement (and sometimes even impossible), so a less capable | ||
| 321 | function called vfork() is used instead. (Using vfork() on a system with an | ||
| 322 | MMU is like pounding a nail with a wrench. Not the best tool for the job, but | ||
| 323 | it works.)</p> | ||
| 324 | |||
| 325 | <p>Busybox hides the difference between fork() and vfork() in | ||
| 326 | libbb/bb_fork_exec.c. If you ever want to fork and exec, use bb_fork_exec() | ||
| 327 | (which returns a pid and takes the same arguments as execve(), although in | ||
| 328 | this case envp can be NULL) and don't worry about it. This description is | ||
| 329 | here in case you want to know why that does what it does.</p> | ||
| 330 | |||
| 331 | <p>Implementing fork() depends on having a Memory Management Unit. With an | ||
| 332 | MMU then you can simply set up a second set of page tables and share the | ||
| 333 | physical memory via copy-on-write. So a fork() followed quickly by exec() | ||
| 334 | only copies a few pages of the parent's memory, just the ones it changes | ||
| 335 | before freeing them.</p> | ||
| 336 | |||
| 337 | <p>With a very primitive MMU (using a base pointer plus length instead of page | ||
| 338 | tables, which can provide virtual addresses and protect processes from each | ||
| 339 | other, but no copy on write) you can still implement fork. But it's | ||
| 340 | unreasonably expensive, because you have to copy all the parent process' | ||
| 341 | memory into the new process (which could easily be several megabytes per fork). | ||
| 342 | And you have to do this even though that memory gets freed again as soon as the | ||
| 343 | exec happens. (This is not just slow and a waste of space but causes memory | ||
| 344 | usage spikes that can easily cause the system to run out of memory.)</p> | ||
| 345 | |||
| 346 | <p>Without even a primitive MMU, you have no virtual addresses. Every process | ||
| 347 | can reach out and touch any other process' memory, because all pointers are to | ||
| 348 | physical addresses with no protection. Even if you copy a process' memory to | ||
| 349 | new physical addresses, all of its pointers point to the old objects in the | ||
| 350 | old process. (Searching through the new copy's memory for pointers and | ||
| 351 | redirect them to the new locations is not an easy problem.)</p> | ||
| 352 | |||
| 353 | <p>So with a primitive or missing MMU, fork() is just not a good idea.</p> | ||
| 354 | |||
| 355 | <p>In theory, vfork() is just a fork() that writeably shares the heap and stack | ||
| 356 | rather than copying it (so what one process writes the other one sees). In | ||
| 357 | practice, vfork() has to suspend the parent process until the child does exec, | ||
| 358 | at which point the parent wakes up and resumes by returning from the call to | ||
| 359 | vfork(). All modern kernel/libc combinations implement vfork() to put the | ||
| 360 | parent to sleep until the child does its exec. There's just no other way to | ||
| 361 | make it work: the parent has to know the child has done its exec() or exit() | ||
| 362 | before it's safe to return from the function it's in, so it has to block | ||
| 363 | until that happens. In fact without suspending the parent there's no way to | ||
| 364 | even store separate copies of the return value (the pid) from the vfork() call | ||
| 365 | itself: both assignments write into the same memory location.</p> | ||
| 366 | |||
| 367 | <p>One way to understand (and in fact implement) vfork() is this: imagine | ||
| 368 | the parent does a setjmp and then continues on (pretending to be the child) | ||
| 369 | until the exec() comes around, then the _exec_ does the actual fork, and the | ||
| 370 | parent does a longjmp back to the original vfork call and continues on from | ||
| 371 | there. (It thus becomes obvious why the child can't return, or modify | ||
| 372 | local variables it doesn't want the parent to see changed when it resumes.) | ||
| 373 | |||
| 374 | <p>Note a common mistake: the need for vfork doesn't mean you can't have two | ||
| 375 | processes running at the same time. It means you can't have two processes | ||
| 376 | sharing the same memory without stomping all over each other. As soon as | ||
| 377 | the child calls exec(), the parent resumes.</p> | ||
| 378 | |||
| 379 | <p>If the child's attempt to call exec() fails, the child should call _exit() | ||
| 380 | rather than a normal exit(). This avoids any atexit() code that might confuse | ||
| 381 | the parent. (The parent should never call _exit(), only a vforked child that | ||
| 382 | failed to exec.)</p> | ||
| 383 | |||
| 384 | <p>(Now in theory, a nommu system could just copy the _stack_ when it forks | ||
| 385 | (which presumably is much shorter than the heap), and leave the heap shared. | ||
| 386 | Even with no MMU at all | ||
| 387 | In practice, you've just wound up in a multi-threaded situation and you can't | ||
| 388 | do a malloc() or free() on your heap without freeing the other process' memory | ||
| 389 | (and if you don't have the proper locking for being threaded, corrupting the | ||
| 390 | heap if both of you try to do it at the same time and wind up stomping on | ||
| 391 | each other while traversing the free memory lists). The thing about vfork is | ||
| 392 | that it's a big red flag warning "there be dragons here" rather than | ||
| 393 | something subtle and thus even more dangerous.)</p> | ||
| 394 | |||
| 395 | <h2><a name="tips_sort_read">Short reads and writes</a></h2> | ||
| 396 | |||
| 397 | <p>Busybox has special functions, bb_full_read() and bb_full_write(), to | ||
| 398 | check that all the data we asked for got read or written. Is this a real | ||
| 399 | world consideration? Try the following:</p> | ||
| 400 | |||
| 401 | <pre>while true; do echo hello; sleep 1; done | tee out.txt</pre> | ||
| 402 | |||
| 403 | <p>If tee is implemented with bb_full_read(), tee doesn't display output | ||
| 404 | in real time but blocks until its entire input buffer (generally a couple | ||
| 405 | kilobytes) is read, then displays it all at once. In that case, we _want_ | ||
| 406 | the short read, for user interface reasons. (Note that read() should never | ||
| 407 | return 0 unless it has hit the end of input, and an attempt to write 0 | ||
| 408 | bytes should be ignored by the OS.)</p> | ||
| 409 | |||
| 410 | <p>As for short writes, play around with two processes piping data to each | ||
| 411 | other on the command line (cat bigfile | gzip > out.gz) and suspend and | ||
| 412 | resume a few times (ctrl-z to suspend, "fg" to resume). The writer can | ||
| 413 | experience short writes, which are especially dangerous because if you don't | ||
| 414 | notice them you'll discard data. They can also happen when a system is under | ||
| 415 | load and a fast process is piping to a slower one. (Such as an xterm waiting | ||
| 416 | on x11 when the scheduler decides X is being a CPU hog with all that | ||
| 417 | text console scrolling...)</p> | ||
| 418 | |||
| 419 | <p>So will data always be read from the far end of a pipe at the | ||
| 420 | same chunk sizes it was written in? Nope. Don't rely on that. For one | ||
| 421 | counterexample, see <a href="http://www.faqs.org/rfcs/rfc896.html">rfc 896 | ||
| 422 | for Nagle's algorithm</a>, which waits a fraction of a second or so before | ||
| 423 | sending out small amounts of data through a TCP/IP connection in case more | ||
| 424 | data comes in that can be merged into the same packet. (In case you were | ||
| 425 | wondering why action games that use TCP/IP set TCP_NODELAY to lower the latency | ||
| 426 | on their their sockets, now you know.)</p> | ||
| 427 | |||
| 428 | <h2><a name="tips_memory">Memory used by relocatable code, PIC, and static linking.</a></h2> | ||
| 429 | |||
| 430 | <p>The downside of standard dynamic linking is that it results in self-modifying | ||
| 431 | code. Although each executable's pages are mmaped() into a process' address | ||
| 432 | space from the executable file and are thus naturally shared between processes | ||
| 433 | out of the page cache, the library loader (ld-linux.so.2 or ld-uClibc.so.0) | ||
| 434 | writes to these pages to supply addresses for relocatable symbols. This | ||
| 435 | dirties the pages, triggering copy-on-write allocation of new memory for each | ||
| 436 | processes' dirtied pages.</p> | ||
| 437 | |||
| 438 | <p>One solution to this is Position Independent Code (PIC), a way of linking | ||
| 439 | a file so all the relocations are grouped together. This dirties fewer | ||
| 440 | pages (often just a single page) for each process' relocations. The down | ||
| 441 | side is this results in larger executables, which take up more space on disk | ||
| 442 | (and a correspondingly larger space in memory). But when many copies of the | ||
| 443 | same program are running, PIC dynamic linking trades a larger disk footprint | ||
| 444 | for a smaller memory footprint, by sharing more pages.</p> | ||
| 445 | |||
| 446 | <p>A third solution is static linking. A statically linked program has no | ||
| 447 | relocations, and thus the entire executable is shared between all running | ||
| 448 | instances. This tends to have a significantly larger disk footprint, but | ||
| 449 | on a system with only one or two executables, shared libraries aren't much | ||
| 450 | of a win anyway.</p> | ||
| 451 | |||
| 452 | <p>You can tell the glibc linker to display debugging information about its | ||
| 453 | relocations with the environment variable "LD_DEBUG". Try | ||
| 454 | "LD_DEBUG=help /bin/true" for a list of commands. Learning to interpret | ||
| 455 | "LD_DEBUG=statistics cat /proc/self/statm" could be interesting.</p> | ||
| 456 | |||
| 457 | <p>For more on this topic, here's Rich Felker:</p> | ||
| 458 | <blockquote> | ||
| 459 | <p>Dynamic linking (without fixed load addresses) fundamentally requires | ||
| 460 | at least one dirty page per dso that uses symbols. Making calls (but | ||
| 461 | never taking the address explicitly) to functions within the same dso | ||
| 462 | does not require a dirty page by itself, but will with ELF unless you | ||
| 463 | use -Bsymbolic or hidden symbols when linking.</p> | ||
| 464 | |||
| 465 | <p>ELF uses significant additional stack space for the kernel to pass all | ||
| 466 | the ELF data structures to the newly created process image. These are | ||
| 467 | located above the argument list and environment. This normally adds 1 | ||
| 468 | dirty page to the process size.</p> | ||
| 469 | |||
| 470 | <p>The ELF dynamic linker has its own data segment, adding one or more | ||
| 471 | dirty pages. I believe it also performs relocations on itself.</p> | ||
| 472 | |||
| 473 | <p>The ELF dynamic linker makes significant dynamic allocations to manage | ||
| 474 | the global symbol table and the loaded dso's. This data is never | ||
| 475 | freed. It will be needed again if libdl is used, so unconditionally | ||
| 476 | freeing it is not possible, but normal programs do not use libdl. Of | ||
| 477 | course with glibc all programs use libdl (due to nsswitch) so the | ||
| 478 | issue was never addressed.</p> | ||
| 479 | |||
| 480 | <p>ELF also has the issue that segments are not page-aligned on disk. | ||
| 481 | This saves up to 4k on disk, but at the expense of using an additional | ||
| 482 | dirty page in most cases, due to a large portion of the first data | ||
| 483 | page being filled with a duplicate copy of the last text page.</p> | ||
| 484 | |||
| 485 | <p>The above is just a partial list of the tiny memory penalties of ELF | ||
| 486 | dynamic linking, which eventually add up to quite a bit. The smallest | ||
| 487 | I've been able to get a process down to is 8 dirty pages, and the | ||
| 488 | above factors seem to mostly account for it (but some were difficult | ||
| 489 | to measure).</p> | ||
| 490 | </blockquote> | ||
| 491 | |||
| 492 | <h2><a name="tips_kernel_headers"></a>Including kernel headers</h2> | ||
| 493 | |||
| 494 | <p>The "linux" or "asm" directories of /usr/include contain Linux kernel | ||
| 495 | headers, so that the C library can talk directly to the Linux kernel. In | ||
| 496 | a perfect world, applications shouldn't include these headers directly, but | ||
| 497 | we don't live in a perfect world.</p> | ||
| 498 | |||
| 499 | <p>For example, Busybox's losetup code wants linux/loop.c because nothing else | ||
| 500 | #defines the structures to call the kernel's loopback device setup ioctls. | ||
| 501 | Attempts to cut and paste the information into a local busybox header file | ||
| 502 | proved incredibly painful, because portions of the loop_info structure vary by | ||
| 503 | architecture, namely the type __kernel_dev_t has different sizes on alpha, | ||
| 504 | arm, x86, and so on. Meaning we either #include <linux/posix_types.h> or | ||
| 505 | we hardwire #ifdefs to check what platform we're building on and define this | ||
| 506 | type appropriately for every single hardware architecture supported by | ||
| 507 | Linux, which is simply unworkable.</p> | ||
| 508 | |||
| 509 | <p>This is aside from the fact that the relevant type defined in | ||
| 510 | posix_types.h was renamed to __kernel_old_dev_t during the 2.5 series, so | ||
| 511 | to cut and paste the structure into our header we have to #include | ||
| 512 | <linux/version.h> to figure out which name to use. (What we actually do is | ||
| 513 | check if we're building on 2.6, and if so just use the new 64 bit structure | ||
| 514 | instead to avoid the rename entirely.) But we still need the version | ||
| 515 | check, since 2.4 didn't have the 64 bit structure.</p> | ||
| 516 | |||
| 517 | <p>The BusyBox developers spent <u>two years</u> _two years_ trying to figure | ||
| 518 | out a clean way to do all this. There isn't one. The losetup in the | ||
| 519 | util-linux package from kernel.org isn't doing it cleanly either, they just | ||
| 520 | hide the ugliness by nesting #include files. Their mount/loop.h | ||
| 521 | #includes "my_dev_t.h", which #includes <linux/posix_types.h> and | ||
| 522 | <linux/version.h> just like we do. There simply is no alternative.</p> | ||
| 523 | |||
| 524 | <p>We should never directly include kernel headers when there's a better | ||
| 525 | way to do it, but block copying information out of the kernel headers is not | ||
| 526 | a better way.</p> | ||
| 527 | |||
| 528 | <h2><a name="who">Who are the BusyBox developers?</a></h2> | ||
| 529 | |||
| 530 | <p>The following login accounts currently exist on busybox.net. (I.E. these | ||
| 531 | people can commit <a href="http://busybox.net/downloads/patches">patches</a> | ||
| 532 | into subversion for the BusyBox, uClibc, and buildroot projects.)</p> | ||
| 533 | |||
| 534 | <pre> | ||
| 535 | aldot :Bernhard Fischer | ||
| 536 | andersen :Erik Andersen <- uClibc and BuildRoot maintainer. | ||
| 537 | bug1 :Glenn McGrath | ||
| 538 | davidm :David McCullough | ||
| 539 | gkajmowi :Garrett Kajmowicz <- uClibc++ maintainer | ||
| 540 | jbglaw :Jan-Benedict Glaw | ||
| 541 | jocke :Joakim Tjernlund | ||
| 542 | landley :Rob Landley <- BusyBox maintainer | ||
| 543 | lethal :Paul Mundt | ||
| 544 | mjn3 :Manuel Novoa III | ||
| 545 | osuadmin :osuadmin | ||
| 546 | pgf :Paul Fox | ||
| 547 | pkj :Peter Kjellerstedt | ||
| 548 | prpplague :David Anders | ||
| 549 | psm :Peter S. Mazinger | ||
| 550 | russ :Russ Dill | ||
| 551 | sandman :Robert Griebl | ||
| 552 | sjhill :Steven J. Hill | ||
| 553 | solar :Ned Ludd | ||
| 554 | timr :Tim Riker | ||
| 555 | tobiasa :Tobias Anderberg | ||
| 556 | vapier :Mike Frysinger | ||
| 557 | </pre> | ||
| 558 | |||
| 559 | <p>The following accounts used to exist on busybox.net, but don't anymore so | ||
| 560 | I can't ask /etc/passwd for their names. (If anybody would like to make | ||
| 561 | a stab at it...)</p> | ||
| 562 | |||
| 563 | <pre> | ||
| 564 | aaronl | ||
| 565 | beppu | ||
| 566 | dwhedon | ||
| 567 | erik : Also Erik Andersen? | ||
| 568 | gfeldman | ||
| 569 | jimg | ||
| 570 | kraai | ||
| 571 | markw | ||
| 572 | miles | ||
| 573 | proski | ||
| 574 | rjune | ||
| 575 | tausq | ||
| 576 | vodz :Vladimir N. Oleynik | ||
| 577 | </pre> | ||
| 578 | |||
| 579 | |||
| 580 | <br> | ||
| 581 | <br> | ||
| 582 | <br> | ||
| 583 | |||
| 584 | <!--#include file="footer.html" --> | ||
