Add explanations of encrypted passwords, and fork vs vfork.

author: Rob Landley <rob@landley.net> 2006-01-29 06:29:01 +0000
committer: Rob Landley <rob@landley.net> 2006-01-29 06:29:01 +0000
commit: b1b3cee831bc8dfcf439ad69f4694d0a8ca3f7e9 (patch)
tree: ae7a94253a6b4769793cd246e506ec2f665a10ca /docs/busybox.net/programming.html
parent: 08a1b5095d710c5d905056a9daa14a1acad5590b (diff)
download: busybox-w32-b1b3cee831bc8dfcf439ad69f4694d0a8ca3f7e9.tar.gz
busybox-w32-b1b3cee831bc8dfcf439ad69f4694d0a8ca3f7e9.tar.bz2
busybox-w32-b1b3cee831bc8dfcf439ad69f4694d0a8ca3f7e9.zip
1 files changed, 115 insertions, 0 deletions
diff --git a/docs/busybox.net/programming.html b/docs/busybox.net/programming.html
index e44f291b3..f77f3c3a6 100644
--- a/docs/busybox.net/programming.html
+++ b/docs/busybox.net/programming.html
@@ -12,6 +12,11 @@
  </ul>
  <li><a href="#adding">Adding an applet to busybox</a></li>
  <li><a href="#standards">What standards does busybox adhere to?</a></li>
+  <li><a href="#tips">Tips and tricks.</a></li>
+  <ul>
+    <li><a href="#tips_encrypted_passwords">Encrypted Passwords</a></li>
+    <li><a href="#tips_vfork">Fork and vfork</a></li>
+  </ul>
 </ul>
 <h2><b><a name="goals" />What are the goals of busybox?</b></h2>
@@ -172,6 +177,116 @@ applet is otherwise finished.  When polishing and testing a busybox applet,
 we ensure we have at least the option of full standards compliance, or else
 document where we (intentionally) fall short.</p>
+<h2><a name="tips" />Programming tips and tricks.</a></h2>
+<p>Various things busybox uses that aren't particularly well documented
+elsewhere.</p>
+<h2><a name="tips_encrypted_passwords">Encrypted Passwords</a></h2>
+<p>Password fields in /etc/passwd and /etc/shadow are in a special format.
+If the first character isn't '$', then it's an old DES style password.  If
+the first character is '$' then the password is actually three fields
+separated by '$' characters:</p>
+<pre>
+  <b>$type$salt$encrypted_password</b>
+</pre>
+<p>The "type" indicates which encryption algorithm to use: 1 for MD5 and 2 for SHA1.</p>
+<p>The "salt" is a bunch of ramdom characters (generally 8) the encryption
+algorithm uses to perturb the password in a known and reproducible way (such
+as by appending the random data to the unencrypted password, or combining
+them with exclusive or).  Salt is randomly generated when setting a password,
+and then the same salt value is re-used when checking the password.  (Salt is
+thus stored unencrypted.)</p>
+<p>The advantage of using salt is that the same cleartext password encrypted
+with a different salt value produces a different encrypted value.
+If each encrypted password uses a different salt value, an attacker is forced
+to do the cryptographic math all over again for each password they want to
+check.  Without salt, they could simply produce a big dictionary of commonly
+used passwords ahead of time, and look up each password in a stolen password
+file to see if it's a known value.  (Even if there are billions of possible
+passwords in the dictionary, checking each one is just a binary search against
+a file only a few gigabytes long.)  With salt they can't even tell if two
+different users share the same password without guessing what that password
+is and decrypting it.  They also can't precompute the attack dictionary for
+a specific password until they know what the salt value is.</p>
+<p>The third field is the encrypted password (plus the salt).  For md5 this
+is 22 bytes.</p>
+<p>The busybox function to handle all this is pw_encrypt(clear, salt) in
+"libbb/pw_encrypt.c".  The first argument is the clear text password to be
+encrypted, and the second is a string in "$type$salt$password" format, from
+which the "type" and "salt" fields will be extracted to produce an encrypted
+value.  (Only the first two fields are needed, the third $ is equivalent to
+the end of the string.)  The return value is an encrypted password in
+/etc/passwd format, with all three $ separated fields.  It's stored in
+a static buffer, 128 bytes long.</p>
+<p>So when checking an existing password, if pw_encrypt(text,
+old_encrypted_password) returns a string that compares identical to
+old_encrypted_password, you've got the right password.  When setting a new
+password, generate a random 8 character salt string, put it in the right
+format with sprintf(buffer, "$%c$%s", type, salt), and feed buffer as the
+second argument to pw_encrypt(text,buffer).</p>
+<h2><a name="tips_vfork">Fork and vfork</a></h2>
+<p>On systems that haven't got a Memory Management Unit, fork() is unreasonably
+expensive to implement, so a less capable function called vfork() is used
+instead.</p>
+<p>The reason vfork() exists is that if you haven't got an MMU then you can't
+simply set up a second set of page tables and share the physical memory via
+copy-on-write, which is what fork() normally does.  This means that actually
+forking has to copy all the parent's memory (which could easily be tens of
+megabytes).  And you have to do this even though that memory gets freed again
+as soon as the exec happens, so it's probably all a big waste of time.</p>
+<p>This is not only slow and a waste of space, it also causes totally
+unnecessary memory usage spikes based on how big the _parent_ process is (not
+the child), and these spikes are quite likely to trigger an out of memory
+condition on small systems (which is where nommu is common anyway).  So
+although you _can_ emulate a real fork on a nommu system, you really don't
+want to.</p>
+<p>In theory, vfork() is just a fork() that writeably shares the heap and stack
+rather than copying it (so what one process writes the other one sees).  In
+practice, vfork() has to suspend the parent process until the child does exec,
+at which point the parent wakes up and resumes by returning from the call to
+vfork().  All modern kernel/libc combinations implement vfork() to put the
+parent to sleep until the child does its exec.  There's just no other way to
+make it work: they're sharing the same stack, so if either one returns from its
+function it stomps on the callstack so that when the other process returns,
+hilarity ensues.  In fact without suspending the parent there's no way to even
+store separate copies of the return value (the pid) from the vfork() call
+itself: both assignments write into the same memory location.</p>
+<p>One way to understand (and in fact implement) vfork() is this: imagine
+the parent does a setjmp and then continues on (pretending to be the child)
+until the exec() comes around, then the _exec_ does the actual fork, and the
+parent does a longjmp back to the original vfork call and continues on from
+there.  (It thus becomes obvious why the child can't return, or modify
+local variables it doesn't want the parent to see changed when it resumes.)
+<p>Note a common mistake: the need for vfork doesn't mean you can't have two
+processes running at the same time.  It means you can't have two processes
+sharing the same memory without stomping all over each other.  As soon as
+the child calls exec(), the parent resumes.</p>
+<p>(Now in theory, a nommu system could just copy the _stack_ when it forks
+(which presumably is much shorter than the heap), and leave the heap shared.
+In practice, you've just wound up in a multi-threaded situation and you can't
+do a malloc() or free() on your heap without freeing the other process's memory
+(and if you don't have the proper locking for being threaded, corrupting the
+heap if both of you try to do it at the same time and wind up stomping on
+each other while traversing the free memory lists).  The thing about vfork is
+that it's a big red flag warning "there be dragons here" rather than
+something subtle and thus even more dangerous.)</p>
 <br>
 <br>
 <br>
author	Rob Landley <rob@landley.net>	2006-01-29 06:29:01 +0000
committer	Rob Landley <rob@landley.net>	2006-01-29 06:29:01 +0000
commit	b1b3cee831bc8dfcf439ad69f4694d0a8ca3f7e9 (patch)
tree	ae7a94253a6b4769793cd246e506ec2f665a10ca /docs/busybox.net/programming.html
parent	08a1b5095d710c5d905056a9daa14a1acad5590b (diff)
download	busybox-w32-b1b3cee831bc8dfcf439ad69f4694d0a8ca3f7e9.tar.gz busybox-w32-b1b3cee831bc8dfcf439ad69f4694d0a8ca3f7e9.tar.bz2 busybox-w32-b1b3cee831bc8dfcf439ad69f4694d0a8ca3f7e9.zip

diff --git a/docs/busybox.net/programming.html b/docs/busybox.net/programming.html index e44f291b3..f77f3c3a6 100644 --- a/docs/busybox.net/programming.html +++ b/docs/busybox.net/programming.html
@@ -12,6 +12,11 @@
12	</ul>	12	</ul>
13	<li><a href="#adding">Adding an applet to busybox</a></li>	13	<li><a href="#adding">Adding an applet to busybox</a></li>
14	<li><a href="#standards">What standards does busybox adhere to?</a></li>	14	<li><a href="#standards">What standards does busybox adhere to?</a></li>
		15	<li><a href="#tips">Tips and tricks.</a></li>
		16	<ul>
		17	<li><a href="#tips_encrypted_passwords">Encrypted Passwords</a></li>
		18	<li><a href="#tips_vfork">Fork and vfork</a></li>
		19	</ul>
15	</ul>	20	</ul>
16		21
17	<h2><b><a name="goals" />What are the goals of busybox?</b></h2>	22	<h2><b><a name="goals" />What are the goals of busybox?</b></h2>
@@ -172,6 +177,116 @@ applet is otherwise finished. When polishing and testing a busybox applet,
172	we ensure we have at least the option of full standards compliance, or else	177	we ensure we have at least the option of full standards compliance, or else
173	document where we (intentionally) fall short.</p>	178	document where we (intentionally) fall short.</p>
174		179
		180	<h2><a name="tips" />Programming tips and tricks.</a></h2>
		181
		182	<p>Various things busybox uses that aren't particularly well documented
		183	elsewhere.</p>
		184
		185	<h2><a name="tips_encrypted_passwords">Encrypted Passwords</a></h2>
		186
		187	<p>Password fields in /etc/passwd and /etc/shadow are in a special format.
		188	If the first character isn't '$', then it's an old DES style password. If
		189	the first character is '$' then the password is actually three fields
		190	separated by '$' characters:</p>
		191	<pre>
		192	<b>$type$salt$encrypted_password</b>
		193	</pre>
		194
		195	<p>The "type" indicates which encryption algorithm to use: 1 for MD5 and 2 for SHA1.</p>
		196
		197	<p>The "salt" is a bunch of ramdom characters (generally 8) the encryption
		198	algorithm uses to perturb the password in a known and reproducible way (such
		199	as by appending the random data to the unencrypted password, or combining
		200	them with exclusive or). Salt is randomly generated when setting a password,
		201	and then the same salt value is re-used when checking the password. (Salt is
		202	thus stored unencrypted.)</p>
		203
		204	<p>The advantage of using salt is that the same cleartext password encrypted
		205	with a different salt value produces a different encrypted value.
		206	If each encrypted password uses a different salt value, an attacker is forced
		207	to do the cryptographic math all over again for each password they want to
		208	check. Without salt, they could simply produce a big dictionary of commonly
		209	used passwords ahead of time, and look up each password in a stolen password
		210	file to see if it's a known value. (Even if there are billions of possible
		211	passwords in the dictionary, checking each one is just a binary search against
		212	a file only a few gigabytes long.) With salt they can't even tell if two
		213	different users share the same password without guessing what that password
		214	is and decrypting it. They also can't precompute the attack dictionary for
		215	a specific password until they know what the salt value is.</p>
		216
		217	<p>The third field is the encrypted password (plus the salt). For md5 this
		218	is 22 bytes.</p>
		219
		220	<p>The busybox function to handle all this is pw_encrypt(clear, salt) in
		221	"libbb/pw_encrypt.c". The first argument is the clear text password to be
		222	encrypted, and the second is a string in "$type$salt$password" format, from
		223	which the "type" and "salt" fields will be extracted to produce an encrypted
		224	value. (Only the first two fields are needed, the third $ is equivalent to
		225	the end of the string.) The return value is an encrypted password in
		226	/etc/passwd format, with all three $ separated fields. It's stored in
		227	a static buffer, 128 bytes long.</p>
		228
		229	<p>So when checking an existing password, if pw_encrypt(text,
		230	old_encrypted_password) returns a string that compares identical to
		231	old_encrypted_password, you've got the right password. When setting a new
		232	password, generate a random 8 character salt string, put it in the right
		233	format with sprintf(buffer, "$%c$%s", type, salt), and feed buffer as the
		234	second argument to pw_encrypt(text,buffer).</p>
		235
		236	<h2><a name="tips_vfork">Fork and vfork</a></h2>
		237
		238	<p>On systems that haven't got a Memory Management Unit, fork() is unreasonably
		239	expensive to implement, so a less capable function called vfork() is used
		240	instead.</p>
		241
		242	<p>The reason vfork() exists is that if you haven't got an MMU then you can't
		243	simply set up a second set of page tables and share the physical memory via
		244	copy-on-write, which is what fork() normally does. This means that actually
		245	forking has to copy all the parent's memory (which could easily be tens of
		246	megabytes). And you have to do this even though that memory gets freed again
		247	as soon as the exec happens, so it's probably all a big waste of time.</p>
		248
		249	<p>This is not only slow and a waste of space, it also causes totally
		250	unnecessary memory usage spikes based on how big the _parent_ process is (not
		251	the child), and these spikes are quite likely to trigger an out of memory
		252	condition on small systems (which is where nommu is common anyway). So
		253	although you _can_ emulate a real fork on a nommu system, you really don't
		254	want to.</p>
		255
		256	<p>In theory, vfork() is just a fork() that writeably shares the heap and stack
		257	rather than copying it (so what one process writes the other one sees). In
		258	practice, vfork() has to suspend the parent process until the child does exec,
		259	at which point the parent wakes up and resumes by returning from the call to
		260	vfork(). All modern kernel/libc combinations implement vfork() to put the
		261	parent to sleep until the child does its exec. There's just no other way to
		262	make it work: they're sharing the same stack, so if either one returns from its
		263	function it stomps on the callstack so that when the other process returns,
		264	hilarity ensues. In fact without suspending the parent there's no way to even
		265	store separate copies of the return value (the pid) from the vfork() call
		266	itself: both assignments write into the same memory location.</p>
		267
		268	<p>One way to understand (and in fact implement) vfork() is this: imagine
		269	the parent does a setjmp and then continues on (pretending to be the child)
		270	until the exec() comes around, then the _exec_ does the actual fork, and the
		271	parent does a longjmp back to the original vfork call and continues on from
		272	there. (It thus becomes obvious why the child can't return, or modify
		273	local variables it doesn't want the parent to see changed when it resumes.)
		274
		275	<p>Note a common mistake: the need for vfork doesn't mean you can't have two
		276	processes running at the same time. It means you can't have two processes
		277	sharing the same memory without stomping all over each other. As soon as
		278	the child calls exec(), the parent resumes.</p>
		279
		280	<p>(Now in theory, a nommu system could just copy the _stack_ when it forks
		281	(which presumably is much shorter than the heap), and leave the heap shared.
		282	In practice, you've just wound up in a multi-threaded situation and you can't
		283	do a malloc() or free() on your heap without freeing the other process's memory
		284	(and if you don't have the proper locking for being threaded, corrupting the
		285	heap if both of you try to do it at the same time and wind up stomping on
		286	each other while traversing the free memory lists). The thing about vfork is
		287	that it's a big red flag warning "there be dragons here" rather than
		288	something subtle and thus even more dangerous.)</p>
		289
175	<br>	290	<br>
176	<br>	291	<br>
177	<br>	292	<br>