add unicode.txt

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
author: Denys Vlasenko <vda.linux@googlemail.com> 2010-02-01 15:58:08 +0100
committer: Denys Vlasenko <vda.linux@googlemail.com> 2010-02-01 15:58:08 +0100
commit: 698dca5805117f470ef19488428c8a5f795b9e0c (patch)
tree: 1ca510d7308f0a019ab8b0ba38be9183d438dfd6
parent: c8e18ca12c66bc95a30a7d41a7aff245c352d2c2 (diff)
download: busybox-w32-698dca5805117f470ef19488428c8a5f795b9e0c.tar.gz
busybox-w32-698dca5805117f470ef19488428c8a5f795b9e0c.tar.bz2
busybox-w32-698dca5805117f470ef19488428c8a5f795b9e0c.zip
1 files changed, 56 insertions, 0 deletions
diff --git a/docs/unicode.txt b/docs/unicode.txt
new file mode 100644
index 000000000..019d12f65
--- /dev/null
+++ b/docs/unicode.txt
@@ -0,0 +1,56 @@
+        Unicode support in busybox
+There are several scenarios where we need to handle unicode
+correctly.
+        Shell input
+We want to correctly handle input of unicode characters.
+There are several problems with it. Just handling input
+as sequence of bytes would break any editing. This was fixed
+and now lineedit operates on the array of wchar_t's.
+But we also need to handle the following problematic moments:
+* It is unreasonable to expect that output device supports
+  _any_ unicode chars. Perhaps we need to avoid printing
+  those chars which are not supported by output device.
+  Examples: chars which are not present in the font,
+  chars which are not assigned in unicode,
+  combining chars (especially trying to combine bad pairs:
+  a_chinese_symbol + "combining grave accent" = ??!)
+* We need to account for the fact that unicode chars have
+  different widths: 0 for combining chars, 1 for usual,
+  2 for ideograms (are there 3+ wide chars?).
+* Bidirectional handling. If user wants to echo a phrase
+  in Hebrew, he types: echo "srettel werbeH"
+        Editors
+This case is a bit similar to "shell input", but unlike shell,
+editors may encounder many more unexpected unicode sequences
+(try to load a random binry file...), and they need to preserve
+them, unlike shell which can afford to drop bogus input.
+        more, less
+.
+        ls (multi-column display)
+.
+        top, ps
+.
+        Filename display (in error messages and elsewhere)
+.
+TODO: write an email to Asmus Freytag (asmus@unicode.org),
+author of http://unicode.org/reports/tr11/
author	Denys Vlasenko <vda.linux@googlemail.com>	2010-02-01 15:58:08 +0100
committer	Denys Vlasenko <vda.linux@googlemail.com>	2010-02-01 15:58:08 +0100
commit	698dca5805117f470ef19488428c8a5f795b9e0c (patch)
tree	1ca510d7308f0a019ab8b0ba38be9183d438dfd6
parent	c8e18ca12c66bc95a30a7d41a7aff245c352d2c2 (diff)
download	busybox-w32-698dca5805117f470ef19488428c8a5f795b9e0c.tar.gz busybox-w32-698dca5805117f470ef19488428c8a5f795b9e0c.tar.bz2 busybox-w32-698dca5805117f470ef19488428c8a5f795b9e0c.zip

diff --git a/docs/unicode.txt b/docs/unicode.txt new file mode 100644 index 000000000..019d12f65 --- /dev/null +++ b/docs/unicode.txt
@@ -0,0 +1,56 @@
	1	Unicode support in busybox
	2
	3	There are several scenarios where we need to handle unicode
	4	correctly.
	5
	6	Shell input
	7
	8	We want to correctly handle input of unicode characters.
	9	There are several problems with it. Just handling input
	10	as sequence of bytes would break any editing. This was fixed
	11	and now lineedit operates on the array of wchar_t's.
	12	But we also need to handle the following problematic moments:
	13
	14	* It is unreasonable to expect that output device supports
	15	_any_ unicode chars. Perhaps we need to avoid printing
	16	those chars which are not supported by output device.
	17	Examples: chars which are not present in the font,
	18	chars which are not assigned in unicode,
	19	combining chars (especially trying to combine bad pairs:
	20	a_chinese_symbol + "combining grave accent" = ??!)
	21
	22	* We need to account for the fact that unicode chars have
	23	different widths: 0 for combining chars, 1 for usual,
	24	2 for ideograms (are there 3+ wide chars?).
	25
	26	* Bidirectional handling. If user wants to echo a phrase
	27	in Hebrew, he types: echo "srettel werbeH"
	28
	29	Editors
	30
	31	This case is a bit similar to "shell input", but unlike shell,
	32	editors may encounder many more unexpected unicode sequences
	33	(try to load a random binry file...), and they need to preserve
	34	them, unlike shell which can afford to drop bogus input.
	35
	36
	37	more, less
	38
	39	.
	40
	41	ls (multi-column display)
	42
	43	.
	44
	45	top, ps
	46
	47	.
	48
	49	Filename display (in error messages and elsewhere)
	50
	51	.
	52
	53
	54
	55	TODO: write an email to Asmus Freytag (asmus@unicode.org),
	56	author of http://unicode.org/reports/tr11/