diff options
author | Denys Vlasenko <vda.linux@googlemail.com> | 2010-02-01 15:58:08 +0100 |
---|---|---|
committer | Denys Vlasenko <vda.linux@googlemail.com> | 2010-02-01 15:58:08 +0100 |
commit | 698dca5805117f470ef19488428c8a5f795b9e0c (patch) | |
tree | 1ca510d7308f0a019ab8b0ba38be9183d438dfd6 | |
parent | c8e18ca12c66bc95a30a7d41a7aff245c352d2c2 (diff) | |
download | busybox-w32-698dca5805117f470ef19488428c8a5f795b9e0c.tar.gz busybox-w32-698dca5805117f470ef19488428c8a5f795b9e0c.tar.bz2 busybox-w32-698dca5805117f470ef19488428c8a5f795b9e0c.zip |
add unicode.txt
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
-rw-r--r-- | docs/unicode.txt | 56 |
1 files changed, 56 insertions, 0 deletions
diff --git a/docs/unicode.txt b/docs/unicode.txt new file mode 100644 index 000000000..019d12f65 --- /dev/null +++ b/docs/unicode.txt | |||
@@ -0,0 +1,56 @@ | |||
1 | Unicode support in busybox | ||
2 | |||
3 | There are several scenarios where we need to handle unicode | ||
4 | correctly. | ||
5 | |||
6 | Shell input | ||
7 | |||
8 | We want to correctly handle input of unicode characters. | ||
9 | There are several problems with it. Just handling input | ||
10 | as sequence of bytes would break any editing. This was fixed | ||
11 | and now lineedit operates on the array of wchar_t's. | ||
12 | But we also need to handle the following problematic moments: | ||
13 | |||
14 | * It is unreasonable to expect that output device supports | ||
15 | _any_ unicode chars. Perhaps we need to avoid printing | ||
16 | those chars which are not supported by output device. | ||
17 | Examples: chars which are not present in the font, | ||
18 | chars which are not assigned in unicode, | ||
19 | combining chars (especially trying to combine bad pairs: | ||
20 | a_chinese_symbol + "combining grave accent" = ??!) | ||
21 | |||
22 | * We need to account for the fact that unicode chars have | ||
23 | different widths: 0 for combining chars, 1 for usual, | ||
24 | 2 for ideograms (are there 3+ wide chars?). | ||
25 | |||
26 | * Bidirectional handling. If user wants to echo a phrase | ||
27 | in Hebrew, he types: echo "srettel werbeH" | ||
28 | |||
29 | Editors | ||
30 | |||
31 | This case is a bit similar to "shell input", but unlike shell, | ||
32 | editors may encounder many more unexpected unicode sequences | ||
33 | (try to load a random binry file...), and they need to preserve | ||
34 | them, unlike shell which can afford to drop bogus input. | ||
35 | |||
36 | |||
37 | more, less | ||
38 | |||
39 | . | ||
40 | |||
41 | ls (multi-column display) | ||
42 | |||
43 | . | ||
44 | |||
45 | top, ps | ||
46 | |||
47 | . | ||
48 | |||
49 | Filename display (in error messages and elsewhere) | ||
50 | |||
51 | . | ||
52 | |||
53 | |||
54 | |||
55 | TODO: write an email to Asmus Freytag (asmus@unicode.org), | ||
56 | author of http://unicode.org/reports/tr11/ | ||