diff options
| author | Denys Vlasenko <vda.linux@googlemail.com> | 2010-02-01 15:58:08 +0100 |
|---|---|---|
| committer | Denys Vlasenko <vda.linux@googlemail.com> | 2010-02-01 15:58:08 +0100 |
| commit | 698dca5805117f470ef19488428c8a5f795b9e0c (patch) | |
| tree | 1ca510d7308f0a019ab8b0ba38be9183d438dfd6 /docs | |
| parent | c8e18ca12c66bc95a30a7d41a7aff245c352d2c2 (diff) | |
| download | busybox-w32-698dca5805117f470ef19488428c8a5f795b9e0c.tar.gz busybox-w32-698dca5805117f470ef19488428c8a5f795b9e0c.tar.bz2 busybox-w32-698dca5805117f470ef19488428c8a5f795b9e0c.zip | |
add unicode.txt
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Diffstat (limited to 'docs')
| -rw-r--r-- | docs/unicode.txt | 56 |
1 files changed, 56 insertions, 0 deletions
diff --git a/docs/unicode.txt b/docs/unicode.txt new file mode 100644 index 000000000..019d12f65 --- /dev/null +++ b/docs/unicode.txt | |||
| @@ -0,0 +1,56 @@ | |||
| 1 | Unicode support in busybox | ||
| 2 | |||
| 3 | There are several scenarios where we need to handle unicode | ||
| 4 | correctly. | ||
| 5 | |||
| 6 | Shell input | ||
| 7 | |||
| 8 | We want to correctly handle input of unicode characters. | ||
| 9 | There are several problems with it. Just handling input | ||
| 10 | as sequence of bytes would break any editing. This was fixed | ||
| 11 | and now lineedit operates on the array of wchar_t's. | ||
| 12 | But we also need to handle the following problematic moments: | ||
| 13 | |||
| 14 | * It is unreasonable to expect that output device supports | ||
| 15 | _any_ unicode chars. Perhaps we need to avoid printing | ||
| 16 | those chars which are not supported by output device. | ||
| 17 | Examples: chars which are not present in the font, | ||
| 18 | chars which are not assigned in unicode, | ||
| 19 | combining chars (especially trying to combine bad pairs: | ||
| 20 | a_chinese_symbol + "combining grave accent" = ??!) | ||
| 21 | |||
| 22 | * We need to account for the fact that unicode chars have | ||
| 23 | different widths: 0 for combining chars, 1 for usual, | ||
| 24 | 2 for ideograms (are there 3+ wide chars?). | ||
| 25 | |||
| 26 | * Bidirectional handling. If user wants to echo a phrase | ||
| 27 | in Hebrew, he types: echo "srettel werbeH" | ||
| 28 | |||
| 29 | Editors | ||
| 30 | |||
| 31 | This case is a bit similar to "shell input", but unlike shell, | ||
| 32 | editors may encounder many more unexpected unicode sequences | ||
| 33 | (try to load a random binry file...), and they need to preserve | ||
| 34 | them, unlike shell which can afford to drop bogus input. | ||
| 35 | |||
| 36 | |||
| 37 | more, less | ||
| 38 | |||
| 39 | . | ||
| 40 | |||
| 41 | ls (multi-column display) | ||
| 42 | |||
| 43 | . | ||
| 44 | |||
| 45 | top, ps | ||
| 46 | |||
| 47 | . | ||
| 48 | |||
| 49 | Filename display (in error messages and elsewhere) | ||
| 50 | |||
| 51 | . | ||
| 52 | |||
| 53 | |||
| 54 | |||
| 55 | TODO: write an email to Asmus Freytag (asmus@unicode.org), | ||
| 56 | author of http://unicode.org/reports/tr11/ | ||
