diff options
| author | Avi Halachmi (:avih) <avihpit@yahoo.com> | 2026-03-18 13:38:24 +0200 |
|---|---|---|
| committer | Ron Yorston <rmy@pobox.com> | 2026-03-19 15:04:16 +0000 |
| commit | df652277439a30a973438577b1a370f4a7d2f47c (patch) | |
| tree | 68a2b9a321058d929d4bfec675a94041a6a04fad /scripts | |
| parent | 0b0ab67527a62f567a88fd674fbe0c2b2499c87e (diff) | |
| download | busybox-w32-df652277439a30a973438577b1a370f4a7d2f47c.tar.gz busybox-w32-df652277439a30a973438577b1a370f4a7d2f47c.tar.bz2 busybox-w32-df652277439a30a973438577b1a370f4a7d2f47c.zip | |
win32: UTF8_OUTPUT: refine bad-sequence output
Previously, at writeCon_utf8, when we detected an invalid byte (for
the current state), then we printed one '?' which also covered any
following invalid bytes, until a valid byte for state 0 was detected.
For instance printf '\377\377\377A' printed '?A' (3 bad bytes, 1 '?').
This was by design to avoid excessive '?' noise.
However, other terminals (xterm), and specifically windows console
(and terminal), print one '?' for any decoding error, and also reset
the decoding state after every error.
I.e. the same input would error 3 times, and display '???A'.
Now we do the same, which also happens to simplify the code.
The reference behavior is windows console/terminal in UTF-8 codepage
(which writeCon_utf8 tries to emulate in other console codepages).
To compare, do 'chcp 65001' to set the console to UTF-8 - which also
bypasses writeCon_utf8, and check how the terminal displays some
sequence. We should be the same, up to CONFIG_SUBST_WCHAR value.
The "state" comment is updated since we no longer maintain bad state.
While at it, refine also few nearby comments.
Diffstat (limited to 'scripts')
0 files changed, 0 insertions, 0 deletions
