win32: UTF8_OUTPUT: refine bad-sequence output - busybox-w32 - A mirror of https://github.com/rmyorston/busybox-w32.git

diff options

author	Avi Halachmi (:avih) <avihpit@yahoo.com>	2026-03-18 13:38:24 +0200
committer	Ron Yorston <rmy@pobox.com>	2026-03-19 15:04:16 +0000
commit	df652277439a30a973438577b1a370f4a7d2f47c (patch)
tree	68a2b9a321058d929d4bfec675a94041a6a04fad /scripts
parent	0b0ab67527a62f567a88fd674fbe0c2b2499c87e (diff)
download	busybox-w32-df652277439a30a973438577b1a370f4a7d2f47c.tar.gz busybox-w32-df652277439a30a973438577b1a370f4a7d2f47c.tar.bz2 busybox-w32-df652277439a30a973438577b1a370f4a7d2f47c.zip

win32: UTF8_OUTPUT: refine bad-sequence output

Previously, at writeCon_utf8, when we detected an invalid byte (for the current state), then we printed one '?' which also covered any following invalid bytes, until a valid byte for state 0 was detected. For instance printf '\377\377\377A' printed '?A' (3 bad bytes, 1 '?'). This was by design to avoid excessive '?' noise. However, other terminals (xterm), and specifically windows console (and terminal), print one '?' for any decoding error, and also reset the decoding state after every error. I.e. the same input would error 3 times, and display '???A'. Now we do the same, which also happens to simplify the code. The reference behavior is windows console/terminal in UTF-8 codepage (which writeCon_utf8 tries to emulate in other console codepages). To compare, do 'chcp 65001' to set the console to UTF-8 - which also bypasses writeCon_utf8, and check how the terminal displays some sequence. We should be the same, up to CONFIG_SUBST_WCHAR value. The "state" comment is updated since we no longer maintain bad state. While at it, refine also few nearby comments.

Diffstat (limited to 'scripts')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: