win32: UTF8_OUTPUT: recover quicker from bad byte - busybox-w32 - A mirror of https://github.com/rmyorston/busybox-w32.git

diff options

author	Avi Halachmi (:avih) <avihpit@yahoo.com>	2024-01-30 18:44:52 +0200
committer	Ron Yorston <rmy@pobox.com>	2024-01-31 08:40:21 +0000
commit	e960b0d69d3f954d50e814a6bc4d6e206bde7f66 (patch)
tree	4be87ed9e57f78e2d4b1914ba7a5eef9e218d128 /libbb
parent	a750640a87ff0bad6e59b534264dddeaf8c6923b (diff)
download	busybox-w32-e960b0d69d3f954d50e814a6bc4d6e206bde7f66.tar.gz busybox-w32-e960b0d69d3f954d50e814a6bc4d6e206bde7f66.tar.bz2 busybox-w32-e960b0d69d3f954d50e814a6bc4d6e206bde7f66.zip

win32: UTF8_OUTPUT: recover quicker from bad byte

When an unexpected value is detected in UTF-8, we should print the placeholder codepoint, and then recover whenever we detect a value which is valid for starting a new UTF-8 codepoint (including ASCII7). However, previously, we only tested recovery at the bytes following the unexpected one, and so if the first unexpected value was also valid for a new codepoint, then didn't rcover it. Now we check for recovery from the first unexpected byte, which, if recoverable, requires both placeholder printout and recovery, so the recovery "unwinding" is modified a bit to allow placeholder. Example of of a sequence which now recovers quicker than before: (where UTF-8 for U+1F600 "😀" is: 0xF0 0x9F 0x98 0x80) printf "\xF0\xF0\x9F\x98\x80A" Previously: ?A Now: ?😀A

Diffstat (limited to 'libbb')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: