From 5862b6920d519974c2453529bdfd6832dd06f807 Mon Sep 17 00:00:00 2001
From: "Avi Halachmi (:avih)" <avihpit@yahoo.com>
Date: Thu, 24 Aug 2023 13:22:24 +0300
Subject: win32: UTF8_OUTPUT: speedup for big outputs

With the native Windows console, writeCon_utf8 which converts
a stream of UTF8 into console output is about 1.4x slower for big
unicode writes than the native fwrite (e.g. when the console codepage
is UTF8), which is not too bad.

However, newer versions of conhost are quicker, e.g. OpenConsole.exe
(which is conhost) which ships with the Windows terminal is about 4x
faster than the native conhost in processing (unicode?) input.

And when conhost can process inputs much quicker, it turned out that
fwrite throughput was nearly 3x better than writeCon_utf8.

Luckily, this turned out to be mainly due to the internal 256 wide
chars buffer which writeCon_utf8 uses, and that with 4096 buffer
it becomes only ~ 10% slower than fwrite, which is much better.

However, making the console window very small such that it needs to
spend very little time on rendering, makes it apparent that there's
still a difference - writeCon_utf8 is about 30% slower than fwrite,
but that's still not bad, and that's also an uncommon use case.

So this commit increases the buffer, and also allocates it dynamically
(once) to avoid abusing the stck with additional 8K in one call.
---
 win32/winansi.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/win32/winansi.c b/win32/winansi.c
index c88c096d2..591154378 100644
--- a/win32/winansi.c
+++ b/win32/winansi.c
@@ -1457,10 +1457,16 @@ static int writeCon_utf8(int fd, const char *u8buf, size_t u8siz)
 	static int state = 0;  // -1: bad, 0-3: remaining cp bytes (0: done/new)
 	static uint32_t codepoint = 0;  // accumulated from up to 4 UTF8 bytes
 
+	// not a state, only avoids re-alloc on every call
+	static const int wbufwsiz = 4096;
+	static wchar_t *wbuf = 0;
+
 	HANDLE h = (HANDLE)_get_osfhandle(fd);
-	wchar_t wbuf[256];
 	int wlen = 0;
 
+	if (!wbuf)
+		wbuf = xmalloc(wbufwsiz * sizeof(wchar_t));
+
 	// ASCII7 uses least logic, then UTF8 continuations, UTF8 lead, errors
 	while (u8siz--) {
 		unsigned char c = *u8buf++;
@@ -1512,7 +1518,7 @@ static int writeCon_utf8(int fd, const char *u8buf, size_t u8siz)
 		}
 
 		// flush if we have less than two empty spaces
-		if (wlen > ARRAY_SIZE(wbuf) - 2) {
+		if (wlen > wbufwsiz - 2) {
 			if (!WriteConsoleW(h, wbuf, wlen, 0, 0))
 				return -1;
 			wlen = 0;
-- 
cgit v1.2.3-55-g6feb