| Commit message (Collapse) | Author | Files | Lines |
|
Now that /dev/urandom and /dev/zero can be used by all applets
there's no need to mention that dd has support for them.
Saves 32-56 bytes.
|
|
The 'ESC]J' sequence in 'reset' is unnecessary.
One or other of 'ESC]3J' and the explicit call to reset_screen()
should be sufficient, but the Win 10/11 terminal requires the
former while the Win 10 console requires the latter.
Saves 4 bytes in the 32-bit build.
(GitHub issue #161)
|
|
The 'reset' applet hasn't kept pace with developments elsewhere.
We now have support for 'stty sane', so the code in 'reset' which
calls that can be enabled.
Windows 10 and 11 have updated Terminal and Console programs, with
varying features and behaviours. Add ANSI emulation code to handle
clearing the scrollback buffer and use ANSI sequences in 'reset'
to do that. It was also necessary to retain the explicit call to
reset_screen(), or 'reset' didn't work properly in the Terminal I'd
installed in Windows 10.
Adds 92-128 bytes.
(GitHub issue #161)
|
|
Commit c4f24ec9d (libbb: try to mitigate 'cat /dev/urandom')
resulted in a 'cat' to the terminal taking ~25% longer than
before.
Raising the isatty() call out of the loop reduces the penalty
to ~1%.
Adds 16 bytes in the 32-bit build.
(GitHub issue #585)
|
|
Now that '/dev/urandom' can be used directly, people with a sense
of curiosity and adventure have tried running 'cat /dev/urandom'
or 'cat </dev/urandom'.
The 'cat' applet in BusyBox is so efficient and ANSI escape
handling is so inefficient that this overwhelms the capabilities
of the Windows console or terminal to the extent that it's unable
to process Ctrl-C requests in a timely manner.
To give it a chance to catch up, force 'cat' to take a short nap
from time to time (very short, zero length) but only when it's
writing to a tty.
The call to Sleep() is in a separate function to avoid unnecessary
bloat in 32-bit builds, where its presence upsets the stack and
requires much larger code for stack access.
Adds 48 bytes.
(GitHub issue #585)
|
|
|
|
Commits e23652908 and 686a0803f (ash: fix execution of applets via
Unix-style path) were necessary following a major revision of the
shell upstream.
Another problem was found:
$ sh
$ PATH="/usr/bin;$PATH" exec non-existent
resulted in a segfault. This happened because during a PATH search
tryexec() modified argv[0] when it detected a Unix-style path (in
this case '/usr/bin/non-existent') but failed to restore it if the
execution failed.
There were other issues:
- The logic of the new code failed to match the original: the test
for a Unix-style path should only have happened if an attempt to
execute the full path had already failed.
- The test for a script running an interpreter which is an applet
also modified argv. If the execution failed argv should have
been restored to its original state.
- During a PATH search the 'path' pointer walked through the
elements of the path. This pointer was also used to determine
if an applet was overridden by an executable. This is wrong:
the full PATH variable should have been used.
The code around tryexec()/shellexec() has been rewritten to take
these issues into account.
Adds 32-48 bytes.
(GitHub issue #584)
|
|
It wasn't possible to build ash if FEATURE_SH_STANDALONE was
disabled.
This issue was introduced during the large merge of upstream
changes to ash in commit e23652908.
|
|
Silence a compiler warning that 'arg' is an unused variable if
neither stty nor ttysize is defined.
|
|
I've been far too lax in applying FAST_FUNC annotations. These
can make function calls smaller and faster, but only on 32-bit
systems and for non-static, non-variadic functions with non-void
arguments.
Saves 4680 bytes in the 32-bit build; 64-bit builds are unaffected.
|
|
Using 'depends on FEATURE_SYSLOG' is unreliable because
FEATURE_SYSLOG can't be set directly. A POSIX build with all
applets that need syslog disabled *except* crond may fail.
Revert to using 'select FEATURE_SYSLOG' in crond and exclude
the unused syslog code on Windows.
If crond is unable to run sendmail report this in the error
message.
Update all default Windows configurations.
(GitHub PR #561)
|
|
* Enable crontab
* Enable crond
* Allow sendmail to be used in crond
This also adds another option to the mingw_popen_fd function to capture
stderr as well. And also added the options to support WNOHANG as well
as an unused status in mingw_wait3
* Check current uid when running crond
* Only show tail for mailto jobs
* Use system drive for cron
* crond: add missing feature check
* crond: use spawn_detach for background operation
|
|
Support for /dev/urandom and /dev/zero relies on having a working
dd. If the dd applet is disabled attempts to access them should
fail.
Theoretically a third-party dd utility could also work, but let's
not encourage that sort of heresy.
|
|
Rearrange the code to avoid a superfluous process creation.
Saves 16 bytes in the 64-bit build, adds 24 in the 32-bit.
|
|
Previously the special devices /dev/urandom and /dev/zero could
only be used via the 'dd' applet. They were also used internally
by some applets.
Expose /dev/urandom and /dev/zero so they can be used as file
arguments to applets and a source of redirections in the shell.
They're also now visible to 'stat' and 'ls', but only if given
as an explicit argument.
Adds 122-144 bytes.
(GitHub issues #98, #282)
|
|
Commit aed19625ff (Post-merge fixes) attempted to handle some
upstream code shuffling which broke support for https in wget
with FEATURE_USE_CNG_API enabled and CONFIG_FEATURE_TLS_SCHANNEL
disabled (i.e. using the upstream internal TLS code).
Unfortunately it resulted in the HMAC context being freed twice
which caused failures in certain cases.
Fix this by not freeing the context in hmac_blocks(), thus making
that function match upstream again.
Thanks to @avih for identifying the problem.
(GitHub issue #582)
|
|
The Microsoft Windows Schannel implementation of TLS validates
the server certificate. Enable the --no-check-certificate
option to wget to allow these checks to be skipped. This may
be useful to connect to badly configured websites.
Adds 202 bytes to the x86_64 build with Schannel enabled.
(GitHub issue #581)
|
|
* win32: fnmatch2 minor refinements (no-op)
With beacket, add '\' to the switch/case, and at fnmatch, pre-calc
the two most used flags (CASEFOLD and PATHNAME) once on init.
Negligible perf impact, and happens to save few bytes in x64.
* win32: actype: trivial refinments (no-op)
NULL was used in actype.h but requires stddef.h - which we don't
include, so use (int*)0 which also expresses it more clearly.
Remove "#include <string.h>" at actype.c, which we don't need anymore.
* scripts/patbench.sh: make executable (chmod +x)
|
|
Commit 2275a53f0 (patch: handle files with no final newline) was
incomplete. It only detected a line with no newline in the first
section of a hunk. Add the code necessary to detect the '\ No
newline at end of file' warning in the second section.
Also add some tests.
Adds 48-80 bytes.
(GitHub issue #575)
|
|
This was done to be more concise, but it also happens to speed up
the two slowest test patterns by about ~ 30%, and without measurable
impact on any of the other test patterns (nor the correctness).
Maybe the stars aligned for compiler optimizations. We'll take it.
Here are results of scripts/patbench.sh with recent iterations of
fnmatch, from most recent to the original old fnmatch:
CURRENT results:
[ 1] 107 ms OK (M:-----) ''
[ 2] 105 ms OK (M:12345) '*'
[ 3] 106 ms OK (M:-----) '@'
[ 4] 111 ms OK (M:1----) 'Lorem*'
[ 5] 122 ms OK (M:-2---) '*quis'
[ 6] 133 ms OK (M:--3--) '*nisi*'
[ 7] 229 ms OK (M:--3--) '*[fobar]xyz*'
[ 8] 451 ms OK (M:--3--) '*[foobarfoobarfoobarfoobar]xyz*'
[ 9] 463 ms OK (M:-2---) '*[!foobarfoobarfoobarfoobar]xyz*'
[10] 136 ms OK (M:12345) '*[hello world, this is a test]*'
[11] 126 ms OK (M:12345) '*[!hello world, this is a test]*'
[12] 169 ms OK (M:1-3--) '*[*'
[13] 175 ms OK (M:1----) '*[]*'
[14] 171 ms OK (M:--3--) '*[!]*'
[15] 162 ms OK (M:-23-5) '*[!]].*'
[16] 243 ms OK (M:1----) '*[![:print:]]*'
[17] 135 ms OK (M:-23--) '*z*'
[18] 131 ms OK (M:---4-) '*-*'
[19] 157 ms OK (M:---4-) '*[-]*'
[20] 165 ms OK (M:-234-) '*[-z]*'
[21] 176 ms OK (M:-234-) '*[z-]*'
[22] 182 ms OK (M:-234-) '*[-z-]*'
[23] 130 ms OK (M:1-3--) '*]*'
[24] 153 ms OK (M:1-3--) '*[]]*'
[25] 164 ms OK (M:1-34-) '*[]-]*'
[26] 167 ms OK (M:123--) '*[]-_]*'
[27] 168 ms OK (M:1234-) '*[]-_-]*'
[28] 122 ms OK (M:12-45) '*[0-9A-Za-z][0-9A-Za-z]'
[29] 126 ms OK (M:12-45) '*[[:alnum:]][[:alnum:]]'
[30] 118 ms OK (M:12-45) '*[a-z][a-z]'
[31] 127 ms OK (M:12-45) '*[a-bc-de-z][a-bc-de-z]'
[32] 131 ms OK (M:12-45) '*[[:lower:]][[:lower:]]'
[33] 136 ms OK (M:-----) '*[0-9A-Fa-ef][0-9A-Fa-ef]'
[34] 124 ms OK (M:-----) '*[[:xdigit:]][[:xdigit:]]'
[35] 128 ms OK (M:12-45) '*[0-9A-Za-z]*[0-9A-Za-z]'
[36] 129 ms OK (M:12-45) '*[[:alnum:]]*[[:alnum:]]'
[37] 127 ms OK (M:12-45) '*[a-z]*[a-z]'
[38] 123 ms OK (M:12-45) '*[a-bc-de-z]*[a-bc-de-z]'
[39] 124 ms OK (M:12-45) '*[[:lower:]]*[[:lower:]]'
[40] 136 ms OK (M:---4-) '*[0-9A-Fa-ef]*[0-9A-Fa-ef]'
[41] 134 ms OK (M:---4-) '*[[:xdigit:]]*[[:xdigit:]]'
[42] 136 ms OK (M:12-45) '*[0-9A-Za-z]*[0-9A-Za-z]*[0-9A-Za-z]'
[43] 131 ms OK (M:12-45) '*[[:alnum:]]*[[:alnum:]]*[[:alnum:]]'
[44] 124 ms OK (M:12-45) '*[a-z]*[a-z]*[a-z]'
[45] 137 ms OK (M:12-45) '*[a-bc-de-z]*[a-bc-de-z]*[a-bc-de-z]'
[46] 129 ms OK (M:12-45) '*[[:lower:]]*[[:lower:]]*[[:lower:]]'
[47] 160 ms OK (M:---4-) '*[0-9A-Fa-ef]*[0-9A-Fa-ef]*[0-9A-Fa-ef]'
[48] 164 ms OK (M:---4-) '*[[:xdigit:]]*[[:xdigit:]]*[[:xdigit:]]'
[49] 152 ms OK (M:12-45) '*[0-9A-Za-z]*[0-9A-Za-z]*[0-9A-Za-z]*[0-9A-Za-z]'
[50] 152 ms OK (M:12-45) '*[[:alnum:]]*[[:alnum:]]*[[:alnum:]]*[[:alnum:]]'
[51] 130 ms OK (M:12-45) '*[a-z]*[a-z]*[a-z]*[a-z]'
[52] 165 ms OK (M:12-45) '*[a-bc-de-z]*[a-bc-de-z]*[a-bc-de-z]*[a-bc-de-z]'
[53] 146 ms OK (M:12-45) '*[[:lower:]]*[[:lower:]]*[[:lower:]]*[[:lower:]]'
[54] 180 ms OK (M:---4-) '*[0-9A-Fa-ef]*[0-9A-Fa-ef]*[0-9A-Fa-ef]*[0-9A-Fa-ef]'
[55] 189 ms OK (M:---4-) '*[[:xdigit:]]*[[:xdigit:]]*[[:xdigit:]]*[[:xdigit:]]'
Previous commit (if-else-...) is mainly slower in patterns 8 and 9:
[ 6] 143 ms OK (M:--3--) '*nisi*'
[ 7] 251 ms OK (M:--3--) '*[fobar]xyz*'
[ 8] 638 ms OK (M:--3--) '*[foobarfoobarfoobarfoobar]xyz*'
[ 9] 651 ms OK (M:-2---) '*[!foobarfoobarfoobarfoobar]xyz*'
[10] 138 ms OK (M:12345) '*[hello world, this is a test]*'
[11] 137 ms OK (M:12345) '*[!hello world, this is a test]*'
Initial fnmatch2 commit (without any optimizations):
[ 1] 101 ms OK (M:-----) ''
[ 2] 123 ms OK (M:12345) '*'
[ 3] 105 ms OK (M:-----) '@'
[ 4] 115 ms OK (M:1----) 'Lorem*'
[ 5] 134 ms OK (M:-2---) '*quis'
[ 6] 131 ms OK (M:--3--) '*nisi*'
[ 7] 235 ms OK (M:--3--) '*[fobar]xyz*'
[ 8] 586 ms OK (M:--3--) '*[foobarfoobarfoobarfoobar]xyz*'
[ 9] 613 ms OK (M:-2---) '*[!foobarfoobarfoobarfoobar]xyz*'
[10] 153 ms OK (M:12345) '*[hello world, this is a test]*'
[11] 160 ms OK (M:12345) '*[!hello world, this is a test]*'
[12] 164 ms OK (M:1-3--) '*[*'
[13] 165 ms OK (M:1----) '*[]*'
[14] 157 ms OK (M:--3--) '*[!]*'
[15] 157 ms OK (M:-23-5) '*[!]].*'
[16] 244 ms OK (M:1----) '*[![:print:]]*'
[17] 132 ms OK (M:-23--) '*z*'
[18] 134 ms OK (M:---4-) '*-*'
[19] 160 ms OK (M:---4-) '*[-]*'
[20] 157 ms OK (M:-234-) '*[-z]*'
[21] 168 ms OK (M:-234-) '*[z-]*'
[22] 166 ms OK (M:-234-) '*[-z-]*'
[23] 130 ms OK (M:1-3--) '*]*'
[24] 150 ms OK (M:1-3--) '*[]]*'
[25] 151 ms OK (M:1-34-) '*[]-]*'
[26] 157 ms OK (M:123--) '*[]-_]*'
[27] 156 ms OK (M:1234-) '*[]-_-]*'
[28] 402 ms OK (M:12-45) '*[0-9A-Za-z][0-9A-Za-z]'
[29] 416 ms OK (M:12-45) '*[[:alnum:]][[:alnum:]]'
[30] 265 ms OK (M:12-45) '*[a-z][a-z]'
[31] 410 ms OK (M:12-45) '*[a-bc-de-z][a-bc-de-z]'
[32] 426 ms OK (M:12-45) '*[[:lower:]][[:lower:]]'
[33] 347 ms OK (M:-----) '*[0-9A-Fa-ef][0-9A-Fa-ef]'
[34] 370 ms OK (M:-----) '*[[:xdigit:]][[:xdigit:]]'
[35] 294 ms OK (M:12-45) '*[0-9A-Za-z]*[0-9A-Za-z]'
[36] 301 ms OK (M:12-45) '*[[:alnum:]]*[[:alnum:]]'
[37] 214 ms OK (M:12-45) '*[a-z]*[a-z]'
[38] 304 ms OK (M:12-45) '*[a-bc-de-z]*[a-bc-de-z]'
[39] 302 ms OK (M:12-45) '*[[:lower:]]*[[:lower:]]'
[40] 306 ms OK (M:---4-) '*[0-9A-Fa-ef]*[0-9A-Fa-ef]'
[41] 291 ms OK (M:---4-) '*[[:xdigit:]]*[[:xdigit:]]'
[42] 291 ms OK (M:12-45) '*[0-9A-Za-z]*[0-9A-Za-z]*[0-9A-Za-z]'
[43] 302 ms OK (M:12-45) '*[[:alnum:]]*[[:alnum:]]*[[:alnum:]]'
[44] 208 ms OK (M:12-45) '*[a-z]*[a-z]*[a-z]'
[45] 314 ms OK (M:12-45) '*[a-bc-de-z]*[a-bc-de-z]*[a-bc-de-z]'
[46] 288 ms OK (M:12-45) '*[[:lower:]]*[[:lower:]]*[[:lower:]]'
[47] 315 ms OK (M:---4-) '*[0-9A-Fa-ef]*[0-9A-Fa-ef]*[0-9A-Fa-ef]'
[48] 340 ms OK (M:---4-) '*[[:xdigit:]]*[[:xdigit:]]*[[:xdigit:]]'
[49] 309 ms OK (M:12-45) '*[0-9A-Za-z]*[0-9A-Za-z]*[0-9A-Za-z]*[0-9A-Za-z]'
[50] 304 ms OK (M:12-45) '*[[:alnum:]]*[[:alnum:]]*[[:alnum:]]*[[:alnum:]]'
[51] 217 ms OK (M:12-45) '*[a-z]*[a-z]*[a-z]*[a-z]'
[52] 317 ms OK (M:12-45) '*[a-bc-de-z]*[a-bc-de-z]*[a-bc-de-z]*[a-bc-de-z]'
[53] 322 ms OK (M:12-45) '*[[:lower:]]*[[:lower:]]*[[:lower:]]*[[:lower:]]'
[54] 315 ms OK (M:---4-) '*[0-9A-Fa-ef]*[0-9A-Fa-ef]*[0-9A-Fa-ef]*[0-9A-Fa-ef]'
[55] 348 ms OK (M:---4-) '*[[:xdigit:]]*[[:xdigit:]]*[[:xdigit:]]*[[:xdigit:]]'
Old fnmatch with actype/isactype in O(1):
[ 1] 103 ms OK (M:-----) ''
[ 2] 116 ms OK (M:12345) '*'
[ 3] 108 ms OK (M:-----) '@'
[ 4] 112 ms OK (M:1----) 'Lorem*'
[ 5] 122 ms OK (M:-2---) '*quis'
[ 6] 123 ms OK (M:--3--) '*nisi*'
[ 7] 270 ms OK (M:--3--) '*[fobar]xyz*'
[ 8] 650 ms OK (M:--3--) '*[foobarfoobarfoobarfoobar]xyz*'
[ 9] 681 ms OK (M:-2---) '*[!foobarfoobarfoobarfoobar]xyz*'
[10] 133 ms OK (M:12345) '*[hello world, this is a test]*'
[11] 135 ms OK (M:12345) '*[!hello world, this is a test]*'
[12] 191 ms E:1-3-- (M:-----) '*[*'
[13] 213 ms E:1---- (M:-----) '*[]*'
[14] 215 ms E:--3-- (M:-----) '*[!]*'
[15] 182 ms OK (M:-23-5) '*[!]].*'
[16] 296 ms OK (M:1----) '*[![:print:]]*'
[17] 121 ms OK (M:-23--) '*z*'
[18] 115 ms OK (M:---4-) '*-*'
[19] 167 ms OK (M:---4-) '*[-]*'
[20] 189 ms OK (M:-234-) '*[-z]*'
[21] 194 ms OK (M:-234-) '*[z-]*'
[22] 203 ms OK (M:-234-) '*[-z-]*'
[23] 122 ms OK (M:1-3--) '*]*'
[24] 166 ms OK (M:1-3--) '*[]]*'
[25] 185 ms OK (M:1-34-) '*[]-]*'
[26] 163 ms OK (M:123--) '*[]-_]*'
[27] 176 ms OK (M:1234-) '*[]-_-]*'
[28] 381 ms OK (M:12-45) '*[0-9A-Za-z][0-9A-Za-z]'
[29] 510 ms OK (M:12-45) '*[[:alnum:]][[:alnum:]]'
[30] 274 ms OK (M:12-45) '*[a-z][a-z]'
[31] 379 ms OK (M:12-45) '*[a-bc-de-z][a-bc-de-z]'
[32] 488 ms OK (M:12-45) '*[[:lower:]][[:lower:]]'
[33] 340 ms OK (M:-----) '*[0-9A-Fa-ef][0-9A-Fa-ef]'
[34] 398 ms OK (M:-----) '*[[:xdigit:]][[:xdigit:]]'
[35] 1402 ms OK (M:12-45) '*[0-9A-Za-z]*[0-9A-Za-z]'
[36] 1962 ms OK (M:12-45) '*[[:alnum:]]*[[:alnum:]]'
[37] 939 ms OK (M:12-45) '*[a-z]*[a-z]'
[38] 1453 ms OK (M:12-45) '*[a-bc-de-z]*[a-bc-de-z]'
[39] 1965 ms OK (M:12-45) '*[[:lower:]]*[[:lower:]]'
[40] 1683 ms OK (M:---4-) '*[0-9A-Fa-ef]*[0-9A-Fa-ef]'
[41] 2117 ms OK (M:---4-) '*[[:xdigit:]]*[[:xdigit:]]'
(aborted manually due to exponential slowness)
Old fnmatch (before actype/isactype):
[ 1] 107 ms OK (M:-----) ''
[ 2] 108 ms OK (M:12345) '*'
[ 3] 104 ms OK (M:-----) '@'
[ 4] 112 ms OK (M:1----) 'Lorem*'
[ 5] 120 ms OK (M:-2---) '*quis'
[ 6] 127 ms OK (M:--3--) '*nisi*'
[ 7] 244 ms OK (M:--3--) '*[fobar]xyz*'
[ 8] 544 ms OK (M:--3--) '*[foobarfoobarfoobarfoobar]xyz*'
[ 9] 588 ms OK (M:-2---) '*[!foobarfoobarfoobarfoobar]xyz*'
[10] 129 ms OK (M:12345) '*[hello world, this is a test]*'
[11] 136 ms OK (M:12345) '*[!hello world, this is a test]*'
[12] 192 ms E:1-3-- (M:-----) '*[*'
[13] 201 ms E:1---- (M:-----) '*[]*'
[14] 203 ms E:--3-- (M:-----) '*[!]*'
[15] 182 ms OK (M:-23-5) '*[!]].*'
[16] 454 ms OK (M:1----) '*[![:print:]]*'
[17] 124 ms OK (M:-23--) '*z*'
[18] 123 ms OK (M:---4-) '*-*'
[19] 160 ms OK (M:---4-) '*[-]*'
[20] 175 ms OK (M:-234-) '*[-z]*'
[21] 177 ms OK (M:-234-) '*[z-]*'
[22] 197 ms OK (M:-234-) '*[-z-]*'
[23] 126 ms OK (M:1-3--) '*]*'
[24] 168 ms OK (M:1-3--) '*[]]*'
[25] 173 ms OK (M:1-34-) '*[]-]*'
[26] 168 ms OK (M:123--) '*[]-_]*'
[27] 170 ms OK (M:1234-) '*[]-_-]*'
[28] 365 ms OK (M:12-45) '*[0-9A-Za-z][0-9A-Za-z]'
[29] 513 ms OK (M:12-45) '*[[:alnum:]][[:alnum:]]'
[30] 263 ms OK (M:12-45) '*[a-z][a-z]'
[31] 376 ms OK (M:12-45) '*[a-bc-de-z][a-bc-de-z]'
[32] 815 ms OK (M:12-45) '*[[:lower:]][[:lower:]]'
[33] 311 ms OK (M:-----) '*[0-9A-Fa-ef][0-9A-Fa-ef]'
[34] 756 ms OK (M:-----) '*[[:xdigit:]][[:xdigit:]]'
[35] 1305 ms OK (M:12-45) '*[0-9A-Za-z]*[0-9A-Za-z]'
[36] 1928 ms OK (M:12-45) '*[[:alnum:]]*[[:alnum:]]'
[37] 919 ms OK (M:12-45) '*[a-z]*[a-z]'
[38] 1446 ms OK (M:12-45) '*[a-bc-de-z]*[a-bc-de-z]'
[39] 3192 ms OK (M:12-45) '*[[:lower:]]*[[:lower:]]'
[40] 1586 ms OK (M:---4-) '*[0-9A-Fa-ef]*[0-9A-Fa-ef]'
[41] 4601 ms OK (M:---4-) '*[[:xdigit:]]*[[:xdigit:]]'
[42] 22491 ms OK (M:12-45) '*[0-9A-Za-z]*[0-9A-Za-z]*[0-9A-Za-z]'
(aborted manually due to exponential slowness)
|
|
This adds two fairly cheap and effective optimizations at the tail
of the pattern (after the last '*'). Speeds up many cases by ~ x2+.
Details at the comment inside.
Adds 192 bytes in x64.
|
|
win32/fnmatch.c has several issues:
- *[* doesn't match foo[bar - but it should (non-special '[').
- FMN_CASEFOLD is not handled corrrectly. It does tolower on pattern
and string, which doesn't always work, e.g. *[@-[]* correctly
matches '@', A-Z, and '[', but as case-insensitive it only matches
'@' and '[' (tolower on the string breaks all the alpha matches):
set -o nocaseglob; echo *[@-[]*
set +o nocaseglob; echo *[@-[]*
- According to posix, negative bracket range, like [z-a], should be
either invalid or empty, but instead it matches only 'z'.
- There could be more issues hiding - the code is not easy to follow.
- It's exponential time in the number of '*' in pattern.
This commit adds win32/fnmatch2.c and disables win32/fnmatch.c
(currently using "#if 0"), but still keeps it in the tree.
The new implementation is written from scratch, non-recursive and
linear time, improves POSIX compliance, and hopefully more readable
and fixable if needed. Also, it's about half the size at the binary
(saves 908 bytes in x64).
See implementation details and choices as comment at fnmatch2.c .
|
|
It was already disabled (ASH_OPTIMIZE_FOR_SIZE is 0 by default),
and it saved ~ 40 bytes but with big hit on performance, so not
really worth it. This also makes actype.c less noisy.
|
|
Unify actype/actail into a single function, and make actype
a wrapper macro which invokes actail(str, NULL) (to avoid calling
a function actype which then trampolines to actail with NULL).
This concludes the integration of actype/isactype.
Overall:
- Unified/standard-ish char-class handling in fnmatch/regcomp/tr.
- Saved about 1400 bytes at the binary as x64.
- regcomp.c is negligibly faster (actype/isactype are O(1)).
- tr.c now also supports [:graph:] and [:print:], and not slower.
fnmatch.c (tested using scripts/patbench.sh):
Before actype was integrated, alnum was fastest and xdigit slowest
due to their order in the names strings list and in the switch/case:
[35] 1305 ms OK (M:12-45) '*[0-9A-Za-z]*[0-9A-Za-z]'
[36] 1928 ms OK (M:12-45) '*[[:alnum:]]*[[:alnum:]]'
[38] 1446 ms OK (M:12-45) '*[a-bc-de-z]*[a-bc-de-z]'
[39] 3192 ms OK (M:12-45) '*[[:lower:]]*[[:lower:]]'
[40] 1586 ms OK (M:---4-) '*[0-9A-Fa-ef]*[0-9A-Fa-ef]'
[41] 4601 ms OK (M:---4-) '*[[:xdigit:]]*[[:xdigit:]]'
Now all classes are similar relative to an equivalent range,
and just barely slower than "alnum" before actype was added
(still slower than range due to additional temp name buffer):
[35] 1402 ms OK (M:12-45) '*[0-9A-Za-z]*[0-9A-Za-z]'
[36] 1962 ms OK (M:12-45) '*[[:alnum:]]*[[:alnum:]]'
[38] 1453 ms OK (M:12-45) '*[a-bc-de-z]*[a-bc-de-z]'
[39] 1965 ms OK (M:12-45) '*[[:lower:]]*[[:lower:]]'
[40] 1683 ms OK (M:---4-) '*[0-9A-Fa-ef]*[0-9A-Fa-ef]'
[41] 2117 ms OK (M:---4-) '*[[:xdigit:]]*[[:xdigit:]]'
|
|
The premise, and implementation, is simple: instead of doing string
search in a list of strings for the class name - which is fast for the
first strings in the list but slow for the last strings, use perfect
hash to map the string directly to a potential class to match in O(1).
The switch/case in isactype is also changed to a function table in O(1)
(the compiler might have optimized this switch to O(1) jump-table, but
now we don't leave it to chance).
Two implementations are supported: one which allows testing actype_t
value against specific classes like AC_ALNUM etc, and one which doesn't
allow that (just like wctype/iswctype) and is more efficient.
Since nothing uses specific class matches anymore, default is the more
efficient choice - where actype_t is a function pointer to isalnum etc.
And while at it, also change the enum names CCLASS_ALNUM etc to more
appropriate AC_ALNUM etc, to use the AC "namespace".
|
|
tr implements its own char classes setup, using index_in_strings and
"precompiling" the various classes chars. We can replace the lot with
a single loop of actail/isactype, which saves about 500 bytes.
Additionally, for unclear reasons, it didn't support [:graph:] and
[:print:], maybe because isgraph and isprint are disabled in libbb.h
because they're locale-dependent.
The new implementation using ac* does add support for these, and so tr
on windows now supports e.g. 'tr [:print:] X' which it didn't before.
This is the first use of actail, and demonstrates the usecases for it
(win32/fnmatch.c could use it as well, but the current code is too
branched to use it while keeping identical behavior, and regcomp.c
extracts the name from [:NAME:] on its own).
|
|
This saves more than 1K size because the original switch code expanded
a two-loops macro - BUILD_CHARCLASS_LOOP in each of the 12 "case".
Performance is non critical here in regcomp, but nevertheless, a future
commit will optimize actype/isactype, and especially isactype_not0
(which is used here with non-0 type).
Also:
- "#if 0" was replaced with "#ifndef CONFIG_BUSYBOX" for clarity.
- is_blank is no longer used, and disabled in this commit.
|
|
This preserves the original semantics where only ASCII chars are
tested, and unknown class name is considered known-but-unmatched name,
which is different than the equivalent wctype code just above.
The ASCII test is redundant, at least on windows - where all the
class tests (isalnum, isupper, etc) can only return true for ASCII
values, but nevertheless it's kept as the original semantics.
|
|
TL;DR:
- Add actype/isactype, like POSIX wctype/iswctype, but single-byte.
- actype is a renamed match_class, isactype is yet unused.
- This can be used in win32/{fnmatch,regcomp}.c, and elsewhere - tr.c.
Details:
win32/fnmatch.c and win32/regcomp.c used match_class to get char-class
index for char-class name, which is then used in a switch statement to
choose the single-byte isupper, isalpha, etc.
This is analogous to the POSIX wctype/iswctype, except that wctype
returns 0 for unknown name but match_class returned -1.
This commit renames match_class to actype:
- Returns 0 for unknown name, which at the enum is a new CCLASS_NONE.
- Return type is actype_t (still enum), analogous to wctype_t.
- Rename match_class* to actype* at fname/kbuild/includes/usage.
Both fnmatch.c and regcomp.c don't care whether unknown name in
match_class/actype is -1 (match_class) or 0 (actype). All they care
about is the values at each "case".
Also implemented, but not yet used:
- isactype(c, t): test whether char c is of type t.
- isactype_not0(c, t): when t is not-0 - same as isactype but faster.
- actail(s, int*): test for "NAME:]..." without temporary buffer.
Note that this commit focuses on exposing a useful API rather than
performance. Future commits may improve actype/isactype performance.
|
|
Few minor improvements:
- Add few valid but edge-case patterns (start with ']' etc).
- Add the expected result for default lines, and report ok/err.
- Disable measurements with -mn (to be measured externally).
Also, the default lines were modified slightly so that the new
edge cases results are largely distinct, so it can be used as
correctness test with -d (and exit code is 1 on errors).
BENCHMARK RESULTS
Below are results of running this script using busybox-w32 on win10,
and using busybox on Alpine linux 3.23.
Observations of busybox-w32 results, which uses old glibc fnmatch:
[10] 129 ms OK (M:12345) '*[hello world, this is a test]*'
[42] 22491 ms OK (M:12-45) '*[0-9A-Za-z]*[0-9A-Za-z]*[0-9A-Za-z]'
The current busybox-w32 implementation is exponential, and while
tests of simple patterns complete in 100-200ms, patterns with 3 '*'
can take 30s or more to complete, way longer with 4 '*', etc.
[12] 192 ms E:1-3-- (M:-----) '*[*'
[13] 201 ms E:1---- (M:-----) '*[]*'
[14] 203 ms E:--3-- (M:-----) '*[!]*'
A bracket which doesn't start a bracket expression is regular char,
and all these are missing the closing bracket (by POSIX rules), and
should be matched literally, but the current fnmatch fails with that.
[35] 1305 ms OK (M:12-45) '*[0-9A-Za-z]*[0-9A-Za-z]'
[36] 1928 ms OK (M:12-45) '*[[:alnum:]]*[[:alnum:]]'
[38] 1446 ms OK (M:12-45) '*[a-bc-de-z]*[a-bc-de-z]'
[39] 3192 ms OK (M:12-45) '*[[:lower:]]*[[:lower:]]'
[40] 1586 ms OK (M:---4-) '*[0-9A-Fa-ef]*[0-9A-Fa-ef]'
[41] 4601 ms OK (M:---4-) '*[[:xdigit:]]*[[:xdigit:]]'
charclass [:alnum:] is x1.4 slower than an equivalent non-class range,
but [:lower:] is already x2 slower than the equivalent, and [:xdigit:]
is x3 slower than the equivalent. This is because the name is searched
in a list which starts with "alnum" and ends in "xdigit", and the
search duration (and the following switch in the same order) clearly
affects how quickly a pattern can be tested.
There are other issues in busybox-w32 fnmatch, like incorrect
case-insensitivity and more, but which don't manifest in case...esac.
Benchmark results with busybox-w32 on Windows 10 (compiled with gcc):
[ 1] 107 ms OK (M:-----) ''
[ 2] 108 ms OK (M:12345) '*'
[ 3] 104 ms OK (M:-----) '@'
[ 4] 112 ms OK (M:1----) 'Lorem*'
[ 5] 120 ms OK (M:-2---) '*quis'
[ 6] 127 ms OK (M:--3--) '*nisi*'
[ 7] 244 ms OK (M:--3--) '*[fobar]xyz*'
[ 8] 544 ms OK (M:--3--) '*[foobarfoobarfoobarfoobar]xyz*'
[ 9] 588 ms OK (M:-2---) '*[!foobarfoobarfoobarfoobar]xyz*'
[10] 129 ms OK (M:12345) '*[hello world, this is a test]*'
[11] 136 ms OK (M:12345) '*[!hello world, this is a test]*'
[12] 192 ms E:1-3-- (M:-----) '*[*'
[13] 201 ms E:1---- (M:-----) '*[]*'
[14] 203 ms E:--3-- (M:-----) '*[!]*'
[15] 182 ms OK (M:-23-5) '*[!]].*'
[16] 454 ms OK (M:1----) '*[![:print:]]*'
[17] 124 ms OK (M:-23--) '*z*'
[18] 123 ms OK (M:---4-) '*-*'
[19] 160 ms OK (M:---4-) '*[-]*'
[20] 175 ms OK (M:-234-) '*[-z]*'
[21] 177 ms OK (M:-234-) '*[z-]*'
[22] 197 ms OK (M:-234-) '*[-z-]*'
[23] 126 ms OK (M:1-3--) '*]*'
[24] 168 ms OK (M:1-3--) '*[]]*'
[25] 173 ms OK (M:1-34-) '*[]-]*'
[26] 168 ms OK (M:123--) '*[]-_]*'
[27] 170 ms OK (M:1234-) '*[]-_-]*'
[28] 365 ms OK (M:12-45) '*[0-9A-Za-z][0-9A-Za-z]'
[29] 513 ms OK (M:12-45) '*[[:alnum:]][[:alnum:]]'
[30] 263 ms OK (M:12-45) '*[a-z][a-z]'
[31] 376 ms OK (M:12-45) '*[a-bc-de-z][a-bc-de-z]'
[32] 815 ms OK (M:12-45) '*[[:lower:]][[:lower:]]'
[33] 311 ms OK (M:-----) '*[0-9A-Fa-ef][0-9A-Fa-ef]'
[34] 756 ms OK (M:-----) '*[[:xdigit:]][[:xdigit:]]'
[35] 1305 ms OK (M:12-45) '*[0-9A-Za-z]*[0-9A-Za-z]'
[36] 1928 ms OK (M:12-45) '*[[:alnum:]]*[[:alnum:]]'
[37] 919 ms OK (M:12-45) '*[a-z]*[a-z]'
[38] 1446 ms OK (M:12-45) '*[a-bc-de-z]*[a-bc-de-z]'
[39] 3192 ms OK (M:12-45) '*[[:lower:]]*[[:lower:]]'
[40] 1586 ms OK (M:---4-) '*[0-9A-Fa-ef]*[0-9A-Fa-ef]'
[41] 4601 ms OK (M:---4-) '*[[:xdigit:]]*[[:xdigit:]]'
[42] 22491 ms OK (M:12-45) '*[0-9A-Za-z]*[0-9A-Za-z]*[0-9A-Za-z]'
(aborted manually - exponential implementation becomes very slow)
results with busybox on Alpine linux 3.23 (musl-libc has linear-time
fnmatch, using wide-char wctype/iswctype - slightly slower than char):
[ 1] 135 ms OK (M:-----) ''
[ 2] 141 ms OK (M:12345) '*'
[ 3] 139 ms OK (M:-----) '@'
[ 4] 141 ms OK (M:1----) 'Lorem*'
[ 5] 148 ms OK (M:-2---) '*quis'
[ 6] 216 ms OK (M:--3--) '*nisi*'
[ 7] 410 ms OK (M:--3--) '*[fobar]xyz*'
[ 8] 787 ms OK (M:--3--) '*[foobarfoobarfoobarfoobar]xyz*'
[ 9] 844 ms OK (M:-2---) '*[!foobarfoobarfoobarfoobar]xyz*'
[10] 215 ms OK (M:12345) '*[hello world, this is a test]*'
[11] 237 ms OK (M:12345) '*[!hello world, this is a test]*'
[12] 206 ms OK (M:1-3--) '*[*'
[13] 242 ms OK (M:1----) '*[]*'
[14] 220 ms OK (M:--3--) '*[!]*'
[15] 306 ms OK (M:-23-5) '*[!]].*'
[16] 591 ms OK (M:1----) '*[![:print:]]*'
[17] 216 ms OK (M:-23--) '*z*'
[18] 198 ms OK (M:---4-) '*-*'
[19] 286 ms OK (M:---4-) '*[-]*'
[20] 285 ms OK (M:-234-) '*[-z]*'
[21] 291 ms OK (M:-234-) '*[z-]*'
[22] 299 ms OK (M:-234-) '*[-z-]*'
[23] 228 ms OK (M:1-3--) '*]*'
[24] 286 ms OK (M:1-3--) '*[]]*'
[25] 267 ms OK (M:1-34-) '*[]-]*'
[26] 349 ms OK (M:123--) '*[]-_]*'
[27] 292 ms OK (M:1234-) '*[]-_-]*'
[28] 198 ms OK (M:12-45) '*[0-9A-Za-z][0-9A-Za-z]'
[29] 201 ms OK (M:12-45) '*[[:alnum:]][[:alnum:]]'
[30] 179 ms OK (M:12-45) '*[a-z][a-z]'
[31] 217 ms OK (M:12-45) '*[a-bc-de-z][a-bc-de-z]'
[32] 205 ms OK (M:12-45) '*[[:lower:]][[:lower:]]'
[33] 202 ms OK (M:-----) '*[0-9A-Fa-ef][0-9A-Fa-ef]'
[34] 203 ms OK (M:-----) '*[[:xdigit:]][[:xdigit:]]'
[35] 215 ms OK (M:12-45) '*[0-9A-Za-z]*[0-9A-Za-z]'
[36] 213 ms OK (M:12-45) '*[[:alnum:]]*[[:alnum:]]'
[37] 191 ms OK (M:12-45) '*[a-z]*[a-z]'
[38] 219 ms OK (M:12-45) '*[a-bc-de-z]*[a-bc-de-z]'
[39] 224 ms OK (M:12-45) '*[[:lower:]]*[[:lower:]]'
[40] 217 ms OK (M:---4-) '*[0-9A-Fa-ef]*[0-9A-Fa-ef]'
[41] 199 ms OK (M:---4-) '*[[:xdigit:]]*[[:xdigit:]]'
[42] 227 ms OK (M:12-45) '*[0-9A-Za-z]*[0-9A-Za-z]*[0-9A-Za-z]'
[43] 239 ms OK (M:12-45) '*[[:alnum:]]*[[:alnum:]]*[[:alnum:]]'
[44] 204 ms OK (M:12-45) '*[a-z]*[a-z]*[a-z]'
[45] 258 ms OK (M:12-45) '*[a-bc-de-z]*[a-bc-de-z]*[a-bc-de-z]'
[46] 235 ms OK (M:12-45) '*[[:lower:]]*[[:lower:]]*[[:lower:]]'
[47] 247 ms OK (M:---4-) '*[0-9A-Fa-ef]*[0-9A-Fa-ef]*[0-9A-Fa-ef]'
[48] 247 ms OK (M:---4-) '*[[:xdigit:]]*[[:xdigit:]]*[[:xdigit:]]'
[49] 286 ms OK (M:12-45) '*[0-9A-Za-z]*[0-9A-Za-z]*[0-9A-Za-z]*[0-9A-Za-z]'
[50] 262 ms OK (M:12-45) '*[[:alnum:]]*[[:alnum:]]*[[:alnum:]]*[[:alnum:]]'
[51] 219 ms OK (M:12-45) '*[a-z]*[a-z]*[a-z]*[a-z]'
[52] 273 ms OK (M:12-45) '*[a-bc-de-z]*[a-bc-de-z]*[a-bc-de-z]*[a-bc-de-z]'
[53] 274 ms OK (M:12-45) '*[[:lower:]]*[[:lower:]]*[[:lower:]]*[[:lower:]]'
[54] 281 ms OK (M:---4-) '*[0-9A-Fa-ef]*[0-9A-Fa-ef]*[0-9A-Fa-ef]*[0-9A-Fa-ef]'
[55] 296 ms OK (M:---4-) '*[[:xdigit:]]*[[:xdigit:]]*[[:xdigit:]]*[[:xdigit:]]'
|