openbsd - A mirror of https://github.com/libressl/openbsd.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	Provide accelerated SHA-1 for aarch64.	jsing	2025-06-28	2	-0/+248
\| \| \| \| \| \| \| \|	Provide an assembly implementation of SHA-1 for aarch64 using the ARM Cryptographic Extension (CE). This results in around a 2x speed up for larger block sizes. ok tb@
*	Make OPENSSL_IA32_SSE2 the default for i386 and remove the flag.	jsing	2025-06-09	2	-6/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The OPENSSL_IA32_SSE2 flag controls whether a number of the perlasm scripts generate additional implementations that use SSE2 functionality. In all cases except ghash, the code checks OPENSSL_ia32cap_P for SSE2 support, before trying to run SSE2 code. For ghash it generates a CLMUL based implementation in addition to different MMX version (one MMX version hides behind OPENSSL_IA32_SSE2, the other does not), however this does not appear to actually use SSE2. We also disable AES-NI on i386 if OPENSSL_IA32_SSE2. On OpenBSD, we've always defined OPENSSL_IA32_SSE2 so this is effectively a no-op. The only change is that we now check MMX rather than SSE2 for the ghash MMX implementation. ok bcook@ beck@
*	Remove GNU assembler version check.	jsing	2025-06-09	1	-4/+1
\| \| \| \| \| \| \| \|	GNU assembler version 2.19 was released in 2014, so it does not seem unreasonable to expect that we have an assembler that supports AVX. Furthermore, the current check fails on LLVM. ok bcook@ beck@
*	Use 'ctx' for sha3_ctx variables, rather than the less readable 'c'.	jsing	2025-04-18	2	-36/+36
\| \| \| \|	ok tb@
*	Pull casts from void * to uint8_t * up to variables, rather than inline.	jsing	2025-04-18	1	-9/+11
\| \| \| \|	ok tb@
*	Use two temporary variables in sha3_keccakf(), rather than reusing bc[0].	jsing	2025-04-18	1	-8/+8
\| \| \| \|	ok tb@
*	Use crypto_rol_u64() instead of a separate ROTL64 define.	jsing	2025-04-18	1	-5/+4
\| \| \| \|	ok tb@
*	Provide an accelerated SHA-512 assembly implementation for aarch64.	jsing	2025-03-12	2	-0/+346
\| \| \| \| \| \| \| \| \| \|	This provides a SHA-512 assembly implementation that makes use of the ARM Cryptographic Extension (CE), which is found on many arm64 CPUs. This gives a performance gain of up to 2.5x on an Apple M2 (dependent on block size). If an aarch64 machine does not have SHA512 support, then we'll fall back to using the existing C implementation. ok kettenis@ tb@
*	Use .arch rather than .cpu for sha2 instructions.	jsing	2025-03-12	1	-2/+2
\| \| \| \| \| \| \|	We have code that targets a specific architecture level, hence .arch makes more sense here than .cpu. Suggested by kettenis@
*	Provide an accelerated SHA-256 assembly implementation for aarch64.	jsing	2025-03-07	2	-0/+223
\| \| \| \| \| \| \| \| \| \|	This provides a SHA-256 assembly implementation that makes use of the ARM Cryptographic Extension (CE), which is found on many arm64 CPUs. This gives a performance gain of up to 7.5x on an Apple M2 (dependent on block size). If an aarch64 machine does not have SHA2 support, then we'll fall back to using the existing C implementation. ok kettenis@ tb@
*	Replace Makefile based SHA_ASM defines with HAVE_SHA_ defines.	jsing	2025-02-14	3	-20/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, SHA{1,256,512}_ASM defines are used to remove the C implementation of sha{1,256,512}_block_data_order() when it is provided by assembly. However, this prevents the C implementation from being used as a fallback. Rename the C sha_block_data_order() to sha_block_generic() and provide a sha_block_data_order() that calls sha_block_generic(). Replace the Makefile based SHA_ASM defines with two HAVE_SHA_ defines that allow these functions to be compiled in or removed, such that machine specific verisons can be provided. This should effectively be a no-op on any platform that defined SHA{1,256,512}_ASM. ok tb@
*	Remove #error if OPENSSL_NO_FOO is defined	tb	2025-01-25	1	-5/+1
\| \| \| \|	discussed with jsing
*	Use name instead of register.	jsing	2025-01-18	1	-3/+3
\|
*	Provide a SHA-1 assembly implementation for amd64 using SHA-NI.	jsing	2024-12-06	2	-1/+177
\| \| \| \| \| \| \| \|	This provides a SHA-1 assembly implementation for amd64, which uses the Intel SHA Extensions (aka SHA New Instructions or SHA-NI). This provides a 2-2.5x performance gain on some Intel CPUs and many AMD CPUs. ok tb@
*	Another now unused perlasm script can bite the dust.	jsing	2024-12-04	1	-1267/+0
\|
*	Provide a replacement assembly implementation for SHA-1 on amd64.	jsing	2024-12-04	2	-0/+342
\| \| \| \| \| \| \| \| \| \| \| \| \|	As already done for SHA-256 and SHA-512, replace the perlasm generated SHA-1 assembly implementation with one that is actually readable. Call the assembly implementation from a C wrapper that can, in the future, dispatch to alternate implementations. On a modern CPU the performance is around 5% faster than the base implementation generated by sha1-x86_64.pl, however it is around 15% slower than the excessively complex SSSE2/AVX version that is also generated by the same script (a SHA-NI version will greatly outperform this and is much cleaner/simpler). ok tb@
*	Simplify endian handling in SHA-3.	jsing	2024-11-23	1	-26/+5
\| \| \| \| \| \| \| \| \| \|	Rather than having blocks of code that are conditional on BYTE_ORDER != LITTLE_ENDIAN, use le64toh() and htole64() unconditionally. In the case of a little endian platform, the compiler will optimise this away, while on a big endian platform we'll either end up with better code or the same code than we have currently. ok tb@
*	Provide a SHA-256 assembly implementation for amd64 using SHA-NI.	jsing	2024-11-16	2	-1/+218
\| \| \| \| \| \| \| \|	This provides a SHA-256 assembly implementation for amd64, which uses the Intel SHA Extensions (aka SHA New Instructions or SHA-NI). This provides a 3-5x performance gain on some Intel CPUs and many AMD CPUs. ok tb@
*	Remove sha512-x86_64.pl.	jsing	2024-11-16	1	-347/+0
\| \| \| \| \|	Now that we have replacement SHA-256 and SHA-512 assembly implementations for amd64, sha512-x86_64.pl can go the way of the dodo.
*	Provide a replacement assembly implementation for SHA-512 on amd64.	jsing	2024-11-16	2	-0/+333
\| \| \| \| \| \| \| \|	Replace the perlasm generated SHA-512 assembly with a more readable version and the same C wrapper introduced for SHA-256. As for SHA-256, on a modern CPU the performance is largely the same. ok tb@
*	Specify size for K256 symbol.	jsing	2024-11-16	1	-1/+2
\| \| \| \|	Missing sizes spotted by guenther@
*	Use multipliers for stack offsets and tweak comment.	jsing	2024-11-12	1	-9/+9
\|
*	Provide a replacement assembly implementation for SHA-256 on amd64.	jsing	2024-11-08	2	-0/+327
\| \| \| \| \| \| \| \| \| \| \| \| \|	Replace the perlasm generated SHA-256 assembly implementation with one that is actually readable. Call the assembly implementation from a C wrapper that can, in the future, dispatch to alternate implementations. Performance is similar (or even better) on modern CPUs, while somewhat slower on older CPUs (this is in part due to the wrapper, the impact of which is more noticable with small block sizes). Thanks to gkoehler@ and tb@ for testing. ok tb@
*	Missed SHA224() in previous: reverse order of attributes	tb	2024-06-01	1	-3/+3
\|
*	Reverse order of attributes	tb	2024-06-01	1	-9/+9
\| \| \| \|	requested by jsing on review
*	Remove support for static buffers in HMAC/digests	tb	2024-06-01	4	-24/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	HMAC() and the one-step digests used to support passing a NULL buffer and would return the digest in a static buffer. This design is firmly from the nineties, not thread safe and it saves callers a single line. The few ports that used to rely this were fixed with patches sent to non-hostile (and non-dead) upstreams. It's early enough in the release cycle that remaining uses hidden from the compiler should be caught, at least the ones that matter. There won't be that many since BoringSSL removed this feature in 2017. https://boringssl-review.googlesource.com/14528 Add non-null attributes to the headers and add a few missing bounded attributes. ok beck jsing
*	Demacro sha1.	jsing	2024-03-28	1	-164/+252
\| \| \| \| \| \| \| \| \|	Replace macros with static inline functions and use names that follow the spec more closely. Unlike SHA256/SHA512, the functions and constants do not align with the number of words loaded, which means we cannot easily loop and just end up just unrolling everything. ok joshua@ tb@
*	Fix line wrapping.	jsing	2024-03-28	1	-6/+4
\|
*	Rework input and output handling for sha1.	jsing	2024-03-26	1	-128/+79
\| \| \| \| \| \| \| \|	Use be32toh(), htobe32() and crypto_{load,store}_htobe32() as appropriate. Also use the same while() loop that is used for other hash functions. ok joshua@ tb@
*	Replace uses of endbr64 with _CET_ENDBR from cet.h	tb	2024-02-24	2	-5/+5
\| \| \| \| \| \| \| \| \|	cet.h is needed for other platforms to emit the relevant .gnu.properties sections that are necessary for them to enable IBT. It also avoids issues with older toolchains on macOS that explode on encountering endbr64. based on a diff by kettenis ok beck kettenis
*	Stop including md32_common.h.	jsing	2023-08-11	1	-15/+1
\| \| \| \| \|	Now that we're no longer dependent on md32_common.h, stop including it. Remove various defines that only existed for md32_common.h usage.
*	Demacro sha256.	jsing	2023-08-11	1	-49/+114
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Replace macros with static inline functions, as well as writing out the variable rotations instead of trying to outsmart the compiler. Also pull the message schedule update up and complete it prior to commencement of the round. Also use rotate right, rather than transposed rotate left. Overall this is more readable and more closely follows the specification. On some platforms (e.g. aarch64) there is no noteable change in performance, while on others there is a significant improvement (more than 25% on arm). ok miod@ tb@
*	Remove MD32_REG_T.	jsing	2023-08-10	2	-5/+5
\| \| \| \| \| \| \| \| \|	This is a hack that is only enabled on a handful of 64 bit platforms, as a workaround for poor compiler optimisation. If you're running an archiac compiler on an archiac architecture, then you can deal with slightly lower performance. ok tb@
*	Hide symbols in sha	beck	2023-07-08	3	-3/+26
\| \| \| \|	ok tb@
*	Remove unused SHA-1 implementation.	jsing	2023-07-08	1	-86/+1
\|
*	Remove now unnecessary "do { } while (0)"	jsing	2023-07-08	1	-4/+2
\|
*	Inline HASH_MAKE_STRING macro.	jsing	2023-07-08	1	-15/+14
\| \| \| \|	No change to generated assembly.
*	Reorder functions.	jsing	2023-07-08	1	-113/+113
\| \| \| \|	No functional change.
*	style(9)	jsing	2023-07-08	1	-36/+33
\|
*	Implement SHA1_{Update,Transform,Final}() directly in sha1.c.	jsing	2023-07-07	1	-5/+104
\| \| \| \| \| \| \| \|	Copy the update, transform and final functions from md32_common.h, manually expanding the macros for SHA1. This will allow for further clean up to occur. No change in generated assembly.
*	Clean up alignment handling for SHA-256.	jsing	2023-07-07	1	-63/+43
\| \| \| \| \| \| \|	If input data is 32 bit aligned use be32toh() directly, otherwise use crypto_load_be32toh(), cleaning up all of the HOST_c2l() usage. ok beck@
*	Clean up SHA-256 input handling and round macros.	jsing	2023-07-07	1	-72/+58
\| \| \| \| \| \| \|	Avoid reach around and initialisation outside of the macro, cleaning up the call sites to remove the initialisation. ok beck@
*	Remove unused SHA-256 implementation.	jsing	2023-07-07	1	-71/+1
\| \| \| \|	ok beck@
*	Replace HOST_l2c() with htob32() or crypto_store_htobe32().	jsing	2023-07-07	1	-17/+15
\| \| \| \|	ok beck@
*	Demacro SHA-512.	jsing	2023-07-02	1	-54/+112
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use static inline functions instead of macros to implement SHA-512. At the same time, make two key changes - firstly, rather than trying to outsmart the compiler and shuffle variables around, write the algorithm the way it is documented and actually swap the variable contents. Secondly, instead of interleaving the message schedule update and the round, do the full message schedule update first, then process the round. Overall, we get safer and more readable code. Additionally, the compiler can generate smaller and faster code (with a gain of 5-10% across a range of architectures). ok beck@ tb@
*	Sprinkle some style(9).	jsing	2023-05-28	1	-15/+15
\|
*	Expand occurrences of HASH_CTX that were previously missed.	jsing	2023-05-28	1	-4/+5
\| \| \| \|	No change in generated assembly.
*	Reorder functions.	jsing	2023-05-28	1	-214/+214
\| \| \| \|	No intended functional change.
*	Clean up includes.	jsing	2023-05-28	1	-6/+5
\|
*	Remove now unnecessary do {} while(0);	jsing	2023-05-28	1	-3/+1
\|