openbsd/src/lib/libcrypto/sha, branch libressl-v4.1.2

openbsd/src/lib/libcrypto/sha, branch libressl-v4.1.2 A mirror of https://github.com/libressl/openbsd.git https://git.lua4.win/openbsd/atom?h=libressl-v4.1.2 2025-03-12T14:13:41+00:00 Provide an accelerated SHA-512 assembly implementation for aarch64. 2025-03-12T14:13:41+00:00 jsing 2025-03-12T14:13:41+00:00 urn:sha1:dcd1700591c767d997c903eba3f3953d562bf23a This provides a SHA-512 assembly implementation that makes use of the ARM Cryptographic Extension (CE), which is found on many arm64 CPUs. This gives a performance gain of up to 2.5x on an Apple M2 (dependent on block size). If an aarch64 machine does not have SHA512 support, then we'll fall back to using the existing C implementation. ok kettenis@ tb@ Use .arch rather than .cpu for sha2 instructions. 2025-03-12T12:53:33+00:00 jsing 2025-03-12T12:53:33+00:00 urn:sha1:76a201e2d50dcc1de518d41ad51e8f894f056407 We have code that targets a specific architecture level, hence .arch makes more sense here than .cpu. Suggested by kettenis@ Provide an accelerated SHA-256 assembly implementation for aarch64. 2025-03-07T14:21:22+00:00 jsing 2025-03-07T14:21:22+00:00 urn:sha1:504b1d708a6f318c44655b84b6f33ec1734e0375 This provides a SHA-256 assembly implementation that makes use of the ARM Cryptographic Extension (CE), which is found on many arm64 CPUs. This gives a performance gain of up to 7.5x on an Apple M2 (dependent on block size). If an aarch64 machine does not have SHA2 support, then we'll fall back to using the existing C implementation. ok kettenis@ tb@ Replace Makefile based SHA*_ASM defines with HAVE_SHA_* defines. 2025-02-14T12:01:58+00:00 jsing 2025-02-14T12:01:58+00:00 urn:sha1:a89810379a758c9cd27af2462547dc646dcfaa61 Currently, SHA{1,256,512}_ASM defines are used to remove the C implementation of sha{1,256,512}_block_data_order() when it is provided by assembly. However, this prevents the C implementation from being used as a fallback. Rename the C sha*_block_data_order() to sha*_block_generic() and provide a sha*_block_data_order() that calls sha*_block_generic(). Replace the Makefile based SHA*_ASM defines with two HAVE_SHA_* defines that allow these functions to be compiled in or removed, such that machine specific verisons can be provided. This should effectively be a no-op on any platform that defined SHA{1,256,512}_ASM. ok tb@ Remove #error if OPENSSL_NO_FOO is defined 2025-01-25T17:59:44+00:00 tb 2025-01-25T17:59:44+00:00 urn:sha1:5d52abc236226c5a47c36b07e2256e77141e373a discussed with jsing Use name instead of register. 2025-01-18T02:56:07+00:00 jsing 2025-01-18T02:56:07+00:00 urn:sha1:c79c1646d28571d60ad8157510b2f311aa3d348e Provide a SHA-1 assembly implementation for amd64 using SHA-NI. 2024-12-06T11:57:18+00:00 jsing 2024-12-06T11:57:18+00:00 urn:sha1:52aaf400e5a619fcab8ae52524e5aaf96e0b0894 This provides a SHA-1 assembly implementation for amd64, which uses the Intel SHA Extensions (aka SHA New Instructions or SHA-NI). This provides a 2-2.5x performance gain on some Intel CPUs and many AMD CPUs. ok tb@ Another now unused perlasm script can bite the dust. 2024-12-04T13:14:45+00:00 jsing 2024-12-04T13:14:45+00:00 urn:sha1:77b230abfab8f172a633dd2e217f23da030bc03a Provide a replacement assembly implementation for SHA-1 on amd64. 2024-12-04T13:13:33+00:00 jsing 2024-12-04T13:13:33+00:00 urn:sha1:2de3ee8c5940ebad54feb4303f6fc816daca784b As already done for SHA-256 and SHA-512, replace the perlasm generated SHA-1 assembly implementation with one that is actually readable. Call the assembly implementation from a C wrapper that can, in the future, dispatch to alternate implementations. On a modern CPU the performance is around 5% faster than the base implementation generated by sha1-x86_64.pl, however it is around 15% slower than the excessively complex SSSE2/AVX version that is also generated by the same script (a SHA-NI version will greatly outperform this and is much cleaner/simpler). ok tb@ Simplify endian handling in SHA-3. 2024-11-23T15:38:12+00:00 jsing 2024-11-23T15:38:12+00:00 urn:sha1:d858094ad2067b28a4d1db54ddc77ec7db656253 Rather than having blocks of code that are conditional on BYTE_ORDER != LITTLE_ENDIAN, use le64toh() and htole64() unconditionally. In the case of a little endian platform, the compiler will optimise this away, while on a big endian platform we'll either end up with better code or the same code than we have currently. ok tb@