openbsd/src/lib/libcrypto/arch, branch libressl-v4.1.2

openbsd/src/lib/libcrypto/arch, branch libressl-v4.1.2 A mirror of https://github.com/libressl/openbsd.git https://git.lua4.win/openbsd/atom?h=libressl-v4.1.2 2025-03-12T14:13:41+00:00 Provide an accelerated SHA-512 assembly implementation for aarch64. 2025-03-12T14:13:41+00:00 jsing 2025-03-12T14:13:41+00:00 urn:sha1:dcd1700591c767d997c903eba3f3953d562bf23a This provides a SHA-512 assembly implementation that makes use of the ARM Cryptographic Extension (CE), which is found on many arm64 CPUs. This gives a performance gain of up to 2.5x on an Apple M2 (dependent on block size). If an aarch64 machine does not have SHA512 support, then we'll fall back to using the existing C implementation. ok kettenis@ tb@ Support OPENSSL_NO_FILENAMES 2025-03-09T15:12:18+00:00 tb 2025-03-09T15:12:18+00:00 urn:sha1:b8acfd2983c50474382bf8ed132a5b7e7bdedb34 Some people are concerned that leaking a user name is a privacy issue. Allow disabling the __FILE__ and __LINE__ argument in the error stack to avoid this. This can be improved a bit in tree. From Viktor Szakats in https://github.com/libressl/portable/issues/761 ok bcook jsing Provide an accelerated SHA-256 assembly implementation for aarch64. 2025-03-07T14:21:22+00:00 jsing 2025-03-07T14:21:22+00:00 urn:sha1:504b1d708a6f318c44655b84b6f33ec1734e0375 This provides a SHA-256 assembly implementation that makes use of the ARM Cryptographic Extension (CE), which is found on many arm64 CPUs. This gives a performance gain of up to 7.5x on an Apple M2 (dependent on block size). If an aarch64 machine does not have SHA2 support, then we'll fall back to using the existing C implementation. ok kettenis@ tb@ Replace Makefile based SHA*_ASM defines with HAVE_SHA_* defines. 2025-02-14T12:01:58+00:00 jsing 2025-02-14T12:01:58+00:00 urn:sha1:a89810379a758c9cd27af2462547dc646dcfaa61 Currently, SHA{1,256,512}_ASM defines are used to remove the C implementation of sha{1,256,512}_block_data_order() when it is provided by assembly. However, this prevents the C implementation from being used as a fallback. Rename the C sha*_block_data_order() to sha*_block_generic() and provide a sha*_block_data_order() that calls sha*_block_generic(). Replace the Makefile based SHA*_ASM defines with two HAVE_SHA_* defines that allow these functions to be compiled in or removed, such that machine specific verisons can be provided. This should effectively be a no-op on any platform that defined SHA{1,256,512}_ASM. ok tb@ Mop up RC4_INDEX. 2025-01-27T14:02:32+00:00 jsing 2025-01-27T14:02:32+00:00 urn:sha1:d97873f8db01cd052f45675db2ed3d9584c93c44 The RC4_INDEX define switches between base pointer indexing and per-byte pointer increment. This supposedly made a huge difference to performance on x86 at some point, however compilers have improved somewhat since then. There is no change (or effectively no change) in generated assembly on a the majority of LLVM platforms and even when there is some change (e.g. aarch64), there is no noticable performance difference. Simplify the (still messy) macros/code and mop up RC4_INDEX. ok tb@ Provide a readable assembly implementation for MD5 on amd64. 2025-01-24T13:35:04+00:00 jsing 2025-01-24T13:35:04+00:00 urn:sha1:e645f65a85d604ca35a8889b91950b72ea837f74 This appears to be about 5% faster than the current perlasm version on a modern Intel CPU. While here rename md5_block_asm_data_order to md5_block_data_order, for consistency with other hashes. ok tb@ Provide a SHA-1 assembly implementation for amd64 using SHA-NI. 2024-12-06T11:57:18+00:00 jsing 2024-12-06T11:57:18+00:00 urn:sha1:52aaf400e5a619fcab8ae52524e5aaf96e0b0894 This provides a SHA-1 assembly implementation for amd64, which uses the Intel SHA Extensions (aka SHA New Instructions or SHA-NI). This provides a 2-2.5x performance gain on some Intel CPUs and many AMD CPUs. ok tb@ Provide a replacement assembly implementation for SHA-1 on amd64. 2024-12-04T13:13:33+00:00 jsing 2024-12-04T13:13:33+00:00 urn:sha1:2de3ee8c5940ebad54feb4303f6fc816daca784b As already done for SHA-256 and SHA-512, replace the perlasm generated SHA-1 assembly implementation with one that is actually readable. Call the assembly implementation from a C wrapper that can, in the future, dispatch to alternate implementations. On a modern CPU the performance is around 5% faster than the base implementation generated by sha1-x86_64.pl, however it is around 15% slower than the excessively complex SSSE2/AVX version that is also generated by the same script (a SHA-NI version will greatly outperform this and is much cleaner/simpler). ok tb@ Provide a SHA-256 assembly implementation for amd64 using SHA-NI. 2024-11-16T15:31:36+00:00 jsing 2024-11-16T15:31:36+00:00 urn:sha1:a06b7340f2af374d4581bec7db2775ae686ce1ab This provides a SHA-256 assembly implementation for amd64, which uses the Intel SHA Extensions (aka SHA New Instructions or SHA-NI). This provides a 3-5x performance gain on some Intel CPUs and many AMD CPUs. ok tb@ Provide a replacement assembly implementation for SHA-512 on amd64. 2024-11-16T14:56:39+00:00 jsing 2024-11-16T14:56:39+00:00 urn:sha1:e7a6ae25b19891efd234d6f7c2769d1c20f3969f Replace the perlasm generated SHA-512 assembly with a more readable version and the same C wrapper introduced for SHA-256. As for SHA-256, on a modern CPU the performance is largely the same. ok tb@