openbsd/src/lib/libcrypto/arch/amd64, branch OPENBSD_7

openbsd/src/lib/libcrypto/arch/amd64, branch OPENBSD_7_7 A mirror of https://github.com/libressl/openbsd.git https://git.lua4.win/openbsd/atom?h=OPENBSD_7_7 2025-03-09T15:12:18+00:00 Support OPENSSL_NO_FILENAMES 2025-03-09T15:12:18+00:00 tb 2025-03-09T15:12:18+00:00 urn:sha1:b8acfd2983c50474382bf8ed132a5b7e7bdedb34 Some people are concerned that leaking a user name is a privacy issue. Allow disabling the __FILE__ and __LINE__ argument in the error stack to avoid this. This can be improved a bit in tree. From Viktor Szakats in https://github.com/libressl/portable/issues/761 ok bcook jsing Replace Makefile based SHA*_ASM defines with HAVE_SHA_* defines. 2025-02-14T12:01:58+00:00 jsing 2025-02-14T12:01:58+00:00 urn:sha1:a89810379a758c9cd27af2462547dc646dcfaa61 Currently, SHA{1,256,512}_ASM defines are used to remove the C implementation of sha{1,256,512}_block_data_order() when it is provided by assembly. However, this prevents the C implementation from being used as a fallback. Rename the C sha*_block_data_order() to sha*_block_generic() and provide a sha*_block_data_order() that calls sha*_block_generic(). Replace the Makefile based SHA*_ASM defines with two HAVE_SHA_* defines that allow these functions to be compiled in or removed, such that machine specific verisons can be provided. This should effectively be a no-op on any platform that defined SHA{1,256,512}_ASM. ok tb@ Mop up RC4_INDEX. 2025-01-27T14:02:32+00:00 jsing 2025-01-27T14:02:32+00:00 urn:sha1:d97873f8db01cd052f45675db2ed3d9584c93c44 The RC4_INDEX define switches between base pointer indexing and per-byte pointer increment. This supposedly made a huge difference to performance on x86 at some point, however compilers have improved somewhat since then. There is no change (or effectively no change) in generated assembly on a the majority of LLVM platforms and even when there is some change (e.g. aarch64), there is no noticable performance difference. Simplify the (still messy) macros/code and mop up RC4_INDEX. ok tb@ Provide a readable assembly implementation for MD5 on amd64. 2025-01-24T13:35:04+00:00 jsing 2025-01-24T13:35:04+00:00 urn:sha1:e645f65a85d604ca35a8889b91950b72ea837f74 This appears to be about 5% faster than the current perlasm version on a modern Intel CPU. While here rename md5_block_asm_data_order to md5_block_data_order, for consistency with other hashes. ok tb@ Provide a SHA-1 assembly implementation for amd64 using SHA-NI. 2024-12-06T11:57:18+00:00 jsing 2024-12-06T11:57:18+00:00 urn:sha1:52aaf400e5a619fcab8ae52524e5aaf96e0b0894 This provides a SHA-1 assembly implementation for amd64, which uses the Intel SHA Extensions (aka SHA New Instructions or SHA-NI). This provides a 2-2.5x performance gain on some Intel CPUs and many AMD CPUs. ok tb@ Provide a replacement assembly implementation for SHA-1 on amd64. 2024-12-04T13:13:33+00:00 jsing 2024-12-04T13:13:33+00:00 urn:sha1:2de3ee8c5940ebad54feb4303f6fc816daca784b As already done for SHA-256 and SHA-512, replace the perlasm generated SHA-1 assembly implementation with one that is actually readable. Call the assembly implementation from a C wrapper that can, in the future, dispatch to alternate implementations. On a modern CPU the performance is around 5% faster than the base implementation generated by sha1-x86_64.pl, however it is around 15% slower than the excessively complex SSSE2/AVX version that is also generated by the same script (a SHA-NI version will greatly outperform this and is much cleaner/simpler). ok tb@ Provide a SHA-256 assembly implementation for amd64 using SHA-NI. 2024-11-16T15:31:36+00:00 jsing 2024-11-16T15:31:36+00:00 urn:sha1:a06b7340f2af374d4581bec7db2775ae686ce1ab This provides a SHA-256 assembly implementation for amd64, which uses the Intel SHA Extensions (aka SHA New Instructions or SHA-NI). This provides a 3-5x performance gain on some Intel CPUs and many AMD CPUs. ok tb@ Provide a replacement assembly implementation for SHA-512 on amd64. 2024-11-16T14:56:39+00:00 jsing 2024-11-16T14:56:39+00:00 urn:sha1:e7a6ae25b19891efd234d6f7c2769d1c20f3969f Replace the perlasm generated SHA-512 assembly with a more readable version and the same C wrapper introduced for SHA-256. As for SHA-256, on a modern CPU the performance is largely the same. ok tb@ Add CPU capability detection for the Intel SHA extensions (aka SHA-NI). 2024-11-16T13:05:35+00:00 jsing 2024-11-16T13:05:35+00:00 urn:sha1:d5d56fc45a9988f5488f0a2804155ccc5067def3 This also provides a crypto_cpu_caps_amd64 variable that can be checked for CRYPTO_CPU_CAPS_AMD64_SHA. ok tb@ Check the correct variable in cpuid(). 2024-11-12T13:14:57+00:00 jsing 2024-11-12T13:14:57+00:00 urn:sha1:e5a8d62d4380136d1e13d26d8b9b2462456b96a7