| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
This provides a SHA-1 assembly implementation for amd64, which uses
the Intel SHA Extensions (aka SHA New Instructions or SHA-NI). This
provides a 2-2.5x performance gain on some Intel CPUs and many AMD CPUs.
ok tb@
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As already done for SHA-256 and SHA-512, replace the perlasm generated
SHA-1 assembly implementation with one that is actually readable. Call the
assembly implementation from a C wrapper that can, in the future, dispatch
to alternate implementations. On a modern CPU the performance is around
5% faster than the base implementation generated by sha1-x86_64.pl, however
it is around 15% slower than the excessively complex SSSE2/AVX version that
is also generated by the same script (a SHA-NI version will greatly
outperform this and is much cleaner/simpler).
ok tb@
|
|
|
|
|
|
|
|
| |
This provides a SHA-256 assembly implementation for amd64, which uses
the Intel SHA Extensions (aka SHA New Instructions or SHA-NI). This
provides a 3-5x performance gain on some Intel CPUs and many AMD CPUs.
ok tb@
|
|
|
|
|
|
|
|
| |
Replace the perlasm generated SHA-512 assembly with a more readable
version and the same C wrapper introduced for SHA-256. As for SHA-256,
on a modern CPU the performance is largely the same.
ok tb@
|
|
|
|
|
|
|
| |
This also provides a crypto_cpu_caps_amd64 variable that can be checked
for CRYPTO_CPU_CAPS_AMD64_SHA.
ok tb@
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Replace the perlasm generated SHA-256 assembly implementation with one that
is actually readable. Call the assembly implementation from a C wrapper
that can, in the future, dispatch to alternate implementations. Performance
is similar (or even better) on modern CPUs, while somewhat slower on older
CPUs (this is in part due to the wrapper, the impact of which is more
noticable with small block sizes).
Thanks to gkoehler@ and tb@ for testing.
ok tb@
|
|
|
|
|
|
|
|
|
|
|
|
| |
Replace the aarch64 CPU detection code with a version that parses ISAR0,
avoiding signal handling and SIGILL. This gets ISAR0 via sysctl(), but this
can be adapted to other mechanisms for other platforms (or alternatively
the same can be achieved via HWCAP).
This now follows the same naming/design as used by amd64 and i386, hence
define HAVE_CRYPTO_CPU_CAPS_INIT for aarch64.
ok kettenis@ tb@
|
|
|
|
|
|
|
|
| |
It is gross that an internal detail leaked into a public header, but,
hey, it's openssl. No hack is too terrible to appear in this library.
opensslconf.h needs major pruning but the day that happens is not today.
ok jsing
|
|
|
|
|
|
|
|
|
|
| |
ppc64-mont.pl (which produces bn_mul_mont_fpu64()) is unused on both
powerpc and powerpc64, so remove it. ppccap.c doesn't actually contain
anything to do with CPU capabilities - it just provides a bn_mul_mont()
that calls bn_mul_mont_int() (which ppc-mont.pl generates). Change
ppc-mont.pl to generate bn_mul_mont() directly and remove ppccap.c.
ok tb@
|
|
|
|
|
|
| |
Move the IA32 specific code to arch/{amd64,i386}/crypto_cpu_caps.c, rather
than polluting cryptlib.c with machine dependent code. A stub version of
crypto_cpu_caps_ia32() still remains for now.
|
|
|
|
|
|
|
| |
This has been unused for a long time - it can be found in the attic if
someone wants to clean it up and enable it in the future.
ok tb@
|
|
|
|
|
|
|
| |
This is the same CPU capabilities code that is now used for amd64. Like
amd64 we now only populate OPENSSL_ia32cap_P with bits used by perlasm.
Discussed with tb@
|
|
|
|
|
|
|
|
|
|
|
| |
This is a CPU capability detection implementation in C, with minimal
inline assembly (for cpuid and xgetbv). This replaces the assembly
mess generated by x86_64cpuid.pl. Rather than populating OPENSSL_ia32cap_P
directly with CPUID output, just set the bits that the remaining
perlasm checks (namely AESNI, AVX, FXSR, INTEL, HT, MMX, PCLMUL, SSE, SSE2
and SSSE3).
ok joshua@ tb@
|
|
|
|
|
|
|
|
|
| |
This allows us in particular to get rid of the MD Symbols.list which
were needed on amd64 and i386 for llvm 16 a while back. OPENSSL_ia32cap_P
was never properly exported since the symbols were marked .hidden in the
asm.
ok beck jsing
|
| |
|
|
|
|
|
|
|
|
| |
Provide a per architecture crypto_arch.h - this will be used in a similar
manner to bn_arch.h and will allow for architecture specific #defines and
static inline functions. Move the HAVE_AES_* and HAVE_RC4_* defines here.
ok tb@
|
|
|
|
|
| |
ssh tools. The dynamic objects are entirely ret-clean, static binaries
will contain a blend of cleaning and non-cleaning callers.
|
|
|
|
|
|
|
| |
Always provide AES_{encrypt,decrypt}() via C functions, which then either
use a C implementation or call the assembly implementation.
ok tb@
|
|
|
|
| |
These files are now built on all platforms.
|
|
|
|
|
|
|
| |
This is a legacy algorithm and the assembly is only marginally faster than
the C code.
Discussed with beck@ and tb@
|
|
|
|
| |
This is now built on all platforms.
|
|
|
|
|
|
|
|
| |
Always include aes_core.c and provide AES_set_{encrypt,decrypt}_key() via C
functions, which then either use a C implementation or call the assembly
implementation.
ok tb@
|
|
|
|
| |
This is now built on all platforms.
|
|
|
|
|
|
|
| |
This is a legacy algorithm and the assembly is only marginally faster than
the C code.
Discussed with beck@ and tb@
|
| |
|
|
|
|
|
|
|
|
| |
Rename the assembly generated functions from AES_cbc_encrypt() to
aes_cbc_encrypt_internal(). Always include aes_cbc.c and change it
to use defines that are similar to those used in BN.
ok tb@
|
| |
|
|
|
|
| |
This is now built on all platforms.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rather than having public API switch between C and assembly, always
use C functions as entry points, which then call an assembly
implementation (if available). This makes it significantly easier
to deal with symbol aliasing/namespaces and it also means we
benefit from vulnerability prevention provided by the C compiler.
Rename the assembly generated functions from RC4() to rc4_internal()
and RC4_set_key() to rc4_set_key_internal(). Always include rc4.c
and change it to use defines that are similar to those used in BN.
ok beck@ joshua@ tb@
|
|
|
|
|
| |
Now that all platforms use a C des implementation, move it to the primary
Makefile.
|
|
|
|
|
|
| |
This one was hiding behind an m4 script.
Build tested by tb@
|
|
|
|
|
|
|
|
| |
This is the only architecture that has an assembly implementation for these
algorithms. There is little to gain from accelerating legacy algorithms on
a legacy architecture.
Discussed with beck@ and tb@
|
|
|
|
| |
This is already disabled since it is "about 35% slower than C code".
|
|
|
|
| |
Discussed with tb@
|
|
|
|
|
| |
The stitched modes have been removed, so having assembly for them is of
little use.
|
|
|
|
|
| |
Now that all architectures are using bf_enc.c, it does not make sense to
have it in every Makefile.inc file.
|
|
|
|
|
|
|
|
| |
This is the only architecture that has an assembly implementation. There is
little to gain from accelerating a legacy algorithm on a legacy
architecture.
ok beck@ tb@
|
|
|
|
| |
ok tb@
|
| |
|
|
|
|
|
|
|
|
| |
OPENSSL_cpuid_setup() is invoked via OPENSSL_init_crypto(), whihc is
triggered by various entry points to the library. As such, we do not need
to invoke it as a constructor.
ok tb@
|
|
|
|
|
|
|
| |
This is currently no different from the existing behaviour and just pulls
in the C code that would have previously been built. However, it means that
OPENSSL_NO_ASM is no longer being defined by the main libcrypto Makefile,
which in turn will allow us to implement assembly optimisations.
|
|
|
|
|
|
|
| |
GF2m support will be removed shortly. In the interim drop some of this
unused code already and let it fall back to the C implementation.
ok jsing
|
|
|
|
|
|
| |
-mmark-bti-property to indicate those now have BTI support.
ok jsing@, deraadt@
|
|
|
|
|
|
|
|
| |
Now that bn_sub() handles word arrays with potentially different lengths,
we no longer need bn_sub_part_words() - call bn_sub() instead. This allows
us to entirely remove the unnecessarily complex bn_sub_part_words() code.
ok tb@
|
|
|
|
|
|
|
|
|
| |
The BN_num_bits_word() function is a hot path, being called more than
80 million times during a libcrypto regress run. The word_clz()
implementation uses five instructions to do the same as the generic code
that uses more than 60 instructions.
Discussed with tb@
|
|
|
|
|
|
| |
This rather misnamed file (bn_asm.c) previously contained the C code that
was needed to build libcrypto bignum on platforms that did not have
assembly implementations of the functions it contained.
|
|
|
|
|
|
|
|
| |
The sparc platform got retired a while back, however some parts remained
hiding in libcrypto. Mop these up (along with the bn_arch.h that I
introduced).
Spotted by and ok tb@
|
|
|
|
|
|
|
| |
This switches the core bignum assembly implementations from x86_64-gcc.c to
s2n-bignum for amd64.
ok miod@ tb@
|