| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
| |
bn_sqr_words() does not actually compute the square of the words, it only
computes the square of each individual word - rename it to reflect reality.
Discussed with tb@
|
| | |
|
| |
|
|
|
| |
This uses s2n-bignum's bignum_mul() and provides significant performance
gains for a range of multiplication sizes.
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Most bn_.*_words() functions operate on two word arrays, however
bn_mul_words() and bn_mul_add_words() operate on one word array and
multiply by a single word. Rename these to bn_mulw_words() and
bn_mulw_add_words() to reflect this, following naming scheme that we use
for primitives.
This frees up bn_mul_words() to actually be used for multiplying two word
arrays. Rename bn_mul_normal() to bn_mul_words(), which will then become
one of the possible assembly integration points.
ok tb@
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Rework some of the squaring code so that it calls bn_sqr_words() and use
this as the integration point for assembly. Convert bn_sqr_normal() to
bn_sqr_words(), which is then used on architectures that do not provide
their own version.
This means that we resume using the assembly version of bn_sqr_words() on
i386, mips64 and powerpc, which can provide considerable performance gains.
ok tb@
|
| |
|
|
|
|
|
|
| |
If ADX instructions are available, use the non-_alt version of s2n-bignum's
bignum_{mul,sqr}_{4_8,6_12,8_16}(), which are faster than the _alt
non-ADX versions.
ok tb@
|
| |
|
|
|
|
|
| |
These use s2n-bignum's bignum_mul_6_12_alt() and bignum_sqr_6_12_alt()
functions.
ok tb@
|
| |
|
|
|
|
| |
These use s2n-bignum's bignum_modadd() and bignum_modsub() routines.
ok tb@
|
| | |
|
| | |
|
| | |
|
| |
|
|
| |
These provide modular addition and subtraction.
|
| |
|
|
|
|
|
| |
These provide fast multiplication and squaring of inputs with 4 words
or 8 words, producing an 8 or 16 word result. These versions require the
CPU to support ADX instructions, while the _alt versions that have
previously been imported do not.
|
| |
|
|
|
|
| |
These provide fast multiplication and squaring of inputs with 6x words,
producing a 12 word result. The non-_alt versions require the CPU to
support ADX instructions, while the _alt versions do not.
|
| |
|
|
|
| |
Now that s2n-bignum has marked various inputs as const, we can do the same.
In most cases we were casting away const, which we no longer need to do.
|
| | |
|
| |
|
|
| |
This amounts to whitespace changes and label renaming.
|
| |
|
|
|
|
|
| |
This makes it consistent with bn_sqr_comba{4,8}() and simplifies an
upcoming change.
ok tb@
|
| |
|
|
| |
bn_subw() will be used more widely in an upcoming change.
|
| |
|
|
|
|
|
|
|
| |
cet.h is needed for other platforms to emit the relevant .gnu.properties
sections that are necessary for them to enable IBT. It also avoids issues
with older toolchains on macOS that explode on encountering endbr64.
based on a diff by kettenis
ok beck kettenis
|
| |
|
|
|
|
| |
This does not cause an issue currently, however if called differently to
their current usage, it can lead to an input being overwritten and
incorrect results being generated.
|
| | |
|
| | |
|
| |
|
|
|
| |
This provides a 1.5-2x performance gain for BN multiplication, with a
similar improvement being seen for RSA operations.
|
| |
|
|
|
|
|
|
|
| |
Rework bn_sqr()/bn_sqr_normal() so that it is less convoluted and more
readable. Instead of recomputing values that the caller has already
computed, pass it as an argument. Avoid branching and remove duplication
of variables. Consistently use a_len and r_len naming for lengths.
ok tb@
|
| | |
|
| | |
|
| |
|
|
|
| |
This provides significant performance gains for bn_sqr_comba4() and
bn_sqr_comba8().
|
| |
|
|
| |
This provides a performance gain across most BN operations.
|
| |
|
|
|
| |
This results in bn_mul_comba4() and bn_mul_comba8() requiring ~30% less
instructions than they did previously.
|
| | |
|
| |
|
|
| |
ok jsing, and kind of tb an earlier version
|
| |
|
|
| |
No functional change.
|
| |
|
|
|
|
| |
macOS aarch64 assembly dialect treats ; as comment instead of a newline
ok tb@, jsing@
|
| |
|
|
|
|
|
|
| |
Rather than working on BIGNUMs, change bn_add()/bn_sub() to operate on word
arrays that potentially differ in length. This matches the behaviour of
s2n-bignum's bignum_add() and bignum_sub().
ok tb@
|
| | |
|
| |
|
|
|
|
|
|
|
| |
The BN_num_bits_word() function is a hot path, being called more than
80 million times during a libcrypto regress run. The word_clz()
implementation uses five instructions to do the same as the generic code
that uses more than 60 instructions.
Discussed with tb@
|
| | |
|
| | |
|
| | |
|
| |
|
|
|
|
|
|
|
| |
This keeps the naming consistent with the other bignum primitives that have
been recently introduced. Also, use 1/0 intead of h/l (e.g. a1 instead of
ah), as this keeps consistency with other primitives and allows for naming
that works with double word, triple word and quadruple word inputs/outputs.
Discussed with tb@
|
| |
|
|
|
|
|
| |
s2n-bignum's bignum_sqr() is not the same as bn_sqr_words() (which only
computes a partial result, unlike the former). This went unnoticed since
bn_sqr() is called directly on amd64, hence bn_sqr_words() is currently
unused.
|
| |
|
|
|
|
|
|
| |
When bn_umul_hilo() is implemented using an instruction pair, mark the
first output with a constraint that prevents the output from overlapping
with the inputs ("&"). Otherwise the first instruction can overwrite the
inputs, which then results in the second instruction producing incorrect
value.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unlike bn_add_words()/bn_sub_words(), the s2n-bignum bignum_add() and
bignum_sub() functions correctly handle inputs with differing word
lengths. This means that they can be called directly, without needing to
fix up any remaining words manually.
Split BN_uadd() in two - the default bn_add() implementation calls
bn_add_words(), before handling the carry for any remaining words.
Likewise split BN_usub() in two - the default bn_sub() implementation
calls bn_sub_words(), before handling the borrow for any remaining words.
On amd64, provide an implementation of bn_add() that calls s2n-bignum's
bignum_add() directly, similarly with an implementation of bn_sub() that
calls s2n-bignum's bignum_sub() directly.
ok tb@
|
| |
|
|
|
|
| |
These should work, but are currently untested and disabled.
ok tb@
|
| |
|
|
| |
ok tb@
|
| |
|
|
|
|
|
|
| |
The sparc platform got retired a while back, however some parts remained
hiding in libcrypto. Mop these up (along with the bn_arch.h that I
introduced).
Spotted by and ok tb@
|
| |
|
|
|
|
|
| |
This switches the core bignum assembly implementations from x86_64-gcc.c to
s2n-bignum for amd64.
ok miod@ tb@
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Provide a function that divides a double word (h:l) by d, returning the
quotient q and the remainder r, such that q * d + r is equal to the
numerator. Call this from the three places that currently implement this
themselves.
This is implemented with some slight indirection, which allows for per
architecture implementations, replacing the define/macro tangle, which
messes with variables that are not passed to it.
Also remove a duplicate of bn_div_words() for the BN_ULLONG && BN_DIV2W
case - this is already handled.
ok tb@
|