| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
The BN_DIV2W define provides a code path for double word division via the C
compiler, which is only enabled on hppa. Simplify the code and mop this up.
ok tb@
|
|
|
|
| |
This is now only on amd64.
|
|
|
|
|
|
|
| |
bn_sqr_words() does not actually compute the square of the words, it only
computes the square of each individual word - rename it to reflect reality.
Discussed with tb@
|
|
|
|
|
|
|
|
| |
The old assembly bn_sqr_words() does not actually square words in the
bignum sense. These will have to be renamed (once I come up with a name
for whatever it actually does) before we can roll forward again.
Found the hard way by Janne Johansson.
|
| |
|
|
|
|
|
|
| |
Use bn_mul_words() and bn_montgomery_reduce_words(), rather than using
bn_montgomery_multiply_words(). This provides better performance on
architectures that have assembly optimised bn_mul_words(), such as amd64.
|
| |
|
|
|
|
|
|
|
|
| |
Use bn_sqr_words() and bn_montgomery_reduce_words(), rather than using
bn_montgomery_multiply_words(). This provides better performance on
architectures that have assembly optimised bn_sqr_words(), such as amd64.
ok tb@
|
|
|
|
|
| |
This uses s2n-bignum's bignum_mul() and provides significant performance
gains for a range of multiplication sizes.
|
| |
|
|
|
|
| |
This was missed in the previous commit.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Most bn_.*_words() functions operate on two word arrays, however
bn_mul_words() and bn_mul_add_words() operate on one word array and
multiply by a single word. Rename these to bn_mulw_words() and
bn_mulw_add_words() to reflect this, following naming scheme that we use
for primitives.
This frees up bn_mul_words() to actually be used for multiplying two word
arrays. Rename bn_mul_normal() to bn_mul_words(), which will then become
one of the possible assembly integration points.
ok tb@
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rework some of the squaring code so that it calls bn_sqr_words() and use
this as the integration point for assembly. Convert bn_sqr_normal() to
bn_sqr_words(), which is then used on architectures that do not provide
their own version.
This means that we resume using the assembly version of bn_sqr_words() on
i386, mips64 and powerpc, which can provide considerable performance gains.
ok tb@
|
|
|
|
|
|
|
|
| |
If ADX instructions are available, use the non-_alt version of s2n-bignum's
bignum_{mul,sqr}_{4_8,6_12,8_16}(), which are faster than the _alt
non-ADX versions.
ok tb@
|
|
|
|
|
|
|
| |
These use s2n-bignum's bignum_mul_6_12_alt() and bignum_sqr_6_12_alt()
functions.
ok tb@
|
|
|
|
|
|
| |
These use s2n-bignum's bignum_modadd() and bignum_modsub() routines.
ok tb@
|
|
|
|
|
|
|
| |
In these cases make use of bn_mul_comba6() or bn_sqr_comba6(), which are
faster than the normal path.
ok tb@
|
| |
|
| |
|
| |
|
|
|
|
| |
These provide modular addition and subtraction.
|
|
|
|
|
|
|
| |
These provide fast multiplication and squaring of inputs with 4 words
or 8 words, producing an 8 or 16 word result. These versions require the
CPU to support ADX instructions, while the _alt versions that have
previously been imported do not.
|
|
|
|
|
|
| |
These provide fast multiplication and squaring of inputs with 6x words,
producing a 12 word result. The non-_alt versions require the CPU to
support ADX instructions, while the _alt versions do not.
|
| |
|
|
|
|
|
| |
Now that s2n-bignum has marked various inputs as const, we can do the same.
In most cases we were casting away const, which we no longer need to do.
|
|
|
|
|
| |
This effectively brings in new function prototypes, a chunk of const
additions and some new defines.
|
| |
|
|
|
|
| |
This amounts to whitespace changes and label renaming.
|
|
|
|
|
|
|
|
| |
Use bn_{mul,sqr}_comba{4,6,8}() and bn_montgomery_reduce_words() for
specific input sizes. This is significantly faster than using
bn_montgomery_multiply_words().
ok tb@
|
|
|
|
|
|
| |
This allows for fast squaring of a 6 word array.
ok tb@
|
|
|
|
|
|
| |
This allows for fast multiplication of two 6 word arrays.
ok tb@
|
|
|
|
|
|
|
| |
This makes it consistent with bn_sqr_comba{4,8}() and simplifies an
upcoming change.
ok tb@
|
|
|
|
|
|
|
|
| |
ri is an int, so the check relied on signed overflow (UB). It's not really
reachable, but shrug.
reported by smatch via jsg
ok beck jsing kenjiro
|
|
|
|
|
| |
Reported by smatch via jsg.
ok beck jsing kenjiro
|
|
|
|
|
| |
For now this still calls bn_montgomery_multiply_words(), however it can
be optimised further in the future.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The OPENSSL_IA32_SSE2 flag controls whether a number of the perlasm
scripts generate additional implementations that use SSE2 functionality.
In all cases except ghash, the code checks OPENSSL_ia32cap_P for SSE2
support, before trying to run SSE2 code. For ghash it generates a CLMUL
based implementation in addition to different MMX version (one MMX
version hides behind OPENSSL_IA32_SSE2, the other does not), however this
does not appear to actually use SSE2. We also disable AES-NI on i386 if
OPENSSL_IA32_SSE2.
On OpenBSD, we've always defined OPENSSL_IA32_SSE2 so this is effectively
a no-op. The only change is that we now check MMX rather than SSE2 for the
ghash MMX implementation.
ok bcook@ beck@
|
|
|
|
| |
via/ok jsg
|
|
|
|
|
|
|
|
|
|
| |
Provide EC_FIELD_ELEMENT and EC_FIELD_MODULUS, which allow for operations
on fixed width fields in constant time. These can in turn be used to
implement Elliptic Curve cryptography for prime fields, without needing
to use BN. This will improve the code, reduces timing leaks and enable
further optimisation.
ok beck@ tb@
|
|
|
|
|
|
|
| |
These implement constant time modular addition, subtraction and
multiplication in the Montegomery domain.
ok tb@
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move bn_add_words() and bn_sub_words() from bn_add.c to bn_add_sub.c.
These have effectively been replaced in the previous rewrites. Remove
the asserts - if bad lengths are passed the results will be incorrect
and things will fail (these should use size_t instead of int, but that
is a problem for another day).
Provide bn_sub_words_borrow(), which computes a subtraction but only
returns the resulting borrow. Provide bn_add_words_masked() and
bn_sub_words_masked(), which perform an masked addition or subtraction.
These can also be used to implement constant time addition and subtraction,
especially for reduction.
ok beck@ tb@
|
|
|
|
|
|
|
|
|
| |
In the diff_len < 0 case, it incorrectly uses 0 - b[0], which mishandles
the borrow - fix this by using bn_subw_subw(). Do the same in the
diff_len > 0 case for consistency. Note that this is never currently
reached since BN_usub() requires a >= b.
ok beck@ tb@
|
|
|
|
| |
ok jsing
|
|
|
|
| |
ok jsing
|
|
|
|
|
|
|
|
| |
This simplifies the handling of the BN_MONT_CTX passed in and unifies the
exit paths. Also zap some particularly insightful comments by our favorite
captain.
ok jsing
|
|
|
|
| |
ok jsing
|
|
|
|
| |
ok jsing
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This does what the public BN_MONT_CTX_new() should have done in the first
place rather than doing the toolkit thing of returning an invalid object
that you need to figure out how to populate and with what because the docs
are abysmal.
It takes the required arguments and calls BN_MONT_CTX_set(), which all
callers do immediately after _new() (except for DSA which managed to
squeeze 170 lines of garbage between the two calls).
ok jsing
|
|
|
|
|
| |
(leaving out a dotasm comment that would become harder to read than it
already is)
|
| |
|