| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
| |
This provides significant performance gains for bn_sqr_comba4() and
bn_sqr_comba8().
|
|
|
|
|
|
|
|
|
|
| |
Factor out and optimise the inner loop for Montgomery multiplication,
making use of bn_qwmulw_addqw_addw() to perform Montgomery multiplication
by one word in larger steps. This provides a significant performance gain,
especially on platforms where bn_qwmulw_addqw_addw() is (or can be)
optimised.
ok tb@
|
|
|
|
|
|
|
|
|
|
|
| |
All the functions changed in this commit would silently misbehave if the
return value aliases the modulus, most of the time they would succeed and
return an incorrect result of 0 in that situation. This adjusts all the
functions in BN_mod.c, others and documentation will follow later.
Prompted by a bug report about BN_mod_inverse() by Guido Vranken.
ok jsing
|
|
|
|
|
|
|
|
|
|
| |
One problem with OpenSSL error codes is that they tend to be too specific
(another problem is that they are extremely ugly). So add an EINVAL-style
error code. This will be used in an upcoming commit to disallow aliasing
of the 'return value' with the modulus in BN_mod_* functions and should
be applicable elsewhere, outside of this one narrow use case.
ok jsing
|
|
|
|
| |
This provides a performance gain across most BN operations.
|
|
|
|
|
|
|
|
| |
This includes bn_qwaddqw(), bn_qwsubqw(), bn_qwmulw_addw() and
bn_qwmulw_addqw_addw(). These can typically be optimised on architectures
that have a reasonable number of general purpose registers.
ok tb@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This traded local copies of CTASSERT() to the one in crypto_internal.h.
This change was backed out due to SHA-512 breakage on STRICT_ALIGNMENT
architectures still using Fred Flintstone's gcc without asm sha512.
Original commit message:
Use crypto_internal.h's CTASSERT()
Now that this macro is available in a header, let's use that version
rather than copies in several .c files.
discussed with jsing
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The somewhat strange calculation m = a^{-1} (mod m) can return 0. This
breaks because of BN_nnmod() having delicate semantics of which variable
can be reused. BN_nnmod(a, a, m, ctx) works and the library relies on that.
Here, the code ends up doing BN_nnmod(m, a, m, ctx) and this doesn't work.
If the result of the initial BN_mod() is negative, then BN_nnmod() will
return 0.
Problem reported by Guido Vranken in
https://github.com/openssl/openssl/issues/21110
This code is well covered by regress, but it does not currently have
explicit test coverage. Such will be added soon.
ok beck jsing
|
|
|
|
|
| |
This results in bn_mul_comba4() and bn_mul_comba8() requiring ~30% less
instructions than they did previously.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This gives us more readable and safer code. There are two intentional
changes to behaviour - firstly, all three functions zero any BN that was
passed in, prior to doing any further processing. This means that a passed
BN is always in a known state, regardless of what happens later. Secondly,
BN_asc2bn() now fails on NULL input, rather than crashing. This brings its
behaviour inline with BN_dec2bn() and BN_hex2bn().
ok tb@
|
| |
|
|
|
|
|
|
|
| |
Now that this macro is available in a header, let's use that version
rather than copies in several .c files.
discussed with jsing
|
|
|
|
|
|
|
| |
This is more accurate and improves readability a bit. Apart from a comment
tweak this is sed + knfmt (which resulted in four wrapped lines).
Discussed with beck and jsing
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The behavior of the BPSW primality test for numbers > 2^64 is not very
well understood. While there is no known composite that passes the test,
there are heuristics that indicate that there are likely infinitely many.
Therefore it seems appropriate to harden the test. Having a settable
number of MR rounds before doing a version of BPSW is also the approach
taken by Go's primality check in math/big.
This adds a new implementation of the old MR test that runs before running
the strong Lucas test. I like to imagine that it's slightly cleaner code.
We're effectively at about twice the cost of what we had a year ago. In
addition, it adds some non-determinism in case there actually are false
positives for the BPSW test.
The implementation is straightforward. It could easily be tweaked to use
the additional gcds in the "enhanced" MR test of FIPS 186-5, but as long
as we are only going to throw away the additional info, that's not worth
much.
This is a first step towards incorporating some of the considerations in
"A performant misuse-resistant API for Primality Testing" by Massimo and
Paterson. Further work will happen in tree. In particular, there are plans
to crank the number of Miller-Rabin tests considerably so as to have a
guaranteed baseline. The manual will be updated shortly.
positive feedback beck
ok jsing
|
|
|
|
|
|
|
|
|
|
| |
Anything taken to the power of 0 is 1, and then reduced mod 1 or mod -1 it
will be 0. If "anything" includes 0 or not is a matter of convention, but
it should not depend on the sign of the modulus...
Reported by Guido Vranken
ok jsing (who had the same diff)
|
|
|
|
| |
ok tb@
|
|
|
|
| |
ok tb@
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
This code is full of problematic C and is also otherwise of questionable
quality. It is far from constant time and jsing informs me it also isn't
faster. Good riddance.
|
| |
|
| |
|
|
|
|
| |
ok jsing
|
|
|
|
| |
ok jsing
|
|
|
|
| |
ok jsing
|
|
|
|
| |
ok jsing, and kind of tb an earlier version
|
|
|
|
|
|
|
|
|
| |
Pull a number of invariants into variables, which avoids repeated loading
from memory on architectures where sufficient registers are available.
Also keep track of the per-iteration carry in a variable, rather than
unnecessarily reading from and writing to memory.
This gives a reasonable performance gain on some architectures (e.g. armv7)
|
|
|
|
| |
ok tb@
|
| |
|
|
|
|
| |
No functional change.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This removes a bunch of incomplete and scary code, which potentially leaks
secrets and is not constant time. A performance gain is achieved on arm64
for sizes that we care about, while a minimal decrease in performance is
noted for larger sizes on some other platforms.
While we will potentially reimplement Karatsuba (or Toom-Cook) at a later
date, it will be easier and safer to do it from a clean slate.
ok tb@
|
|
|
|
| |
No functional change.
|
| |
|
|
|
|
|
|
|
| |
This supports a mostly forgotten, seemingly unused and long retired
standard. No need for this in our public API Dyson sphere.
ok jsing
|
|
|
|
|
|
|
| |
This is unused outside of the library and could do with some reworking.
That's easier without having to care about outside consumers.
ok jsing
|
|
|
|
|
|
|
|
| |
With the corresponding structs now being opaque, the only thing they are
good for outside the library are memory leaks. They will be removed
completely or become internal only.
ok jsing
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The faster nist code is rife with problematic C. While this is generally
considered to be a pleonasm nowadays, here it specifically refers to
aliasing issues and other flavors of undefined behavior. With compilers
and standardization committees becoming seemingly more determined about
making C even more unusable than it already is, this code has resulted
in miscompilations and generally is a target rich environment for fuzzers
to feast on. We're better off without it. Go look while it's still there.
It's some of the very worst we have to offer.
ok jsing
|
| |
|
|
|
|
|
|
|
|
| |
This file primarily contains the various BN_bn2*() and BN_*2bn() functions
(along with BN_print() and BN_options()). More function shuffling will
follow.
Discussed with tb@
|
|
|
|
|
|
|
|
| |
This is simpler than the current code, while still being well optimised by
compilers, across a range of architectures. In many cases we even get a
performance gain for the BN sizes that we primarily care about.
Joint work with tb@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a reimplementation from scratch of the Tonelli-Shanks algorithm
based on Henri Cohen "A Course in Computational Algebraic Number Theory",
Springer GTM 138, section 1.5.1. It is API compatible with the previous
implementation, so no documentation change is required.
Contrary to the old implementation, this does not have any infinite loops
and has various additional sanity checks to prevent misbehavior in case
the input modulus is not a prime. It contains extensive comments and the
individual parts of the algorithm are split into digestible chunks instead
of having one huge function.
One difference of note is that it BN_mod_sqrt() now always returns the
smaller of the two possible answers. In other words, while its core is
non-deterministic, its answer is not.
ok jsing
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This function is spread out over way too many lines and has too much
repetition. Once this is made a little more compact, it becomes clearer
that this is a somewhat obfuscated version of binary gcd (it is not
constant time therefore cryptographically unsound. It is not used
internally). This will likely go away later.
ok jsing
|