summaryrefslogtreecommitdiff
path: root/src/lib/libcrypto/bn/bn_internal.h (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Provide bn_mod_sqr_words() and call it from ec_field_element_sqr().jsing2025-08-021-1/+3
| | | | | For now this still calls bn_montgomery_multiply_words(), however it can be optimised further in the future.
* Implement EC field element operations.jsing2025-05-251-1/+3
| | | | | | | | | | Provide EC_FIELD_ELEMENT and EC_FIELD_MODULUS, which allow for operations on fixed width fields in constant time. These can in turn be used to implement Elliptic Curve cryptography for prime fields, without needing to use BN. This will improve the code, reduces timing leaks and enable further optimisation. ok beck@ tb@
* Provide bn_mod_{add,sub,mul}_words().jsing2025-05-251-1/+11
| | | | | | | These implement constant time modular addition, subtraction and multiplication in the Montegomery domain. ok tb@
* Fix previous.jsing2025-05-251-3/+3
|
* Provide additional variants of bn_add_words()/bn_sub_words().jsing2025-05-251-1/+11
| | | | | | | | | | | | | | | | Move bn_add_words() and bn_sub_words() from bn_add.c to bn_add_sub.c. These have effectively been replaced in the previous rewrites. Remove the asserts - if bad lengths are passed the results will be incorrect and things will fail (these should use size_t instead of int, but that is a problem for another day). Provide bn_sub_words_borrow(), which computes a subtraction but only returns the resulting borrow. Provide bn_add_words_masked() and bn_sub_words_masked(), which perform an masked addition or subtraction. These can also be used to implement constant time addition and subtraction, especially for reduction. ok beck@ tb@
* Provide additional BN primitives for BN_ULLONG architectures.jsing2023-06-251-21/+79
| | | | | | | | | | | | | | | | | On BN_ULLONG architectures, the C compiler can usually do a decent job of optimising primitives, however it struggles to see through primitive calls due to type narrowing. As such, providing explicit versions of compound primitives can result in the production of more optimal code. For example, on arm the bn_mulw_addw_addw() primitive can be replaced with a single umaal instruction, which provides significant performance gains. Rather than intermingling #ifdef/#else throughout the header, the BN_ULLONG defines are pulled up above the normal functions. This also allows complex compound primitives to be reused. The conditionals have also been changed from BN_LLONG to BN_ULLONG, since that is what really matters. ok tb@
* Provide and use bn_clzw() in place of bn_word_clz().jsing2023-06-211-1/+9
| | | | | | | | | | On some architectures, we can provide an optimised (often single instruction) count-leading-zero implementation. In order to do this effectively, provide bn_clzw() as a static inline that can be replaced by an architecture specific version. The default implementation defers to the bn_word_clz() function (which may also be architecture specific). ok tb@
* Make BN_num_bits() independent of bn->top.jsing2023-06-211-1/+5
| | | | | | | | Provide bn_bitsize(), which performs a constant time scan of a BN in order to determine the bit size of the BN value. Use this for BN_num_bits() such that it is no longer dependent on the bn->top value. ok tb@
* Provide and use various quad word primitives.jsing2023-06-121-1/+109
| | | | | | | | This includes bn_qwaddqw(), bn_qwsubqw(), bn_qwmulw_addw() and bn_qwmulw_addqw_addw(). These can typically be optimised on architectures that have a reasonable number of general purpose registers. ok tb@
* Slightly rework bn_mulw_addtw().jsing2023-03-071-5/+3
| | | | | | | | | Call bn_mulw_addw() rather than doing bn_mulw() follow by bn_addw(). This simplifies the code slightly, plus on some platforms bn_mulw_addw() can be optimised (and bn_mulw_addtw() will then benefit from such an optimisation). ok tb@
* Fix comment for bn_mul2_mulw_addtw()jsing2023-03-071-5/+5
|
* Reimplement bn_sqr_comba{4,8}().jsing2023-02-171-1/+29
| | | | | | | | | | | | Use bignum primitives rather than the current mess of macros.The sqr_add_c macro gets replaced with bn_mulw_addtw(), while the sqr_add_c2 macro gets replaced with bn_mul2_mulw_addtw(). The variables in the comba functions have also been reordered, so that the patterns are easier to understand - the compiler can take care of optimising the inputs and outputs to avoid register moves. ok tb@
* Use bn_addw() in bn_mulw(), rather than duplicating add with carry code.jsing2023-02-161-12/+7
|
* Rename bn_umul_hilo() to bn_mulw().jsing2023-02-161-55/+59
| | | | | | | | | This keeps the naming consistent with the other bignum primitives that have been recently introduced. Also, use 1/0 intead of h/l (e.g. a1 instead of ah), as this keeps consistency with other primitives and allows for naming that works with double word, triple word and quadruple word inputs/outputs. Discussed with tb@
* Add missing masks to accumulator version of bn_umul_hilo()jsing2023-02-161-1/+5
|
* Reimplement bn_add_words() and bn_sub_words() using bignum primitives.jsing2023-02-161-1/+58
| | | | | | | This removes the effectively duplicate BN_LLONG version of bn_add_words() and simplifies the code considerably. ok tb@
* zap tabtb2023-02-151-2/+2
|
* Provide big number primitives for word addition/multiplication.jsing2023-02-141-1/+114
| | | | | | | | | | These use a consistent naming scheme and are implemented using bitwise/constant time style operations, which should generally be safe on all platforms (until a compiler decides to optimise and use branches). More optimised versions can be provided for a given architecture. ok tb@
* Provide bn_ct_{eq,ne}_zero{,_mask}() inline functions.jsing2023-02-141-1/+33
| | | | | | | These will be used to test a BN_ULONG in cases where constant time style behaviour is required. ok tb@
* Provide bn_umul_hilo().jsing2023-01-311-0/+159
The bignum code needs to be able to multiply two words, producing a double word result. Some architectures do not have native support for this, hence a pure C version is required. bn_umul_hilo() provides this functionality. There are currently two implementations, both of which are branch free. The first uses bitwise operations for the carry, while the second uses accumulators. The accumulator version uses fewer instructions, however requires more variables/registers and seems to be slower, at least on amd64/i386. The accumulator version may be faster on architectures that have more registers available. Further testing can be performed and one of the two implementations can be removed at a later date. ok tb@