diff options
Diffstat (limited to 'src/lib/libcrypto/rc4/asm/rc4-x86_64.pl')
-rwxr-xr-x | src/lib/libcrypto/rc4/asm/rc4-x86_64.pl | 4 |
1 files changed, 2 insertions, 2 deletions
diff --git a/src/lib/libcrypto/rc4/asm/rc4-x86_64.pl b/src/lib/libcrypto/rc4/asm/rc4-x86_64.pl index 2135b38ef8..18a967e546 100755 --- a/src/lib/libcrypto/rc4/asm/rc4-x86_64.pl +++ b/src/lib/libcrypto/rc4/asm/rc4-x86_64.pl | |||
@@ -50,7 +50,7 @@ | |||
50 | # As was shown by Zou Nanhai loop unrolling can improve Intel EM64T | 50 | # As was shown by Zou Nanhai loop unrolling can improve Intel EM64T |
51 | # performance by >30% [unlike P4 32-bit case that is]. But this is | 51 | # performance by >30% [unlike P4 32-bit case that is]. But this is |
52 | # provided that loads are reordered even more aggressively! Both code | 52 | # provided that loads are reordered even more aggressively! Both code |
53 | # pathes, AMD64 and EM64T, reorder loads in essentially same manner | 53 | # paths, AMD64 and EM64T, reorder loads in essentially same manner |
54 | # as my IA-64 implementation. On Opteron this resulted in modest 5% | 54 | # as my IA-64 implementation. On Opteron this resulted in modest 5% |
55 | # improvement [I had to test it], while final Intel P4 performance | 55 | # improvement [I had to test it], while final Intel P4 performance |
56 | # achieves respectful 432MBps on 2.8GHz processor now. For reference. | 56 | # achieves respectful 432MBps on 2.8GHz processor now. For reference. |
@@ -81,7 +81,7 @@ | |||
81 | # The only code path that was not modified is P4-specific one. Non-P4 | 81 | # The only code path that was not modified is P4-specific one. Non-P4 |
82 | # Intel code path optimization is heavily based on submission by Maxim | 82 | # Intel code path optimization is heavily based on submission by Maxim |
83 | # Perminov, Maxim Locktyukhin and Jim Guilford of Intel. I've used | 83 | # Perminov, Maxim Locktyukhin and Jim Guilford of Intel. I've used |
84 | # some of the ideas even in attempt to optmize the original RC4_INT | 84 | # some of the ideas even in attempt to optimize the original RC4_INT |
85 | # code path... Current performance in cycles per processed byte (less | 85 | # code path... Current performance in cycles per processed byte (less |
86 | # is better) and improvement coefficients relative to previous | 86 | # is better) and improvement coefficients relative to previous |
87 | # version of this module are: | 87 | # version of this module are: |