diff options
author | Andy Polyakov <appro@openssl.org> | 2015-03-03 22:05:25 +0100 |
---|---|---|
committer | Andy Polyakov <appro@openssl.org> | 2015-04-02 09:47:56 +0200 |
commit | 94376cccb4ed5b376220bffe0739140ea9dad8c8 (patch) | |
tree | 162487337d9149d1c34e585f5bfb5583f0561f64 /crypto/sha | |
parent | 7b644df899d0c818488686affc0bfe2dfdd0d0c2 (diff) |
aes/asm/aesv8-armx.pl: optimize for Cortex-A5x.
ARM has optimized Cortex-A5x pipeline to favour pairs of complementary
AES instructions. While modified code improves performance of post-r0p0
Cortex-A53 performance by >40% (for CBC decrypt and CTR), it hurts
original r0p0. We favour later revisions, because one can't prevent
future from coming. Improvement on post-r0p0 Cortex-A57 exceeds 50%,
while new code is not slower on r0p0, or Apple A7 for that matter.
[Update even SHA results for latest Cortex-A53.]
Reviewed-by: Richard Levitte <levitte@openssl.org>
Diffstat (limited to 'crypto/sha')
-rw-r--r-- | crypto/sha/asm/sha1-armv8.pl | 2 | ||||
-rw-r--r-- | crypto/sha/asm/sha512-armv8.pl | 4 |
2 files changed, 3 insertions, 3 deletions
diff --git a/crypto/sha/asm/sha1-armv8.pl b/crypto/sha/asm/sha1-armv8.pl index 6be8624342..bdf48b8ec4 100644 --- a/crypto/sha/asm/sha1-armv8.pl +++ b/crypto/sha/asm/sha1-armv8.pl @@ -14,7 +14,7 @@ # # hardware-assisted software(*) # Apple A7 2.31 4.13 (+14%) -# Cortex-A53 2.19 8.73 (+108%) +# Cortex-A53 2.24 8.03 (+97%) # Cortex-A57 2.35 7.88 (+74%) # # (*) Software results are presented mostly for reference purposes. diff --git a/crypto/sha/asm/sha512-armv8.pl b/crypto/sha/asm/sha512-armv8.pl index 45eb719fe5..717473b86e 100644 --- a/crypto/sha/asm/sha512-armv8.pl +++ b/crypto/sha/asm/sha512-armv8.pl @@ -14,7 +14,7 @@ # # SHA256-hw SHA256(*) SHA512 # Apple A7 1.97 10.5 (+33%) 6.73 (-1%(**)) -# Cortex-A53 2.38 15.6 (+110%) 10.1 (+190%(***)) +# Cortex-A53 2.38 15.5 (+115%) 10.0 (+150%(***)) # Cortex-A57 2.31 11.6 (+86%) 7.51 (+260%(***)) # # (*) Software SHA256 results are of lesser relevance, presented @@ -25,7 +25,7 @@ # (***) Super-impressive coefficients over gcc-generated code are # indication of some compiler "pathology", most notably code # generated with -mgeneral-regs-only is significanty faster -# and lags behind assembly only by 50-90%. +# and the gap is only 40-90%. $flavour=shift; $output=shift; |