summaryrefslogtreecommitdiffstats
path: root/crypto/poly1305/asm
diff options
context:
space:
mode:
authorAndy Polyakov <appro@openssl.org>2017-07-20 09:48:35 +0200
committerAndy Polyakov <appro@openssl.org>2017-07-21 14:07:32 +0200
commit64d92d74985ebb3d0be58a9718f9e080a14a8e7f (patch)
tree036456c8d139587371300824d273d1c500411d1a /crypto/poly1305/asm
parentbbb4ceb86eb6ea0300f744443c36fb6e980fff9d (diff)
x86_64 assembly pack: "optimize" for Knights Landing, add AVX-512 results.
"Optimize" is in quotes because it's rather a "salvage operation" for now. Idea is to identify processor capability flags that drive Knights Landing to suboptimial code paths and mask them. Two flags were identified, XSAVE and ADCX/ADOX. Former affects choice of AES-NI code path specific for Silvermont (Knights Landing is of Silvermont "ancestry"). And 64-bit ADCX/ADOX instructions are effectively mishandled at decode time. In both cases we are looking at ~2x improvement. AVX-512 results cover even Skylake-X :-) Hardware used for benchmarking courtesy of Atos, experiments run by Romain Dolbeau <romain.dolbeau@atos.net>. Kudos! Reviewed-by: Rich Salz <rsalz@openssl.org>
Diffstat (limited to 'crypto/poly1305/asm')
-rwxr-xr-xcrypto/poly1305/asm/poly1305-x86_64.pl7
1 files changed, 5 insertions, 2 deletions
diff --git a/crypto/poly1305/asm/poly1305-x86_64.pl b/crypto/poly1305/asm/poly1305-x86_64.pl
index 1dce5d61e3..1faa6ebf46 100755
--- a/crypto/poly1305/asm/poly1305-x86_64.pl
+++ b/crypto/poly1305/asm/poly1305-x86_64.pl
@@ -27,14 +27,15 @@
# Numbers are cycles per processed byte with poly1305_blocks alone,
# measured with rdtsc at fixed clock frequency.
#
-# IALU/gcc-4.8(*) AVX(**) AVX2
+# IALU/gcc-4.8(*) AVX(**) AVX2 AVX-512
# P4 4.46/+120% -
# Core 2 2.41/+90% -
# Westmere 1.88/+120% -
# Sandy Bridge 1.39/+140% 1.10
# Haswell 1.14/+175% 1.11 0.65
-# Skylake 1.13/+120% 0.96 0.51
+# Skylake[-X] 1.13/+120% 0.96 0.51 [0.35]
# Silvermont 2.83/+95% -
+# Knights L 3.60/- 1.65 1.10 (***)
# Goldmont 1.70/+180% -
# VIA Nano 1.82/+150% -
# Sledgehammer 1.38/+160% -
@@ -49,6 +50,8 @@
# Core processors, 50-30%, less newer processor is, but slower on
# contemporary ones, for example almost 2x slower on Atom, and as
# former are naturally disappearing, SSE2 is deemed unnecessary;
+# (***) Current AVX-512 code requires BW and VL extensions and can not
+# execute on Knights Landing;
$flavour = shift;
$output = shift;