summaryrefslogtreecommitdiffstats
path: root/crypto
diff options
context:
space:
mode:
authorSebastian Pop <spop@amazon.com>2022-03-28 20:58:15 +0000
committerTomas Mraz <tomas@openssl.org>2022-10-18 14:22:12 +0200
commit69c7154545606c3c54650b70360175e9a0fdda33 (patch)
tree42fac0cb57ff3b1d39518560dc854d3d2ec58752 /crypto
parent679ea6a1d4f031ee8281aea08356ea48cf5d7bb1 (diff)
disable 5x interleave on buffers shorter than 512 bytes: 3% speedup on Graviton2
d6e4287c9726691e800bff221be71edd894a3c6a introduced 5x interleaving as an optimization for ThunderX2, and that leads to some performance degradation on when encoding short buffers. We found this performance degradation by measuring the performance of nginx on Ubuntu 20.04 that comes with OpenSSL 1.1.1f and Ubuntu 22.04 with OpenSSL 3.0.1. This patch limits the 5x interleave to buffers larger than 512 bytes. On Graviton2 we see the following performance with this patch: $ openssl speed -evp aes-128-gcm -bytes 128 AES-128-GCM 64 bytes 79 bytes 80 bytes 128 bytes 256 bytes 511 bytes 512 bytes 1024 bytes master 1062564.71k 775113.11k 1069959.33k 1411716.28k 1653114.86k 1585981.16k 1973683.03k 2203214.08k master+patch 1062729.28k 771915.11k 1103883.42k 1458665.43k 1708701.20k 1647060.84k 1975571.80k 2204038.42k diff 0% 0% 3% 3% 3% 4% 0% 0% revert d6e428 1055290.03k 773448.92k 1117411.97k 1441478.57k 1695698.52k 1634598.04k 1981851.65k 2196680.36k CLA: trivial Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from https://github.com/openssl/openssl/pull/17984) (cherry picked from commit 9c140a33663f319ad4000a6a985c3e14297c7389)
Diffstat (limited to 'crypto')
-rwxr-xr-xcrypto/aes/asm/aesv8-armx.pl2
1 files changed, 1 insertions, 1 deletions
diff --git a/crypto/aes/asm/aesv8-armx.pl b/crypto/aes/asm/aesv8-armx.pl
index 06e43855da..6a7bf05d1b 100755
--- a/crypto/aes/asm/aesv8-armx.pl
+++ b/crypto/aes/asm/aesv8-armx.pl
@@ -1825,7 +1825,7 @@ $code.=<<___ if ($flavour !~ /64/);
vorr $dat2,$ivec,$ivec
___
$code.=<<___ if ($flavour =~ /64/);
- cmp $len,#2
+ cmp $len,#32
b.lo .Loop3x_ctr32
add w13,$ctr,#1