summaryrefslogtreecommitdiffstats
path: root/INSTALL.md
diff options
context:
space:
mode:
authorDaniel Hu <Daniel.Hu@arm.com>2022-07-19 18:43:28 +0100
committerTomas Mraz <tomas@openssl.org>2022-11-23 18:21:42 +0100
commit8bee6acc6fa05993f60f2cff8754453055b8e09e (patch)
tree04f8a5893721369a65a38eae5acc50f0fd6ed347 /INSTALL.md
parent6bf9a6e59cb42f763f2c532915ce9d1acf5d6836 (diff)
Improve chacha20 perfomance on aarch64 by interleaving scalar with SVE/SVE2
The patch will process one extra block by scalar in addition to blocks by SVE/SVE2 in parallel. This is esp. helpful in the scenario where we only have 128-bit vector length. The actual uplift to performance is complicated, depending on the vector length and input data size. SVE/SVE2 implementation don't always perform better than Neon, but it should prevail in most cases On a CPU with 256-bit SVE/SVE2, interleaved processing can handle 9 blocks in parallel (8 blocks by SVE and 1 by Scalar). on 128-bit SVE/SVE2 it is 5 blocks. Input size that is a multiple of 9/5 blocks on respective CPU can be typically handled at maximum speed. Here are test data for 256-bit and 128-bit SVE/SVE2 by running "openssl speed -evp chacha20 -bytes 576" (and other size) ----------------------------------+--------------------------------- 256-bit SVE | 128-bit SVE2 ----------------------------------|--------------------------------- Input 576 bytes 512 bytes | 320 bytes 256 bytes ----------------------------------|--------------------------------- SVE 1716361.91k 1556699.18k | 1615789.06k 1302864.40k ----------------------------------|--------------------------------- Neon 1262643.44k 1509044.05k | 680075.67k 1060532.31k ----------------------------------+--------------------------------- If the input size gets very large, the advantage of SVE/SVE2 over Neon will fade out. Signed-off-by: Daniel Hu <Daniel.Hu@arm.com> Change-Id: Ieedfcb767b9c08280d7c8c9a8648919c69728fab Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from https://github.com/openssl/openssl/pull/18901) (cherry picked from commit 3f42f41ad19c631287386fd8d58f9e02466c5e3f)
Diffstat (limited to 'INSTALL.md')
0 files changed, 0 insertions, 0 deletions