summaryrefslogtreecommitdiffstats
path: root/crypto/armcap.c
AgeCommit message (Collapse)Author
2023-03-29Ensure there's only one copy of OPENSSL_armcap_P in libcrypto.aTom Cosgrove
Change-Id: Ia94e528a2d55934435de6a2949784c52eb38d82f Reviewed-by: Matt Caswell <matt@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/20621)
2023-03-27Avoid duplication of OPENSSL_armcap_P on 32bit ARMTomas Mraz
Reviewed-by: Tom Cosgrove <tom.cosgrove@arm.com> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from https://github.com/openssl/openssl/pull/20558)
2023-03-03Tidy up aarch64 feature detection code in armcap.cTom Cosgrove
Make the SIGILL-based code easier to read, and don't use it on Apple Silicon. Also fix "error: 'HWCAP(2)_*' macro redefined" warnings on FreeBSD. Fixes #20188 Change-Id: I5618bbe9444cc40cb5705c6ccbdc331c16bab794 Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from https://github.com/openssl/openssl/pull/20305)
2023-02-08Apply aes-gcm unroll8+eor3 optimization patch to Neoverse V2Xiaokang Qian
Reviewed-by: Paul Dale <pauli@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/20184)
2023-01-30Enable AES optimisation on Apple Silicon M2-based systemsTom Cosgrove
Gives a performance enhancement of 16-38%, similar to the M1. Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Hugo Landau <hlandau@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from https://github.com/openssl/openssl/pull/20141)
2022-12-06Fix the code used to detect aarch64 capabilities when we don't have getauxval()Tom Cosgrove
In addition to a missing prototype there was also a missing closing brace '}'. Fixes #19825. Reviewed-by: Matt Caswell <matt@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/19833)
2022-11-24Add two new build targets to enable the possibility of using clang-cl asEverton Constantino
an assembler for Windows on Arm builds and also clang-cl as the compiler as well. Make appropriate changes to armcap source and peralsm scripts. Reviewed-by: Paul Dale <pauli@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Hugo Landau <hlandau@openssl.org> (Merged from https://github.com/openssl/openssl/pull/19523)
2022-10-04armcap: skip probing _armv7_tick()Cameron Gutman
Detection of this feature is unreliable so only use it if requested. Reviewed-by: Paul Dale <pauli@openssl.org> Reviewed-by: Dmitry Belyavskiy <beldmit@gmail.com> Reviewed-by: Hugo Landau <hlandau@openssl.org> (Merged from https://github.com/openssl/openssl/pull/18852)
2022-05-23Apply the AES-GCM unroll8 optimization patch to Neoverse N2XiaokangQian
The loop unrolling and use of EOR3 can improve N2 performance by up to 32% Signed-off-by: XiaokangQian <xiaokang.qian@arm.com> Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from https://github.com/openssl/openssl/pull/18350)
2022-05-03Update copyright yearMatt Caswell
Reviewed-by: Tomas Mraz <tomas@openssl.org> Release: yes
2022-05-03Acceleration of chacha20 on aarch64 by SVEDaniel Hu
This patch accelerates chacha20 on aarch64 when Scalable Vector Extension (SVE) is supported by CPU. Tested on modern micro-architecture with 256-bit SVE, it has the potential to improve performance up to 20% The solution takes a hybrid approach. SVE will handle multi-blocks that fit the SVE vector length, with Neon/Scalar to process any tail data Test result: With SVE type 1024 bytes 8192 bytes 16384 bytes ChaCha20 1596208.13k 1650010.79k 1653151.06k Without SVE (by Neon/Scalar) type 1024 bytes 8192 bytes 16384 bytes chacha20 1355487.91k 1372678.83k 1372662.44k The assembly code has been reviewed internally by ARM engineer Fangming.Fang@arm.com Signed-off-by: Daniel Hu <Daniel.Hu@arm.com> Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from https://github.com/openssl/openssl/pull/17916)
2022-01-25Optimize AES-GCM for uarchs with unroll and new instructionsXiaokangQian
Increase the block numbers to 8 for every iteration. Increase the hash table capacity. Make use of EOR3 instruction to improve the performance. This can improve performance 25-40% on out-of-order microarchitectures with a large number of fast execution units, such as Neoverse V1. We also see 20-30% performance improvements on other architectures such as the M1. Assembly code reviewd by Tom Cosgrove (ARM). Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from https://github.com/openssl/openssl/pull/15916)
2022-01-18SM4 optimization for ARM by HW instructionDaniel Hu
This patch implements the SM4 optimization for ARM processor, using SM4 HW instruction, which is an optional feature of crypto extension for aarch64 V8. Tested on some modern ARM micro-architectures with SM4 support, the performance uplift can be observed around 8X~40X over existing C implementation in openssl. Algorithms that can be parallelized (like CTR, ECB, CBC decryption) are on higher end, with algorithm like CBC encryption on lower end (due to inter-block dependency) Perf data on Yitian-710 2.75GHz hardware, before and after optimization: Before: type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes SM4-CTR 105787.80k 107837.87k 108380.84k 108462.08k 108549.46k 108554.92k SM4-ECB 111924.58k 118173.76k 119776.00k 120093.70k 120264.02k 120274.94k SM4-CBC 106428.09k 109190.98k 109674.33k 109774.51k 109827.41k 109827.41k After (7.4x - 36.6x faster): type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes SM4-CTR 781979.02k 2432994.28k 3437753.86k 3834177.88k 3963715.58k 3974556.33k SM4-ECB 937590.69k 2941689.02k 3945751.81k 4328655.87k 4459181.40k 4468692.31k SM4-CBC 890639.88k 1027746.58k 1050621.78k 1056696.66k 1058613.93k 1058701.31k Signed-off-by: Daniel Hu <Daniel.Hu@arm.com> Reviewed-by: Paul Dale <pauli@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/17455)
2022-01-14SM3 acceleration with SM3 hardware instruction on aarch64fangming.fang
SM3 hardware instruction is optional feature of crypto extension for aarch64. This implementation accelerates SM3 via SM3 instructions. For the platform not supporting SM3 instruction, the original C implementation still works. Thanks to AliBaba for testing and reporting the following perf numbers for Yitian710: Benchmark on T-Head Yitian-710 2.75GHz: Before: type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes sm3 49297.82k 121062.63k 223106.05k 283371.52k 307574.10k 309400.92k After (33% - 74% faster): type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes sm3 65640.01k 179121.79k 359854.59k 481448.96k 534055.59k 538274.47k Reviewed-by: Paul Dale <pauli@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/17454)
2022-01-05fix building failure when using -Wconditional-uninitializedfangming.fang
Use clang -Wconditional-uninitialized to build, the error "initialize the variable 'buffer_size' to silence this warning" will be reported. Reviewed-by: Paul Dale <pauli@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/17375)
2021-12-16Add Arm Assembly (aarch64) support for RNGOrr Toledano
Include aarch64 asm instructions for random number generation using the RNDR and RNDRRS instructions. Provide detection functions for RNDR and RNDRRS getauxval. Reviewed-by: Paul Dale <pauli@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/15361)
2021-11-24Fix detection of ARMv7 and ARM64 CPU features on FreeBSDAllan Jude
OpenSSL assumes AT_HWCAP = 16 (as on Linux), but on FreeBSD AT_HWCAP = 25 Switch to using AT_HWCAP, and setting it to 16 if it is not defined. OpenSSL calls elf_auxv_info() with AT_CANARY which returns ENOENT resulting in all ARM acceleration features being disabled. CLA: trivial Reviewed-by: Ben Kaduk <kaduk@mit.edu> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/17082)
2021-06-25enable getauxval on android 10yunh
Fixes #9498 Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from https://github.com/openssl/openssl/pull/15870) (cherry picked from commit b2dea4d5f22ec146373324c282fb1bcecd5a7d90)
2021-06-15Use getauxval on Android with API level > 18Lars Immisch
We received analytics that devices of the device family Oppo A37x are crashing with SIGILL when trying to load libcrypto.so. These crashes were fixed by using the system-supplied getauxval function. Reviewed-by: Kurt Roeckx <kurt@roeckx.be> Reviewed-by: Tim Hudson <tjh@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/11257)
2021-05-28Initialise OPENSSL_armcap_P to 0 before setting it based on capabilities, ↵Tom Cosgrove
not after Signed-off-by: Tom Cosgrove <tom.cosgrove@arm.com> Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Shane Lontis <shane.lontis@oracle.com> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from https://github.com/openssl/openssl/pull/15486)
2021-05-11armcap: fix Mac M1 SHA512 support.David CARLIER
The SIGILL catch/trap works however disabled purposely for Darwin, thus relying on native api instead. Reviewed-by: Paul Dale <pauli@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/14935)
2021-05-09Optimize RSA on armv8fangming.fang
Add Neon path for RSA on armv8, this optimisation targets to A72 and N1 that are ones of important cores of infrastructure. Other platforms are not impacted. A72 old new improved rsa 512 sign 9828.6 9738.7 -1% rsa 512 verify 121497.2 122367.7 1% rsa 1024 sign 1818 1816.9 0% rsa 1024 verify 37175.6 37161.3 0% rsa 2048 sign 267.3 267.4 0% rsa 2048 verify 10127.6 10119.6 0% rsa 3072 sign 86.8 87 0% rsa 3072 verify 4604.2 4956.2 8% rsa 4096 sign 38.3 38.5 1% rsa 4096 verify 2619.8 2972.1 13% rsa 7680 sign 5 7 40% rsa 7680 verify 756 929.4 23% rsa 15360 sign 0.8 1 25% rsa 15360 verify 190.4 246 29% N1 old new improved rsa 512 sign 12599.2 12596.7 0% rsa 512 verify 148636.1 148656.2 0% rsa 1024 sign 2150.6 2148.9 0% rsa 1024 verify 42353.5 42265.2 0% rsa 2048 sign 305.5 305.3 0% rsa 2048 verify 11209.7 11205.2 0% rsa 3072 sign 97.8 98.2 0% rsa 3072 verify 5061.3 5990.7 18% rsa 4096 sign 42.8 43 0% rsa 4096 verify 2867.6 3509.8 22% rsa 7680 sign 5.5 8.4 53% rsa 7680 verify 823.5 1058.3 29% rsa 15360 sign 0.9 1.1 22% rsa 15360 verify 207 273.9 32% CustomizedGitHooks: yes Change-Id: I01c732cc429d793c4eb5ffd27ccd30ff9cebf8af Jira: SECLIB-540 Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from https://github.com/openssl/openssl/pull/14761)
2021-01-28Update copyright yearRichard Levitte
Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/13999)
2021-01-14OPENSSL_cpuid_setup FreeBSD arm update.David Carlier
when possible using the getauxval equivalent which has similar ids as Linux, instead of bad instructions catch approach. Reviewed-by: Ben Kaduk <kaduk@mit.edu> Reviewed-by: Matt Caswell <matt@openssl.org> (Merged from https://github.com/openssl/openssl/pull/13650)
2020-12-09Read MIDR_EL1 system register on aarch64Fangming.Fang
MIDR_EL1 system register exposes microarchitecture information so that people can make micro-arch related optimization such as exposing as much instruction level parallelism as possible. MIDR_EL1 register can be read only if HWCAP_CPUID feature is supported. Change-Id: Iabb8a36c5d31b184dba6399f378598058d394d4e Reviewed-by: Paul Dale <paul.dale@oracle.com> Reviewed-by: Tomas Mraz <tmraz@fedoraproject.org> (Merged from https://github.com/openssl/openssl/pull/11744)
2019-01-16crypto/armcap.c, crypto/ppccap.c: stricter use of getauxval()Richard Levitte
Having a weak getauxval() and only depending on GNU C without looking at the library we build against meant that it got picked up where not really expected. So we change this to check for the glibc version, and since we know it exists from that version, there's no real need to make it weak. Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de> (Merged from https://github.com/openssl/openssl/pull/8028)
2018-12-06Following the license change, modify the boilerplates in crypto/Richard Levitte
[skip ci] Reviewed-by: Matt Caswell <matt@openssl.org> (Merged from https://github.com/openssl/openssl/pull/7827)
2018-03-06crypto/armcap.c: mask SHA512 hardware detection on iOS.Andy Polyakov
When running iOS application from command line it's impossible to get past the failing capability detection. This is because it's executed under debugger and iOS debugger is impossible to deal with. [If Apple implements SHA512 in silicon, it would have to be detected with sysctlbyname.] Reviewed-by: Rich Salz <rsalz@openssl.org>
2018-02-13Update copyright yearMatt Caswell
Reviewed-by: Richard Levitte <levitte@openssl.org>
2018-02-12crypto/armcap.c: detect hardware-assisted SHA512 support.Andy Polyakov
Reviewed-by: Rich Salz <rsalz@openssl.org>
2017-11-25Create a prototype for OPENSSL_rdtscKurt Roeckx
Switch to make it return an uint32_t instead of the various different types it returns now. Fixes: #3125 Reviewed-by: Andy Polyakov <appro@openssl.org> GH: #4757
2017-08-05Fix typo in files in crypto folderXiaoyin Liu
Reviewed-by: Kurt Roeckx <kurt@roeckx.be> Reviewed-by: Andy Polyakov <appro@openssl.org> GH: #4093
2017-06-16Modify type of variable in OPENSSL_cpuid_setup functionkomainu8
CLA: trivial Reviewed-by: Kurt Roeckx <kurt@roeckx.be> Reviewed-by: Rich Salz <rsalz@openssl.org> (Merged from https://github.com/openssl/openssl/pull/3651)
2017-02-15crypto/armcap.c: short-circuit processor capability probe in iOS builds.Andy Polyakov
Capability probing by catching SIGILL appears to be problematic on iOS. But since Apple universe is "monocultural", it's actually possible to simply set pre-defined processor capability mask. Reviewed-by: Rich Salz <rsalz@openssl.org> (Merged from https://github.com/openssl/openssl/pull/2617)
2016-05-17Copyright consolidation 07/10Rich Salz
Reviewed-by: Richard Levitte <levitte@openssl.org>
2015-04-20Add assembly support for 32-bit iOS.Andy Polyakov
Reviewed-by: Matt Caswell <matt@openssl.org> Reviewed-by: Richard Levitte <levitte@openssl.org>
2015-01-23Add assembly support to ios64-cross.Andy Polyakov
Fix typos in ios64-cross config line. Reviewed-by: Tim Hudson <tjh@openssl.org>
2015-01-22Run util/openssl-format-source -v -c .Matt Caswell
Reviewed-by: Tim Hudson <tjh@openssl.org>
2015-01-04Remove inconsistency in ARM support.Andy Polyakov
This facilitates "universal" builds, ones that target multiple architectures, e.g. ARMv5 through ARMv7. See commentary in Configure for details. Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Matt Caswell <matt@openssl.org>
2014-06-01Add linux-aarch64 taget.Andy Polyakov
armcap.c is shared between 32- and 64-bit builds and features link-time detection of getauxval. Submitted by: Ard Biesheuvel.
2014-05-04crypto/armcap.c: detect ARMv8 capabilities [in 32-bit build].Andy Polyakov
2013-09-15crypto/armcap.c: fix typo in rdtsc subroutine.Andy Polyakov
PR: 3125 Submitted by: Kyle McMartin
2011-10-24typoDr. Stephen Henson
2011-10-20armcap.c: auto-setup processor capability vector.Andy Polyakov
2011-07-17ARM assembler pack: add platform run-time detection.Andy Polyakov