summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorAlexei Starovoitov <ast@kernel.org>2019-05-24 18:58:59 -0700
committerAlexei Starovoitov <ast@kernel.org>2019-05-24 18:59:00 -0700
commit198ae936efdba33cf5e5cfb28986c9376179fd23 (patch)
tree3643594fcaf5b1e760a2d899c1b3627251d8b780
parenta08acd118d5ca7f6e745ef81cfc6cbadacb56462 (diff)
parent0b4de1ff19bf878eb38f4f668ee15c9b9eed4240 (diff)
Merge branch 'optimize-zext'
Jiong Wang says: ==================== v9: - Split patch 5 in v8. make bpf uapi header file sync a separate patch. (Alexei) v8: - For stack slot read, mark them as REG_LIVE_READ64. (Alexei) - Change DEF_NOT_SUBREG from -1 to 0. (Alexei) - Rebased on top of latest bpf-next. v7: - Drop the first patch in v6, the one adding 32-bit return value and argument type. (Alexei) - Rename bpf_jit_hardware_zext to bpf_jit_needs_zext. (Alexei) - Use mov32 with imm == 1 to indicate it is zext. (Alexei) - JIT back-ends peephole next insn to optimize out unnecessary zext inserted by verifier. (Alexei) - Testing: + patch set tested (bpf selftest) on x64 host with llvm 9.0 no regression observed no both JIT and interpreter modes. + patch set tested (bpf selftest) on x32 host. By Yanqing Wang, thanks! no regression observed on both JIT and interpreter modes. + patch set tested (bpf selftest) on RV64 host with llvm 9.0, By Björn Töpel, thanks! no regression observed before and after this set with JIT_ALWAYS_ON. test_progs_32 also enabled as LLVM 9.0 is used by Björn. + cross compiled the other affected targets, arm, PowerPC, SPARC, S390. v6: - Fixed s390 kbuild test robot error. (kbuild) - Make comment style in backends patches more consistent. v5: - Adjusted several test_verifier helpers to make them works on hosts w and w/o hardware zext. (Naveen) - Make sure zext flag not set when verifier by-passed, for example, libtest_bpf.ko. (Naveen) - Conservatively mark bpf main return value as 64-bit. (Alexei) - Make sure read flag is either READ64 or READ32, not the mix of both. (Alexei) - Merged patch 1 and 2 in v4. (Alexei) - Fixed kbuild test robot warning on NFP. (kbuild) - Proposed new BPF_ZEXT insn to have optimal code-gen for various JIT back-ends. - Conservately set zext flags for patched-insn. - Fixed return value zext for helper function calls. - Also adjusted test_verifier scalability unit test to avoid triggerring too many insn patch which will hang computer. - re-tested on x86 host with llvm 9.0, no regression on test_verifier, test_progs, test_progs_32. - re-tested offload target (nfp), no regression on local testsuite. v4: - added the two missing fixes which addresses two Jakub's reviewes in v3. - rebase on top of bpf-next. v3: - remove redundant check in "propagate_liveness_reg". (Jakub) - add extra check in "mark_reg_read" to prune more search. (Jakub) - re-implemented "prog_flags" passing mechanism, removed use of global switch inside libbpf. - enabled high 32-bit randomization beyond "test_verifier" and "test_progs". Now it should have been enabled for all possible tests. Re-run all tests, haven't noticed regression. - remove RFC tag. v2: - rebased on top of bpf-next master. - added comments for what is sub-register def index. (Edward, Alexei) - removed patch 1 which turns bit mask from enum to macro. (Alexei) - removed sysctl/bpf_jit_32bit_opt. (Alexei) - merged sub-register def insn index into reg state. (Alexei) - change test methodology (Alexei): + instead of simple unit tests on x86_64 for which this optimization doesn't enabled due to there is hardware support, poison high 32-bit for whose def identified as safe to do so. this could let the correctness of this patch set checked when daily bpf selftest ran which delivers very stressful test on host machine like x86_64. + hi32 poisoning is gated by a new BPF_F_TEST_RND_HI32 prog flags. + BPF_F_TEST_RND_HI32 is enabled for all tests of "test_progs" and "test_verifier", the latter needs minor tweak on two unit tests, please see the patch for the change. + introduced a new global variable "libbpf_test_mode" into libbpf. once it is set to true, it will set BPF_F_TEST_RND_HI32 for all the later PROG_LOAD syscall, the goal is to easy the enable of hi32 poison on exsiting testsuite. we could also introduce new APIs, for example "bpf_prog_test_load", then use -Dbpf_prog_load=bpf_prog_test_load to migrate tests under test_progs, but there are several load APIs, and such new API need some change on struture like "struct bpf_prog_load_attr". + removed old unit tests. it is based on insn scan and requires quite a few test_verifier generic code change. given hi32 randomization could offer good test coverage, the unit tests doesn't add much extra test value. - enhanced register width check ("is_reg64") when record sub-register write, now, it returns more accurate width. - Re-run all tests under "test_progs" and "test_verifier" on x86_64, no regression. Fixed a couple of bugs exposed: 1. ctx field size transformation was not taken into account. 2. insn patch could cause lost of original aux data which is important for ctx field conversion. 3. return value for propagate_liveness was wrong and caused regression on processed insn number. 4. helper call arg wasn't handled properly that path prune may cause 64-bit read info in pruned path lost. - Re-run Cilium bpf prog for processed-insn-number benchmarking, no regression. v1: - Fixed the missing handling on callee-saved for bpf-to-bpf call, sub-register defs therefore moved to frame state. (Jakub Kicinski) - Removed redundant "cross_reg". (Jakub Kicinski) - Various coding styles & grammar fixes. (Jakub Kicinski, Quentin Monnet) eBPF ISA specification requires high 32-bit cleared when low 32-bit sub-register is written. This applies to destination register of ALU32 etc. JIT back-ends must guarantee this semantic when doing code-gen. x86_64 and AArch64 ISA has the same semantics, so the corresponding JIT back-end doesn't need to do extra work. However, 32-bit arches (arm, x86, nfp etc.) and some other 64-bit arches (PowerPC, SPARC etc) need to do explicit zero extension to meet this requirement, otherwise code like the following will fail. u64_value = (u64) u32_value ... other uses of u64_value This is because compiler could exploit the semantic described above and save those zero extensions for extending u32_value to u64_value, these JIT back-ends are expected to guarantee this through inserting extra zero extensions which however could be a significant increase on the code size. Some benchmarks show there could be ~40% sub-register writes out of total insns, meaning at least ~40% extra code-gen. One observation is these extra zero extensions are not always necessary. Take above code snippet for example, it is possible u32_value will never be casted into a u64, the value of high 32-bit of u32_value then could be ignored and extra zero extension could be eliminated. This patch implements this idea, insns defining sub-registers will be marked when the high 32-bit of the defined sub-register matters. For those unmarked insns, it is safe to eliminate high 32-bit clearnace for them. Algo ==== We could use insn scan based static analysis to tell whether one sub-register def doesn't need zero extension. However, using such static analysis, we must do conservative assumption at branching point where multiple uses could be introduced. So, for any sub-register def that is active at branching point, we need to mark it as needing zero extension. This could introducing quite a few false alarms, for example ~25% on Cilium bpf_lxc. It will be far better to use dynamic data-flow tracing which verifier fortunately already has and could be easily extend to serve the purpose of this patch set. - Split read flags into READ32 and READ64. - Record index of insn that does sub-register write. Keep the index inside reg state and update it during verifier insn walking. - A full register read on a sub-register marks its definition insn as needing zero extension on dst register. A new sub-register write overrides the old one. - When propagating read64 during path pruning, also mark any insn defining a sub-register that is read in the pruned path as full-register. Benchmark ========= - I estimate the JITed image could be 10% ~ 30% smaller on these affected arches (nfp, arm, x32, risv, ppc, sparc, s390), depending on the prog. - For Cilium bpf_lxc, there is ~11500 insns in the compiled binary (use latest LLVM snapshot, and with -mcpu=v3 -mattr=+alu32 enabled), 4460 of them has sub-register writes (~40%). Calculated by: cat dump | grep -P "\tw" | wc -l (ALU32) cat dump | grep -P "r.*=.*u32" | wc -l (READ_W) cat dump | grep -P "r.*=.*u16" | wc -l (READ_H) cat dump | grep -P "r.*=.*u8" | wc -l (READ_B) After this patch set enabled, > 25% of those 4460 could be identified as doesn't needing zero extension on the destination, and the percentage could go further up to more than 50% with some follow up optimizations based on the infrastructure offered by this set. This leads to significant save on JITed image. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-rw-r--r--arch/arm/net/bpf_jit_32.c42
-rw-r--r--arch/powerpc/net/bpf_jit_comp64.c36
-rw-r--r--arch/riscv/net/bpf_jit_comp.c43
-rw-r--r--arch/s390/net/bpf_jit_comp.c41
-rw-r--r--arch/sparc/net/bpf_jit_comp_64.c29
-rw-r--r--arch/x86/net/bpf_jit_comp32.c83
-rw-r--r--drivers/net/ethernet/netronome/nfp/bpf/jit.c115
-rw-r--r--drivers/net/ethernet/netronome/nfp/bpf/main.h2
-rw-r--r--drivers/net/ethernet/netronome/nfp/bpf/verifier.c12
-rw-r--r--include/linux/bpf.h1
-rw-r--r--include/linux/bpf_verifier.h14
-rw-r--r--include/linux/filter.h15
-rw-r--r--include/uapi/linux/bpf.h18
-rw-r--r--kernel/bpf/core.c9
-rw-r--r--kernel/bpf/syscall.c4
-rw-r--r--kernel/bpf/verifier.c297
-rw-r--r--tools/include/uapi/linux/bpf.h18
-rw-r--r--tools/lib/bpf/bpf.c1
-rw-r--r--tools/lib/bpf/bpf.h1
-rw-r--r--tools/lib/bpf/libbpf.c3
-rw-r--r--tools/lib/bpf/libbpf.h1
-rw-r--r--tools/testing/selftests/bpf/Makefile10
-rw-r--r--tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c1
-rw-r--r--tools/testing/selftests/bpf/test_sock_addr.c1
-rw-r--r--tools/testing/selftests/bpf/test_sock_fields.c1
-rw-r--r--tools/testing/selftests/bpf/test_socket_cookie.c1
-rw-r--r--tools/testing/selftests/bpf/test_stub.c40
-rw-r--r--tools/testing/selftests/bpf/test_verifier.c31
28 files changed, 723 insertions, 147 deletions
diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index c8bfbbfdfcc3..97a6b4b2a115 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -736,7 +736,8 @@ static inline void emit_a32_alu_r64(const bool is64, const s8 dst[],
/* ALU operation */
emit_alu_r(rd[1], rs, true, false, op, ctx);
- emit_a32_mov_i(rd[0], 0, ctx);
+ if (!ctx->prog->aux->verifier_zext)
+ emit_a32_mov_i(rd[0], 0, ctx);
}
arm_bpf_put_reg64(dst, rd, ctx);
@@ -758,8 +759,9 @@ static inline void emit_a32_mov_r64(const bool is64, const s8 dst[],
struct jit_ctx *ctx) {
if (!is64) {
emit_a32_mov_r(dst_lo, src_lo, ctx);
- /* Zero out high 4 bytes */
- emit_a32_mov_i(dst_hi, 0, ctx);
+ if (!ctx->prog->aux->verifier_zext)
+ /* Zero out high 4 bytes */
+ emit_a32_mov_i(dst_hi, 0, ctx);
} else if (__LINUX_ARM_ARCH__ < 6 &&
ctx->cpu_architecture < CPU_ARCH_ARMv5TE) {
/* complete 8 byte move */
@@ -1060,17 +1062,20 @@ static inline void emit_ldx_r(const s8 dst[], const s8 src,
case BPF_B:
/* Load a Byte */
emit(ARM_LDRB_I(rd[1], rm, off), ctx);
- emit_a32_mov_i(rd[0], 0, ctx);
+ if (!ctx->prog->aux->verifier_zext)
+ emit_a32_mov_i(rd[0], 0, ctx);
break;
case BPF_H:
/* Load a HalfWord */
emit(ARM_LDRH_I(rd[1], rm, off), ctx);
- emit_a32_mov_i(rd[0], 0, ctx);
+ if (!ctx->prog->aux->verifier_zext)
+ emit_a32_mov_i(rd[0], 0, ctx);
break;
case BPF_W:
/* Load a Word */
emit(ARM_LDR_I(rd[1], rm, off), ctx);
- emit_a32_mov_i(rd[0], 0, ctx);
+ if (!ctx->prog->aux->verifier_zext)
+ emit_a32_mov_i(rd[0], 0, ctx);
break;
case BPF_DW:
/* Load a Double Word */
@@ -1359,6 +1364,11 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx)
case BPF_ALU64 | BPF_MOV | BPF_X:
switch (BPF_SRC(code)) {
case BPF_X:
+ if (imm == 1) {
+ /* Special mov32 for zext */
+ emit_a32_mov_i(dst_hi, 0, ctx);
+ break;
+ }
emit_a32_mov_r64(is64, dst, src, ctx);
break;
case BPF_K:
@@ -1438,7 +1448,8 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx)
}
emit_udivmod(rd_lo, rd_lo, rt, ctx, BPF_OP(code));
arm_bpf_put_reg32(dst_lo, rd_lo, ctx);
- emit_a32_mov_i(dst_hi, 0, ctx);
+ if (!ctx->prog->aux->verifier_zext)
+ emit_a32_mov_i(dst_hi, 0, ctx);
break;
case BPF_ALU64 | BPF_DIV | BPF_K:
case BPF_ALU64 | BPF_DIV | BPF_X:
@@ -1453,7 +1464,8 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx)
return -EINVAL;
if (imm)
emit_a32_alu_i(dst_lo, imm, ctx, BPF_OP(code));
- emit_a32_mov_i(dst_hi, 0, ctx);
+ if (!ctx->prog->aux->verifier_zext)
+ emit_a32_mov_i(dst_hi, 0, ctx);
break;
/* dst = dst << imm */
case BPF_ALU64 | BPF_LSH | BPF_K:
@@ -1488,7 +1500,8 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx)
/* dst = ~dst */
case BPF_ALU | BPF_NEG:
emit_a32_alu_i(dst_lo, 0, ctx, BPF_OP(code));
- emit_a32_mov_i(dst_hi, 0, ctx);
+ if (!ctx->prog->aux->verifier_zext)
+ emit_a32_mov_i(dst_hi, 0, ctx);
break;
/* dst = ~dst (64 bit) */
case BPF_ALU64 | BPF_NEG:
@@ -1544,11 +1557,13 @@ emit_bswap_uxt:
#else /* ARMv6+ */
emit(ARM_UXTH(rd[1], rd[1]), ctx);
#endif
- emit(ARM_EOR_R(rd[0], rd[0], rd[0]), ctx);
+ if (!ctx->prog->aux->verifier_zext)
+ emit(ARM_EOR_R(rd[0], rd[0], rd[0]), ctx);
break;
case 32:
/* zero-extend 32 bits into 64 bits */
- emit(ARM_EOR_R(rd[0], rd[0], rd[0]), ctx);
+ if (!ctx->prog->aux->verifier_zext)
+ emit(ARM_EOR_R(rd[0], rd[0], rd[0]), ctx);
break;
case 64:
/* nop */
@@ -1838,6 +1853,11 @@ void bpf_jit_compile(struct bpf_prog *prog)
/* Nothing to do here. We support Internal BPF. */
}
+bool bpf_jit_needs_zext(void)
+{
+ return true;
+}
+
struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
{
struct bpf_prog *tmp, *orig_prog = prog;
diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
index 21a1dcd4b156..0ebd946f178b 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -504,6 +504,9 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
case BPF_ALU | BPF_LSH | BPF_X: /* (u32) dst <<= (u32) src */
/* slw clears top 32 bits */
PPC_SLW(dst_reg, dst_reg, src_reg);
+ /* skip zero extension move, but set address map. */
+ if (insn_is_zext(&insn[i + 1]))
+ addrs[++i] = ctx->idx * 4;
break;
case BPF_ALU64 | BPF_LSH | BPF_X: /* dst <<= src; */
PPC_SLD(dst_reg, dst_reg, src_reg);
@@ -511,6 +514,8 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
case BPF_ALU | BPF_LSH | BPF_K: /* (u32) dst <<== (u32) imm */
/* with imm 0, we still need to clear top 32 bits */
PPC_SLWI(dst_reg, dst_reg, imm);
+ if (insn_is_zext(&insn[i + 1]))
+ addrs[++i] = ctx->idx * 4;
break;
case BPF_ALU64 | BPF_LSH | BPF_K: /* dst <<== imm */
if (imm != 0)
@@ -518,12 +523,16 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
break;
case BPF_ALU | BPF_RSH | BPF_X: /* (u32) dst >>= (u32) src */
PPC_SRW(dst_reg, dst_reg, src_reg);
+ if (insn_is_zext(&insn[i + 1]))
+ addrs[++i] = ctx->idx * 4;
break;
case BPF_ALU64 | BPF_RSH | BPF_X: /* dst >>= src */
PPC_SRD(dst_reg, dst_reg, src_reg);
break;
case BPF_ALU | BPF_RSH | BPF_K: /* (u32) dst >>= (u32) imm */
PPC_SRWI(dst_reg, dst_reg, imm);
+ if (insn_is_zext(&insn[i + 1]))
+ addrs[++i] = ctx->idx * 4;
break;
case BPF_ALU64 | BPF_RSH | BPF_K: /* dst >>= imm */
if (imm != 0)
@@ -548,6 +557,11 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
*/
case BPF_ALU | BPF_MOV | BPF_X: /* (u32) dst = src */
case BPF_ALU64 | BPF_MOV | BPF_X: /* dst = src */
+ if (imm == 1) {
+ /* special mov32 for zext */
+ PPC_RLWINM(dst_reg, dst_reg, 0, 0, 31);
+ break;
+ }
PPC_MR(dst_reg, src_reg);
goto bpf_alu32_trunc;
case BPF_ALU | BPF_MOV | BPF_K: /* (u32) dst = imm */
@@ -555,11 +569,13 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
PPC_LI32(dst_reg, imm);
if (imm < 0)
goto bpf_alu32_trunc;
+ else if (insn_is_zext(&insn[i + 1]))
+ addrs[++i] = ctx->idx * 4;
break;
bpf_alu32_trunc:
/* Truncate to 32-bits */
- if (BPF_CLASS(code) == BPF_ALU)
+ if (BPF_CLASS(code) == BPF_ALU && !fp->aux->verifier_zext)
PPC_RLWINM(dst_reg, dst_reg, 0, 0, 31);
break;
@@ -618,10 +634,13 @@ emit_clear:
case 16:
/* zero-extend 16 bits into 64 bits */
PPC_RLDICL(dst_reg, dst_reg, 0, 48);
+ if (insn_is_zext(&insn[i + 1]))
+ addrs[++i] = ctx->idx * 4;
break;
case 32:
- /* zero-extend 32 bits into 64 bits */
- PPC_RLDICL(dst_reg, dst_reg, 0, 32);
+ if (!fp->aux->verifier_zext)
+ /* zero-extend 32 bits into 64 bits */
+ PPC_RLDICL(dst_reg, dst_reg, 0, 32);
break;
case 64:
/* nop */
@@ -698,14 +717,20 @@ emit_clear:
/* dst = *(u8 *)(ul) (src + off) */
case BPF_LDX | BPF_MEM | BPF_B:
PPC_LBZ(dst_reg, src_reg, off);
+ if (insn_is_zext(&insn[i + 1]))
+ addrs[++i] = ctx->idx * 4;
break;
/* dst = *(u16 *)(ul) (src + off) */
case BPF_LDX | BPF_MEM | BPF_H:
PPC_LHZ(dst_reg, src_reg, off);
+ if (insn_is_zext(&insn[i + 1]))
+ addrs[++i] = ctx->idx * 4;
break;
/* dst = *(u32 *)(ul) (src + off) */
case BPF_LDX | BPF_MEM | BPF_W:
PPC_LWZ(dst_reg, src_reg, off);
+ if (insn_is_zext(&insn[i + 1]))
+ addrs[++i] = ctx->idx * 4;
break;
/* dst = *(u64 *)(ul) (src + off) */
case BPF_LDX | BPF_MEM | BPF_DW:
@@ -1046,6 +1071,11 @@ struct powerpc64_jit_data {
struct codegen_context ctx;
};
+bool bpf_jit_needs_zext(void)
+{
+ return true;
+}
+
struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
{
u32 proglen;
diff --git a/arch/riscv/net/bpf_jit_comp.c b/arch/riscv/net/bpf_jit_comp.c
index 80b12aa5e10d..c4c836e3d318 100644
--- a/arch/riscv/net/bpf_jit_comp.c
+++ b/arch/riscv/net/bpf_jit_comp.c
@@ -731,6 +731,7 @@ static int emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
{
bool is64 = BPF_CLASS(insn->code) == BPF_ALU64 ||
BPF_CLASS(insn->code) == BPF_JMP;
+ struct bpf_prog_aux *aux = ctx->prog->aux;
int rvoff, i = insn - ctx->prog->insnsi;
u8 rd = -1, rs = -1, code = insn->code;
s16 off = insn->off;
@@ -742,8 +743,13 @@ static int emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
/* dst = src */
case BPF_ALU | BPF_MOV | BPF_X:
case BPF_ALU64 | BPF_MOV | BPF_X:
+ if (imm == 1) {
+ /* Special mov32 for zext */
+ emit_zext_32(rd, ctx);
+ break;
+ }
emit(is64 ? rv_addi(rd, rs, 0) : rv_addiw(rd, rs, 0), ctx);
- if (!is64)
+ if (!is64 && !aux->verifier_zext)
emit_zext_32(rd, ctx);
break;
@@ -771,19 +777,19 @@ static int emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
case BPF_ALU | BPF_MUL | BPF_X:
case BPF_ALU64 | BPF_MUL | BPF_X:
emit(is64 ? rv_mul(rd, rd, rs) : rv_mulw(rd, rd, rs), ctx);
- if (!is64)
+ if (!is64 && !aux->verifier_zext)
emit_zext_32(rd, ctx);
break;
case BPF_ALU | BPF_DIV | BPF_X:
case BPF_ALU64 | BPF_DIV | BPF_X:
emit(is64 ? rv_divu(rd, rd, rs) : rv_divuw(rd, rd, rs), ctx);
- if (!is64)
+ if (!is64 && !aux->verifier_zext)
emit_zext_32(rd, ctx);
break;
case BPF_ALU | BPF_MOD | BPF_X:
case BPF_ALU64 | BPF_MOD | BPF_X:
emit(is64 ? rv_remu(rd, rd, rs) : rv_remuw(rd, rd, rs), ctx);
- if (!is64)
+ if (!is64 && !aux->verifier_zext)
emit_zext_32(rd, ctx);
break;
case BPF_ALU | BPF_LSH | BPF_X:
@@ -867,7 +873,7 @@ out_be:
case BPF_ALU | BPF_MOV | BPF_K:
case BPF_ALU64 | BPF_MOV | BPF_K:
emit_imm(rd, imm, ctx);
- if (!is64)
+ if (!is64 && !aux->verifier_zext)
emit_zext_32(rd, ctx);
break;
@@ -882,7 +888,7 @@ out_be:
emit(is64 ? rv_add(rd, rd, RV_REG_T1) :
rv_addw(rd, rd, RV_REG_T1), ctx);
}
- if (!is64)
+ if (!is64 && !aux->verifier_zext)
emit_zext_32(rd, ctx);
break;
case BPF_ALU | BPF_SUB | BPF_K:
@@ -895,7 +901,7 @@ out_be:
emit(is64 ? rv_sub(rd, rd, RV_REG_T1) :
rv_subw(rd, rd, RV_REG_T1), ctx);
}
- if (!is64)
+ if (!is64 && !aux->verifier_zext)
emit_zext_32(rd, ctx);
break;
case BPF_ALU | BPF_AND | BPF_K:
@@ -906,7 +912,7 @@ out_be:
emit_imm(RV_REG_T1, imm, ctx);
emit(rv_and(rd, rd, RV_REG_T1), ctx);
}
- if (!is64)
+ if (!is64 && !aux->verifier_zext)
emit_zext_32(rd, ctx);
break;
case BPF_ALU | BPF_OR | BPF_K:
@@ -917,7 +923,7 @@ out_be:
emit_imm(RV_REG_T1, imm, ctx);
emit(rv_or(rd, rd, RV_REG_T1), ctx);
}
- if (!is64)
+ if (!is64 && !aux->verifier_zext)
emit_zext_32(rd, ctx);
break;
case BPF_ALU | BPF_XOR | BPF_K:
@@ -928,7 +934,7 @@ out_be:
emit_imm(RV_REG_T1, imm, ctx);
emit(rv_xor(rd, rd, RV_REG_T1), ctx);
}
- if (!is64)
+ if (!is64 && !aux->verifier_zext)
emit_zext_32(rd, ctx);
break;
case BPF_ALU | BPF_MUL | BPF_K:
@@ -936,7 +942,7 @@ out_be:
emit_imm(RV_REG_T1, imm, ctx);
emit(is64 ? rv_mul(rd, rd, RV_REG_T1) :
rv_mulw(rd, rd, RV_REG_T1), ctx);
- if (!is64)
+ if (!is64 && !aux->verifier_zext)
emit_zext_32(rd, ctx);
break;
case BPF_ALU | BPF_DIV | BPF_K:
@@ -944,7 +950,7 @@ out_be:
emit_imm(RV_REG_T1, imm, ctx);
emit(is64 ? rv_divu(rd, rd, RV_REG_T1) :
rv_divuw(rd, rd, RV_REG_T1), ctx);
- if (!is64)
+ if (!is64 && !aux->verifier_zext)
emit_zext_32(rd, ctx);
break;
case BPF_ALU | BPF_MOD | BPF_K:
@@ -952,7 +958,7 @@ out_be:
emit_imm(RV_REG_T1, imm, ctx);
emit(is64 ? rv_remu(rd, rd, RV_REG_T1) :
rv_remuw(rd, rd, RV_REG_T1), ctx);
- if (!is64)
+ if (!is64 && !aux->verifier_zext)
emit_zext_32(rd, ctx);
break;
case BPF_ALU | BPF_LSH | BPF_K:
@@ -1239,6 +1245,8 @@ out_be:
emit_imm(RV_REG_T1, off, ctx);
emit(rv_add(RV_REG_T1, RV_REG_T1, rs), ctx);
emit(rv_lbu(rd, 0, RV_REG_T1), ctx);
+ if (insn_is_zext(&insn[1]))
+ return 1;
break;
case BPF_LDX | BPF_MEM | BPF_H:
if (is_12b_int(off)) {
@@ -1249,6 +1257,8 @@ out_be:
emit_imm(RV_REG_T1, off, ctx);
emit(rv_add(RV_REG_T1, RV_REG_T1, rs), ctx);
emit(rv_lhu(rd, 0, RV_REG_T1), ctx);
+ if (insn_is_zext(&insn[1]))
+ return 1;
break;
case BPF_LDX | BPF_MEM | BPF_W:
if (is_12b_int(off)) {
@@ -1259,6 +1269,8 @@ out_be:
emit_imm(RV_REG_T1, off, ctx);
emit(rv_add(RV_REG_T1, RV_REG_T1, rs), ctx);
emit(rv_lwu(rd, 0, RV_REG_T1), ctx);
+ if (insn_is_zext(&insn[1]))
+ return 1;
break;
case BPF_LDX | BPF_MEM | BPF_DW:
if (is_12b_int(off)) {
@@ -1503,6 +1515,11 @@ static void bpf_flush_icache(void *start, void *end)
flush_icache_range((unsigned long)start, (unsigned long)end);
}
+bool bpf_jit_needs_zext(void)
+{
+ return true;
+}
+
struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
{
bool tmp_blinded = false, extra_pass = false;
diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
index 5e7c63033159..e636728ab452 100644
--- a/arch/s390/net/bpf_jit_comp.c
+++ b/arch/s390/net/bpf_jit_comp.c
@@ -299,9 +299,11 @@ static inline void reg_set_seen(struct bpf_jit *jit, u32 b1)
#define EMIT_ZERO(b1) \
({ \
- /* llgfr %dst,%dst (zero extend to 64 bit) */ \
- EMIT4(0xb9160000, b1, b1); \
- REG_SET_SEEN(b1); \
+ if (!fp->aux->verifier_zext) { \
+ /* llgfr %dst,%dst (zero extend to 64 bit) */ \
+ EMIT4(0xb9160000, b1, b1); \
+ REG_SET_SEEN(b1); \
+ } \
})
/*
@@ -520,6 +522,8 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp, int i
case BPF_ALU | BPF_MOV | BPF_X: /* dst = (u32) src */
/* llgfr %dst,%src */
EMIT4(0xb9160000, dst_reg, src_reg);
+ if (insn_is_zext(&insn[1]))
+ insn_count = 2;
break;
case BPF_ALU64 | BPF_MOV | BPF_X: /* dst = src */
/* lgr %dst,%src */
@@ -528,6 +532,8 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp, int i
case BPF_ALU | BPF_MOV | BPF_K: /* dst = (u32) imm */
/* llilf %dst,imm */
EMIT6_IMM(0xc00f0000, dst_reg, imm);
+ if (insn_is_zext(&insn[1]))
+ insn_count = 2;
break;
case BPF_ALU64 | BPF_MOV | BPF_K: /* dst = imm */
/* lgfi %dst,imm */
@@ -639,6 +645,8 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp, int i
EMIT4(0xb9970000, REG_W0, src_reg);
/* llgfr %dst,%rc */
EMIT4(0xb9160000, dst_reg, rc_reg);
+ if (insn_is_zext(&insn[1]))
+ insn_count = 2;
break;
}
case BPF_ALU64 | BPF_DIV | BPF_X: /* dst = dst / src */
@@ -676,6 +684,8 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp, int i
EMIT_CONST_U32(imm));
/* llgfr %dst,%rc */
EMIT4(0xb9160000, dst_reg, rc_reg);
+ if (insn_is_zext(&insn[1]))
+ insn_count = 2;
break;
}
case BPF_ALU64 | BPF_DIV | BPF_K: /* dst = dst / imm */
@@ -864,10 +874,13 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp, int i
case 16: /* dst = (u16) cpu_to_be16(dst) */
/* llghr %dst,%dst */
EMIT4(0xb9850000, dst_reg, dst_reg);
+ if (insn_is_zext(&insn[1]))
+ insn_count = 2;
break;
case 32: /* dst = (u32) cpu_to_be32(dst) */
- /* llgfr %dst,%dst */
- EMIT4(0xb9160000, dst_reg, dst_reg);
+ if (!fp->aux->verifier_zext)
+ /* llgfr %dst,%dst */
+ EMIT4(0xb9160000, dst_reg, dst_reg);
break;
case 64: /* dst = (u64) cpu_to_be64(dst) */
break;
@@ -882,12 +895,15 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp, int i
EMIT4_DISP(0x88000000, dst_reg, REG_0, 16);
/* llghr %dst,%dst */
EMIT4(0xb9850000, dst_reg, dst_reg);
+ if (insn_is_zext(&insn[1]))
+ insn_count = 2;
break;
case 32: /* dst = (u32) cpu_to_le32(dst) */
/* lrvr %dst,%dst */
EMIT4(0xb91f0000, dst_reg, dst_reg);
- /* llgfr %dst,%dst */
- EMIT4(0xb9160000, dst_reg, dst_reg);
+ if (!fp->aux->verifier_zext)
+ /* llgfr %dst,%dst */
+ EMIT4(0xb9160000, dst_reg, dst_reg);
break;
case 64: /* dst = (u64) cpu_to_le64(dst) */
/* lrvgr %dst,%dst */
@@ -968,16 +984,22 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp, int i
/* llgc %dst,0(off,%src) */
EMIT6_DISP_LH(0xe3000000, 0x0090, dst_reg, src_reg, REG_0, off);
jit->seen |= SEEN_MEM;
+ if (insn_is_zext(&insn[1]))
+ insn_count = 2;
break;
case BPF_LDX | BPF_MEM | BPF_H: /* dst = *(u16 *)(ul) (src + off) */
/* llgh %dst,0(off,%src) */
EMIT6_DISP_LH(0xe3000000, 0x0091, dst_reg, src_reg, REG_0, off);
jit->seen |= SEEN_MEM;
+ if (insn_is_zext(&insn[1]))
+ insn_count = 2;
break;
case BPF_LDX | BPF_MEM | BPF_W: /* dst = *(u32 *)(ul) (src + off) */
/* llgf %dst,off(%src) */
jit->seen |= SEEN_MEM;
EMIT6_DISP_LH(0xe3000000, 0x0016, dst_reg, src_reg, REG_0, off);
+ if (insn_is_zext(&insn[1]))
+ insn_count = 2;
break;
case BPF_LDX | BPF_MEM | BPF_DW: /* dst = *(u64 *)(ul) (src + off) */
/* lg %dst,0(off,%src) */
@@ -1282,6 +1304,11 @@ static int bpf_jit_prog(struct bpf_jit *jit, struct bpf_prog *fp)
return 0;
}
+bool bpf_jit_needs_zext(void)
+{
+ return true;
+}
+
/*
* Compile eBPF program "fp"
*/
diff --git a/arch/sparc/net/bpf_jit_comp_64.c b/arch/sparc/net/bpf_jit_comp_64.c
index 65428e79b2f3..3364e2a00989 100644
--- a/arch/sparc/net/bpf_jit_comp_64.c
+++ b/arch/sparc/net/bpf_jit_comp_64.c
@@ -908,6 +908,8 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx)
/* dst = src */
case BPF_ALU | BPF_MOV | BPF_X:
emit_alu3_K(SRL, src, 0, dst, ctx);
+ if (insn_is_zext(&insn[1]))
+ return 1;
break;
case BPF_ALU64 | BPF_MOV | BPF_X:
emit_reg_move(src, dst, ctx);
@@ -942,6 +944,8 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx)
case BPF_ALU | BPF_DIV | BPF_X:
emit_write_y(G0, ctx);
emit_alu(DIV, src, dst, ctx);
+ if (insn_is_zext(&insn[1]))
+ return 1;
break;
case BPF_ALU64 | BPF_DIV | BPF_X:
emit_alu(UDIVX, src, dst, ctx);
@@ -975,6 +979,8 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx)
break;
case BPF_ALU | BPF_RSH | BPF_X:
emit_alu(SRL, src, dst, ctx);
+ if (insn_is_zext(&insn[1]))
+ return 1;
break;
case BPF_ALU64 | BPF_RSH | BPF_X:
emit_alu(SRLX, src, dst, ctx);
@@ -997,9 +1003,12 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx)
case 16:
emit_alu_K(SLL, dst, 16, ctx);
emit_alu_K(SRL, dst, 16, ctx);
+ if (insn_is_zext(&insn[1]))
+ return 1;
break;
case 32:
- emit_alu_K(SRL, dst, 0, ctx);
+ if (!ctx->prog->aux->verifier_zext)
+ emit_alu_K(SRL, dst, 0, ctx);
break;
case 64:
/* nop */
@@ -1021,6 +1030,8 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx)
emit_alu3_K(AND, dst, 0xff, dst, ctx);
emit_alu3_K(SLL, tmp, 8, tmp, ctx);
emit_alu(OR, tmp, dst, ctx);
+ if (insn_is_zext(&insn[1]))
+ return 1;
break;
case 32:
@@ -1037,6 +1048,8 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx)
emit_alu3_K(AND, dst, 0xff, dst, ctx); /* dst = dst & 0xff */
emit_alu3_K(SLL, dst, 24, dst, ctx); /* dst = dst << 24 */
emit_alu(OR, tmp, dst, ctx); /* dst = dst | tmp */
+ if (insn_is_zext(&insn[1]))
+ return 1;
break;
case 64:
@@ -1050,6 +1063,8 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx)
/* dst = imm */
case BPF_ALU | BPF_MOV | BPF_K:
emit_loadimm32(imm, dst, ctx);
+ if (insn_is_zext(&insn[1]))
+ return 1;
break;
case BPF_ALU64 | BPF_MOV | BPF_K:
emit_loadimm_sext(imm, dst, ctx);
@@ -1132,6 +1147,8 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx)
break;
case BPF_ALU | BPF_RSH | BPF_K:
emit_alu_K(SRL, dst, imm, ctx);
+ if (insn_is_zext(&insn[1]))
+ return 1;
break;
case BPF_ALU64 | BPF_RSH | BPF_K:
emit_alu_K(SRLX, dst, imm, ctx);
@@ -1144,7 +1161,8 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx)
break;
do_alu32_trunc:
- if (BPF_CLASS(code) == BPF_ALU)
+ if (BPF_CLASS(code) == BPF_ALU &&
+ !ctx->prog->aux->verifier_zext)
emit_alu_K(SRL, dst, 0, ctx);
break;
@@ -1265,6 +1283,8 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx)
rs2 = RS2(tmp);
}
emit(opcode | RS1(src) | rs2 | RD(dst), ctx);
+ if (opcode != LD64 && insn_is_zext(&insn[1]))
+ return 1;
break;
}
/* ST: *(size *)(dst + off) = imm */
@@ -1432,6 +1452,11 @@ static void jit_fill_hole(void *area, unsigned int size)
*ptr++ = 0x91d02005; /* ta 5 */
}
+bool bpf_jit_needs_zext(void)
+{
+ return true;
+}
+
struct sparc64_jit_data {
struct bpf_binary_header *header;
u8 *image;
diff --git a/arch/x86/net/bpf_jit_comp32.c b/arch/x86/net/bpf_jit_comp32.c
index b29e82f190c7..133433d181ba 100644
--- a/arch/x86/net/bpf_jit_comp32.c
+++ b/arch/x86/net/bpf_jit_comp32.c
@@ -253,13 +253,14 @@ static inline void emit_ia32_mov_r(const u8 dst, const u8 src, bool dstk,
/* dst = src */
static inline void emit_ia32_mov_r64(const bool is64, const u8 dst[],
const u8 src[], bool dstk,
- bool sstk, u8 **pprog)
+ bool sstk, u8 **pprog,
+ const struct bpf_prog_aux *aux)
{
emit_ia32_mov_r(dst_lo, src_lo, dstk, sstk, pprog);
if (is64)
/* complete 8 byte move */
emit_ia32_mov_r(dst_hi, src_hi, dstk, sstk, pprog);
- else
+ else if (!aux->verifier_zext)
/* zero out high 4 bytes */
emit_ia32_mov_i(dst_hi, 0, dstk, pprog);
}