summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)Author
2019-10-17selftest/bpf: Remove test_libbpf.sh and test_libbpf_openAndrii Nakryiko
test_progs is much more sophisticated superset of tests compared to test_libbpf.sh and test_libbpf_open. Remove test_libbpf.sh and test_libbpf_open. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191016060051.2024182-8-andriin@fb.com
2019-10-17selftests/bpf: Move test_queue_stack_map.h into progs/ where it belongsAndrii Nakryiko
test_queue_stack_map.h is used only from BPF programs. Thus it should be part of progs/ subdir. An added benefit of moving it there is that new TEST_RUNNER_DEFINE_RULES macro-rule will properly capture dependency on this header for all BPF objects and trigger re-build, if it changes. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191016060051.2024182-7-andriin@fb.com
2019-10-17selftests/bpf: Replace test_progs and test_maps w/ general ruleAndrii Nakryiko
Define test runner generation meta-rule that codifies dependencies between test runner, its tests, and its dependent BPF programs. Use that for defining test_progs and test_maps test-runners. Also additionally define 2 flavors of test_progs: - alu32, which builds BPF programs with 32-bit registers codegen; - bpf_gcc, which build BPF programs using GCC, if it supports BPF target. Overall, this is accomplished through $(eval)'ing a set of generic rules, which defines Makefile targets dynamically at runtime. See comments explaining the need for 2 $(evals), though. For each test runner we have (test_maps and test_progs, currently), and, optionally, their flavors, the logic of build process is modeled as follows (using test_progs as an example): - all BPF objects are in progs/: - BPF object's .o file is built into output directory from corresponding progs/.c file; - all BPF objects in progs/*.c depend on all progs/*.h headers; - all BPF objects depend on bpf_*.h helpers from libbpf (but not libbpf archive). There is an extra rule to trigger bpf_helper_defs.h (re-)build, if it's not present/outdated); - build recipe for BPF object can be re-defined per test runner/flavor; - test files are built from prog_tests/*.c: - all such test file objects are built on individual file basis; - currently, every single test file depends on all BPF object files; this might be improved in follow up patches to do 1-to-1 dependency, but allowing to customize this per each individual test; - each test runner definition can specify a list of extra .c and .h files to be built along test files and test runner binary; all such headers are becoming automatic dependency of each test .c file; - due to test files sometimes embedding (using .incbin assembly directive) contents of some BPF objects at compilation time, which are expected to be in CWD of compiler, compilation for test file object does cd into test runner's output directory; to support this mode all the include paths are turned into absolute paths using $(abspath) make function; - prog_tests/test.h is automatically (re-)generated with an entry for each .c file in prog_tests/; - final test runner binary is linked together from test object files and extra object files, linking together libbpf's archive as well; - it's possible to specify extra "resource" files/targets, which will be copied into test runner output directory, if it differes from Makefile-wide $(OUTPUT). This is used to ensure btf_dump test cases and urandom_read binary is put into a test runner's CWD for tests to find them in runtime. For flavored test runners, their output directory is a subdirectory of common Makefile-wide $(OUTPUT) directory with flavor name used as subdirectory name. BPF objects targets might be reused between different test runners, so extra checks are employed to not double-define them. Similarly, we have redefinition guards for output directories and test headers. test_verifier follows slightly different patterns and is simple enough to not justify generalizing TEST_RUNNER_DEFINE/TEST_RUNNER_DEFINE_RULES further to accomodate these differences. Instead, rules for test_verifier are minimized and simplified, while preserving correctness of dependencies. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191016060051.2024182-6-andriin@fb.com
2019-10-17selftests/bpf: Add simple per-test targets to MakefileAndrii Nakryiko
Currently it's impossible to do `make test_progs` and have only test_progs be built, because all the binary targets are defined in terms of $(OUTPUT)/<binary>, and $(OUTPUT) is absolute path to current directory (or whatever gets overridden to by user). This patch adds simple re-directing targets for all test targets making it possible to do simple and nice `make test_progs` (and any other target). Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191016060051.2024182-5-andriin@fb.com
2019-10-17selftests/bpf: Switch test_maps to test_progs' test.h formatAndrii Nakryiko
Make test_maps use tests.h header format consistent with the one used by test_progs, to facilitate unification. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191016060051.2024182-4-andriin@fb.com
2019-10-17selftests/bpf: Make CO-RE reloc test impartial to test_progs flavorAndrii Nakryiko
test_core_reloc_kernel test captures its own process name and validates it as part of the test. Given extra "flavors" of test_progs, this break for anything by default test_progs binary. Fix the test to cut out flavor part of the process name. Fixes: ee2eb063d330 ("selftests/bpf: Add BPF_CORE_READ and BPF_CORE_READ_STR_INTO macro tests") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191016060051.2024182-3-andriin@fb.com
2019-10-17selftests/bpf: Teach test_progs to cd into subdirAndrii Nakryiko
We are building a bunch of "flavors" of test_progs, e.g., w/ alu32 flag for Clang when building BPF object. test_progs setup is relying on having all the BPF object files and extra resources to be available in current working directory, though. But we actually build all these files into a separate sub-directory. Next set of patches establishes convention of naming "flavored" test_progs (and test runner binaries in general) as test_progs-flavor (e.g., test_progs-alu32), for each such extra flavor. This patch teaches test_progs binary to automatically detect its own extra flavor based on its argv[0], and if present, to change current directory to a flavor-specific subdirectory. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191016060051.2024182-2-andriin@fb.com
2019-10-17selftests/bpf: Restore the netns after flow dissector reattach testJakub Sitnicki
flow_dissector_reattach test changes the netns we run in but does not restore it to the one we started in when finished. This interferes with tests that run after it. Fix it by restoring the netns when done. Fixes: f97eea1756f3 ("selftests/bpf: Check that flow dissector can be re-attached") Reported-by: Alexei Starovoitov <ast@kernel.org> Reported-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191017083752.30999-1-jakub@cloudflare.com
2019-10-17Merge branch 'bpf-btf-trace'Daniel Borkmann
Alexei Starovoitov says: ==================== v2->v3: - while trying to adopt btf-based tracing in production service realized that disabling bpf_probe_read() was premature. The real tracing program needs to see much more than this type safe tracking can provide. With these patches the verifier will be able to see that skb->data is a pointer to 'u8 *', but it cannot possibly know how many bytes of it is readable. Hence bpf_probe_read() is necessary to do basic packet reading from tracing program. Some helper can be introduced to solve this particular problem, but there are other similar structures. Another issue is bitfield reading. The support for bitfields is coming to llvm. libbpf will be supporting it eventually as well, but there will be corner cases where bpf_probe_read() is necessary. The long term goal is still the same: get rid of probe_read eventually. - fixed build issue with clang reported by Nathan Chancellor. - addressed a ton of comments from Andrii. bitfields and arrays are explicitly unsupported in btf-based tracking. This will be improved in the future. Right now the verifier is more strict than necessary. In some cases it can fall back to 'scalar' instead of rejecting the program, but rejection today allows to make better decisions in the future. - adjusted testcase to demo bitfield and skb->data reading. v1->v2: - addressed feedback from Andrii and Eric. Thanks a lot for review! - added missing check at raw_tp attach time. - Andrii noticed that expected_attach_type cannot be reused. Had to introduce new field to bpf_attr. - cleaned up logging nicely by introducing bpf_log() helper. - rebased. Revolutionize bpf tracing and bpf C programming. C language allows any pointer to be typecasted to any other pointer or convert integer to a pointer. Though bpf verifier is operating at assembly level it has strict type checking for fixed number of types. Known types are defined in 'enum bpf_reg_type'. For example: PTR_TO_FLOW_KEYS is a pointer to 'struct bpf_flow_keys' PTR_TO_SOCKET is a pointer to 'struct bpf_sock', and so on. When it comes to bpf tracing there are no types to track. bpf+kprobe receives 'struct pt_regs' as input. bpf+raw_tracepoint receives raw kernel arguments as an array of u64 values. It was up to bpf program to interpret these integers. Typical tracing program looks like: int bpf_prog(struct pt_regs *ctx) { struct net_device *dev; struct sk_buff *skb; int ifindex; skb = (struct sk_buff *) ctx->di; bpf_probe_read(&dev, sizeof(dev), &skb->dev); bpf_probe_read(&ifindex, sizeof(ifindex), &dev->ifindex); } Addressing mistakes will not be caught by C compiler or by the verifier. The program above could have typecasted ctx->si to skb and page faulted on every bpf_probe_read(). bpf_probe_read() allows reading any address and suppresses page faults. Typical program has hundreds of bpf_probe_read() calls to walk kernel data structures. Not only tracing program would be slow, but there was always a risk that bpf_probe_read() would read mmio region of memory and cause unpredictable hw behavior. With introduction of Compile Once Run Everywhere technology in libbpf and in LLVM and BPF Type Format (BTF) the verifier is finally ready for the next step in program verification. Now it can use in-kernel BTF to type check bpf assembly code. Equivalent program will look like: struct trace_kfree_skb { struct sk_buff *skb; void *location; }; SEC("raw_tracepoint/kfree_skb") int trace_kfree_skb(struct trace_kfree_skb* ctx) { struct sk_buff *skb = ctx->skb; struct net_device *dev; int ifindex; __builtin_preserve_access_index(({ dev = skb->dev; ifindex = dev->ifindex; })); } These patches teach bpf verifier to recognize kfree_skb's first argument as 'struct sk_buff *' because this is what kernel C code is doing. The bpf program cannot 'cheat' and say that the first argument to kfree_skb raw_tracepoint is some other type. The verifier will catch such type mismatch between bpf program assumption of kernel code and the actual type in the kernel. Furthermore skb->dev access is type tracked as well. The verifier can see which field of skb is being read in bpf assembly. It will match offset to type. If bpf program has code: struct net_device *dev = (void *)skb->len; C compiler will not complain and generate bpf assembly code, but the verifier will recognize that integer 'len' field is being accessed at offsetof(struct sk_buff, len) and will reject further dereference of 'dev' variable because it contains integer value instead of a pointer. Such sophisticated type tracking allows calling networking bpf helpers from tracing programs. This patchset allows calling bpf_skb_event_output() that dumps skb data into perf ring buffer. It greatly improves observability. Now users can not only see packet lenth of the skb about to be freed in kfree_skb() kernel function, but can dump it to user space via perf ring buffer using bpf helper that was previously available only to TC and socket filters. See patch 10 for full example. The end result is safer and faster bpf tracing. Safer - because type safe direct load can be used most of the time instead of bpf_probe_read(). Faster - because direct loads are used to walk kernel data structures instead of bpf_probe_read() calls. Note that such loads can page fault and are supported by hidden bpf_probe_read() in interpreter and via exception table if program is JITed. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-10-17selftests/bpf: Add kfree_skb raw_tp testAlexei Starovoitov
Load basic cls_bpf program. Load raw_tracepoint program and attach to kfree_skb raw tracepoint. Trigger cls_bpf via prog_test_run. At the end of test_run kernel will call kfree_skb which will trigger trace_kfree_skb tracepoint. Which will call our raw_tracepoint program. Which will take that skb and will dump it into perf ring buffer. Check that user space received correct packet. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191016032505.2089704-12-ast@kernel.org
2019-10-17bpf: Check types of arguments passed into helpersAlexei Starovoitov
Introduce new helper that reuses existing skb perf_event output implementation, but can be called from raw_tracepoint programs that receive 'struct sk_buff *' as tracepoint argument or can walk other kernel data structures to skb pointer. In order to do that teach verifier to resolve true C types of bpf helpers into in-kernel BTF ids. The type of kernel pointer passed by raw tracepoint into bpf program will be tracked by the verifier all the way until it's passed into helper function. For example: kfree_skb() kernel function calls trace_kfree_skb(skb, loc); bpf programs receives that skb pointer and may eventually pass it into bpf_skb_output() bpf helper which in-kernel is implemented via bpf_skb_event_output() kernel function. Its first argument in the kernel is 'struct sk_buff *'. The verifier makes sure that types match all the way. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191016032505.2089704-11-ast@kernel.org
2019-10-17bpf: Add support for BTF pointers to x86 JITAlexei Starovoitov
Pointer to BTF object is a pointer to kernel object or NULL. Such pointers can only be used by BPF_LDX instructions. The verifier changed their opcode from LDX|MEM|size to LDX|PROBE_MEM|size to make JITing easier. The number of entries in extable is the number of BPF_LDX insns that access kernel memory via "pointer to BTF type". Only these load instructions can fault. Since x86 extable is relative it has to be allocated in the same memory region as JITed code. Allocate it prior to last pass of JITing and let the last pass populate it. Pointer to extable in bpf_prog_aux is necessary to make page fault handling fast. Page fault handling is done in two steps: 1. bpf_prog_kallsyms_find() finds BPF program that page faulted. It's done by walking rb tree. 2. then extable for given bpf program is binary searched. This process is similar to how page faulting is done for kernel modules. The exception handler skips over faulting x86 instruction and initializes destination register with zero. This mimics exact behavior of bpf_probe_read (when probe_kernel_read faults dest is zeroed). JITs for other architectures can add support in similar way. Until then they will reject unknown opcode and fallback to interpreter. Since extable should be aligned and placed near JITed code make bpf_jit_binary_alloc() return 4 byte aligned image offset, so that extable aligning formula in bpf_int_jit_compile() doesn't need to rely on internal implementation of bpf_jit_binary_alloc(). On x86 gcc defaults to 16-byte alignment for regular kernel functions due to better performance. JITed code may be aligned to 16 in the future, but it will use 4 in the meantime. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191016032505.2089704-10-ast@kernel.org
2019-10-17bpf: Add support for BTF pointers to interpreterAlexei Starovoitov
Pointer to BTF object is a pointer to kernel object or NULL. The memory access in the interpreter has to be done via probe_kernel_read to avoid page faults. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191016032505.2089704-9-ast@kernel.org
2019-10-17bpf: Attach raw_tp program with BTF via type nameAlexei Starovoitov
BTF type id specified at program load time has all necessary information to attach that program to raw tracepoint. Use kernel type name to find raw tracepoint. Add missing CHECK_ATTR() condition. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191016032505.2089704-8-ast@kernel.org
2019-10-17bpf: Implement accurate raw_tp context access via BTFAlexei Starovoitov
libbpf analyzes bpf C program, searches in-kernel BTF for given type name and stores it into expected_attach_type. The kernel verifier expects this btf_id to point to something like: typedef void (*btf_trace_kfree_skb)(void *, struct sk_buff *skb, void *loc); which represents signature of raw_tracepoint "kfree_skb". Then btf_ctx_access() matches ctx+0 access in bpf program with 'skb' and 'ctx+8' access with 'loc' arguments of "kfree_skb" tracepoint. In first case it passes btf_id of 'struct sk_buff *' back to the verifier core and 'void *' in second case. Then the verifier tracks PTR_TO_BTF_ID as any other pointer type. Like PTR_TO_SOCKET points to 'struct bpf_sock', PTR_TO_TCP_SOCK points to 'struct bpf_tcp_sock', and so on. PTR_TO_BTF_ID points to in-kernel structs. If 1234 is btf_id of 'struct sk_buff' in vmlinux's BTF then PTR_TO_BTF_ID#1234 points to one of in kernel skbs. When PTR_TO_BTF_ID#1234 is dereferenced (like r2 = *(u64 *)r1 + 32) the btf_struct_access() checks which field of 'struct sk_buff' is at offset 32. Checks that size of access matches type definition of the field and continues to track the dereferenced type. If that field was a pointer to 'struct net_device' the r2's type will be PTR_TO_BTF_ID#456. Where 456 is btf_id of 'struct net_device' in vmlinux's BTF. Such verifier analysis prevents "cheating" in BPF C program. The program cannot cast arbitrary pointer to 'struct sk_buff *' and access it. C compiler would allow type cast, of course, but the verifier will notice type mismatch based on BPF assembly and in-kernel BTF. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191016032505.2089704-7-ast@kernel.org
2019-10-17libbpf: Auto-detect btf_id of BTF-based raw_tracepointsAlexei Starovoitov
It's a responsiblity of bpf program author to annotate the program with SEC("tp_btf/name") where "name" is a valid raw tracepoint. The libbpf will try to find "name" in vmlinux BTF and error out in case vmlinux BTF is not available or "name" is not found. If "name" is indeed a valid raw tracepoint then in-kernel BTF will have "btf_trace_##name" typedef that points to function prototype of that raw tracepoint. BTF description captures exact argument the kernel C code is passing into raw tracepoint. The kernel verifier will check the types while loading bpf program. libbpf keeps BTF type id in expected_attach_type, but since kernel ignores this attribute for tracing programs copy it into attach_btf_id attribute before loading. Later the kernel will use prog->attach_btf_id to select raw tracepoint during bpf_raw_tracepoint_open syscall command. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191016032505.2089704-6-ast@kernel.org
2019-10-17bpf: Add attach_btf_id attribute to program loadAlexei Starovoitov
Add attach_btf_id attribute to prog_load command. It's similar to existing expected_attach_type attribute which is used in several cgroup based program types. Unfortunately expected_attach_type is ignored for tracing programs and cannot be reused for new purpose. Hence introduce attach_btf_id to verify bpf programs against given in-kernel BTF type id at load time. It is strictly checked to be valid for raw_tp programs only. In a later patches it will become: btf_id == 0 semantics of existing raw_tp progs. btd_id > 0 raw_tp with BTF and additional type safety. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191016032505.2089704-5-ast@kernel.org
2019-10-17bpf: Process in-kernel BTFAlexei Starovoitov
If in-kernel BTF exists parse it and prepare 'struct btf *btf_vmlinux' for further use by the verifier. In-kernel BTF is trusted just like kallsyms and other build artifacts embedded into vmlinux. Yet run this BTF image through BTF verifier to make sure that it is valid and it wasn't mangled during the build. If in-kernel BTF is incorrect it means either gcc or pahole or kernel are buggy. In such case disallow loading BPF programs. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191016032505.2089704-4-ast@kernel.org
2019-10-17bpf: Add typecast to bpf helpers to help BTF generationAlexei Starovoitov
When pahole converts dwarf to btf it emits only used types. Wrap existing bpf helper functions into typedef and use it in typecast to make gcc emits this type into dwarf. Then pahole will convert it to btf. The "btf_#name_of_helper" types will be used to figure out types of arguments of bpf helpers. The generated code before and after is the same. Only dwarf and btf sections are different. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191016032505.2089704-3-ast@kernel.org
2019-10-17bpf: Add typecast to raw_tracepoints to help BTF generationAlexei Starovoitov
When pahole converts dwarf to btf it emits only used types. Wrap existing __bpf_trace_##template() function into btf_trace_##template typedef and use it in type cast to make gcc emits this type into dwarf. Then pahole will convert it to btf. The "btf_trace_" prefix will be used to identify BTF enabled raw tracepoints. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191016032505.2089704-2-ast@kernel.org
2019-10-16bpf/stackmap: Fix deadlock with rq_lock in bpf_get_stack()Song Liu
bpf stackmap with build-id lookup (BPF_F_STACK_BUILD_ID) can trigger A-A deadlock on rq_lock(): rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [...] Call Trace: try_to_wake_up+0x1ad/0x590 wake_up_q+0x54/0x80 rwsem_wake+0x8a/0xb0 bpf_get_stack+0x13c/0x150 bpf_prog_fbdaf42eded9fe46_on_event+0x5e3/0x1000 bpf_overflow_handler+0x60/0x100 __perf_event_overflow+0x4f/0xf0 perf_swevent_overflow+0x99/0xc0 ___perf_sw_event+0xe7/0x120 __schedule+0x47d/0x620 schedule+0x29/0x90 futex_wait_queue_me+0xb9/0x110 futex_wait+0x139/0x230 do_futex+0x2ac/0xa50 __x64_sys_futex+0x13c/0x180 do_syscall_64+0x42/0x100 entry_SYSCALL_64_after_hwframe+0x44/0xa9 This can be reproduced by: 1. Start a multi-thread program that does parallel mmap() and malloc(); 2. taskset the program to 2 CPUs; 3. Attach bpf program to trace_sched_switch and gather stackmap with build-id, e.g. with trace.py from bcc tools: trace.py -U -p <pid> -s <some-bin,some-lib> t:sched:sched_switch A sample reproducer is attached at the end. This could also trigger deadlock with other locks that are nested with rq_lock. Fix this by checking whether irqs are disabled. Since rq_lock and all other nested locks are irq safe, it is safe to do up_read() when irqs are not disable. If the irqs are disabled, postpone up_read() in irq_work. Fixes: 615755a77b24 ("bpf: extend stackmap to save binary_build_id+offset instead of address") Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191014171223.357174-1-songliubraving@fb.com Reproducer: ============================ 8< ============================ char *filename; void *worker(void *p) { void *ptr; int fd; char *pptr; fd = open(filename, O_RDONLY); if (fd < 0) return NULL; while (1) { struct timespec ts = {0, 1000 + rand() % 2000}; ptr = mmap(NULL, 4096 * 64, PROT_READ, MAP_PRIVATE, fd, 0); usleep(1); if (ptr == MAP_FAILED) { printf("failed to mmap\n"); break; } munmap(ptr, 4096 * 64); usleep(1); pptr = malloc(1); usleep(1); pptr[0] = 1; usleep(1); free(pptr); usleep(1); nanosleep(&ts, NULL); } close(fd); return NULL; } int main(int argc, char *argv[]) { void *ptr; int i; pthread_t threads[THREAD_COUNT]; if (argc < 2) return 0; filename = argv[1]; for (i = 0; i < THREAD_COUNT; i++) { if (pthread_create(threads + i, NULL, worker, NULL)) { fprintf(stderr, "Error creating thread\n"); return 0; } } for (i = 0; i < THREAD_COUNT; i++) pthread_join(threads[i], NULL); return 0; } ============================ 8< ============================
2019-10-16scripts/bpf: Emit an #error directive known types list needs updatingJakub Sitnicki
Make the compiler report a clear error when bpf_helpers_doc.py needs updating rather than rely on the fact that Clang fails to compile English: ../../../lib/bpf/bpf_helper_defs.h:2707:1: error: unknown type name 'Unrecognized' Unrecognized type 'struct bpf_inet_lookup', please add it to known types! Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191016085811.11700-1-jakub@cloudflare.com
2019-10-15selftests: bpf: Don't try to read files without read permissionJiri Pirko
Recently couple of files that are write only were added to netdevsim debugfs. Don't read these files and avoid error. Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-15selftests: bpf: Add selftest for __sk_buff tstampStanislav Fomichev
Make sure BPF_PROG_TEST_RUN accepts tstamp and exports any modifications that BPF program does. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191015183125.124413-2-sdf@google.com
2019-10-15bpf: Allow __sk_buff tstamp in BPF_PROG_TEST_RUNStanislav Fomichev
It's useful for implementing EDT related tests (set tstamp, run the test, see how the tstamp is changed or observe some other parameter). Note that bpf_ktime_get_ns() helper is using monotonic clock, so for the BPF programs that compare tstamp against it, tstamp should be derived from clock_gettime(CLOCK_MONOTONIC, ...). Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191015183125.124413-1-sdf@google.com
2019-10-15Merge branch 'libbpf-field-existence'Alexei Starovoitov
Andrii Nakryiko says: ==================== This patch set generalizes libbpf's CO-RE relocation support. In addition to existing field's byte offset relocation, libbpf now supports field existence relocations, which are emitted by Clang when using __builtin_preserve_field_info(<field>, BPF_FIELD_EXISTS). A convenience bpf_core_field_exists() macro is added to bpf_core_read.h BPF-side header, along the bpf_field_info_kind enum containing currently supported types of field information libbpf supports. This list will grow as libbpf gains support for other relo kinds. This patch set upgrades the format of .BTF.ext's relocation record to match latest Clang's format (12 -> 16 bytes). This is not a breaking change, as the previous format hasn't been released yet as part of official Clang version release. v1->v2: - unify bpf_field_info_kind enum and naming changes (Alexei); - added bpf_core_field_exists() to bpf_core_read.h. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-10-15selftests/bpf: Add field existence CO-RE relocs testsAndrii Nakryiko
Add a bunch of tests validating CO-RE is handling field existence relocation. Relaxed CO-RE relocation mode is activated for these new tests to prevent libbpf from rejecting BPF object for no-match relocation, even though test BPF program is not going to use that relocation, if field is missing. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191015182849.3922287-6-andriin@fb.com
2019-10-15libbpf: Add BPF-side definitions of supported field relocation kindsAndrii Nakryiko
Add enum definition for Clang's __builtin_preserve_field_info() second argument (info_kind). Currently only byte offset and existence are supported. Corresponding Clang changes introducing this built-in can be found at [0] [0] https://reviews.llvm.org/D67980 Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191015182849.3922287-5-andriin@fb.com
2019-10-15libbpf: Add support for field existance CO-RE relocationAndrii Nakryiko
Add support for BPF_FRK_EXISTS relocation kind to detect existence of captured field in a destination BTF, allowing conditional logic to handle incompatible differences between kernels. Also introduce opt-in relaxed CO-RE relocation handling option, which makes libbpf emit warning for failed relocations, but proceed with other relocations. Instruction, for which relocation failed, is patched with (u32)-1 value. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191015182849.3922287-4-andriin@fb.com
2019-10-15libbpf: Refactor bpf_object__open APIs to use common optsAndrii Nakryiko
Refactor all the various bpf_object__open variations to ultimately specify common bpf_object_open_opts struct. This makes it easy to keep extending this common struct w/ extra parameters without having to update all the legacy APIs. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191015182849.3922287-3-andriin@fb.com
2019-10-15libbpf: Update BTF reloc support to latest Clang formatAndrii Nakryiko
BTF offset reloc was generalized in recent Clang into field relocation, capturing extra u32 field, specifying what aspect of captured field needs to be relocated. This changes .BTF.ext's record size for this relocation from 12 bytes to 16 bytes. Given these format changes happened in Clang before official released version, it's ok to not support outdated 12-byte record size w/o breaking ABI. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191015182849.3922287-2-andriin@fb.com
2019-10-15net: Update address for vrf and l3mdev in MAINTAINERSDavid Ahern
Use my kernel.org address for all entries in MAINTAINERS. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-15Merge branch 'Scatter-gather-SPI-for-SJA1105-DSA'David S. Miller
Vladimir Oltean says: ==================== Scatter/gather SPI for SJA1105 DSA This is a small series that reduces the stack memory usage for the sja1105 driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-15net: dsa: sja1105: Switch to scatter/gather API for SPIVladimir Oltean
This reworks the SPI transfer implementation to make use of more of the SPI core features. The main benefit is to avoid the memcpy in sja1105_xfer_buf(). The memcpy was only needed because the function was transferring a single buffer at a time. So it needed to copy the caller-provided buffer at buf + 4, to store the SPI message header in the "headroom" area. But the SPI core supports scatter-gather messages, comprised of multiple transfers. We can actually use those to break apart every SPI message into 2 transfers: one for the header and one for the actual payload. To keep the behavior the same regarding the chip select signal, it is necessary to tell the SPI core to de-assert the chip select after each chunk. This was not needed before, because each spi_message contained only 1 single transfer. The meaning of the per-transfer cs_change=1 is: - If the transfer is the last one of the message, keep CS asserted - Otherwise, deassert CS We need to deassert CS in the "otherwise" case, which was implicit before. Avoiding the memcpy creates yet another opportunity. The device can't process more than 256 bytes of SPI payload at a time, so the sja1105_xfer_long_buf() function used to exist, to split the larger caller buffer into chunks. But these chunks couldn't be used as scatter/gather buffers for spi_message until now, because of that memcpy (we would have needed more memory for each chunk). So we can now remove the sja1105_xfer_long_buf() function and have a single implementation for long and short buffers. Another benefit is lower usage of stack memory. Previously we had to store 2 SPI buffers for each chunk. Due to the elimination of the memcpy, we can now send pointers to the actual chunks from the caller-supplied buffer to the SPI core. Since the patch merges two functions into a rewritten implementation, the function prototype was also changed, mainly for cosmetic consistency with the structures used within it. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-15net: dsa: sja1105: Move sja1105_spi_transfer into sja1105_xferVladimir Oltean
This is a cosmetic patch that reduces some boilerplate in the SPI interaction of the driver. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-15net: b44: remove redundant assignment to variable regColin Ian King
The variable reg is being assigned a value that is never read and is being re-assigned in the following for-loop. The assignment is redundant and hence can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-14Merge branch 'PTP-driver-refactoring-for-SJA1105-DSA'David S. Miller
Vladimir Oltean says: ==================== PTP driver refactoring for SJA1105 DSA This series creates a better separation between the driver core and the PTP portion. Therefore, users who are not interested in PTP can get a simpler and smaller driver by compiling it out. This is in preparation for further patches: SPI transfer timestamping, synchronizing the hardware clock (as opposed to keeping it free-running), PPS input/output, etc. ==================== Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-14net: dsa: sja1105: Change the PTP command access patternVladimir Oltean
The PTP command register contains enable bits for: - Putting the 64-bit PTPCLKVAL register in add/subtract or write mode - Taking timestamps off of the corrected vs free-running clock - Starting/stopping the TTEthernet scheduling - Starting/stopping PPS output - Resetting the switch When a command needs to be issued (e.g. "change the PTPCLKVAL from write mode to add/subtract mode"), one cannot simply write to the command register setting the PTPCLKADD bit to 1, because that would zeroize the other settings. One also cannot do a read-modify-write (that would be too easy for this hardware) because not all bits of the command register are readable over SPI. So this leaves us with the only option of keeping the value of the PTP command register in the driver, and operating on that. Actually there are 2 types of PTP operations now: - Operations that modify the cached PTP command. These operate on ptp_data->cmd as a pointer. - Operations that apply all previously cached PTP settings, but don't otherwise cache what they did themselves. The sja1105_ptp_reset function is such an example. It copies the ptp_data->cmd on stack before modifying and writing it to SPI. This practically means that struct sja1105_ptp_cmd is no longer an implementation detail, since it needs to be stored in full into struct sja1105_ptp_data, and hence in struct sja1105_private. So the (*ptp_cmd) function prototype can change and take struct sja1105_ptp_cmd as second argument now. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-14net: dsa: sja1105: Move PTP data to its own private structureVladimir Oltean
This is a non-functional change with 2 goals (both for the case when CONFIG_NET_DSA_SJA1105_PTP is not enabled): - Reduce the size of the sja1105_private structure. - Make the PTP code more self-contained. Leaving priv->ptp_data.lock to be initialized in sja1105_main.c is not a leftover: it will be used in a future patch "net: dsa: sja1105: Restore PTP time after switch reset". Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-14net: dsa: sja1105: Make all public PTP functions take dsa_switch as argumentVladimir Oltean
The new rule (as already started for sja1105_tas.h) is for functions of optional driver components (ones which may be disabled via Kconfig - PTP and TAS) to take struct dsa_switch *ds instead of struct sja1105_private *priv as first argument. This is so that forward-declarations of struct sja1105_private can be avoided. So make sja1105_ptp.h the second user of this rule. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-14net: dsa: sja1105: Get rid of global declaration of struct ptp_clock_infoVladimir Oltean
We need priv->ptp_caps to hold a structure and not just a pointer, because we use container_of in the various PTP callbacks. Therefore, the sja1105_ptp_caps structure declared in the global memory of the driver serves no further purpose after copying it into priv->ptp_caps. So just populate priv->ptp_caps with the needed operations and remove sja1105_ptp_caps. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Alexei Starovoitov says: ==================== pull-request: bpf-next 2019-10-14 The following pull-request contains BPF updates for your *net-next* tree. 12 days of development and 85 files changed, 1889 insertions(+), 1020 deletions(-) The main changes are: 1) auto-generation of bpf_helper_defs.h, from Andrii. 2) split of bpf_helpers.h into bpf_{helpers, helper_defs, endian, tracing}.h and move into libbpf, from Andrii. 3) Track contents of read-only maps as scalars in the verifier, from Andrii. 4) small x86 JIT optimization, from Daniel. 5) cross compilation support, from Ivan. 6) bpf flow_dissector enhancements, from Jakub and Stanislav. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-13Merge tag 'mac80211-next-for-net-next-2019-10-11' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== A few more small things, nothing really stands out: * minstrel improvements from Felix * a TX aggregation simplification * some additional capabilities for hwsim * minor cleanups & docs updates ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-13genetlink: do not parse attributes for families with zero maxattrMichal Kubecek
Commit c10e6cf85e7d ("net: genetlink: push attrbuf allocation and parsing to a separate function") moved attribute buffer allocation and attribute parsing from genl_family_rcv_msg_doit() into a separate function genl_family_rcv_msg_attrs_parse() which, unlike the previous code, calls __nlmsg_parse() even if family->maxattr is 0 (i.e. the family does its own parsing). The parser error is ignored and does not propagate out of genl_family_rcv_msg_attrs_parse() but an error message ("Unknown attribute type") is set in extack and if further processing generates no error or warning, it stays there and is interpreted as a warning by userspace. Dumpit requests are not affected as genl_family_rcv_msg_dumpit() bypasses the call of genl_family_rcv_msg_attrs_parse() if family->maxattr is zero. Move this logic inside genl_family_rcv_msg_attrs_parse() so that we don't have to handle it in each caller. v3: put the check inside genl_family_rcv_msg_attrs_parse() v2: adjust also argument of genl_family_rcv_msg_attrs_free() Fixes: c10e6cf85e7d ("net: genetlink: push attrbuf allocation and parsing to a separate function") Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-13tcp: improve recv_skip_hint for tcp_zerocopy_receiveSoheil Hassas Yeganeh
tcp_zerocopy_receive() rounds down the zc->length a multiple of PAGE_SIZE. This results in two issues: - tcp_zerocopy_receive sets recv_skip_hint to the length of the receive queue if the zc->length input is smaller than the PAGE_SIZE, even though the data in receive queue could be zerocopied. - tcp_zerocopy_receive would set recv_skip_hint of 0, in cases where we have a little bit of data after the perfectly-sized packets. To fix these issues, do not store the rounded down value in zc->length. Round down the length passed to zap_page_range(), and return min(inq, zc->length) when the zap_range is 0. Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-12Merge branch 'selftests-bpf-Makefile-cleanup'Alexei Starovoitov
Andrii Nakryiko says: ==================== Patch #1 enforces libbpf build to have bpf_helper_defs.h ready before test BPF programs are built. Patch #2 drops obsolete BTF/pahole detection logic from Makefile. v1->v2: - drop CPU and PROBE (Martin). ==================== Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-10-12selftests/bpf: Remove obsolete pahole/BTF support detectionAndrii Nakryiko
Given lots of selftests won't work without recent enough Clang/LLVM that fully supports BTF, there is no point in maintaining outdated BTF support detection and fall-back to pahole logic. Just assume we have everything we need. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191011220146.3798961-3-andriin@fb.com
2019-10-12selftests/bpf: Enforce libbpf build before BPF programs are builtAndrii Nakryiko
Given BPF programs rely on libbpf's bpf_helper_defs.h, which is auto-generated during libbpf build, libbpf build has to happen before we attempt progs/*.c build. Enforce it as order-only dependency. Fixes: 24f25763d6de ("libbpf: auto-generate list of BPF helper definitions") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191011220146.3798961-2-andriin@fb.com
2019-10-12Merge branch 'samples-cross-compile'Alexei Starovoitov
Ivan Khoronzhuk says: ==================== This series contains mainly fixes/improvements for cross-compilation but not only, tested for arm, arm64, and intended for any arch. Also verified on native build (not cross compilation) for x86_64 and arm, arm64. Initial RFC link: https://lkml.org/lkml/2019/8/29/1665 Prev. version: https://lkml.org/lkml/2019/10/9/1045 Besides the patches given here, the RFC also contains couple patches related to llvm clang arm: include: asm: swab: mask rev16 instruction for clang arm: include: asm: unified: mask .syntax unified for clang They are necessarily to verify arm 32 build. Also, couple more fixes were added but are not merged in bpf-next yet, they can be needed for verification/configuration steps, if not in your tree the fixes can be taken here: https://www.spinics.net/lists/netdev/msg601716.html https://www.spinics.net/lists/netdev/msg601714.html https://www.spinics.net/lists/linux-kbuild/msg23468.html Now, to build samples, SAMPLE_BPF should be enabled in config. The change touches not only cross-compilation and can have impact on other archs and build environments, so might be good idea to verify it in order to add appropriate changes, some warn options could be tuned also. All is tested on x86-64 with clang installed (has to be built containing targets for arm, arm64..., see llc --version, usually it's present already) Instructions to test native on x86_64 ================================================= Native build on x86_64 is done in usual way and shouldn't have difference except HOSTCC is now printed as CC wile building the samples. Instructions to test cross compilation on arm64 ================================================= gcc version 8.3.0 (GNU Toolchain for the A-profile Architecture 8.3-2019.03 (arm-rel-8.36)) I've used sdk for TI am65x