summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)Author
2020-05-21net: sgi: ioc3-eth: Fix return value check in ioc3eth_probe()Tang Bin
In the function devm_platform_ioremap_resource(), if get resource failed, the return value is ERR_PTR() not NULL. Thus it must be replaced by IS_ERR(), or else it may result in crashes if a critical error path is encountered. Fixes: 0ce5ebd24d25 ("mfd: ioc3: Add driver for SGI IOC3 chip") Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: Tang Bin <tangbin@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-21net: don't return invalid table id error when we fall back to PF_UNSPECSabrina Dubroca
In case we can't find a ->dumpit callback for the requested (family,type) pair, we fall back to (PF_UNSPEC,type). In effect, we're in the same situation as if userspace had requested a PF_UNSPEC dump. For RTM_GETROUTE, that handler is rtnl_dump_all, which calls all the registered RTM_GETROUTE handlers. The requested table id may or may not exist for all of those families. commit ae677bbb4441 ("net: Don't return invalid table id error when dumping all families") fixed the problem when userspace explicitly requests a PF_UNSPEC dump, but missed the fallback case. For example, when we pass ipv6.disable=1 to a kernel with CONFIG_IP_MROUTE=y and CONFIG_IP_MROUTE_MULTIPLE_TABLES=y, the (PF_INET6, RTM_GETROUTE) handler isn't registered, so we end up in rtnl_dump_all, and listing IPv6 routes will unexpectedly print: # ip -6 r Error: ipv4: MR table does not exist. Dump terminated commit ae677bbb4441 introduced the dump_all_families variable, which gets set when userspace requests a PF_UNSPEC dump. However, we can't simply set the family to PF_UNSPEC in rtnetlink_rcv_msg in the fallback case to get dump_all_families == true, because some messages types (for example RTM_GETRULE and RTM_GETNEIGH) only register the PF_UNSPEC handler and use the family to filter in the kernel what is dumped to userspace. We would then export more entries, that userspace would have to filter. iproute does that, but other programs may not. Instead, this patch removes dump_all_families and updates the RTM_GETROUTE handlers to check if the family that is being dumped is their own. When it's not, which covers both the intentional PF_UNSPEC dumps (as dump_all_families did) and the fallback case, ignore the missing table id error. Fixes: cb167893f41e ("net: Plumb support for filtering ipv4 and ipv6 multicast route dumps") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-21net: ipip: fix wrong address family in init error pathVadim Fedorenko
In case of error with MPLS support the code is misusing AF_INET instead of AF_MPLS. Fixes: 1b69e7e6c4da ("ipip: support MPLS over IPv4") Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-21Merge branch 'net-tls-fix-encryption-error-path'David S. Miller
Vadim Fedorenko says: ==================== net/tls: fix encryption error path The problem with data stream corruption was found in KTLS transmit path with small socket send buffers and large amount of data. bpf_exec_tx_verdict() frees open record on any type of error including EAGAIN, ENOMEM and ENOSPC while callers are able to recover this transient errors. Also wrong error code was returned to user space in that case. This patchset fixes the problems. ==================== Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-21net/tls: free record only on encryption errorVadim Fedorenko
We cannot free record on any transient error because it leads to losing previos data. Check socket error to know whether record must be freed or not. Fixes: d10523d0b3d7 ("net/tls: free the record on encryption error") Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-21net/tls: fix encryption error checkingVadim Fedorenko
bpf_exec_tx_verdict() can return negative value for copied variable. In that case this value will be pushed back to caller and the real error code will be lost. Fix it using signed type and checking for positive value. Fixes: d10523d0b3d7 ("net/tls: free the record on encryption error") Fixes: d3b18ad31f93 ("tls: add bpf support to sk_msg handling") Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-21Merge branch 'net-ethernet-ti-fix-some-return-value-check'David S. Miller
Wei Yongjun says: ==================== net: ethernet: ti: fix some return value check This patchset convert cpsw_ale_create() to return PTR_ERR() only, and changed all the caller to check IS_ERR() instead of NULL. Since v2: 1) rebased on net.git, as Jakub's suggest 2) split am65-cpsw-nuss.c changes, as Grygorii's suggest ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-21net: ethernet: ti: am65-cpsw-nuss: fix error handling of am65_cpsw_nuss_probeWei Yongjun
Convert to using IS_ERR() instead of NULL test for cpsw_ale_create() error handling. Also fix to return negative error code from this error handling case instead of 0 in. Fixes: 93a76530316a ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-21net: ethernet: ti: fix some return value check of cpsw_ale_create()Wei Yongjun
cpsw_ale_create() can return both NULL and PTR_ERR(), but all of the caller only check NULL for error handling. This patch convert it to only return PTR_ERR() in all error cases, and the caller using IS_ERR() instead of NULL test. Fixes: 4b41d3436796 ("net: ethernet: ti: cpsw: allow untagged traffic on host port") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-21net: qrtr: Fix passing invalid reference to qrtr_local_enqueue()Manivannan Sadhasivam
Once the traversal of the list is completed with list_for_each_entry(), the iterator (node) will point to an invalid object. So passing this to qrtr_local_enqueue() which is outside of the iterator block is erroneous eventhough the object is not used. So fix this by passing NULL to qrtr_local_enqueue(). Fixes: bdabad3e363d ("net: Add Qualcomm IPC router") Reported-by: kbuild test robot <lkp@intel.com> Reported-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-21ethtool: count header size in reply size estimateMichal Kubecek
As ethnl_request_ops::reply_size handlers do not include common header size into calculated/estimated reply size, it needs to be added in ethnl_default_doit() and ethnl_default_notify() before allocating the message. On the other hand, strset_reply_size() should not add common header size. Fixes: 728480f12442 ("ethtool: default handlers for GET requests") Reported-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-20net: nlmsg_cancel() if put fails for nhmsgStephen Worley
Fixes data remnant seen when we fail to reserve space for a nexthop group during a larger dump. If we fail the reservation, we goto nla_put_failure and cancel the message. Reproduce with the following iproute2 commands: ===================== ip link add dummy1 type dummy ip link add dummy2 type dummy ip link add dummy3 type dummy ip link add dummy4 type dummy ip link add dummy5 type dummy ip link add dummy6 type dummy ip link add dummy7 type dummy ip link add dummy8 type dummy ip link add dummy9 type dummy ip link add dummy10 type dummy ip link add dummy11 type dummy ip link add dummy12 type dummy ip link add dummy13 type dummy ip link add dummy14 type dummy ip link add dummy15 type dummy ip link add dummy16 type dummy ip link add dummy17 type dummy ip link add dummy18 type dummy ip link add dummy19 type dummy ip link add dummy20 type dummy ip link add dummy21 type dummy ip link add dummy22 type dummy ip link add dummy23 type dummy ip link add dummy24 type dummy ip link add dummy25 type dummy ip link add dummy26 type dummy ip link add dummy27 type dummy ip link add dummy28 type dummy ip link add dummy29 type dummy ip link add dummy30 type dummy ip link add dummy31 type dummy ip link add dummy32 type dummy ip link set dummy1 up ip link set dummy2 up ip link set dummy3 up ip link set dummy4 up ip link set dummy5 up ip link set dummy6 up ip link set dummy7 up ip link set dummy8 up ip link set dummy9 up ip link set dummy10 up ip link set dummy11 up ip link set dummy12 up ip link set dummy13 up ip link set dummy14 up ip link set dummy15 up ip link set dummy16 up ip link set dummy17 up ip link set dummy18 up ip link set dummy19 up ip link set dummy20 up ip link set dummy21 up ip link set dummy22 up ip link set dummy23 up ip link set dummy24 up ip link set dummy25 up ip link set dummy26 up ip link set dummy27 up ip link set dummy28 up ip link set dummy29 up ip link set dummy30 up ip link set dummy31 up ip link set dummy32 up ip link set dummy33 up ip link set dummy34 up ip link set vrf-red up ip link set vrf-blue up ip link set dummyVRFred up ip link set dummyVRFblue up ip ro add 1.1.1.1/32 dev dummy1 ip ro add 1.1.1.2/32 dev dummy2 ip ro add 1.1.1.3/32 dev dummy3 ip ro add 1.1.1.4/32 dev dummy4 ip ro add 1.1.1.5/32 dev dummy5 ip ro add 1.1.1.6/32 dev dummy6 ip ro add 1.1.1.7/32 dev dummy7 ip ro add 1.1.1.8/32 dev dummy8 ip ro add 1.1.1.9/32 dev dummy9 ip ro add 1.1.1.10/32 dev dummy10 ip ro add 1.1.1.11/32 dev dummy11 ip ro add 1.1.1.12/32 dev dummy12 ip ro add 1.1.1.13/32 dev dummy13 ip ro add 1.1.1.14/32 dev dummy14 ip ro add 1.1.1.15/32 dev dummy15 ip ro add 1.1.1.16/32 dev dummy16 ip ro add 1.1.1.17/32 dev dummy17 ip ro add 1.1.1.18/32 dev dummy18 ip ro add 1.1.1.19/32 dev dummy19 ip ro add 1.1.1.20/32 dev dummy20 ip ro add 1.1.1.21/32 dev dummy21 ip ro add 1.1.1.22/32 dev dummy22 ip ro add 1.1.1.23/32 dev dummy23 ip ro add 1.1.1.24/32 dev dummy24 ip ro add 1.1.1.25/32 dev dummy25 ip ro add 1.1.1.26/32 dev dummy26 ip ro add 1.1.1.27/32 dev dummy27 ip ro add 1.1.1.28/32 dev dummy28 ip ro add 1.1.1.29/32 dev dummy29 ip ro add 1.1.1.30/32 dev dummy30 ip ro add 1.1.1.31/32 dev dummy31 ip ro add 1.1.1.32/32 dev dummy32 ip next add id 1 via 1.1.1.1 dev dummy1 ip next add id 2 via 1.1.1.2 dev dummy2 ip next add id 3 via 1.1.1.3 dev dummy3 ip next add id 4 via 1.1.1.4 dev dummy4 ip next add id 5 via 1.1.1.5 dev dummy5 ip next add id 6 via 1.1.1.6 dev dummy6 ip next add id 7 via 1.1.1.7 dev dummy7 ip next add id 8 via 1.1.1.8 dev dummy8 ip next add id 9 via 1.1.1.9 dev dummy9 ip next add id 10 via 1.1.1.10 dev dummy10 ip next add id 11 via 1.1.1.11 dev dummy11 ip next add id 12 via 1.1.1.12 dev dummy12 ip next add id 13 via 1.1.1.13 dev dummy13 ip next add id 14 via 1.1.1.14 dev dummy14 ip next add id 15 via 1.1.1.15 dev dummy15 ip next add id 16 via 1.1.1.16 dev dummy16 ip next add id 17 via 1.1.1.17 dev dummy17 ip next add id 18 via 1.1.1.18 dev dummy18 ip next add id 19 via 1.1.1.19 dev dummy19 ip next add id 20 via 1.1.1.20 dev dummy20 ip next add id 21 via 1.1.1.21 dev dummy21 ip next add id 22 via 1.1.1.22 dev dummy22 ip next add id 23 via 1.1.1.23 dev dummy23 ip next add id 24 via 1.1.1.24 dev dummy24 ip next add id 25 via 1.1.1.25 dev dummy25 ip next add id 26 via 1.1.1.26 dev dummy26 ip next add id 27 via 1.1.1.27 dev dummy27 ip next add id 28 via 1.1.1.28 dev dummy28 ip next add id 29 via 1.1.1.29 dev dummy29 ip next add id 30 via 1.1.1.30 dev dummy30 ip next add id 31 via 1.1.1.31 dev dummy31 ip next add id 32 via 1.1.1.32 dev dummy32 i=100 while [ $i -le 200 ] do ip next add id $i group 1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19 echo $i ((i++)) done ip next add id 999 group 1/2/3/4/5/6 ip next ls ======================== Fixes: ab84be7e54fc ("net: Initial nexthop code") Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-20ax25: fix setsockopt(SO_BINDTODEVICE)Eric Dumazet
syzbot was able to trigger this trace [1], probably by using a zero optlen. While we are at it, cap optlen to IFNAMSIZ - 1 instead of IFNAMSIZ. [1] BUG: KMSAN: uninit-value in strnlen+0xf9/0x170 lib/string.c:569 CPU: 0 PID: 8807 Comm: syz-executor483 Not tainted 5.7.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x220 lib/dump_stack.c:118 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:121 __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215 strnlen+0xf9/0x170 lib/string.c:569 dev_name_hash net/core/dev.c:207 [inline] netdev_name_node_lookup net/core/dev.c:277 [inline] __dev_get_by_name+0x75/0x2b0 net/core/dev.c:778 ax25_setsockopt+0xfa3/0x1170 net/ax25/af_ax25.c:654 __compat_sys_setsockopt+0x4ed/0x910 net/compat.c:403 __do_compat_sys_setsockopt net/compat.c:413 [inline] __se_compat_sys_setsockopt+0xdd/0x100 net/compat.c:410 __ia32_compat_sys_setsockopt+0x62/0x80 net/compat.c:410 do_syscall_32_irqs_on arch/x86/entry/common.c:339 [inline] do_fast_syscall_32+0x3bf/0x6d0 arch/x86/entry/common.c:398 entry_SYSENTER_compat+0x68/0x77 arch/x86/entry/entry_64_compat.S:139 RIP: 0023:0xf7f57dd9 Code: 90 e8 0b 00 00 00 f3 90 0f ae e8 eb f9 8d 74 26 00 89 3c 24 c3 90 90 90 90 90 90 90 90 90 90 90 90 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 eb 0d 90 90 90 90 90 90 90 90 90 90 90 90 RSP: 002b:00000000ffae8c1c EFLAGS: 00000217 ORIG_RAX: 000000000000016e RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000000101 RDX: 0000000000000019 RSI: 0000000020000000 RDI: 0000000000000004 RBP: 0000000000000012 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Local variable ----devname@ax25_setsockopt created at: ax25_setsockopt+0xe6/0x1170 net/ax25/af_ax25.c:536 ax25_setsockopt+0xe6/0x1170 net/ax25/af_ax25.c:536 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-20Merge branch 'wireguard-fixes'David S. Miller
Jason A. Donenfeld says: ==================== wireguard fixes for 5.7-rc7 Hopefully these are the last fixes for 5.7: 1) A trivial bump in the selftest harness to support gcc-10. build.wireguard.com is still on gcc-9 but I'll probably switch to gcc-10 in the coming weeks. 2) A concurrency fix regarding userspace modifying the pre-shared key at the same time as packets are being processed, reported by Matt Dunwoodie. 3) We were previously clearing skb->hash on egress, which broke fq_codel, cake, and other things that actually make use of the flow hash for queueing, reported by Dave Taht and Toke Høiland-Jørgensen. 4) A fix for the increased memory usage caused by (3). This can be thought of as part of patch (3), but because of the separate reasoning and breadth of it I thought made it a bit cleaner to put in a standalone commit. Fixes (2), (3), and (4) are -stable material. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-20wireguard: noise: separate receive counter from send counterJason A. Donenfeld
In "wireguard: queueing: preserve flow hash across packet scrubbing", we were required to slightly increase the size of the receive replay counter to something still fairly small, but an increase nonetheless. It turns out that we can recoup some of the additional memory overhead by splitting up the prior union type into two distinct types. Before, we used the same "noise_counter" union for both sending and receiving, with sending just using a simple atomic64_t, while receiving used the full replay counter checker. This meant that most of the memory being allocated for the sending counter was being wasted. Since the old "noise_counter" type increased in size in the prior commit, now is a good time to split up that union type into a distinct "noise_replay_ counter" for receiving and a boring atomic64_t for sending, each using neither more nor less memory than required. Also, since sometimes the replay counter is accessed without necessitating additional accesses to the bitmap, we can reduce cache misses by hoisting the always-necessary lock above the bitmap in the struct layout. We also change a "noise_replay_counter" stack allocation to kmalloc in a -DDEBUG selftest so that KASAN doesn't trigger a stack frame warning. All and all, removing a bit of abstraction in this commit makes the code simpler and smaller, in addition to the motivating memory usage recuperation. For example, passing around raw "noise_symmetric_key" structs is something that really only makes sense within noise.c, in the one place where the sending and receiving keys can safely be thought of as the same type of object; subsequent to that, it's important that we uniformly access these through keypair->{sending,receiving}, where their distinct roles are always made explicit. So this patch allows us to draw that distinction clearly as well. Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-20wireguard: queueing: preserve flow hash across packet scrubbingJason A. Donenfeld
It's important that we clear most header fields during encapsulation and decapsulation, because the packet is substantially changed, and we don't want any info leak or logic bug due to an accidental correlation. But, for encapsulation, it's wrong to clear skb->hash, since it's used by fq_codel and flow dissection in general. Without it, classification does not proceed as usual. This change might make it easier to estimate the number of innerflows by examining clustering of out of order packets, but this shouldn't open up anything that can't already be inferred otherwise (e.g. syn packet size inference), and fq_codel can be disabled anyway. Furthermore, it might be the case that the hash isn't used or queried at all until after wireguard transmits the encrypted UDP packet, which means skb->hash might still be zero at this point, and thus no hash taken over the inner packet data. In order to address this situation, we force a calculation of skb->hash before encrypting packet data. Of course this means that fq_codel might transmit packets slightly more out of order than usual. Toke did some testing on beefy machines with high quantities of parallel flows and found that increasing the reply-attack counter to 8192 takes care of the most pathological cases pretty well. Reported-by: Dave Taht <dave.taht@gmail.com> Reviewed-and-tested-by: Toke Høiland-Jørgensen <toke@toke.dk> Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-20wireguard: noise: read preshared key while taking lockJason A. Donenfeld
Prior we read the preshared key after dropping the handshake lock, which isn't an actual crypto issue if it races, but it's still not quite correct. So copy that part of the state into a temporary like we do with the rest of the handshake state variables. Then we can release the lock, operate on the temporary, and zero it out at the end of the function. In performance tests, the impact of this was entirely unnoticable, probably because those bytes are coming from the same cacheline as other things that are being copied out in the same manner. Reported-by: Matt Dunwoodie <ncon@noconroy.net> Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-20wireguard: selftests: use newer iproute2 for gcc-10Jason A. Donenfeld
gcc-10 switched to defaulting to -fno-common, which broke iproute2-5.4. This was fixed in iproute-5.6, so switch to that. Because we're after a stable testing surface, we generally don't like to bump these unnecessarily, but in this case, being able to actually build is a basic necessity. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-19sctp: Don't add the shutdown timer if its already been addedNeil Horman
This BUG halt was reported a while back, but the patch somehow got missed: PID: 2879 TASK: c16adaa0 CPU: 1 COMMAND: "sctpn" #0 [f418dd28] crash_kexec at c04a7d8c #1 [f418dd7c] oops_end at c0863e02 #2 [f418dd90] do_invalid_op at c040aaca #3 [f418de28] error_code (via invalid_op) at c08631a5 EAX: f34baac0 EBX: 00000090 ECX: f418deb0 EDX: f5542950 EBP: 00000000 DS: 007b ESI: f34ba800 ES: 007b EDI: f418dea0 GS: 00e0 CS: 0060 EIP: c046fa5e ERR: ffffffff EFLAGS: 00010286 #4 [f418de5c] add_timer at c046fa5e #5 [f418de68] sctp_do_sm at f8db8c77 [sctp] #6 [f418df30] sctp_primitive_SHUTDOWN at f8dcc1b5 [sctp] #7 [f418df48] inet_shutdown at c080baf9 #8 [f418df5c] sys_shutdown at c079eedf #9 [f418df70] sys_socketcall at c079fe88 EAX: ffffffda EBX: 0000000d ECX: bfceea90 EDX: 0937af98 DS: 007b ESI: 0000000c ES: 007b EDI: b7150ae4 SS: 007b ESP: bfceea7c EBP: bfceeaa8 GS: 0033 CS: 0073 EIP: b775c424 ERR: 00000066 EFLAGS: 00000282 It appears that the side effect that starts the shutdown timer was processed multiple times, which can happen as multiple paths can trigger it. This of course leads to the BUG halt in add_timer getting called. Fix seems pretty straightforward, just check before the timer is added if its already been started. If it has mod the timer instead to min(current expiration, new expiration) Its been tested but not confirmed to fix the problem, as the issue has only occured in production environments where test kernels are enjoined from being installed. It appears to be a sane fix to me though. Also, recentely, Jere found a reproducer posted on list to confirm that this resolves the issues Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Vlad Yasevich <vyasevich@gmail.com> CC: "David S. Miller" <davem@davemloft.net> CC: jere.leppanen@nokia.com CC: marcelo.leitner@gmail.com CC: netdev@vger.kernel.org Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-19__netif_receive_skb_core: pass skb by referenceBoris Sukholitko
__netif_receive_skb_core may change the skb pointer passed into it (e.g. in rx_handler). The original skb may be freed as a result of this operation. The callers of __netif_receive_skb_core may further process original skb by using pt_prev pointer returned by __netif_receive_skb_core thus leading to unpleasant effects. The solution is to pass skb by reference into __netif_receive_skb_core. v2: Added Fixes tag and comment regarding ppt_prev and skb invariant. Fixes: 88eb1944e18c ("net: core: propagate SKB lists through packet_type lookup") Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-19net: inet_csk: Fix so_reuseport bind-address cache in tb->fast*Martin KaFai Lau
The commit 637bc8bbe6c0 ("inet: reset tb->fastreuseport when adding a reuseport sk") added a bind-address cache in tb->fast*. The tb->fast* caches the address of a sk which has successfully been binded with SO_REUSEPORT ON. The idea is to avoid the expensive conflict search in inet_csk_bind_conflict(). There is an issue with wildcard matching where sk_reuseport_match() should have returned false but it is currently returning true. It ends up hiding bind conflict. For example, bind("[::1]:443"); /* without SO_REUSEPORT. Succeed. */ bind("[::2]:443"); /* with SO_REUSEPORT. Succeed. */ bind("[::]:443"); /* with SO_REUSEPORT. Still Succeed where it shouldn't */ The last bind("[::]:443") with SO_REUSEPORT on should have failed because it should have a conflict with the very first bind("[::1]:443") which has SO_REUSEPORT off. However, the address "[::2]" is cached in tb->fast* in the second bind. In the last bind, the sk_reuseport_match() returns true because the binding sk's wildcard addr "[::]" matches with the "[::2]" cached in tb->fast*. The correct bind conflict is reported by removing the second bind such that tb->fast* cache is not involved and forces the bind("[::]:443") to go through the inet_csk_bind_conflict(): bind("[::1]:443"); /* without SO_REUSEPORT. Succeed. */ bind("[::]:443"); /* with SO_REUSEPORT. -EADDRINUSE */ The expected behavior for sk_reuseport_match() is, it should only allow the "cached" tb->fast* address to be used as a wildcard match but not the address of the binding sk. To do that, the current "bool match_wildcard" arg is split into "bool match_sk1_wildcard" and "bool match_sk2_wildcard". This change only affects the sk_reuseport_match() which is only used by inet_csk (e.g. TCP). The other use cases are calling inet_rcv_saddr_equal() and this patch makes it pass the same "match_wildcard" arg twice to the "ipv[46]_rcv_saddr_equal(..., match_wildcard, match_wildcard)". Cc: Josef Bacik <jbacik@fb.com> Fixes: 637bc8bbe6c0 ("inet: reset tb->fastreuseport when adding a reuseport sk") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-19r8152: support additional Microsoft Surface Ethernet Adapter variantMarc Payne
Device id 0927 is the RTL8153B-based component of the 'Surface USB-C to Ethernet and USB Adapter' and may be used as a component of other devices in future. Tested and working with the r8152 driver. Update the cdc_ether blacklist due to the RTL8153 'network jam on suspend' issue which this device will cause (personally confirmed). Signed-off-by: Marc Payne <marc.payne@mdpsys.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-19mptcp: use rightmost 64 bits in ADD_ADDR HMACTodd Malsbary
This changes the HMAC used in the ADD_ADDR option from the leftmost 64 bits to the rightmost 64 bits as described in RFC 8684, section 3.4.1. This issue was discovered while adding support to packetdrill for the ADD_ADDR v1 option. Fixes: 3df523ab582c ("mptcp: Add ADD_ADDR handling") Signed-off-by: Todd Malsbary <todd.malsbary@linux.intel.com> Acked-by: Christoph Paasch <cpaasch@apple.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-19Merge tag 'wireless-drivers-2020-05-19' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers Kalle Valo says: ==================== wireless-drivers fixes for v5.7 Third and most likely the last set of fixes for v5.7. Only one iwlwifi fix this time. iwlwifi * another fix for QuZ device configuration ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-19net: bmac: Fix read of MAC address from ROMJeremy Kerr
In bmac_get_station_address, We're reading two bytes at a time from ROM, but we do that six times, resulting in 12 bytes of read & writes. This means we will write off the end of the six-byte destination buffer. This change fixes the for-loop to only read/write six bytes. Based on a proposed fix from Finn Thain <fthain@telegraphics.com.au>. Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Reported-by: Stan Johnson <userm57@yahoo.com> Tested-by: Stan Johnson <userm57@yahoo.com> Reported-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-18net sched: fix reporting the first-time use timestampRoman Mashak
When a new action is installed, firstuse field of 'tcf_t' is explicitly set to 0. Value of zero means "new action, not yet used"; as a packet hits the action, 'firstuse' is stamped with the current jiffies value. tcf_tm_dump() should return 0 for firstuse if action has not yet been hit. Fixes: 48d8ee1694dd ("net sched actions: aggregate dumping of actions timeinfo") Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Roman Mashak <mrv@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-17net: phy: propagate an error back to the callers of phy_sfp_probeLeon Romanovsky
The compilation warning below reveals that the errors returned from the sfp_bus_add_upstream() call are not propagated to the callers. Fix it by returning "ret". 14:37:51 drivers/net/phy/phy_device.c: In function 'phy_sfp_probe': 14:37:51 drivers/net/phy/phy_device.c:1236:6: warning: variable 'ret' set but not used [-Wunused-but-set-variable] 14:37:51 1236 | int ret; 14:37:51 | ^~~ Fixes: 298e54fa810e ("net: phy: add core phylib sfp support") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-17net: revert "net: get rid of an signed integer overflow in ip_idents_reserve()"Yuqi Jin
Commit adb03115f459 ("net: get rid of an signed integer overflow in ip_idents_reserve()") used atomic_cmpxchg to replace "atomic_add_return" inside the function "ip_idents_reserve". The reason was to avoid UBSAN warning. However, this change has caused performance degrade and in GCC-8, fno-strict-overflow is now mapped to -fwrapv -fwrapv-pointer and signed integer overflow is now undefined by default at all optimization levels[1]. Moreover, it was a bug in UBSAN vs -fwrapv /-fno-strict-overflow, so Let's revert it safely. [1] https://gcc.gnu.org/gcc-8/changes.html Suggested-by: Peter Zijlstra <peterz@infradead.org> Suggested-by: Eric Dumazet <edumazet@google.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jiri Pirko <jiri@resnulli.us> Cc: Arvind Sankar <nivedita@alum.mit.edu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Jiong Wang <jiongwang@huawei.com> Signed-off-by: Yuqi Jin <jinyuqi@huawei.com> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-17dt-bindings: net: dsa: b53: Add missing size and address cells to exampleKurt Kanzenbach
Add the missing size and address cells to the b53 example. Otherwise, it may not compile or issue warnings if directly copied into a device tree. Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-17nexthop: Fix attribute checking for groupsDavid Ahern
For nexthop groups, attributes after NHA_GROUP_TYPE are invalid, but nh_check_attr_group starts checking at NHA_GROUP. The group type defaults to multipath and the NHA_GROUP_TYPE is currently optional so this has slipped through so far. Fix the attribute checking to handle support of new group types. Fixes: 430a049190de ("nexthop: Add support for nexthop groups") Signed-off-by: ASSOGBA Emery <assogba.emery@gmail.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-16net: ipa: don't be a hog in gsi_channel_poll()Alex Elder
The iteration count value used in gsi_channel_poll() is intended to limit poll iterations to the budget supplied as an argument. But it's never updated. Fix this bug by incrementing the count each time through the loop. Reported-by: Sharath Chandra Vurukala <sharathv@codeaurora.com> Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-16net: dsa: mt7530: fix roaming from DSA user portsDENG Qingfang
When a client moves from a DSA user port to a software port in a bridge, it cannot reach any other clients that connected to the DSA user ports. That is because SA learning on the CPU port is disabled, so the switch ignores the client's frames from the CPU port and still thinks it is at the user port. Fix it by enabling SA learning on the CPU port. To prevent the switch from learning from flooding frames from the CPU port, set skb->offload_fwd_mark to 1 for unicast and broadcast frames, and let the switch flood them instead of trapping to the CPU port. Multicast frames still need to be trapped to the CPU port for snooping, so set the SA_DIS bit of the MTK tag to 1 when transmitting those frames to disable SA learning. Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch") Signed-off-by: DENG Qingfang <dqfext@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-16ipv6: Fix suspicious RCU usage warning in ip6mrMadhuparna Bhowmik
This patch fixes the following warning: ============================= WARNING: suspicious RCU usage 5.7.0-rc4-next-20200507-syzkaller #0 Not tainted ----------------------------- net/ipv6/ip6mr.c:124 RCU-list traversed in non-reader section!! ipmr_new_table() returns an existing table, but there is no table at init. Therefore the condition: either holding rtnl or the list is empty is used. Fixes: d1db275dd3f6e ("ipv6: ip6mr: support multiple tables") Reported-by: kernel test robot <lkp@intel.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix sk_psock reference count leak on receive, from Xiyu Yang. 2) CONFIG_HNS should be invisible, from Geert Uytterhoeven. 3) Don't allow locking route MTUs in ipv6, RFCs actually forbid this, from Maciej Żenczykowski. 4) ipv4 route redirect backoff wasn't actually enforced, from Paolo Abeni. 5) Fix netprio cgroup v2 leak, from Zefan Li. 6) Fix infinite loop on rmmod in conntrack, from Florian Westphal. 7) Fix tcp SO_RCVLOWAT hangs, from Eric Dumazet. 8) Various bpf probe handling fixes, from Daniel Borkmann. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (68 commits) selftests: mptcp: pm: rm the right tmp file dpaa2-eth: properly handle buffer size restrictions bpf: Restrict bpf_trace_printk()'s %s usage and add %pks, %pus specifier bpf: Add bpf_probe_read_{user, kernel}_str() to do_refine_retval_range bpf: Restrict bpf_probe_read{, str}() only to archs where they work MAINTAINERS: Mark networking drivers as Maintained. ipmr: Add lockdep expression to ipmr_for_each_table macro ipmr: Fix RCU list debugging warning drivers: net: hamradio: Fix suspicious RCU usage warning in bpqether.c net: phy: broadcom: fix BCM54XX_SHD_SCR3_TRDDAPD value for BCM54810 tcp: fix error recovery in tcp_zerocopy_receive() MAINTAINERS: Add Jakub to networking drivers. MAINTAINERS: another add of Karsten Graul for S390 networking drivers: ipa: fix typos for ipa_smp2p structure doc pppoe: only process PADT targeted at local interfaces selftests/bpf: Enforce returning 0 for fentry/fexit programs bpf: Enforce returning 0 for fentry/fexit progs net: stmmac: fix num_por initialization security: Fix the default value of secid_to_secctx hook libbpf: Fix register naming in PT_REGS s390 macros ...
2020-05-15Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma fixes from Jason Gunthorpe: "A few minor bug fixes for user visible defects, and one regression: - Various bugs from static checkers and syzkaller - Add missing error checking in mlx4 - Prevent RTNL lock recursion in i40iw - Fix segfault in cxgb4 in peer abort cases - Fix a regression added in 5.7 where the IB_EVENT_DEVICE_FATAL could be lost, and wasn't delivered to all the FDs" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/uverbs: Move IB_EVENT_DEVICE_FATAL to destroy_uobj RDMA/uverbs: Do not discard the IB_EVENT_DEVICE_FATAL event RDMA/iw_cxgb4: Fix incorrect function parameters RDMA/core: Fix double put of resource IB/core: Fix potential NULL pointer dereference in pkey cache IB/hfi1: Fix another case where pq is left on waitlist IB/i40iw: Remove bogus call to netdev_master_upper_dev_get() IB/mlx4: Test return value of calls to ib_get_cached_pkey RDMA/rxe: Always return ERR_PTR from rxe_create_mmap_info() i40iw: Fix error handling in i40iw_manage_arp_cache()
2020-05-15Merge tag 'linux-kselftest-5.7-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest fixes from Shuah Khan: - lkdtm runner fixes to prevent dmesg clearing and shellcheck errors - ftrace test handling when test module doesn't exist - nsfs test fix to replace zero-length array with flexible-array - dmabuf-heaps test fix to return clear error value * tag 'linux-kselftest-5.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests/lkdtm: Use grep -E instead of egrep selftests/lkdtm: Don't clear dmesg when running tests selftests/ftrace: mark irqsoff_tracer.tc test as unresolved if the test module does not exist tools/testing: Replace zero-length array with flexible-array kselftests: dmabuf-heaps: Fix confused return value on expected error testing
2020-05-15Merge tag 'riscv-for-linus-5.7-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: "A handful of build fixes, all found by Huawei's autobuilder. None of these patches should have any functional impact on kernels that build, and they're mostly related to various features intermingling with !MMU. While some of these might be better hoisted to generic code, it seems better to have the simple fixes in the meanwhile. As far as I know these are the only outstanding patches for 5.7" * tag 'riscv-for-linus-5.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: mmiowb: Fix implicit declaration of function 'smp_processor_id' riscv: pgtable: Fix __kernel_map_pages build error if NOMMU riscv: Make SYS_SUPPORTS_HUGETLBFS depends on MMU riscv: Disable ARCH_HAS_DEBUG_VIRTUAL if NOMMU riscv: Add pgprot_writecombine/device and PAGE_SHARED defination if NOMMU riscv: stacktrace: Fix undefined reference to `walk_stackframe' riscv: Fix unmet direct dependencies built based on SOC_VIRT riscv: perf: RISCV_BASE_PMU should be independent riscv: perf_event: Make some funciton static
2020-05-15Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fix from Catalin Marinas: "Fix flush_icache_range() second argument in machine_kexec() to be an address rather than size" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: fix the flush_icache_range arguments in machine_kexec
2020-05-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Alexei Starovoitov says: ==================== pull-request: bpf 2020-05-15 The following pull-request contains BPF updates for your *net* tree. We've added 9 non-merge commits during the last 2 day(s) which contain a total of 14 files changed, 137 insertions(+), 43 deletions(-). The main changes are: 1) Fix secid_to_secctx LSM hook default value, from Anders. 2) Fix bug in mmap of bpf array, from Andrii. 3) Restrict bpf_probe_read to archs where they work, from Daniel. 4) Enforce returning 0 for fentry/fexit progs, from Yonghong. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-15selftests: mptcp: pm: rm the right tmp fileMatthieu Baerts
"$err" is a variable pointing to a temp file. "$out" is not: only used as a local variable in "check()" and representing the output of a command line. Fixes: eedbc685321b (selftests: add PM netlink functional tests) Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-15dpaa2-eth: properly handle buffer size restrictionsIoana Ciornei
Depending on the WRIOP version, the buffer size on the RX path must by a multiple of 64 or 256. Handle this restriction properly by aligning down the buffer size to the necessary value. Also, use the new buffer size dynamically computed instead of the compile time one. Fixes: 27c874867c4e ("dpaa2-eth: Use a single page per Rx buffer") Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-15Merge tag 'hwmon-for-v5.7-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon fixes from Guenter Roeck: - Fix ADC access synchronization problem with da9052 driver - Fix temperature limit and status reporting in nct7904 driver - Fix drivetemp temperature reporting if SCT is supported but SCT data tables are not. * tag 'hwmon-for-v5.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: (da9052) Synchronize access with mfd hwmon: (nct7904) Fix incorrect range of temperature limit registers hwmon: (nct7904) Read all SMI status registers in probe function hwmon: (drivetemp) Fix SCT support if SCT data tables are not supported
2020-05-15Merge tag 'sound-5.7-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "Things look good and calming down; the only change to ALSA core is the fix for racy rawmidi buffer accesses spotted by syzkaller, and the rest are all small device-specific quirks for HD-audio and USB-audio devices" * tag 'sound-5.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda/realtek - Limit int mic boost for Thinkpad T530 ALSA: hda/realtek - Add COEF workaround for ASUS ZenBook UX431DA ALSA: hda/realtek: Enable headset mic of ASUS UX581LV with ALC295 ALSA: hda/realtek - Enable headset mic of ASUS UX550GE with ALC295 ALSA: hda/realtek - Enable headset mic of ASUS GL503VM with ALC295 ALSA: hda/realtek: Add quirk for Samsung Notebook ALSA: rawmidi: Fix racy buffer resize under concurrent accesses ALSA: usb-audio: add mapping for ASRock TRX40 Creator ALSA: hda/realtek - Fix S3 pop noise on Dell Wyse Revert "ALSA: hda/realtek: Fix pop noise on ALC225" ALSA: firewire-lib: fix 'function sizeof not defined' error of tracepoints format ALSA: usb-audio: Add control message quirk delay for Kingston HyperX headset
2020-05-15Merge tag 'drm-fixes-2020-05-15' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm fixes from Dave Airlie: "As mentioned last week an i915 PR came in late, but I left it, so the i915 bits of this cover 2 weeks, which is why it's likely a bit larger than usual. Otherwise it's mostly amdgpu fixes, one tegra fix, one meson fix. i915: - Handle idling during i915_gem_evict_something busy loops (Chris) - Mark current submissions with a weak-dependency (Chris) - Propagate error from completed fences (Chris) - Fixes on execlist to avoid GPU hang situation (Chris) - Fixes couple deadlocks (Chris) - Timeslice preemption fixes (Chris) - Fix Display Port interrupt handling on Tiger Lake (Imre) - Reduce debug noise around Frame Buffer Compression (Peter) - Fix logic around IPC W/a for Coffee Lake and Kaby Lake (Sultan) - Avoid dereferencing a dead context (Chris) tegra: - tegra120/4 smmu fixes amdgpu: - Clockgating fixes - Fix fbdev with scatter/gather display - S4 fix for navi - Soft recovery for gfx10 - Freesync fixes - Atomic check cursor fix - Add a gfxoff quirk - MST fix amdkfd: - Fix GEM reference counting meson: - error code propogation fix" * tag 'drm-fixes-2020-05-15' of git://anongit.freedesktop.org/drm/drm: (29 commits) drm/i915: Handle idling during i915_gem_evict_something busy loops drm/meson: pm resume add return errno branch drm/amd/amdgpu: Update update_config() logic drm/amd/amdgpu: add raven1 part to the gfxoff quirk list drm/i915: Mark concurrent submissions with a weak-dependency drm/i915: Propagate error from completed fences drm/i915/gvt: Fix kernel oops for 3-level ppgtt guest drm/i915/gvt: Init DPLL/DDI vreg for virtual display instead of inheritance. drm/amd/display: add basic atomic check for cursor plane drm/amd/display: Fix vblank and pageflip event handling for FreeSync drm/amdgpu: implement soft_recovery for gfx10 drm/amdgpu: enable hibernate support on Navi1X drm/amdgpu: Use GEM obj reference for KFD BOs drm/amdgpu: force fbdev into vram drm/amd/powerplay: perform PG ungate prior to CG ungate drm/amdgpu: drop unnecessary cancel_delayed_work_sync on PG ungate drm/amdgpu: disable MGCG/MGLS also on gfx CG ungate drm/i915/execlists: Track inflight CCID drm/i915/execlists: Avoid reusing the same logical CCID drm/i915/gem: Remove object_is_locked assertion from unpin_from_display_plane ...
2020-05-15Merge branch 'restrict-bpf_probe_read'Alexei Starovoitov
Daniel Borkmann says: ==================== Small set of fixes in order to restrict BPF helpers for tracing which are broken on archs with overlapping address ranges as per discussion in [0]. I've targetted this for -bpf tree so they can be routed as fixes. Thanks! v1 -> v2: - switch to reusable %pks, %pus format specifiers (Yonghong) - fixate %s on kernel_ds probing for archs with overlapping addr space [0] https://lore.kernel.org/bpf/CAHk-=wjJKo0GVixYLmqPn-Q22WFu0xHaBSjKEo7e7Yw72y5SPQ@mail.gmail.com/T/ ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-05-15bpf: Restrict bpf_trace_printk()'s %s usage and add %pks, %pus specifierDaniel Borkmann
Usage of plain %s conversion specifier in bpf_trace_printk() suffers from the very same issue as bpf_probe_read{,str}() helpers, that is, it is broken on archs with overlapping address ranges. While the helpers have been addressed through work in 6ae08ae3dea2 ("bpf: Add probe_read_{user, kernel} and probe_read_{user, kernel}_str helpers"), we need an option for bpf_trace_printk() as well to fix it. Similarly as with the helpers, force users to make an explicit choice by adding %pks and %pus specifier to bpf_trace_printk() which will then pick the corresponding strncpy_from_unsafe*() variant to perform the access under KERNEL_DS or USER_DS. The %pk* (kernel specifier) and %pu* (user specifier) can later also be extended for other objects aside strings that are probed and printed under tracing, and reused out of other facilities like bpf_seq_printf() or BTF based type printing. Existing behavior of %s for current users is still kept working for archs where it is not broken and therefore gated through CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE. For archs not having this property we fall-back to pick probing under KERNEL_DS as a sensible default. Fixes: 8d3b7dce8622 ("bpf: add support for %s specifier to bpf_trace_printk()") Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Brendan Gregg <brendan.d.gregg@gmail.com> Link: https://lore.kernel.org/bpf/20200515101118.6508-4-daniel@iogearbox.net
2020-05-15bpf: Add bpf_probe_read_{user, kernel}_str() to do_refine_retval_rangeDaniel Borkmann
Given bpf_probe_read{,str}() BPF helpers are now only available under CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE, we need to add the drop-in replacements of bpf_probe_read_{kernel,user}_str() to do_refine_retval_range() as well to avoid hitting the same issue as in 849fa50662fbc ("bpf/verifier: refine retval R0 state for bpf_get_stack helper"). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200515101118.6508-3-daniel@iogearbox.net
2020-05-15bpf: Restrict bpf_probe_read{, str}() only to archs where they workDaniel Borkmann