summaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)Author
2020-04-07netfilter: nf_tables: reintroduce the NFT_SET_CONCAT flagPablo Neira Ayuso
Stefano originally proposed to introduce this flag, users hit EOPNOTSUPP in new binaries with old kernels when defining a set with ranges in a concatenation. Fixes: f3a2181e16f1 ("netfilter: nf_tables: Support for sets with multiple ranged fields") Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-04-07netfilter: nf_tables: report EOPNOTSUPP on unsupported flags/object typePablo Neira Ayuso
EINVAL should be used for malformed netlink messages. New userspace utility and old kernels might easily result in EINVAL when exercising new set features, which is misleading. Fixes: 8aeff920dcc9 ("netfilter: nf_tables: add stateful object reference to set elements") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-04-06xsk: Fix out of boundary write in __xsk_rcv_memcpyLi RongQing
first_len is the remainder of the first page we're copying. If this size is larger, then out of page boundary write will otherwise happen. Fixes: c05cd3645814 ("xsk: add support to allow unaligned chunk placement") Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1585813930-19712-1-git-send-email-lirongqing@baidu.com
2020-04-06ipv6: rpl: fix loop iterationAlexander Aring
This patch fix the loop iteration by not walking over the last iteration. The cmpri compressing value exempt the last segment. As the code shows the last iteration will be overwritten by cmpre value handling which is for the last segment. I think this doesn't end in any bufferoverflows because we work on worst case temporary buffer sizes but it ends in not best compression settings in some cases. Fixes: 8610c7c6e3bd ("net: ipv6: add support for rpl sr exthdr") Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-06Merge tag '9p-for-5.7' of git://github.com/martinetd/linuxLinus Torvalds
Pull 9p updates from Dominique Martinet: "Not much new, but a few patches for this cycle: - Fix read with O_NONBLOCK to allow incomplete read and return immediately - Rest is just cleanup (indent, unused field in struct, extra semicolon)" * tag '9p-for-5.7' of git://github.com/martinetd/linux: net/9p: remove unused p9_req_t aux field 9p: read only once on O_NONBLOCK 9pnet: allow making incomplete read requests 9p: Remove unneeded semicolon 9p: Fix Kconfig indentation
2020-04-06netfilter: ipset: Pass lockdep expression to RCU listsAmol Grover
ip_set_type_list is traversed using list_for_each_entry_rcu outside an RCU read-side critical section but under the protection of ip_set_type_mutex. Hence, add corresponding lockdep expression to silence false-positive warnings, and harden RCU lists. Signed-off-by: Amol Grover <frextrite@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-04-05netfilter: nf_tables: do not leave dangling pointer in nf_tables_set_alloc_nameEric Dumazet
If nf_tables_set_alloc_name() frees set->name, we better clear set->name to avoid a future use-after-free or invalid-free. BUG: KASAN: double-free or invalid-free in nf_tables_newset+0x1ed6/0x2560 net/netfilter/nf_tables_api.c:4148 CPU: 0 PID: 28233 Comm: syz-executor.0 Not tainted 5.6.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x188/0x20d lib/dump_stack.c:118 print_address_description.constprop.0.cold+0xd3/0x315 mm/kasan/report.c:374 kasan_report_invalid_free+0x61/0xa0 mm/kasan/report.c:468 __kasan_slab_free+0x129/0x140 mm/kasan/common.c:455 __cache_free mm/slab.c:3426 [inline] kfree+0x109/0x2b0 mm/slab.c:3757 nf_tables_newset+0x1ed6/0x2560 net/netfilter/nf_tables_api.c:4148 nfnetlink_rcv_batch+0x83a/0x1610 net/netfilter/nfnetlink.c:433 nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:543 [inline] nfnetlink_rcv+0x3af/0x420 net/netfilter/nfnetlink.c:561 netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline] netlink_unicast+0x537/0x740 net/netlink/af_netlink.c:1329 netlink_sendmsg+0x882/0xe10 net/netlink/af_netlink.c:1918 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:672 ____sys_sendmsg+0x6b9/0x7d0 net/socket.c:2345 ___sys_sendmsg+0x100/0x170 net/socket.c:2399 __sys_sendmsg+0xec/0x1b0 net/socket.c:2432 do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x45c849 Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fe5ca21dc78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007fe5ca21e6d4 RCX: 000000000045c849 RDX: 0000000000000000 RSI: 0000000020000c40 RDI: 0000000000000003 RBP: 000000000076bf00 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 000000000000095b R14: 00000000004cc0e9 R15: 000000000076bf0c Allocated by task 28233: save_stack+0x1b/0x80 mm/kasan/common.c:72 set_track mm/kasan/common.c:80 [inline] __kasan_kmalloc mm/kasan/common.c:515 [inline] __kasan_kmalloc.constprop.0+0xbf/0xd0 mm/kasan/common.c:488 __do_kmalloc mm/slab.c:3656 [inline] __kmalloc_track_caller+0x159/0x790 mm/slab.c:3671 kvasprintf+0xb5/0x150 lib/kasprintf.c:25 kasprintf+0xbb/0xf0 lib/kasprintf.c:59 nf_tables_set_alloc_name net/netfilter/nf_tables_api.c:3536 [inline] nf_tables_newset+0x1543/0x2560 net/netfilter/nf_tables_api.c:4088 nfnetlink_rcv_batch+0x83a/0x1610 net/netfilter/nfnetlink.c:433 nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:543 [inline] nfnetlink_rcv+0x3af/0x420 net/netfilter/nfnetlink.c:561 netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline] netlink_unicast+0x537/0x740 net/netlink/af_netlink.c:1329 netlink_sendmsg+0x882/0xe10 net/netlink/af_netlink.c:1918 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:672 ____sys_sendmsg+0x6b9/0x7d0 net/socket.c:2345 ___sys_sendmsg+0x100/0x170 net/socket.c:2399 __sys_sendmsg+0xec/0x1b0 net/socket.c:2432 do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 28233: save_stack+0x1b/0x80 mm/kasan/common.c:72 set_track mm/kasan/common.c:80 [inline] kasan_set_free_info mm/kasan/common.c:337 [inline] __kasan_slab_free+0xf7/0x140 mm/kasan/common.c:476 __cache_free mm/slab.c:3426 [inline] kfree+0x109/0x2b0 mm/slab.c:3757 nf_tables_set_alloc_name net/netfilter/nf_tables_api.c:3544 [inline] nf_tables_newset+0x1f73/0x2560 net/netfilter/nf_tables_api.c:4088 nfnetlink_rcv_batch+0x83a/0x1610 net/netfilter/nfnetlink.c:433 nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:543 [inline] nfnetlink_rcv+0x3af/0x420 net/netfilter/nfnetlink.c:561 netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline] netlink_unicast+0x537/0x740 net/netlink/af_netlink.c:1329 netlink_sendmsg+0x882/0xe10 net/netlink/af_netlink.c:1918 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:672 ____sys_sendmsg+0x6b9/0x7d0 net/socket.c:2345 ___sys_sendmsg+0x100/0x170 net/socket.c:2399 __sys_sendmsg+0xec/0x1b0 net/socket.c:2432 do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff8880a6032d00 which belongs to the cache kmalloc-32 of size 32 The buggy address is located 0 bytes inside of 32-byte region [ffff8880a6032d00, ffff8880a6032d20) The buggy address belongs to the page: page:ffffea0002980c80 refcount:1 mapcount:0 mapping:ffff8880aa0001c0 index:0xffff8880a6032fc1 flags: 0xfffe0000000200(slab) raw: 00fffe0000000200 ffffea0002a3be88 ffffea00029b1908 ffff8880aa0001c0 raw: ffff8880a6032fc1 ffff8880a6032000 000000010000003e 0000000000000000 page dumped because: kasan: bad access detected Fixes: 65038428b2c6 ("netfilter: nf_tables: allow to specify stateful expression in set definition") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-04-05netfilter: xt_IDLETIMER: target v1 - match Android layoutMaciej Żenczykowski
Android has long had an extension to IDLETIMER to send netlink messages to userspace, see: https://android.googlesource.com/kernel/common/+/refs/heads/android-mainline/include/uapi/linux/netfilter/xt_IDLETIMER.h#42 Note: this is idletimer target rev 1, there is no rev 0 in the Android common kernel sources, see registration at: https://android.googlesource.com/kernel/common/+/refs/heads/android-mainline/net/netfilter/xt_IDLETIMER.c#483 When we compare that to upstream's new idletimer target rev 1: https://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git/tree/include/uapi/linux/netfilter/xt_IDLETIMER.h#n46 We immediately notice that these two rev 1 structs are the same size and layout, and that while timer_type and send_nl_msg are differently named and serve a different purpose, they're at the same offset. This makes them impossible to tell apart - and thus one cannot know in a mixed Android/vanilla environment whether one means timer_type or send_nl_msg. Since this is iptables/netfilter uapi it introduces a problem between iptables (vanilla vs Android) userspace and kernel (vanilla vs Android) if the two don't match each other. Additionally when at some point in the future Android picks up 5.7+ it's not at all clear how to resolve the resulting merge conflict. Furthermore, since upgrading the kernel on old Android phones is pretty much impossible there does not seem to be an easy way out of this predicament. The only thing I've been able to come up with is some super disgusting kernel version >= 5.7 check in the iptables binary to flip between different struct layouts. By adding a dummy field to the vanilla Linux kernel header file we can force the two structs to be compatible with each other. Long term I think I would like to deprecate send_nl_msg out of Android entirely, but I haven't quite been able to figure out exactly how we depend on it. It seems to be very similar to sysfs notifications but with some extra info. Currently it's actually always enabled whenever Android uses the IDLETIMER target, so we could also probably entirely remove it from the uapi in favour of just always enabling it, but again we can't upgrade old kernels already in the field. (Also note that this doesn't change the structure's size, as it is simply fitting into the pre-existing padding, and that since 5.7 hasn't been released yet, there's still time to make this uapi visible change) Cc: Manoj Basapathi <manojbm@codeaurora.org> Cc: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-04-05netfilter: nf_tables: do not update stateful expressions if lookup is invertedPablo Neira Ayuso
Initialize set lookup matching element to NULL. Otherwise, the NFT_LOOKUP_F_INV flag reverses the matching logic and it leads to deference an uninitialized pointer to the matching element. Make sure element data area and stateful expression are accessed if there is a matching set element. This patch undoes 24791b9aa1ab ("netfilter: nft_set_bitmap: initialize set element extension in lookups") which is not required anymore. Fixes: 339706bc21c1 ("netfilter: nft_lookup: update element stateful expression") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-04-05netfilter: nft_set_rbtree: Drop spurious condition for overlap detection on ↵Stefano Brivio
insertion Case a1. for overlap detection in __nft_rbtree_insert() is not a valid one: start-after-start is not needed to detect any type of interval overlap and it actually results in a false positive if, while descending the tree, this is the only step we hit after starting from the root. This introduced a regression, as reported by Pablo, in Python tests cases ip/ip.t and ip/numgen.t: ip/ip.t: ERROR: line 124: add rule ip test-ip4 input ip hdrlength vmap { 0-4 : drop, 5 : accept, 6 : continue } counter: This rule should not have failed. ip/numgen.t: ERROR: line 7: add rule ip test-ip4 pre dnat to numgen inc mod 10 map { 0-5 : 192.168.10.100, 6-9 : 192.168.20.200}: This rule should not have failed. Drop case a1. and renumber others, so that they are a bit clearer. In order for these diagrams to be readily understandable, a bigger rework is probably needed, such as an ASCII art of the actual rbtree (instead of a flattened version). Shell script test sets/0044interval_overlap_0 should cover all possible cases for false negatives, so I consider that test case still sufficient after this change. v2: Fix comments for cases a3. and b3. Reported-by: Pablo Neira Ayuso <pablo@netfilter.org> Fixes: 7c84d41416d8 ("netfilter: nft_set_rbtree: Detect partial overlaps on insertion") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-04-05Bluetooth: Increment management interface revisionMarcel Holtmann
Increment the mgmt revision due to the recently added new commands. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2020-04-05Bluetooth: Add support for reading security informationMarcel Holtmann
To allow userspace to make correcty security policy decision, the kernel needs to export a few details of the supported security features and encryption key size information. This command exports this information and also allows future extensions if needed. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Reviewed-by: Alain Michaud <alainm@chromium.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2020-04-05Bluetooth: Add support for Read Local Simple Pairing OptionsMarcel Holtmann
With the Read Local Simple Pairing Options command it is possible to retrieve the support for max encryption key size supported by the controller and also if the controller correctly verifies the ECDH public key during pairing. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Reviewed-by: Alain Michaud <alainm@chromium.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2020-04-05Bluetooth: Add framework for Microsoft vendor extensionMiao-chen Chou
Micrsoft defined a set for HCI vendor extensions. Check the following link for details: https://docs.microsoft.com/en-us/windows-hardware/drivers/bluetooth/microsoft-defined-bluetooth-hci-commands-and-events This provides the basic framework to enable the extension and read its supported features. Drivers still have to declare support for this extension before it can be utilized by the host stack. Signed-off-by: Miao-chen Chou <mcchou@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2020-04-05Bluetooth: Move debugfs configuration above the selftestsMarcel Holtmann
This is just a cosmetic clean to move the selftests configuration option to the bottom of the list of options. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2020-04-05Bluetooth: add support to notify using SCO air modeSathish Narsimman
notifying using HCI_NOTIFY_CONN_ADD for SCO connection is generic in case of mSBC audio. To differntiate SCO air mode introducing HCI_NOTIFY_ENABLE_SCO_CVSD and HCI_NOTIFY_ENABLE_SCO_TRANSP. Signed-off-by: Sathish Narsimman <sathish.narasimman@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2020-04-04SUNRPC: Don't start a timer on an already queued rpc taskTrond Myklebust
Move the test for whether a task is already queued to prevent corruption of the timer list in __rpc_sleep_on_priority_timeout(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-04Merge tag 'keys-fixes-20200329' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull keyrings fixes from David Howells: "Here's a couple of patches that fix a circular dependency between holding key->sem and mm->mmap_sem when reading data from a key. One potential issue is that a filesystem looking to use a key inside, say, ->readpages() could deadlock if the key being read is the key that's required and the buffer the key is being read into is on a page that needs to be fetched. The case actually detected is a bit more involved - with a filesystem calling request_key() and locking the target keyring for write - which could be being read" * tag 'keys-fixes-20200329' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: KEYS: Avoid false positive ENOMEM error on key read KEYS: Don't write out to userspace while holding key semaphore
2020-04-04Merge tag 'nfsd-5.7' of git://git.linux-nfs.org/projects/cel/cel-2.6Linus Torvalds
Pull nfsd updates from Chuck Lever: - Fix EXCHANGE_ID response when NFSD runs in a container - A battery of new static trace points - Socket transports now use bio_vec to send Replies - NFS/RDMA now supports filesystems with no .splice_read method - Favor memcpy() over DMA mapping for small RPC/RDMA Replies - Add pre-requisites for supporting multiple Write chunks - Numerous minor fixes and clean-ups [ Chuck is filling in for Bruce this time while he and his family settle into a new house ] * tag 'nfsd-5.7' of git://git.linux-nfs.org/projects/cel/cel-2.6: (39 commits) svcrdma: Fix leak of transport addresses SUNRPC: Fix a potential buffer overflow in 'svc_print_xprts()' SUNRPC/cache: don't allow invalid entries to be flushed nfsd: fsnotify on rmdir under nfsd/clients/ nfsd4: kill warnings on testing stateids with mismatched clientids nfsd: remove read permission bit for ctl sysctl NFSD: Fix NFS server build errors sunrpc: Add tracing for cache events SUNRPC/cache: Allow garbage collection of invalid cache entries nfsd: export upcalls must not return ESTALE when mountd is down nfsd: Add tracepoints for update of the expkey and export cache entries nfsd: Add tracepoints for exp_find_key() and exp_get_by_name() nfsd: Add tracing to nfsd_set_fh_dentry() nfsd: Don't add locks to closed or closing open stateids SUNRPC: Teach server to use xprt_sock_sendmsg for socket sends SUNRPC: Refactor xs_sendpages() svcrdma: Avoid DMA mapping small RPC Replies svcrdma: Fix double sync of transport header buffer svcrdma: Refactor chunk list encoders SUNRPC: Add encoders for list item discriminators ...
2020-04-03mptcp: add some missing pr_fmt definesGeliang Tang
Some of the mptcp logs didn't print out the format string: [ 185.651493] DSS [ 185.651494] data_fin=0 dsn64=0 use_map=0 ack64=1 use_ack=1 [ 185.651494] data_ack=13792750332298763796 [ 185.651495] MPTCP: msk=00000000c4b81cfc ssk=000000009743af53 data_avail=0 skb=0000000063dc595d [ 185.651495] MPTCP: msk=00000000c4b81cfc ssk=000000009743af53 status=0 [ 185.651495] MPTCP: msk ack_seq=9bbc894565aa2f9a subflow ack_seq=9bbc894565aa2f9a [ 185.651496] MPTCP: msk=00000000c4b81cfc ssk=000000009743af53 data_avail=1 skb=0000000012e809e1 So this patch added these missing pr_fmt defines. Then we can get the same format string "MPTCP" in all mptcp logs like this: [ 142.795829] MPTCP: DSS [ 142.795829] MPTCP: data_fin=0 dsn64=0 use_map=0 ack64=1 use_ack=1 [ 142.795829] MPTCP: data_ack=8089704603109242421 [ 142.795830] MPTCP: msk=00000000133a24e0 ssk=000000002e508c64 data_avail=0 skb=00000000d5f230df [ 142.795830] MPTCP: msk=00000000133a24e0 ssk=000000002e508c64 status=0 [ 142.795831] MPTCP: msk ack_seq=66790290f1199d9b subflow ack_seq=66790290f1199d9b [ 142.795831] MPTCP: msk=00000000133a24e0 ssk=000000002e508c64 data_avail=1 skb=00000000de5aca2e Signed-off-by: Geliang Tang <geliangtang@gmail.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-03net_sched: fix a missing refcnt in tcindex_init()Cong Wang
The initial refcnt of struct tcindex_data should be 1, it is clear that I forgot to set it to 1 in tcindex_init(). This leads to a dec-after-zero warning. Reported-by: syzbot+8325e509a1bf83ec741d@syzkaller.appspotmail.com Fixes: 304e024216a8 ("net_sched: add a temporary refcnt for struct tcindex_data") Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Cc: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-03Merge tag 'spdx-5.7-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx Pull SPDX updates from Greg KH: "Here are three SPDX patches for 5.7-rc1. One fixes up the SPDX tag for a single driver, while the other two go through the tree and add SPDX tags for all of the .gitignore files as needed. Nothing too complex, but you will get a merge conflict with your current tree, that should be trivial to handle (one file modified by two things, one file deleted.) All three of these have been in linux-next for a while, with no reported issues other than the merge conflict" * tag 'spdx-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx: ASoC: MT6660: make spdxcheck.py happy .gitignore: add SPDX License Identifier .gitignore: remove too obvious comments
2020-04-03Bluetooth: fixing minor typo in commentAlain Michaud
This changes a simple typo in hci_event.c Signed-off-by: Alain Michaud <alainm@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2020-04-03Bluetooth: Prioritize SCO trafficAbhishek Pandit-Subedi
When scheduling TX packets, send all SCO/eSCO packets first, check for pending SCO/eSCO packets after every ACL/LE packet and send them if any are pending. This is done to make sure that we can meet SCO deadlines on slow interfaces like UART. If we were to queue up multiple ACL packets without checking for a SCO packet, we might miss the SCO timing. For example: The time it takes to send a maximum size ACL packet (1024 bytes): t = 10/8 * 1024 bytes * 8 bits/byte * 1 packet / baudrate where 10/8 is uart overhead due to start/stop bits per byte Replace t = 3.75ms (SCO deadline), which gives us a baudrate of 2730666. At a baudrate of 3000000, if we didn't check for SCO packets within 1024 bytes, we would miss the 3.75ms timing window. Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2020-04-02neigh: support smaller retrans_time setttingHangbin Liu
Currently, we limited the retrans_time to be greater than HZ/2. i.e. setting retrans_time less than 500ms will not work. This makes the user unable to achieve a more accurate control for bonding arp fast failover. Update the sanity check to HZ/100, which is 10ms, to let users have more ability on the retrans_time control. v3: sync the behavior with IPv6 and update all the timer handler v2: use HZ instead of hard code number Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-02net: openvswitch: use hlist_for_each_entry_rcu instead of hlist_for_each_entryTonghao Zhang
The struct sw_flow is protected by RCU, when traversing them, use hlist_for_each_entry_rcu. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-02net: core: enable SO_BINDTODEVICE for non-root usersVincent Bernat
Currently, SO_BINDTODEVICE requires CAP_NET_RAW. This change allows a non-root user to bind a socket to an interface if it is not already bound. This is useful to allow an application to bind itself to a specific VRF for outgoing or incoming connections. Currently, an application wanting to manage connections through several VRF need to be privileged. Previously, IP_UNICAST_IF and IPV6_UNICAST_IF were added for Wine (76e21053b5bf3 and c4062dfc425e9) specifically for use by non-root processes. However, they are restricted to sendmsg() and not usable with TCP. Allowing SO_BINDTODEVICE would allow TCP clients to get the same privilege. As for TCP servers, outside the VRF use case, SO_BINDTODEVICE would only further restrict connections a server could accept. When an application is restricted to a VRF (with `ip vrf exec`), the socket is bound to an interface at creation and therefore, a non-privileged call to SO_BINDTODEVICE to escape the VRF fails. When an application bound a socket to SO_BINDTODEVICE and transmit it to a non-privileged process through a Unix socket, a tentative to change the bound device also fails. Before: >>> import socket >>> s=socket.socket(socket.AF_INET, socket.SOCK_STREAM) >>> s.setsockopt(socket.SOL_SOCKET, socket.SO_BINDTODEVICE, b"dummy0") Traceback (most recent call last): File "<stdin>", line 1, in <module> PermissionError: [Errno 1] Operation not permitted After: >>> import socket >>> s=socket.socket(socket.AF_INET, socket.SOCK_STREAM) >>> s.setsockopt(socket.SOL_SOCKET, socket.SO_BINDTODEVICE, b"dummy0") >>> s.setsockopt(socket.SOL_SOCKET, socket.SO_BINDTODEVICE, b"dummy0") Traceback (most recent call last): File "<stdin>", line 1, in <module> PermissionError: [Errno 1] Operation not permitted Signed-off-by: Vincent Bernat <vincent@bernat.ch> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-03net, sk_msg: Don't use RCU_INIT_POINTER on sk_user_dataJakub Sitnicki
sparse reports an error due to use of RCU_INIT_POINTER helper to assign to sk_user_data pointer, which is not tagged with __rcu: net/core/sock.c:1875:25: error: incompatible types in comparison expression (different address spaces): net/core/sock.c:1875:25: void [noderef] <asn:4> * net/core/sock.c:1875:25: void * ... and rightfully so. sk_user_data is not always treated as a pointer to an RCU-protected data. When it is used to point at an RCU-protected object, we access it with __sk_user_data to inform sparse about it. In this case, when the child socket does not inherit sk_user_data from the parent, there is no reason to treat it as an RCU-protected pointer. Use a regular assignment to clear the pointer value. Fixes: f1ff5ce2cd5e ("net, sk_msg: Clear sk_user_data pointer on clone if tagged") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200402125524.851439-1-jakub@cloudflare.com
2020-04-02mptcp: fix "fn parameter not described" warningsMatthieu Baerts
Obtained with: $ make W=1 net/mptcp/token.o net/mptcp/token.c:53: warning: Function parameter or member 'req' not described in 'mptcp_token_new_request' net/mptcp/token.c:98: warning: Function parameter or member 'sk' not described in 'mptcp_token_new_connect' net/mptcp/token.c:133: warning: Function parameter or member 'conn' not described in 'mptcp_token_new_accept' net/mptcp/token.c:178: warning: Function parameter or member 'token' not described in 'mptcp_token_destroy_request' net/mptcp/token.c:191: warning: Function parameter or member 'token' not described in 'mptcp_token_destroy' Fixes: 79c0949e9a09 (mptcp: Add key generation and token tree) Fixes: 58b09919626b (mptcp: create msk early) Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-02mptcp: re-check dsn before reading from subflowFlorian Westphal
mptcp_subflow_data_available() is commonly called via ssk->sk_data_ready(), in this case the mptcp socket lock cannot be acquired. Therefore, while we can safely discard subflow data that was already received up to msk->ack_seq, we cannot be sure that 'subflow->data_avail' will still be valid at the time userspace wants to read the data -- a previous read on a different subflow might have carried this data already. In that (unlikely) event, msk->ack_seq will have been updated and will be ahead of the subflow dsn. We can check for this condition and skip/resync to the expected sequence number. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-02mptcp: subflow: check parent mptcp socket on subflow state changeFlorian Westphal
This is needed at least until proper MPTCP-Level fin/reset signalling gets added: We wake parent when a subflow changes, but we should do this only when all subflows have closed, not just one. Schedule the mptcp worker and tell it to check eof state on all subflows. Only flag mptcp socket as closed and wake userspace processes blocking in poll if all subflows have closed. Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-02mptcp: fix tcp fallback crashFlorian Westphal
Christoph Paasch reports following crash: general protection fault [..] CPU: 0 PID: 2874 Comm: syz-executor072 Not tainted 5.6.0-rc5 #62 RIP: 0010:__pv_queued_spin_lock_slowpath kernel/locking/qspinlock.c:471 [..] queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:50 [inline] do_raw_spin_lock include/linux/spinlock.h:181 [inline] spin_lock_bh include/linux/spinlock.h:343 [inline] __mptcp_flush_join_list+0x44/0xb0 net/mptcp/protocol.c:278 mptcp_shutdown+0xb3/0x230 net/mptcp/protocol.c:1882 [..] Problem is that mptcp_shutdown() socket isn't an mptcp socket, its a plain tcp_sk. Thus, trying to access mptcp_sk specific members accesses garbage. Root cause is that accept() returns a fallback (tcp) socket, not an mptcp one. There is code in getpeername to detect this and override the sockets stream_ops. But this will only run when accept() caller provided a sockaddr struct. "accept(fd, NULL, 0)" will therefore result in mptcp stream ops, but with sock->sk pointing at a tcp_sk. Update the existing fallback handling to detect this as well. Moreover, mptcp_shutdown did not have fallback handling, and mptcp_poll did it too late so add that there as well. Reported-by: Christoph Paasch <cpaasch@apple.com> Tested-by: Christoph Paasch <cpaasch@apple.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-02net: ipv6: rpl_iptunnel: remove redundant assignments to variable errColin Ian King
The variable err is being initialized with a value that is never read and it is being updated later with a new value. The initialization is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-02net: dsa: dsa_bridge_mtu_normalization() can be statickbuild test robot
Fixes: f41071407c85 ("net: dsa: implement auto-normalization of MTU for bridge hardware datapath") Signed-off-by: kbuild test robot <lkp@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-02Bluetooth: Always request for user confirmation for Just WorksSonny Sasaka
To improve security, always give the user-space daemon a chance to accept or reject a Just Works pairing (LE). The daemon may decide to auto-accept based on the user's intent. Signed-off-by: Sonny Sasaka <sonnysasaka@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2020-04-02Bluetooth: Add BT_MODE socket optionLuiz Augusto von Dentz
This adds BT_MODE socket option which can be used to set L2CAP modes, including modes only supported over LE which were not supported using the L2CAP_OPTIONS. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2020-04-02Bluetooth: L2CAP: Fix handling LE modes by L2CAP_OPTIONSLuiz Augusto von Dentz
L2CAP_OPTIONS shall only be used with BR/EDR modes. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2020-04-01ipv6: don't auto-add link-local address to lag portsJarod Wilson
Bonding slave and team port devices should not have link-local addresses automatically added to them, as it can interfere with openvswitch being able to properly add tc ingress. Basic reproducer, courtesy of Marcelo: $ ip link add name bond0 type bond $ ip link set dev ens2f0np0 master bond0 $ ip link set dev ens2f1np2 master bond0 $ ip link set dev bond0 up $ ip a s 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens2f0np0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000 link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff 5: ens2f1np2: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 1500 qdisc mq master bond0 state DOWN group default qlen 1000 link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff 11: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff inet6 fe80::20f:53ff:fe2f:ea40/64 scope link valid_lft forever preferred_lft forever (above trimmed to relevant entries, obviously) $ sysctl net.ipv6.conf.ens2f0np0.addr_gen_mode=0 net.ipv6.conf.ens2f0np0.addr_gen_mode = 0 $ sysctl net.ipv6.conf.ens2f1np2.addr_gen_mode=0 net.ipv6.conf.ens2f1np2.addr_gen_mode = 0 $ ip a l ens2f0np0 2: ens2f0np0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000 link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff inet6 fe80::20f:53ff:fe2f:ea40/64 scope link tentative valid_lft forever preferred_lft forever $ ip a l ens2f1np2 5: ens2f1np2: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 1500 qdisc mq master bond0 state DOWN group default qlen 1000 link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff inet6 fe80::20f:53ff:fe2f:ea40/64 scope link tentative valid_lft forever preferred_lft forever Looks like addrconf_sysctl_addr_gen_mode() bypasses the original "is this a slave interface?" check added by commit c2edacf80e15, and results in an address getting added, while w/the proposed patch added, no address gets added. This simply adds the same gating check to another code path, and thus should prevent the same devices from erroneously obtaining an ipv6 link-local address. Fixes: d35a00b8e33d ("net/ipv6: allow sysctl to change link-local address generation mode") Reported-by: Moshe Levi <moshele@mellanox.com> CC: Stephen Hemminger <stephen@networkplumber.org> CC: Marcelo Ricardo Leitner <mleitner@redhat.com> CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-01net_sched: add a temporary refcnt for struct tcindex_dataCong Wang
Although we intentionally use an ordered workqueue for all tc filter works, the ordering is not guaranteed by RCU work, given that tcf_queue_work() is esstenially a call_rcu(). This problem is demostrated by Thomas: CPU 0: tcf_queue_work() tcf_queue_work(&r->rwork, tcindex_destroy_rexts_work); -> Migration to CPU 1 CPU 1: tcf_queue_work(&p->rwork, tcindex_destroy_work); so the 2nd work could be queued before the 1st one, which leads to a free-after-free. Enforcing this order in RCU work is hard as it requires to change RCU code too. Fortunately we can workaround this problem in tcindex filter by taking a temporary refcnt, we only refcnt it right before we begin to destroy it. This simplifies the code a lot as a full refcnt requires much more changes in tcindex_set_parms(). Reported-by: syzbot+46f513c3033d592409d2@syzkaller.appspotmail.com Fixes: 3d210534cc93 ("net_sched: fix a race condition in tcindex_destroy()") Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextLinus Torvalds
Pull networking updates from David Miller: "Highlights: 1) Fix the iwlwifi regression, from Johannes Berg. 2) Support BSS coloring and 802.11 encapsulation offloading in hardware, from John Crispin. 3) Fix some potential Spectre issues in qtnfmac, from Sergey Matyukevich. 4) Add TTL decrement action to openvswitch, from Matteo Croce. 5) Allow paralleization through flow_action setup by not taking the RTNL mutex, from Vlad Buslov. 6) A lot of zero-length array to flexible-array conversions, from Gustavo A. R. Silva. 7) Align XDP statistics names across several drivers for consistency, from Lorenzo Bianconi. 8) Add various pieces of infrastructure for offloading conntrack, and make use of it in mlx5 driver, from Paul Blakey. 9) Allow using listening sockets in BPF sockmap, from Jakub Sitnicki. 10) Lots of parallelization improvements during configuration changes in mlxsw driver, from Ido Schimmel. 11) Add support to devlink for generic packet traps, which report packets dropped during ACL processing. And use them in mlxsw driver. From Jiri Pirko. 12) Support bcmgenet on ACPI, from Jeremy Linton. 13) Make BPF compatible with RT, from Thomas Gleixnet, Alexei Starovoitov, and your's truly. 14) Support XDP meta-data in virtio_net, from Yuya Kusakabe. 15) Fix sysfs permissions when network devices change namespaces, from Christian Brauner. 16) Add a flags element to ethtool_ops so that drivers can more simply indicate which coalescing parameters they actually support, and therefore the generic layer can validate the user's ethtool request. Use this in all drivers, from Jakub Kicinski. 17) Offload FIFO qdisc in mlxsw, from Petr Machata. 18) Support UDP sockets in sockmap, from Lorenz Bauer. 19) Fix stretch ACK bugs in several TCP congestion control modules, from Pengcheng Yang. 20) Support virtual functiosn in octeontx2 driver, from Tomasz Duszynski. 21) Add region operations for devlink and use it in ice driver to dump NVM contents, from Jacob Keller. 22) Add support for hw offload of MACSEC, from Antoine Tenart. 23) Add support for BPF programs that can be attached to LSM hooks, from KP Singh. 24) Support for multiple paths, path managers, and counters in MPTCP. From Peter Krystad, Paolo Abeni, Florian Westphal, Davide Caratti, and others. 25) More progress on adding the netlink interface to ethtool, from Michal Kubecek" * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2121 commits) net: ipv6: rpl_iptunnel: Fix potential memory leak in rpl_do_srh_inline cxgb4/chcr: nic-tls stats in ethtool net: dsa: fix oops while probing Marvell DSA switches net/bpfilter: remove superfluous testing message net: macb: Fix handling of fixed-link node net: dsa: ksz: Select KSZ protocol tag netdevsim: dev: Fix memory leak in nsim_dev_take_snapshot_write net: stmmac: add EHL 2.5Gbps PCI info and PCI ID net: stmmac: add EHL PSE0 & PSE1 1Gbps PCI info and PCI ID net: stmmac: create dwmac-intel.c to contain all Intel platform net: dsa: bcm_sf2: Support specifying VLAN tag egress rule net: dsa: bcm_sf2: Add support for matching VLAN TCI net: dsa: bcm_sf2: Move writing of CFP_DATA(5) into slicing functions net: dsa: bcm_sf2: Check earlier for FLOW_EXT and FLOW_MAC_EXT net: dsa: bcm_sf2: Disable learning for ASP port net: dsa: b53: Deny enslaving port 7 for 7278 into a bridge net: dsa: b53: Prevent tagged VLAN on port 7 for 7278 net: dsa: b53: Restore VLAN entries upon (re)configuration net: dsa: bcm_sf2: Fix overflow checks hv_netvsc: Remove unnecessary round_up for recv_completion_cnt ...
2020-03-31net: ipv6: rpl_iptunnel: Fix potential memory leak in rpl_do_srh_inlineGustavo A. R. Silva
In case memory resources for buf were allocated, release them before return. Addresses-Coverity-ID: 1492011 ("Resource leak") Fixes: a7a29f9c361f ("net: ipv6: add rpl sr tunnel") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-31net: dsa: fix oops while probing Marvell DSA switchesRussell King
Fix an oops in dsa_port_phylink_mac_change() caused by a combination of a20f997010c4 ("net: dsa: Don't instantiate phylink for CPU/DSA ports unless needed") and the net-dsa-improve-serdes-integration series of patches 65b7a2c8e369 ("Merge branch 'net-dsa-improve-serdes-integration'"). Unable to handle kernel NULL pointer dereference at virtual address 00000124 pgd = c0004000 [00000124] *pgd=00000000 Internal error: Oops: 805 [#1] SMP ARM Modules linked in: tag_edsa spi_nor mtd xhci_plat_hcd mv88e6xxx(+) xhci_hcd armada_thermal marvell_cesa dsa_core ehci_orion libdes phy_armada38x_comphy at24 mcp3021 sfp evbug spi_orion sff mdio_i2c CPU: 1 PID: 214 Comm: irq/55-mv88e6xx Not tainted 5.6.0+ #470 Hardware name: Marvell Armada 380/385 (Device Tree) PC is at phylink_mac_change+0x10/0x88 LR is at mv88e6352_serdes_irq_status+0x74/0x94 [mv88e6xxx] Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-31net/bpfilter: remove superfluous testing messageBruno Meneguele
A testing message was brought by 13d0f7b814d9 ("net/bpfilter: fix dprintf usage for /dev/kmsg") but should've been deleted before patch submission. Although it doesn't cause any harm to the code or functionality itself, it's totally unpleasant to have it displayed on every loop iteration with no real use case. Thus remove it unconditionally. Fixes: 13d0f7b814d9 ("net/bpfilter: fix dprintf usage for /dev/kmsg") Signed-off-by: Bruno Meneguele <bmeneg@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
2020-03-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for net-next: 1) Add support to specify a stateful expression in set definitions, this allows users to specify e.g. counters per set elements. 2) Flowtable software counter support. 3) Flowtable hardware offload counter support, from wenxu. 3) Parallelize flowtable hardware offload requests, from Paul Blakey. This includes a patch to add one work entry per offload command. 4) Several patches to rework nf_queue refcount handling, from Florian Westphal. 4) A few fixes for the flowtable tunnel offload: Fix crash if tunneling information is missing and set up indirect flow block as TC_SETUP_FT, patch from wenxu. 5) Stricter netlink attribute sanity check on filters, from Romain Bellan and Florent Fourcot. 5) Annotations to make sparse happy, from Jules Irenge. 6) Improve icmp errors in debugging information, from Haishuang Yan. 7) Fix warning in IPVS icmp error debugging, from Haishuang Yan. 8) Fix endianess issue in tcp extension header, from Sergey Marinkevich. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30devlink: Allow setting of packet trap group parametersIdo Schimmel
The previous patch allowed device drivers to publish their default binding between packet trap policers and packet trap groups. However, some users might not be content with this binding and would like to change it. In case user space passed a packet trap policer identifier when setting a packet trap group, invoke the appropriate device driver callback and pass the new policer identifier. v2: * Check for presence of 'DEVLINK_ATTR_TRAP_POLICER_ID' in devlink_trap_group_set() and bail if not present * Add extack error message in case trap group was partially modified Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30devlink: Add packet trap group parameters supportIdo Schimmel
Packet trap groups are used to aggregate logically related packet traps. Currently, these groups allow user space to batch operations such as setting the trap action of all member traps. In order to prevent the CPU from being overwhelmed by too many trapped packets, it is desirable to bind a packet trap policer to these groups. For example, to limit all the packets that encountered an exception during routing to 10Kpps. Allow device drivers to bind default packet trap policers to packet trap groups when the latter are registered with devlink. The next patch will enable user space to change this default binding. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30devlink: Add packet trap policers supportIdo Schimmel
Devices capable of offloading the kernel's datapath and perform functions such as bridging and routing must also be able to send (trap) specific packets to the kernel (i.e., the CPU) for processing. For example, a device acting as a multicast-aware bridge must be able to trap IGMP membership reports to the kernel for processing by the bridge module. In most cases, the underlying device is capable of handling packet rates that are several orders of magnitude higher compared to those that can be handled by the CPU. Therefore, in order to prevent the underlying device from overwhelming the CPU, devices usually include packet trap policers that are able to police the trapped packets to rates that can be handled by the CPU. This patch allows capable device drivers to register their supported packet trap policers with devlink. User space can then tune the parameters of these policer (currently, rate and burst size) and read from the device the number of packets that were dropped by the policer, if supported. Subsequent patches in the series will allow device drivers to create default binding between these policers and packet trap groups and allow user space to change the binding. v2: * Add 'strict_start_type' in devlink policy * Have device drivers provide max/min rate/burst size for each policer. Use them to check validity of user provided parameters Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>