summaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)Author
2020-04-06xsk: Fix out of boundary write in __xsk_rcv_memcpyLi RongQing
first_len is the remainder of the first page we're copying. If this size is larger, then out of page boundary write will otherwise happen. Fixes: c05cd3645814 ("xsk: add support to allow unaligned chunk placement") Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1585813930-19712-1-git-send-email-lirongqing@baidu.com
2020-04-03net, sk_msg: Don't use RCU_INIT_POINTER on sk_user_dataJakub Sitnicki
sparse reports an error due to use of RCU_INIT_POINTER helper to assign to sk_user_data pointer, which is not tagged with __rcu: net/core/sock.c:1875:25: error: incompatible types in comparison expression (different address spaces): net/core/sock.c:1875:25: void [noderef] <asn:4> * net/core/sock.c:1875:25: void * ... and rightfully so. sk_user_data is not always treated as a pointer to an RCU-protected data. When it is used to point at an RCU-protected object, we access it with __sk_user_data to inform sparse about it. In this case, when the child socket does not inherit sk_user_data from the parent, there is no reason to treat it as an RCU-protected pointer. Use a regular assignment to clear the pointer value. Fixes: f1ff5ce2cd5e ("net, sk_msg: Clear sk_user_data pointer on clone if tagged") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200402125524.851439-1-jakub@cloudflare.com
2020-04-02mptcp: fix "fn parameter not described" warningsMatthieu Baerts
Obtained with: $ make W=1 net/mptcp/token.o net/mptcp/token.c:53: warning: Function parameter or member 'req' not described in 'mptcp_token_new_request' net/mptcp/token.c:98: warning: Function parameter or member 'sk' not described in 'mptcp_token_new_connect' net/mptcp/token.c:133: warning: Function parameter or member 'conn' not described in 'mptcp_token_new_accept' net/mptcp/token.c:178: warning: Function parameter or member 'token' not described in 'mptcp_token_destroy_request' net/mptcp/token.c:191: warning: Function parameter or member 'token' not described in 'mptcp_token_destroy' Fixes: 79c0949e9a09 (mptcp: Add key generation and token tree) Fixes: 58b09919626b (mptcp: create msk early) Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-02mptcp: re-check dsn before reading from subflowFlorian Westphal
mptcp_subflow_data_available() is commonly called via ssk->sk_data_ready(), in this case the mptcp socket lock cannot be acquired. Therefore, while we can safely discard subflow data that was already received up to msk->ack_seq, we cannot be sure that 'subflow->data_avail' will still be valid at the time userspace wants to read the data -- a previous read on a different subflow might have carried this data already. In that (unlikely) event, msk->ack_seq will have been updated and will be ahead of the subflow dsn. We can check for this condition and skip/resync to the expected sequence number. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-02mptcp: subflow: check parent mptcp socket on subflow state changeFlorian Westphal
This is needed at least until proper MPTCP-Level fin/reset signalling gets added: We wake parent when a subflow changes, but we should do this only when all subflows have closed, not just one. Schedule the mptcp worker and tell it to check eof state on all subflows. Only flag mptcp socket as closed and wake userspace processes blocking in poll if all subflows have closed. Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-02mptcp: fix tcp fallback crashFlorian Westphal
Christoph Paasch reports following crash: general protection fault [..] CPU: 0 PID: 2874 Comm: syz-executor072 Not tainted 5.6.0-rc5 #62 RIP: 0010:__pv_queued_spin_lock_slowpath kernel/locking/qspinlock.c:471 [..] queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:50 [inline] do_raw_spin_lock include/linux/spinlock.h:181 [inline] spin_lock_bh include/linux/spinlock.h:343 [inline] __mptcp_flush_join_list+0x44/0xb0 net/mptcp/protocol.c:278 mptcp_shutdown+0xb3/0x230 net/mptcp/protocol.c:1882 [..] Problem is that mptcp_shutdown() socket isn't an mptcp socket, its a plain tcp_sk. Thus, trying to access mptcp_sk specific members accesses garbage. Root cause is that accept() returns a fallback (tcp) socket, not an mptcp one. There is code in getpeername to detect this and override the sockets stream_ops. But this will only run when accept() caller provided a sockaddr struct. "accept(fd, NULL, 0)" will therefore result in mptcp stream ops, but with sock->sk pointing at a tcp_sk. Update the existing fallback handling to detect this as well. Moreover, mptcp_shutdown did not have fallback handling, and mptcp_poll did it too late so add that there as well. Reported-by: Christoph Paasch <cpaasch@apple.com> Tested-by: Christoph Paasch <cpaasch@apple.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-02net: ipv6: rpl_iptunnel: remove redundant assignments to variable errColin Ian King
The variable err is being initialized with a value that is never read and it is being updated later with a new value. The initialization is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-02net: dsa: dsa_bridge_mtu_normalization() can be statickbuild test robot
Fixes: f41071407c85 ("net: dsa: implement auto-normalization of MTU for bridge hardware datapath") Signed-off-by: kbuild test robot <lkp@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-01ipv6: don't auto-add link-local address to lag portsJarod Wilson
Bonding slave and team port devices should not have link-local addresses automatically added to them, as it can interfere with openvswitch being able to properly add tc ingress. Basic reproducer, courtesy of Marcelo: $ ip link add name bond0 type bond $ ip link set dev ens2f0np0 master bond0 $ ip link set dev ens2f1np2 master bond0 $ ip link set dev bond0 up $ ip a s 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens2f0np0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000 link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff 5: ens2f1np2: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 1500 qdisc mq master bond0 state DOWN group default qlen 1000 link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff 11: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff inet6 fe80::20f:53ff:fe2f:ea40/64 scope link valid_lft forever preferred_lft forever (above trimmed to relevant entries, obviously) $ sysctl net.ipv6.conf.ens2f0np0.addr_gen_mode=0 net.ipv6.conf.ens2f0np0.addr_gen_mode = 0 $ sysctl net.ipv6.conf.ens2f1np2.addr_gen_mode=0 net.ipv6.conf.ens2f1np2.addr_gen_mode = 0 $ ip a l ens2f0np0 2: ens2f0np0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000 link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff inet6 fe80::20f:53ff:fe2f:ea40/64 scope link tentative valid_lft forever preferred_lft forever $ ip a l ens2f1np2 5: ens2f1np2: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 1500 qdisc mq master bond0 state DOWN group default qlen 1000 link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff inet6 fe80::20f:53ff:fe2f:ea40/64 scope link tentative valid_lft forever preferred_lft forever Looks like addrconf_sysctl_addr_gen_mode() bypasses the original "is this a slave interface?" check added by commit c2edacf80e15, and results in an address getting added, while w/the proposed patch added, no address gets added. This simply adds the same gating check to another code path, and thus should prevent the same devices from erroneously obtaining an ipv6 link-local address. Fixes: d35a00b8e33d ("net/ipv6: allow sysctl to change link-local address generation mode") Reported-by: Moshe Levi <moshele@mellanox.com> CC: Stephen Hemminger <stephen@networkplumber.org> CC: Marcelo Ricardo Leitner <mleitner@redhat.com> CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-01net_sched: add a temporary refcnt for struct tcindex_dataCong Wang
Although we intentionally use an ordered workqueue for all tc filter works, the ordering is not guaranteed by RCU work, given that tcf_queue_work() is esstenially a call_rcu(). This problem is demostrated by Thomas: CPU 0: tcf_queue_work() tcf_queue_work(&r->rwork, tcindex_destroy_rexts_work); -> Migration to CPU 1 CPU 1: tcf_queue_work(&p->rwork, tcindex_destroy_work); so the 2nd work could be queued before the 1st one, which leads to a free-after-free. Enforcing this order in RCU work is hard as it requires to change RCU code too. Fortunately we can workaround this problem in tcindex filter by taking a temporary refcnt, we only refcnt it right before we begin to destroy it. This simplifies the code a lot as a full refcnt requires much more changes in tcindex_set_parms(). Reported-by: syzbot+46f513c3033d592409d2@syzkaller.appspotmail.com Fixes: 3d210534cc93 ("net_sched: fix a race condition in tcindex_destroy()") Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextLinus Torvalds
Pull networking updates from David Miller: "Highlights: 1) Fix the iwlwifi regression, from Johannes Berg. 2) Support BSS coloring and 802.11 encapsulation offloading in hardware, from John Crispin. 3) Fix some potential Spectre issues in qtnfmac, from Sergey Matyukevich. 4) Add TTL decrement action to openvswitch, from Matteo Croce. 5) Allow paralleization through flow_action setup by not taking the RTNL mutex, from Vlad Buslov. 6) A lot of zero-length array to flexible-array conversions, from Gustavo A. R. Silva. 7) Align XDP statistics names across several drivers for consistency, from Lorenzo Bianconi. 8) Add various pieces of infrastructure for offloading conntrack, and make use of it in mlx5 driver, from Paul Blakey. 9) Allow using listening sockets in BPF sockmap, from Jakub Sitnicki. 10) Lots of parallelization improvements during configuration changes in mlxsw driver, from Ido Schimmel. 11) Add support to devlink for generic packet traps, which report packets dropped during ACL processing. And use them in mlxsw driver. From Jiri Pirko. 12) Support bcmgenet on ACPI, from Jeremy Linton. 13) Make BPF compatible with RT, from Thomas Gleixnet, Alexei Starovoitov, and your's truly. 14) Support XDP meta-data in virtio_net, from Yuya Kusakabe. 15) Fix sysfs permissions when network devices change namespaces, from Christian Brauner. 16) Add a flags element to ethtool_ops so that drivers can more simply indicate which coalescing parameters they actually support, and therefore the generic layer can validate the user's ethtool request. Use this in all drivers, from Jakub Kicinski. 17) Offload FIFO qdisc in mlxsw, from Petr Machata. 18) Support UDP sockets in sockmap, from Lorenz Bauer. 19) Fix stretch ACK bugs in several TCP congestion control modules, from Pengcheng Yang. 20) Support virtual functiosn in octeontx2 driver, from Tomasz Duszynski. 21) Add region operations for devlink and use it in ice driver to dump NVM contents, from Jacob Keller. 22) Add support for hw offload of MACSEC, from Antoine Tenart. 23) Add support for BPF programs that can be attached to LSM hooks, from KP Singh. 24) Support for multiple paths, path managers, and counters in MPTCP. From Peter Krystad, Paolo Abeni, Florian Westphal, Davide Caratti, and others. 25) More progress on adding the netlink interface to ethtool, from Michal Kubecek" * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2121 commits) net: ipv6: rpl_iptunnel: Fix potential memory leak in rpl_do_srh_inline cxgb4/chcr: nic-tls stats in ethtool net: dsa: fix oops while probing Marvell DSA switches net/bpfilter: remove superfluous testing message net: macb: Fix handling of fixed-link node net: dsa: ksz: Select KSZ protocol tag netdevsim: dev: Fix memory leak in nsim_dev_take_snapshot_write net: stmmac: add EHL 2.5Gbps PCI info and PCI ID net: stmmac: add EHL PSE0 & PSE1 1Gbps PCI info and PCI ID net: stmmac: create dwmac-intel.c to contain all Intel platform net: dsa: bcm_sf2: Support specifying VLAN tag egress rule net: dsa: bcm_sf2: Add support for matching VLAN TCI net: dsa: bcm_sf2: Move writing of CFP_DATA(5) into slicing functions net: dsa: bcm_sf2: Check earlier for FLOW_EXT and FLOW_MAC_EXT net: dsa: bcm_sf2: Disable learning for ASP port net: dsa: b53: Deny enslaving port 7 for 7278 into a bridge net: dsa: b53: Prevent tagged VLAN on port 7 for 7278 net: dsa: b53: Restore VLAN entries upon (re)configuration net: dsa: bcm_sf2: Fix overflow checks hv_netvsc: Remove unnecessary round_up for recv_completion_cnt ...
2020-03-31net: ipv6: rpl_iptunnel: Fix potential memory leak in rpl_do_srh_inlineGustavo A. R. Silva
In case memory resources for buf were allocated, release them before return. Addresses-Coverity-ID: 1492011 ("Resource leak") Fixes: a7a29f9c361f ("net: ipv6: add rpl sr tunnel") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-31net: dsa: fix oops while probing Marvell DSA switchesRussell King
Fix an oops in dsa_port_phylink_mac_change() caused by a combination of a20f997010c4 ("net: dsa: Don't instantiate phylink for CPU/DSA ports unless needed") and the net-dsa-improve-serdes-integration series of patches 65b7a2c8e369 ("Merge branch 'net-dsa-improve-serdes-integration'"). Unable to handle kernel NULL pointer dereference at virtual address 00000124 pgd = c0004000 [00000124] *pgd=00000000 Internal error: Oops: 805 [#1] SMP ARM Modules linked in: tag_edsa spi_nor mtd xhci_plat_hcd mv88e6xxx(+) xhci_hcd armada_thermal marvell_cesa dsa_core ehci_orion libdes phy_armada38x_comphy at24 mcp3021 sfp evbug spi_orion sff mdio_i2c CPU: 1 PID: 214 Comm: irq/55-mv88e6xx Not tainted 5.6.0+ #470 Hardware name: Marvell Armada 380/385 (Device Tree) PC is at phylink_mac_change+0x10/0x88 LR is at mv88e6352_serdes_irq_status+0x74/0x94 [mv88e6xxx] Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-31net/bpfilter: remove superfluous testing messageBruno Meneguele
A testing message was brought by 13d0f7b814d9 ("net/bpfilter: fix dprintf usage for /dev/kmsg") but should've been deleted before patch submission. Although it doesn't cause any harm to the code or functionality itself, it's totally unpleasant to have it displayed on every loop iteration with no real use case. Thus remove it unconditionally. Fixes: 13d0f7b814d9 ("net/bpfilter: fix dprintf usage for /dev/kmsg") Signed-off-by: Bruno Meneguele <bmeneg@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
2020-03-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for net-next: 1) Add support to specify a stateful expression in set definitions, this allows users to specify e.g. counters per set elements. 2) Flowtable software counter support. 3) Flowtable hardware offload counter support, from wenxu. 3) Parallelize flowtable hardware offload requests, from Paul Blakey. This includes a patch to add one work entry per offload command. 4) Several patches to rework nf_queue refcount handling, from Florian Westphal. 4) A few fixes for the flowtable tunnel offload: Fix crash if tunneling information is missing and set up indirect flow block as TC_SETUP_FT, patch from wenxu. 5) Stricter netlink attribute sanity check on filters, from Romain Bellan and Florent Fourcot. 5) Annotations to make sparse happy, from Jules Irenge. 6) Improve icmp errors in debugging information, from Haishuang Yan. 7) Fix warning in IPVS icmp error debugging, from Haishuang Yan. 8) Fix endianess issue in tcp extension header, from Sergey Marinkevich. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30devlink: Allow setting of packet trap group parametersIdo Schimmel
The previous patch allowed device drivers to publish their default binding between packet trap policers and packet trap groups. However, some users might not be content with this binding and would like to change it. In case user space passed a packet trap policer identifier when setting a packet trap group, invoke the appropriate device driver callback and pass the new policer identifier. v2: * Check for presence of 'DEVLINK_ATTR_TRAP_POLICER_ID' in devlink_trap_group_set() and bail if not present * Add extack error message in case trap group was partially modified Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30devlink: Add packet trap group parameters supportIdo Schimmel
Packet trap groups are used to aggregate logically related packet traps. Currently, these groups allow user space to batch operations such as setting the trap action of all member traps. In order to prevent the CPU from being overwhelmed by too many trapped packets, it is desirable to bind a packet trap policer to these groups. For example, to limit all the packets that encountered an exception during routing to 10Kpps. Allow device drivers to bind default packet trap policers to packet trap groups when the latter are registered with devlink. The next patch will enable user space to change this default binding. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30devlink: Add packet trap policers supportIdo Schimmel
Devices capable of offloading the kernel's datapath and perform functions such as bridging and routing must also be able to send (trap) specific packets to the kernel (i.e., the CPU) for processing. For example, a device acting as a multicast-aware bridge must be able to trap IGMP membership reports to the kernel for processing by the bridge module. In most cases, the underlying device is capable of handling packet rates that are several orders of magnitude higher compared to those that can be handled by the CPU. Therefore, in order to prevent the underlying device from overwhelming the CPU, devices usually include packet trap policers that are able to police the trapped packets to rates that can be handled by the CPU. This patch allows capable device drivers to register their supported packet trap policers with devlink. User space can then tune the parameters of these policer (currently, rate and burst size) and read from the device the number of packets that were dropped by the policer, if supported. Subsequent patches in the series will allow device drivers to create default binding between these policers and packet trap groups and allow user space to change the binding. v2: * Add 'strict_start_type' in devlink policy * Have device drivers provide max/min rate/burst size for each policer. Use them to check validity of user provided parameters Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30bpf: Don't refcount LISTEN sockets in sk_assign()Joe Stringer
Avoid taking a reference on listen sockets by checking the socket type in the sk_assign and in the corresponding skb_steal_sock() code in the the transport layer, and by ensuring that the prefetch free (sock_pfree) function uses the same logic to check whether the socket is refcounted. Suggested-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Joe Stringer <joe@wand.net.nz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200329225342.16317-4-joe@wand.net.nz
2020-03-30net: Track socket refcounts in skb_steal_sock()Joe Stringer
Refactor the UDP/TCP handlers slightly to allow skb_steal_sock() to make the determination of whether the socket is reference counted in the case where it is prefetched by earlier logic such as early_demux. Signed-off-by: Joe Stringer <joe@wand.net.nz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200329225342.16317-3-joe@wand.net.nz
2020-03-30bpf: Add socket assign supportJoe Stringer
Add support for TPROXY via a new bpf helper, bpf_sk_assign(). This helper requires the BPF program to discover the socket via a call to bpf_sk*_lookup_*(), then pass this socket to the new helper. The helper takes its own reference to the socket in addition to any existing reference that may or may not currently be obtained for the duration of BPF processing. For the destination socket to receive the traffic, the traffic must be routed towards that socket via local route. The simplest example route is below, but in practice you may want to route traffic more narrowly (eg by CIDR): $ ip route add local default dev lo This patch avoids trying to introduce an extra bit into the skb->sk, as that would require more invasive changes to all code interacting with the socket to ensure that the bit is handled correctly, such as all error-handling cases along the path from the helper in BPF through to the orphan path in the input. Instead, we opt to use the destructor variable to switch on the prefetch of the socket. Signed-off-by: Joe Stringer <joe@wand.net.nz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200329225342.16317-2-joe@wand.net.nz
2020-03-30Merge tag 'docs-5.7' of git://git.lwn.net/linuxLinus Torvalds
Pull documentation updates from Jonathan Corbet: "This has been a busy cycle for documentation work. Highlights include: - Lots of RST conversion work by Mauro, Daniel ALmeida, and others. Maybe someday we'll get to the end of this stuff...maybe... - Some organizational work to bring some order to the core-api manual. - Various new docs and additions to the existing documentation. - Typo fixes, warning fixes, ..." * tag 'docs-5.7' of git://git.lwn.net/linux: (123 commits) Documentation: x86: exception-tables: document CONFIG_BUILDTIME_TABLE_SORT MAINTAINERS: adjust to filesystem doc ReST conversion docs: deprecated.rst: Add BUG()-family doc: zh_CN: add translation for virtiofs doc: zh_CN: index files in filesystems subdirectory docs: locking: Drop :c:func: throughout docs: locking: Add 'need' to hardirq section docs: conf.py: avoid thousands of duplicate label warning on Sphinx docs: prevent warnings due to autosectionlabel docs: fix reference to core-api/namespaces.rst docs: fix pointers to io-mapping.rst and io_ordering.rst files Documentation: Better document the softlockup_panic sysctl docs: hw-vuln: tsx_async_abort.rst: get rid of an unused ref docs: perf: imx-ddr.rst: get rid of a warning docs: filesystems: fuse.rst: supress a Sphinx warning docs: translations: it: avoid duplicate refs at programming-language.rst docs: driver.rst: supress two ReSt warnings docs: trace: events.rst: convert some new stuff to ReST format Documentation: Add io_ordering.rst to driver-api manual Documentation: Add io-mapping.rst to driver-api manual ...
2020-03-30Merge tag 'for-5.7/io_uring-2020-03-29' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring updates from Jens Axboe: "Here are the io_uring changes for this merge window. Light on new features this time around (just splice + buffer selection), lots of cleanups, fixes, and improvements to existing support. In particular, this contains: - Cleanup fixed file update handling for stack fallback (Hillf) - Re-work of how pollable async IO is handled, we no longer require thread offload to handle that. Instead we rely using poll to drive this, with task_work execution. - In conjunction with the above, allow expendable buffer selection, so that poll+recv (for example) no longer has to be a split operation. - Make sure we honor RLIMIT_FSIZE for buffered writes - Add support for splice (Pavel) - Linked work inheritance fixes and optimizations (Pavel) - Async work fixes and cleanups (Pavel) - Improve io-wq locking (Pavel) - Hashed link write improvements (Pavel) - SETUP_IOPOLL|SETUP_SQPOLL improvements (Xiaoguang)" * tag 'for-5.7/io_uring-2020-03-29' of git://git.kernel.dk/linux-block: (54 commits) io_uring: cleanup io_alloc_async_ctx() io_uring: fix missing 'return' in comment io-wq: handle hashed writes in chains io-uring: drop 'free_pfile' in struct io_file_put io-uring: drop completion when removing file io_uring: Fix ->data corruption on re-enqueue io-wq: close cancel gap for hashed linked work io_uring: make spdxcheck.py happy io_uring: honor original task RLIMIT_FSIZE io-wq: hash dependent work io-wq: split hashing and enqueueing io-wq: don't resched if there is no work io-wq: remove duplicated cancel code io_uring: fix truncated async read/readv and write/writev retry io_uring: dual license io_uring.h uapi header io_uring: io_uring_enter(2) don't poll while SETUP_IOPOLL|SETUP_SQPOLL enabled io_uring: Fix unused function warnings io_uring: add end-of-bits marker and build time verify it io_uring: provide means of removing buffers io_uring: add IOSQE_BUFFER_SELECT support for IORING_OP_RECVMSG ...
2020-03-30ipvs: fix uninitialized variable warningHaishuang Yan
If outer_proto is not set, GCC warning as following: In file included from net/netfilter/ipvs/ip_vs_core.c:52: net/netfilter/ipvs/ip_vs_core.c: In function 'ip_vs_in_icmp': include/net/ip_vs.h:233:4: warning: 'outer_proto' may be used uninitialized in this function [-Wmaybe-uninitialized] 233 | printk(KERN_DEBUG pr_fmt(msg), ##__VA_ARGS__); \ | ^~~~~~ net/netfilter/ipvs/ip_vs_core.c:1666:8: note: 'outer_proto' was declared here 1666 | char *outer_proto; | ^~~~~~~~~~~ Fixes: 73348fed35d0 ("ipvs: optimize tunnel dumps for icmp errors") Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-30netfilter: nft_exthdr: fix endianness of tcp option castSergey Marinkevich
I got a problem on MIPS with Big-Endian is turned on: every time when NF trying to change TCP MSS it returns because of new.v16 was greater than old.v16. But real MSS was 1460 and my rule was like this: add rule table chain tcp option maxseg size set 1400 And 1400 is lesser that 1460, not greater. Later I founded that main causer is cast from u32 to __be16. Debugging: In example MSS = 1400(HEX: 0x578). Here is representation of each byte like it is in memory by addresses from left to right(e.g. [0x0 0x1 0x2 0x3]). LE — Little-Endian system, BE — Big-Endian, left column is type. LE BE u32: [78 05 00 00] [00 00 05 78] As you can see, u32 representation will be casted to u16 from different half of 4-byte address range. But actually nf_tables uses registers and store data of various size. Actually TCP MSS stored in 2 bytes. But registers are still u32 in definition: struct nft_regs { union { u32 data[20]; struct nft_verdict verdict; }; }; So, access like regs->data[priv->sreg] exactly u32. So, according to table presents above, per-byte representation of stored TCP MSS in register will be: LE BE (u32)regs->data[]: [78 05 00 00] [05 78 00 00] ^^ ^^ We see that register uses just half of u32 and other 2 bytes may be used for some another data. But in nft_exthdr_tcp_set_eval() it casted just like u32 -> __be16: new.v16 = src But u32 overfill __be16, so it get 2 low bytes. For clarity draw one more table(<xx xx> means that bytes will be used for cast). LE BE u32: [<78 05> 00 00] [00 00 <05 78>] (u32)regs->data[]: [<78 05> 00 00] [05 78 <00 00>] As you can see, for Little-Endian nothing changes, but for Big-endian we take the wrong half. In my case there is some other data instead of zeros, so new MSS was wrongly greater. For shooting this bug I used solution for ports ranges. Applying of this patch does not affect Little-Endian systems. Signed-off-by: Sergey Marinkevich <sergey.marinkevich@eltex-co.ru> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-30Merge branch 'for-upstream' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2020-03-29 Here are a few more Bluetooth patches for the 5.7 kernel: - Fix assumption of encryption key size when reading fails - Add support for DEFER_SETUP with L2CAP Enhanced Credit Based Mode - Fix issue with auto-connected devices - Fix suspend handling when entering the state fails ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30net: dsa: add port policersVladimir Oltean
The approach taken to pass the port policer methods on to drivers is pragmatic. It is similar to the port mirroring implementation (in that the DSA core does all of the filter block interaction and only passes simple operations for the driver to implement) and dissimilar to how flow-based policers are going to be implemented (where the driver has full control over the flow_cls_offload data structure). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30net: dsa: refactor matchall mirred action to separate functionVladimir Oltean
Make room for other actions for the matchall filter by keeping the mirred argument parsing self-contained in its own function. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30devlink: Add auto dump flag to health reporterEran Ben Elisha
On low memory system, run time dumps can consume too much memory. Add administrator ability to disable auto dumps per reporter as part of the error flow handle routine. This attribute is not relevant while executing DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET. By default, auto dump is activated for any reporter that has a dump method, as part of the reporter registration to devlink. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30devlink: Implicitly set auto recover flag when registering health reporterEran Ben Elisha
When health reporter is registered to devlink, devlink will implicitly set auto recover if and only if the reporter has a recover method. No reason to explicitly get the auto recover flag from the driver. Remove this flag from all drivers that called devlink_health_reporter_create. All existing health reporters set auto recovery to true if they have a recover method. Yet, administrator can unset auto recover via netlink command as prior to this patch. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30net: devlink: use NL_SET_ERR_MSG_MOD instead of NL_SET_ERR_MSGJiri Pirko
The rest of the devlink code sets the extack message using NL_SET_ERR_MSG_MOD. Change the existing appearances of NL_SET_ERR_MSG to NL_SET_ERR_MSG_MOD. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30net: sched: expose HW stats types per action used by driversJiri Pirko
It may be up to the driver (in case ANY HW stats is passed) to select which type of HW stats he is going to use. Add an infrastructure to expose this information to user. $ tc filter add dev enp3s0np1 ingress proto ip handle 1 pref 1 flower dst_ip 192.168.1.1 action drop $ tc -s filter show dev enp3s0np1 ingress filter protocol ip pref 1 flower chain 0 filter protocol ip pref 1 flower chain 0 handle 0x1 eth_type ipv4 dst_ip 192.168.1.1 in_hw in_hw_count 2 action order 1: gact action drop random type none pass val 0 index 1 ref 1 bind 1 installed 10 sec used 10 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 used_hw_stats immediate <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30net: introduce nla_put_bitfield32() helper and use itJiri Pirko
Introduce a helper to pass value and selector to. The helper packs them into struct and puts them into netlink message. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2020-03-28 1) Use kmem_cache_zalloc() instead of kmem_cache_alloc() in xfrm_state_alloc(). From Huang Zijiang. 2) esp_output_fill_trailer() is the same in IPv4 and IPv6, so share this function to avoide code duplcation. From Raed Salem. 3) Add offload support for esp beet mode. From Xin Long. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30net: dsa: Simplify 'dsa_tag_protocol_to_str()'Christophe JAILLET
There is no point in preparing the module name in a buffer. The format string can be passed diectly to 'request_module()'. This axes a few lines of code and cleans a few things: - max len for a driver name is MODULE_NAME_LEN wich is ~ 60 chars, not 128. It would be down-sized in 'request_module()' - we should pass the total size of the buffer to 'snprintf()', not the size minus 1 Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30net: fix fraglist segmentation reference count leakFlorian Westphal
Xin Long says: On udp rx path udp_rcv_segment() may do segment where the frag skbs will get the header copied from the head skb in skb_segment_list() by calling __copy_skb_header(), which could overwrite the frag skbs' extensions by __skb_ext_copy() and cause a leak. This issue was found after loading esp_offload where a sec path ext is set in the skb. Fix this by discarding head state of the fraglist skb before replacing its contents. Fixes: 3a1296a38d0cf62 ("net: Support GRO/GSO fraglist chaining.") Cc: Steffen Klassert <steffen.klassert@secunet.com> Reported-by: Xiumei Mu <xmu@redhat.com> Tested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30udp: initialize is_flist with 0 in udp_gro_receiveXin Long
Without NAPI_GRO_CB(skb)->is_flist initialized, when the dev doesn't support NETIF_F_GRO_FRAGLIST, is_flist can still be set and fraglist will be used in udp_gro_receive(). So fix it by initializing is_flist with 0 in udp_gro_receive. Fixes: 9fd1ff5d2ac7 ("udp: Support UDP fraglist GRO/GSO.") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29ethtool: provide timestamping information with TSINFO_GET requestMichal Kubecek
Implement TSINFO_GET request to get timestamping information for a network device. This is traditionally available via ETHTOOL_GET_TS_INFO ioctl request. Move part of ethtool_get_ts_info() into common.c so that ioctl and netlink code use the same logic to get timestamping information from the device. v3: use "TSINFO" rather than "TIMESTAMP", suggested by Richard Cochran Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29ethtool: add timestamping related string setsMichal Kubecek
Add three string sets related to timestamping information: ETH_SS_SOF_TIMESTAMPING: SOF_TIMESTAMPING_* flags ETH_SS_TS_TX_TYPES: timestamping Tx types ETH_SS_TS_RX_FILTERS: timestamping Rx filters These will be used for TIMESTAMP_GET request. v2: avoid compiler warning ("enumeration value not handled in switch") in net_hwtstamp_validate() v3: omit dash in Tx type names ("one-step-*" -> "onestep-*"), suggested by Richard Cochran Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29ethtool: add EEE_NTF notificationMichal Kubecek
Send ETHTOOL_MSG_EEE_NTF notification whenever EEE settings of a network device are modified using ETHTOOL_MSG_EEE_SET netlink message or ETHTOOL_SEEE ioctl request. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29ethtool: set EEE settings with EEE_SET requestMichal Kubecek
Implement EEE_SET netlink request to set EEE settings of a network device. These are traditionally set with ETHTOOL_SEEE ioctl request. The netlink interface allows setting the EEE status for all link modes supported by kernel but only first 32 link modes can be set at the moment as only those are supported by the ethtool_ops callback. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29ethtool: provide EEE settings with EEE_GET requestMichal Kubecek
Implement EEE_GET request to get EEE settings of a network device. These are traditionally available via ETHTOOL_GEEE ioctl request. The netlink interface allows reporting EEE status for all link modes supported by kernel but only first 32 link modes are provided at the moment as only those are reported by the ethtool_ops callback and drivers. v2: fix alignment (whitespace only) Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29ethtool: add PAUSE_NTF notificationMichal Kubecek
Send ETHTOOL_MSG_PAUSE_NTF notification whenever pause parameters of a network device are modified using ETHTOOL_MSG_PAUSE_SET netlink message or ETHTOOL_SPAUSEPARAM ioctl request. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29ethtool: set pause parameters with PAUSE_SET requestMichal Kubecek
Implement PAUSE_SET netlink request to set pause parameters of a network device. Thease are traditionally set with ETHTOOL_SPAUSEPARAM ioctl request. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29ethtool: provide pause parameters with PAUSE_GET requestMichal Kubecek
Implement PAUSE_GET request to get pause parameters of a network device. These are traditionally available via ETHTOOL_GPAUSEPARAM ioctl request. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29ethtool: add COALESCE_NTF notificationMichal Kubecek
Send ETHTOOL_MSG_COALESCE_NTF notification whenever coalescing parameters of a network device are modified using ETHTOOL_MSG_COALESCE_SET netlink message or ETHTOOL_SCOALESCE ioctl request. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29ethtool: set coalescing parameters with COALESCE_SET requestMichal Kubecek
Implement COALESCE_SET netlink request to set coalescing parameters of a network device. These are traditionally set with ETHTOOL_SCOALESCE ioctl request. This commit adds only support for device coalescing parameters, not per queue coalescing parameters. Like the ioctl implementation, the generic ethtool code checks if only supported parameters are modified; if not, first offending attribute is reported using extack. v2: fix alignment (whitespace only) Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29ethtool: provide coalescing parameters with COALESCE_GET requestMichal Kubecek
Implement COALESCE_GET request to get coalescing parameters of a network device. These are traditionally available via ETHTOOL_GCOALESCE ioctl request. This commit adds only support for device coalescing parameters, not per queue coalescing parameters. Omit attributes with zero values unless they are declared as supported (i.e. the corresponding bit in ethtool_ops::supported_coalesce_params is set). Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>