summaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)Author
2020-02-18net: netlabel: Use built-in RCU list checkingMadhuparna Bhowmik
list_for_each_entry_rcu() has built-in RCU and lock checking. Pass cond argument to list_for_each_entry_rcu() to silence false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled by default. Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-18Revert "net: dev: introduce support for sch BYPASS for lockless qdisc"Paolo Abeni
This reverts commit ba27b4cdaaa66561aaedb2101876e563738d36fe Ahmed reported ouf-of-order issues bisected to commit ba27b4cdaaa6 ("net: dev: introduce support for sch BYPASS for lockless qdisc"). I can't find any working solution other than a plain revert. This will introduce some minor performance regressions for pfifo_fast qdisc. I plan to address them in net-next with more indirect call wrapper boilerplate for qdiscs. Reported-by: Ahmad Fatoum <a.fatoum@pengutronix.de> Fixes: ba27b4cdaaa6 ("net: dev: introduce support for sch BYPASS for lockless qdisc") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-18mptcp: fix bogus socket flag valuesFlorian Westphal
Dan Carpenter reports static checker warnings due to bogus BIT() usage: net/mptcp/subflow.c:571 subflow_write_space() warn: test_bit() takes a bit number net/mptcp/subflow.c:694 subflow_state_change() warn: test_bit() takes a bit number net/mptcp/protocol.c:261 ssk_check_wmem() warn: test_bit() takes a bit number [..] This is harmless (we use bits 1 & 2 instead of 0 and 1), but would break eventually when adding BIT(5) (or 6, depends on size of 'long'). Just use 0 and 1, the values are only passed to test/set/clear_bit functions. Fixes: 648ef4b88673 ("mptcp: Implement MPTCP receive path") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17sctp: move the format error check out of __sctp_sf_do_9_1_abortXin Long
When T2 timer is to be stopped, the asoc should also be deleted, otherwise, there will be no chance to call sctp_association_free and the asoc could last in memory forever. However, in sctp_sf_shutdown_sent_abort(), after adding the cmd SCTP_CMD_TIMER_STOP for T2 timer, it may return error due to the format error from __sctp_sf_do_9_1_abort() and miss adding SCTP_CMD_ASSOC_FAILED where the asoc will be deleted. This patch is to fix it by moving the format error check out of __sctp_sf_do_9_1_abort(), and do it before adding the cmd SCTP_CMD_TIMER_STOP for T2 timer. Thanks Hangbin for reporting this issue by the fuzz testing. v1->v2: - improve the comment in the code as Marcelo's suggestion. Fixes: 96ca468b86b0 ("sctp: check invalid value of length parameter in error cause") Reported-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17net: sched: correct flower port blockingJason Baron
tc flower rules that are based on src or dst port blocking are sometimes ineffective due to uninitialized stack data. __skb_flow_dissect() extracts ports from the skb for tc flower to match against. However, the port dissection is not done when when the FLOW_DIS_IS_FRAGMENT bit is set in key_control->flags. All callers of __skb_flow_dissect(), zero-out the key_control field except for fl_classify() as used by the flower classifier. Thus, the FLOW_DIS_IS_FRAGMENT may be set on entry to __skb_flow_dissect(), since key_control is allocated on the stack and may not be initialized. Since key_basic and key_control are present for all flow keys, let's make sure they are initialized. Fixes: 62230715fd24 ("flow_dissector: do not dissect l4 ports for fragments") Co-developed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jason Baron <jbaron@akamai.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17net: netlink: Replace zero-length array with flexible-array memberGustavo A. R. Silva
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17net: switchdev: Replace zero-length array with flexible-array memberGustavo A. R. Silva
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17bpf, sockmap: Replace zero-length array with flexible-array memberGustavo A. R. Silva
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17NFC: digital: Replace zero-length array with flexible-array memberGustavo A. R. Silva
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17net/smc: reduce port_event schedulingUrsula Braun
IB event handlers schedule the port event worker for further processing of port state changes. This patch reduces the number of schedules to avoid duplicate processing of the same port change. Reviewed-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17net/smc: simplify normal link terminationKarsten Graul
smc_lgr_terminate() and smc_lgr_terminate_sched() both result in soft link termination, smc_lgr_terminate_sched() is scheduling a worker for this task. Take out complexity by always using the termination worker and getting rid of smc_lgr_terminate() completely. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17net/smc: remove unused parameter of smc_lgr_terminate()Karsten Graul
The soft parameter of smc_lgr_terminate() is not used and obsolete. Remove it. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17net/smc: do not delete lgr from list twiceKarsten Graul
When 2 callers call smc_lgr_terminate() at the same time for the same lgr, one gets the lgr_lock and deletes the lgr from the list and releases the lock. Then the second caller gets the lock and tries to delete it again. In smc_lgr_terminate() add a check if the link group lgr is already deleted from the link group list and prevent to try to delete it a second time. And add a check if the lgr is marked as freeing, which means that a termination is already pending. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17net/smc: use termination worker under send_lockKarsten Graul
smc_tx_rdma_write() is called under the send_lock and should not call smc_lgr_terminate() directly. Call smc_lgr_terminate_sched() instead which schedules a worker. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17net/smc: improve smc_lgr_cleanup()Karsten Graul
smc_lgr_cleanup() is called during termination processing, there is no need to send a DELETE_LINK at that time. A DELETE_LINK should have been sent before the termination is initiated, if needed. And remove the extra call to wake_up(&lnk->wr_reg_wait) because smc_llc_link_inactive() already calls the related helper function smc_wr_wakeup_reg_wait(). Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17net: bridge: teach ndo_dflt_bridge_getlink() more brport flagsJulian Wiedmann
This enables ndo_dflt_bridge_getlink() to report a bridge port's offload settings for multicast and broadcast flooding. CC: Roopa Prabhu <roopa@cumulusnetworks.com> CC: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17net: vlan: suppress "failed to kill vid" warningsJulian Wiedmann
When a real dev unregisters, vlan_device_event() also unregisters all of its vlan interfaces. For each VID this ends up in __vlan_vid_del(), which attempts to remove the VID from the real dev's VLAN filter. But the unregistering real dev might no longer be able to issue the required IOs, and return an error. Subsequently we raise a noisy warning msg that is not appropriate for this situation: the real dev is being torn down anyway, there shouldn't be any worry about cleanly releasing all of its HW-internal resources. So to avoid scaring innocent users, suppress this warning when the failed deletion happens on an unregistering device. While at it also convert the raw pr_warn() to a more fitting netdev_warn(). Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17net: sched: don't take rtnl lock during flow_action setupVlad Buslov
Refactor tc_setup_flow_action() function not to use rtnl lock and remove 'rtnl_held' argument that is no longer needed. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17net: sched: lock action when translating it to flow_action infraVlad Buslov
In order to remove dependency on rtnl lock, take action's tcfa_lock when constructing its representation as flow_action_entry structure. Refactor tcf_sample_get_group() to assume that caller holds tcf_lock and don't take it manually. This callback is only called from flow_action infra representation translator which now calls it with tcf_lock held, so this refactoring is necessary to prevent deadlock. Allocate memory with GFP_ATOMIC flag for ip_tunnel_info copy because tcf_tunnel_info_copy() is only called from flow_action representation infra code with tcf_lock spinlock taken. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-17netfilter: conntrack: allow insertion of clashing entriesFlorian Westphal
This patch further relaxes the need to drop an skb due to a clash with an existing conntrack entry. Current clash resolution handles the case where the clash occurs between two identical entries (distinct nf_conn objects with same tuples), i.e.: Original Reply existing: 10.2.3.4:42 -> 10.8.8.8:53 10.2.3.4:42 <- 10.0.0.6:5353 clashing: 10.2.3.4:42 -> 10.8.8.8:53 10.2.3.4:42 <- 10.0.0.6:5353 ... existing handling will discard the unconfirmed clashing entry and makes skb->_nfct point to the existing one. The skb can then be processed normally just as if the clash would not have existed in the first place. For other clashes, the skb needs to be dropped. This frequently happens with DNS resolvers that send A and AAAA queries back-to-back when NAT rules are present that cause packets to get different DNAT transformations applied, for example: -m statistics --mode random ... -j DNAT --dnat-to 10.0.0.6:5353 -m statistics --mode random ... -j DNAT --dnat-to 10.0.0.7:5353 In this case the A or AAAA query is dropped which incurs a costly delay during name resolution. This patch also allows this collision type: Original Reply existing: 10.2.3.4:42 -> 10.8.8.8:53 10.2.3.4:42 <- 10.0.0.6:5353 clashing: 10.2.3.4:42 -> 10.8.8.8:53 10.2.3.4:42 <- 10.0.0.7:5353 In this case, clash is in original direction -- the reply direction is still unique. The change makes it so that when the 2nd colliding packet is received, the clashing conntrack is tagged with new IPS_NAT_CLASH_BIT, gets a fixed 1 second timeout and is inserted in the reply direction only. The entry is hidden from 'conntrack -L', it will time out quickly and it can be early dropped because it will never progress to the ASSURED state. To avoid special-casing the delete code path to special case the ORIGINAL hlist_nulls node, a new helper, "hlist_nulls_add_fake", is added so hlist_nulls_del() will work. Example: CPU A: CPU B: 1. 10.2.3.4:42 -> 10.8.8.8:53 (A) 2. 10.2.3.4:42 -> 10.8.8.8:53 (AAAA) 3. Apply DNAT, reply changed to 10.0.0.6 4. 10.2.3.4:42 -> 10.8.8.8:53 (AAAA) 5. Apply DNAT, reply changed to 10.0.0.7 6. confirm/commit to conntrack table, no collisions 7. commit clashing entry Reply comes in: 10.2.3.4:42 <- 10.0.0.6:5353 (A) -> Finds a conntrack, DNAT is reversed & packet forwarded to 10.2.3.4:42 10.2.3.4:42 <- 10.0.0.7:5353 (AAAA) -> Finds a conntrack, DNAT is reversed & packet forwarded to 10.2.3.4:42 The conntrack entry is deleted from table, as it has the NAT_CLASH bit set. In case of a retransmit from ORIGINAL dir, all further packets will get the DNAT transformation to 10.0.0.6. I tried to come up with other solutions but they all have worse problems. Alternatives considered were: 1. Confirm ct entries at allocation time, not in postrouting. a. will cause uneccesarry work when the skb that creates the conntrack is dropped by ruleset. b. in case nat is applied, ct entry would need to be moved in the table, which requires another spinlock pair to be taken. c. breaks the 'unconfirmed entry is private to cpu' assumption: we would need to guard all nfct->ext allocation requests with ct->lock spinlock. 2. Make the unconfirmed list a hash table instead of a pcpu list. Shares drawback c) of the first alternative. 3. Document this is expected and force users to rearrange their ruleset (e.g. by using "-m cluster" instead of "-m statistics"). nft has the 'jhash' expression which can be used instead of 'numgen'. Major drawback: doesn't fix what I consider a bug, not very realistic and I believe its reasonable to have the existing rulesets to 'just work'. 4. Document this is expected and force users to steer problematic packets to the same CPU -- this would serialize the "allocate new conntrack entry/nat table evaluation/perform nat/confirm entry", so no race can occur. Similar drawback to 3. Another advantage of this patch compared to 1) and 2) is that there are no changes to the hot path; things are handled in the udp tracker and the clash resolution path. Cc: rcu@vger.kernel.org Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-02-16skbuff: remove stale bit mask commentsRandy Dunlap
Remove stale comments since this flag is no longer a bit mask but is a bit field. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-16mptcp: select CRYPTOMatthieu Baerts
Without this modification and if CRYPTO is not selected, we have this warning: WARNING: unmet direct dependencies detected for CRYPTO_LIB_SHA256 Depends on [n]: CRYPTO [=n] Selected by [y]: - MPTCP [=y] && NET [=y] && INET [=y] MPTCP selects CRYPTO_LIB_SHA256 which seems to depend on CRYPTO. CRYPTO is now selected to avoid this issue. Even though the config system prints that warning, it looks like sha256.c is compiled and linked even without CONFIG_CRYPTO. Since MPTCP will end up needing CONFIG_CRYPTO anyway in future commits -- currently in preparation for net-next -- we propose to add it now to fix the warning. The dependency in the config system comes from the fact that CRYPTO_LIB_SHA256 is defined in "lib/crypto/Kconfig" which is sourced from "crypto/Kconfig" only if CRYPTO is selected. Fixes: 65492c5a6ab5 (mptcp: move from sha1 (v0) to sha256 (v1)) Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-16openvswitch: add TTL decrement actionMatteo Croce
New action to decrement TTL instead of setting it to a fixed value. This action will decrement the TTL and, in case of expired TTL, drop it or execute an action passed via a nested attribute. The default TTL expired action is to drop the packet. Supports both IPv4 and IPv6 via the ttl and hop_limit fields, respectively. Tested with a corresponding change in the userspace: # ovs-dpctl dump-flows in_port(2),eth(),eth_type(0x0800), packets:0, bytes:0, used:never, actions:dec_ttl{ttl<=1 action:(drop)},1 in_port(1),eth(),eth_type(0x0800), packets:0, bytes:0, used:never, actions:dec_ttl{ttl<=1 action:(drop)},2 in_port(1),eth(),eth_type(0x0806), packets:0, bytes:0, used:never, actions:2 in_port(2),eth(),eth_type(0x0806), packets:0, bytes:0, used:never, actions:1 # ping -c1 192.168.0.2 -t 42 IP (tos 0x0, ttl 41, id 61647, offset 0, flags [DF], proto ICMP (1), length 84) 192.168.0.1 > 192.168.0.2: ICMP echo request, id 386, seq 1, length 64 # ping -c1 192.168.0.2 -t 120 IP (tos 0x0, ttl 119, id 62070, offset 0, flags [DF], proto ICMP (1), length 84) 192.168.0.1 > 192.168.0.2: ICMP echo request, id 388, seq 1, length 64 # ping -c1 192.168.0.2 -t 1 # Co-developed-by: Bindiya Kurle <bindiyakurle@gmail.com> Signed-off-by: Bindiya Kurle <bindiyakurle@gmail.com> Signed-off-by: Matteo Croce <mcroce@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-16net: export netdev_next_lower_dev_rcu()Taehee Yoo
netdev_next_lower_dev_rcu() will be used to implement a function, which is to walk all lower interfaces. There are already functions that they walk their lower interface. (netdev_walk_all_lower_dev_rcu, netdev_walk_all_lower_dev()). But, there would be cases that couldn't be covered by given netdev_walk_all_lower_dev_{rcu}() function. So, some modules would want to implement own function, which is to walk all lower interfaces. In the next patch, netdev_next_lower_dev_rcu() will be used. In addition, this patch removes two unused prototypes in netdevice.h. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-16ethtool: fix application of verbose no_mask bitsetMichal Kubecek
A bitset without mask in a _SET request means we want exactly the bits in the bitset to be set. This works correctly for compact format but when verbose format is parsed, ethnl_update_bitset32_verbose() only sets the bits present in the request bitset but does not clear the rest. This can cause incorrect results like lion:~ # ethtool eth0 | grep Wake Supports Wake-on: pumbg Wake-on: g lion:~ # ethtool -s eth0 wol u lion:~ # ethtool eth0 | grep Wake Supports Wake-on: pumbg Wake-on: ug when the second ethtool command issues request ETHTOOL_MSG_WOL_SET ETHTOOL_A_WOL_HEADER ETHTOOL_A_HEADER_DEV_NAME = "eth0" ETHTOOL_A_WOL_MODES ETHTOOL_A_BITSET_NOMASK ETHTOOL_A_BITSET_BITS ETHTOOL_A_BITSET_BITS_BIT ETHTOOL_BITSET_BIT_INDEX = 1 Fix the logic by clearing the whole target bitmap before we start iterating through the request bits. Fixes: 10b518d4e6dd ("ethtool: netlink bitset handling") Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-16tcp-zerocopy: Return sk_err (if set) along with tcp receive zerocopy.Arjun Roy
This patchset is intended to reduce the number of extra system calls imposed by TCP receive zerocopy. For ping-pong RPC style workloads, this patchset has demonstrated a system call reduction of about 30% when coupled with userspace changes. For applications using epoll, returning sk_err along with the result of tcp receive zerocopy could remove the need to call recvmsg()=-EAGAIN after a spurious wakeup. Consider a multi-threaded application using epoll. A thread may awaken with EPOLLIN but another thread may already be reading. The spuriously-awoken thread does not necessarily know that another thread 'won'; rather, it may be possible that it was woken up due to the presence of an error if there is no data. A zerocopy read receiving 0 bytes thus would need to be followed up by recvmsg to be sure. Instead, we return sk_err directly with zerocopy, so the application can avoid this extra system call. Signed-off-by: Arjun Roy <arjunroy@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-16tcp-zerocopy: Return inq along with tcp receive zerocopy.Arjun Roy
This patchset is intended to reduce the number of extra system calls imposed by TCP receive zerocopy. For ping-pong RPC style workloads, this patchset has demonstrated a system call reduction of about 30% when coupled with userspace changes. For applications using edge-triggered epoll, returning inq along with the result of tcp receive zerocopy could remove the need to call recvmsg()=-EAGAIN after a successful zerocopy. Generally speaking, since normally we would need to perform a recvmsg() call for every successful small RPC read via TCP receive zerocopy, returning inq can reduce the number of system calls performed by approximately half. Signed-off-by: Arjun Roy <arjunroy@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-16mptcp: Protect subflow socket options before connection completesMat Martineau
Userspace should not be able to directly manipulate subflow socket options before a connection is established since it is not yet known if it will be an MPTCP subflow or a TCP fallback subflow. TCP fallback subflows can be more directly controlled by userspace because they are regular TCP connections, while MPTCP subflow sockets need to be configured for the specific needs of MPTCP. Use the same logic as sendmsg/recvmsg to ensure that socket option calls are only passed through to known TCP fallback subflows. Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-16net: add strict checks in netdev_name_node_alt_destroy()Eric Dumazet
netdev_name_node_alt_destroy() does a lookup over all device names of a namespace. We need to make sure the name belongs to the device of interest, and that we do not destroy its primary name, since we rely on it being not deleted : dev->name_node would indeed point to freed memory. syzbot report was the following : BUG: KASAN: use-after-free in dev_net include/linux/netdevice.h:2206 [inline] BUG: KASAN: use-after-free in mld_force_mld_version net/ipv6/mcast.c:1172 [inline] BUG: KASAN: use-after-free in mld_in_v2_mode_only net/ipv6/mcast.c:1180 [inline] BUG: KASAN: use-after-free in mld_in_v1_mode+0x203/0x230 net/ipv6/mcast.c:1190 Read of size 8 at addr ffff88809886c588 by task swapper/1/0 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.6.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x197/0x210 lib/dump_stack.c:118 print_address_description.constprop.0.cold+0xd4/0x30b mm/kasan/report.c:374 __kasan_report.cold+0x1b/0x32 mm/kasan/report.c:506 kasan_report+0x12/0x20 mm/kasan/common.c:641 __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:135 dev_net include/linux/netdevice.h:2206 [inline] mld_force_mld_version net/ipv6/mcast.c:1172 [inline] mld_in_v2_mode_only net/ipv6/mcast.c:1180 [inline] mld_in_v1_mode+0x203/0x230 net/ipv6/mcast.c:1190 mld_send_initial_cr net/ipv6/mcast.c:2083 [inline] mld_dad_timer_expire+0x24/0x230 net/ipv6/mcast.c:2118 call_timer_fn+0x1ac/0x780 kernel/time/timer.c:1404 expire_timers kernel/time/timer.c:1449 [inline] __run_timers kernel/time/timer.c:1773 [inline] __run_timers kernel/time/timer.c:1740 [inline] run_timer_softirq+0x6c3/0x1790 kernel/time/timer.c:1786 __do_softirq+0x262/0x98c kernel/softirq.c:292 invoke_softirq kernel/softirq.c:373 [inline] irq_exit+0x19b/0x1e0 kernel/softirq.c:413 exiting_irq arch/x86/include/asm/apic.h:546 [inline] smp_apic_timer_interrupt+0x1a3/0x610 arch/x86/kernel/apic/apic.c:1146 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:829 </IRQ> RIP: 0010:native_safe_halt+0xe/0x10 arch/x86/include/asm/irqflags.h:61 Code: 68 73 c5 f9 eb 8a cc cc cc cc cc cc e9 07 00 00 00 0f 00 2d 94 be 59 00 f4 c3 66 90 e9 07 00 00 00 0f 00 2d 84 be 59 00 fb f4 <c3> cc 55 48 89 e5 41 57 41 56 41 55 41 54 53 e8 de 2a 74 f9 e8 09 RSP: 0018:ffffc90000d3fd68 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13 RAX: 1ffffffff136761a RBX: ffff8880a99fc340 RCX: 0000000000000000 RDX: dffffc0000000000 RSI: 0000000000000006 RDI: ffff8880a99fcbd4 RBP: ffffc90000d3fd98 R08: ffff8880a99fc340 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: dffffc0000000000 R13: ffffffff8aa5a1c0 R14: 0000000000000000 R15: 0000000000000001 arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:686 default_idle_call+0x84/0xb0 kernel/sched/idle.c:94 cpuidle_idle_call kernel/sched/idle.c:154 [inline] do_idle+0x3c8/0x6e0 kernel/sched/idle.c:269 cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:361 start_secondary+0x2f4/0x410 arch/x86/kernel/smpboot.c:264 secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:242 Allocated by task 10229: save_stack+0x23/0x90 mm/kasan/common.c:72 set_track mm/kasan/common.c:80 [inline] __kasan_kmalloc mm/kasan/common.c:515 [inline] __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:488 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:529 __do_kmalloc_node mm/slab.c:3616 [inline] __kmalloc_node+0x4e/0x70 mm/slab.c:3623 kmalloc_node include/linux/slab.h:578 [inline] kvmalloc_node+0x68/0x100 mm/util.c:574 kvmalloc include/linux/mm.h:645 [inline] kvzalloc include/linux/mm.h:653 [inline] alloc_netdev_mqs+0x98/0xe40 net/core/dev.c:9797 rtnl_create_link+0x22d/0xaf0 net/core/rtnetlink.c:3047 __rtnl_newlink+0xf9f/0x1790 net/core/rtnetlink.c:3309 rtnl_newlink+0x69/0xa0 net/core/rtnetlink.c:3377 rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5438 netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477 rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5456 netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] netlink_unicast+0x59e/0x7e0 net/netlink/af_netlink.c:1328 netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xd7/0x130 net/socket.c:672 __sys_sendto+0x262/0x380 net/socket.c:1998 __do_compat_sys_socketcall net/compat.c:771 [inline] __se_compat_sys_socketcall net/compat.c:719 [inline] __ia32_compat_sys_socketcall+0x530/0x710 net/compat.c:719 do_syscall_32_irqs_on arch/x86/entry/common.c:337 [inline] do_fast_syscall_32+0x27b/0xe16 arch/x86/entry/common.c:408 entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139 Freed by task 10229: save_stack+0x23/0x90 mm/kasan/common.c:72 set_track mm/kasan/common.c:80 [inline] kasan_set_free_info mm/kasan/common.c:337 [inline] __kasan_slab_free+0x102/0x150 mm/kasan/common.c:476 kasan_slab_free+0xe/0x10 mm/kasan/common.c:485 __cache_free mm/slab.c:3426 [inline] kfree+0x10a/0x2c0 mm/slab.c:3757 __netdev_name_node_alt_destroy+0x1ff/0x2a0 net/core/dev.c:322 netdev_name_node_alt_destroy+0x57/0x80 net/core/dev.c:334 rtnl_alt_ifname net/core/rtnetlink.c:3518 [inline] rtnl_linkprop.isra.0+0x575/0x6f0 net/core/rtnetlink.c:3567 rtnl_dellinkprop+0x46/0x60 net/core/rtnetlink.c:3588 rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5438 netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477 rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5456 netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] netlink_unicast+0x59e/0x7e0 net/netlink/af_netlink.c:1328 netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xd7/0x130 net/socket.c:672 ____sys_sendmsg+0x753/0x880 net/socket.c:2343 ___sys_sendmsg+0x100/0x170 net/socket.c:2397 __sys_sendmsg+0x105/0x1d0 net/socket.c:2430 __compat_sys_sendmsg net/compat.c:642 [inline] __do_compat_sys_sendmsg net/compat.c:649 [inline] __se_compat_sys_sendmsg net/compat.c:646 [inline] __ia32_compat_sys_sendmsg+0x7a/0xb0 net/compat.c:646 do_syscall_32_irqs_on arch/x86/entry/common.c:337 [inline] do_fast_syscall_32+0x27b/0xe16 arch/x86/entry/common.c:408 entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139 The buggy address belongs to the object at ffff88809886c000 which belongs to the cache kmalloc-4k of size 4096 The buggy address is located 1416 bytes inside of 4096-byte region [ffff88809886c000, ffff88809886d000) The buggy address belongs to the page: page:ffffea0002621b00 refcount:1 mapcount:0 mapping:ffff8880aa402000 index:0x0 compound_mapcount: 0 flags: 0xfffe0000010200(slab|head) raw: 00fffe0000010200 ffffea0002610d08 ffffea0002607608 ffff8880aa402000 raw: 0000000000000000 ffff88809886c000 0000000100000001 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88809886c480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88809886c500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff88809886c580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff88809886c600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88809886c680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fixes: 36fbf1e52bd3 ("net: rtnetlink: add linkprop commands to add and delete alternative ifnames") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Jiri Pirko <jiri@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-16net: virtio_vsock: Enhance connection semanticsSebastien Boeuf
Whenever the vsock backend on the host sends a packet through the RX queue, it expects an answer on the TX queue. Unfortunately, there is one case where the host side will hang waiting for the answer and might effectively never recover if no timeout mechanism was implemented. This issue happens when the guest side starts binding to the socket, which insert a new bound socket into the list of already bound sockets. At this time, we expect the guest to also start listening, which will trigger the sk_state to move from TCP_CLOSE to TCP_LISTEN. The problem occurs if the host side queued a RX packet and triggered an interrupt right between the end of the binding process and the beginning of the listening process. In this specific case, the function processing the packet virtio_transport_recv_pkt() will find a bound socket, which means it will hit the switch statement checking for the sk_state, but the state won't be changed into TCP_LISTEN yet, which leads the code to pick the default statement. This default statement will only free the buffer, while it should also respond to the host side, by sending a packet on its TX queue. In order to simply fix this unfortunate chain of events, it is important that in case the default statement is entered, and because at this stage we know the host side is waiting for an answer, we must send back a packet containing the operation VIRTIO_VSOCK_OP_RST. One could say that a proper timeout mechanism on the host side will be enough to avoid the backend to hang. But the point of this patch is to ensure the normal use case will be provided with proper responsiveness when it comes to establishing the connection. Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-16Merge tag 'mac80211-next-for-net-next-2020-02-14' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== A few big new things: * 802.11 frame encapsulation offload support * more HE (802.11ax) support, including some for 6 GHz band * powersave in hwsim, for better testing Of course as usual there are various cleanups and small fixes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-16net: x25: convert to list_for_each_entry_safe()chenqiwu
Use list_for_each_entry_safe() instead of list_for_each_safe() to simplify the code. Signed-off-by: chenqiwu <chenqiwu@xiaomi.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-16net: rtnetlink: fix bugs in rtnl_alt_ifname()Eric Dumazet
Since IFLA_ALT_IFNAME is an NLA_STRING, we have no guarantee it is nul terminated. We should use nla_strdup() instead of kstrdup(), since this helper will make sure not accessing out-of-bounds data. BUG: KMSAN: uninit-value in strlen+0x5e/0xa0 lib/string.c:535 CPU: 1 PID: 19157 Comm: syz-executor.5 Not tainted 5.5.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x220 lib/dump_stack.c:118 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118 __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215 strlen+0x5e/0xa0 lib/string.c:535 kstrdup+0x7f/0x1a0 mm/util.c:59 rtnl_alt_ifname net/core/rtnetlink.c:3495 [inline] rtnl_linkprop+0x85d/0xc00 net/core/rtnetlink.c:3553 rtnl_newlinkprop+0x9d/0xb0 net/core/rtnetlink.c:3568 rtnetlink_rcv_msg+0x1153/0x1570 net/core/rtnetlink.c:5424 netlink_rcv_skb+0x451/0x650 net/netlink/af_netlink.c:2477 rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:5442 netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] netlink_unicast+0xf9e/0x1100 net/netlink/af_netlink.c:1328 netlink_sendmsg+0x1248/0x14d0 net/netlink/af_netlink.c:1917 sock_sendmsg_nosec net/socket.c:639 [inline] sock_sendmsg net/socket.c:659 [inline] ____sys_sendmsg+0x12b6/0x1350 net/socket.c:2330 ___sys_sendmsg net/socket.c:2384 [inline] __sys_sendmsg+0x451/0x5f0 net/socket.c:2417 __do_sys_sendmsg net/socket.c:2426 [inline] __se_sys_sendmsg+0x97/0xb0 net/socket.c:2424 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2424 do_syscall_64+0xb8/0x160 arch/x86/entry/common.c:296 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x45b3b9 Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007ff1c7b1ac78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007ff1c7b1b6d4 RCX: 000000000045b3b9 RDX: 0000000000000000 RSI: 0000000020000040 RDI: 0000000000000003 RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 00000000000009cb R14: 00000000004cb3dd R15: 000000000075bf2c Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:144 [inline] kmsan_internal_poison_shadow+0x66/0xd0 mm/kmsan/kmsan.c:127 kmsan_slab_alloc+0x8a/0xe0 mm/kmsan/kmsan_hooks.c:82 slab_alloc_node mm/slub.c:2774 [inline] __kmalloc_node_track_caller+0xb40/0x1200 mm/slub.c:4382 __kmalloc_reserve net/core/skbuff.c:141 [inline] __alloc_skb+0x2fd/0xac0 net/core/skbuff.c:209 alloc_skb include/linux/skbuff.h:1049 [inline] netlink_alloc_large_skb net/netlink/af_netlink.c:1174 [inline] netlink_sendmsg+0x7d3/0x14d0 net/netlink/af_netlink.c:1892 sock_sendmsg_nosec net/socket.c:639 [inline] sock_sendmsg net/socket.c:659 [inline] ____sys_sendmsg+0x12b6/0x1350 net/socket.c:2330 ___sys_sendmsg net/socket.c:2384 [inline] __sys_sendmsg+0x451/0x5f0 net/socket.c:2417 __do_sys_sendmsg net/socket.c:2426 [inline] __se_sys_sendmsg+0x97/0xb0 net/socket.c:2424 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2424 do_syscall_64+0xb8/0x160 arch/x86/entry/common.c:296 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 36fbf1e52bd3 ("net: rtnetlink: add linkprop commands to add and delete alternative ifnames") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jiri Pirko <jiri@mellanox.com> Reported-by: syzbot <syzkaller@googlegroups.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-16net: fib_rules: Correctly set table field when table number exceeds 8 bitsJethro Beekman
In 709772e6e06564ed94ba740de70185ac3d792773, RT_TABLE_COMPAT was added to allow legacy software to deal with routing table numbers >= 256, but the same change to FIB rule queries was overlooked. Signed-off-by: Jethro Beekman <jethro@fortanix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-16net/rds: Track user mapped pages through special APILeon Romanovsky
Convert net/rds to use the newly introduces pin_user_pages() API, which properly sets FOLL_PIN. Setting FOLL_PIN is now required for code that requires tracking of pinned pages. Note that this effectively changes the code's behavior: it now ultimately calls set_page_dirty_lock(), instead of set_page_dirty(). This is probably more accurate. As Christoph Hellwig put it, "set_page_dirty() is only safe if we are dealing with a file backed page where we have reference on the inode it hangs off." [1] [1] https://lore.kernel.org/r/20190723153640.GB720@lst.de Cc: Hans Westgaard Ry <hans.westgaard.ry@oracle.com> Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: John Hubbard <jhubbard@nvidia.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-16ipv6: Fix nlmsg_flags when splitting a multipath routeBenjamin Poirier
When splitting an RTA_MULTIPATH request into multiple routes and adding the second and later components, we must not simply remove NLM_F_REPLACE but instead replace it by NLM_F_CREATE. Otherwise, it may look like the netlink message was malformed. For example, ip route add 2001:db8::1/128 dev dummy0 ip route change 2001:db8::1/128 nexthop via fe80::30:1 dev dummy0 \ nexthop via fe80::30:2 dev dummy0 results in the following warnings: [ 1035.057019] IPv6: RTM_NEWROUTE with no NLM_F_CREATE or NLM_F_REPLACE [ 1035.057517] IPv6: NLM_F_CREATE should be set when creating new route This patch makes the nlmsg sequence look equivalent for __ip6_ins_rt() to what it would get if the multipath route had been added in multiple netlink operations: ip route add 2001:db8::1/128 dev dummy0 ip route change 2001:db8::1/128 nexthop via fe80::30:1 dev dummy0 ip route append 2001:db8::1/128 nexthop via fe80::30:2 dev dummy0 Fixes: 27596472473a ("ipv6: fix ECMP route replacement") Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-16ipv6: Fix route replacement with dev-only routeBenjamin Poirier
After commit 27596472473a ("ipv6: fix ECMP route replacement") it is no longer possible to replace an ECMP-able route by a non ECMP-able route. For example, ip route add 2001:db8::1/128 via fe80::1 dev dummy0 ip route replace 2001:db8::1/128 dev dummy0 does not work as expected. Tweak the replacement logic so that point 3 in the log of the above commit becomes: 3. If the new route is not ECMP-able, and no matching non-ECMP-able route exists, replace matching ECMP-able route (if any) or add the new route. We can now summarize the entire replace semantics to: When doing a replace, prefer replacing a matching route of the same "ECMP-able-ness" as the replace argument. If there is no such candidate, fallback to the first route found. Fixes: 27596472473a ("ipv6: fix ECMP route replacement") Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-14Merge tag 'nfs-for-5.6-2' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Anna Schumaker: "The only stable fix this time is the DMA scatter-gather list bug fixed by Chuck. The rest fix up races and refcounting issues that have been found during testing. Stable fix: - fix DMA scatter-gather list mapping imbalance The rest: - fix directory verifier races - fix races between open and dentry revalidation - fix revalidation of dentries with delegations - fix "cachethis" setting for writes - fix delegation and delegation cred pinning" * tag 'nfs-for-5.6-2' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFSv4: Ensure the delegation cred is pinned when we call delegreturn NFSv4: Ensure the delegation is pinned in nfs_do_return_delegation() NFSv4.1 make cachethis=no for writes xprtrdma: Fix DMA scatter-gather list mapping imbalance NFSv4: Fix revalidation of dentries with delegations NFSv4: Fix races between open and dentry revalidation NFS: Fix up directory verifier races
2020-02-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix interrupt name truncation in mv88e6xxx dsa driver, from Andrew Lunn. 2) Process generic XDP even if SKB is cloned, from Toke Høiland-Jørgensen. 3) Fix leak of kernel memory to userspace in smc, from Eric Dumazet. 4) Add some missing netlink attribute validation to matchall and flower, from Davide Caratti. 5) Send icmp responses properly when NAT has been applied to the frame before we get to the tunnel emitting the icmp, from Jason Donenfeld. 6) Make sure there is enough SKB headroom when adding dsa tags for qca and ar9331. From Per Forlin. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (62 commits) netdevice.h: fix all kernel-doc and Sphinx warnings net: dsa: tag_ar9331: Make sure there is headroom for tag net: dsa: tag_qca: Make sure there is headroom for tag net, ip6_tunnel: enhance tunnel locate with link check net/smc: no peer ID in CLC decline for SMCD net/smc: transfer fasync_list in case of fallback net: hns3: fix a copying IPv6 address error in hclge_fd_get_flow_tuples() net: hns3: fix VF bandwidth does not take effect in some case net: hns3: add management table after IMP reset mac80211: fix wrong 160/80+80 MHz setting cfg80211: add missing policy for NL80211_ATTR_STATUS_CODE xfrm: interface: use icmp_ndo_send helper wireguard: device: use icmp_ndo_send helper sunvnet: use icmp_ndo_send helper gtp: use icmp_ndo_send helper icmp: introduce helper for nat'd source address in network device context net/sched: flower: add missing validation of TCA_FLOWER_FLAGS net/sched: matchall: add missing validation of TCA_MATCHALL_FLAGS net/flow_dissector: remove unexist field description page_pool: refill page when alloc.count of pool is zero ...
2020-02-14net: dsa: tag_ar9331: Make sure there is headroom for tagPer Forlin
Passing tag size to skb_cow_head will make sure there is enough headroom for the tag data. This change does not introduce any overhead in case there is already available headroom for tag. Signed-off-by: Per Forlin <perfn@axis.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-14net: dsa: tag_qca: Make sure there is headroom for tagPer Forlin
Passing tag size to skb_cow_head will make sure there is enough headroom for the tag data. This change does not introduce any overhead in case there is already available headroom for tag. Signed-off-by: Per Forlin <perfn@axis.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-14net, ip6_tunnel: enhance tunnel locate with link checkWilliam Dauchy
With ipip, it is possible to create an extra interface explicitly attached to a given physical interface: # ip link show tunl0 4: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ipip 0.0.0.0 brd 0.0.0.0 # ip link add tunl1 type ipip dev eth0 # ip link show tunl1 6: tunl1@eth0: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ipip 0.0.0.0 brd 0.0.0.0 But it is not possible with ip6tnl: # ip link show ip6tnl0 5: ip6tnl0@NONE: <NOARP> mtu 1452 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/tunnel6 :: brd :: # ip link add ip6tnl1 type ip6tnl dev eth0 RTNETLINK answers: File exists This patch aims to make it possible by adding link comparaison in both tunnel locate and lookup functions; we also modify mtu calculation when attached to an interface with a lower mtu. This permits to make use of x-netns communication by moving the newly created tunnel in a given netns. Signed-off-by: William Dauchy <w.dauchy@criteo.com> Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-14Merge tag 'mac80211-for-net-2020-02-14' of ↵David S. Miller
git://git.kernel.org/pub