summaryrefslogtreecommitdiffstats
path: root/include
AgeCommit message (Collapse)Author
2018-10-16ipmr: Refactor mr_rtm_dumprouteDavid Ahern
Move per-table loops from mr_rtm_dumproute to mr_table_dump and export mr_table_dump for dumps by specific table id. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16net/ipv4: Plumb support for filtering route dumpsDavid Ahern
Implement kernel side filtering of routes by table id, egress device index, protocol and route type. If the table id is given in the filter, lookup the table and call fib_table_dump directly for it. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16net: Add struct for fib dump filterDavid Ahern
Add struct fib_dump_filter for options on limiting which routes are returned in a dump request. The current list is table id, protocol, route type, rtm_flags and nexthop device index. struct net is needed to lookup the net_device from the index. Declare the filter for each route dump handler and plumb the new arguments from dump handlers to ip_valid_fib_dump_req. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16netlink: Add answer_flags to netlink_callbackDavid Ahern
With dump filtering we need a way to ensure the NLM_F_DUMP_FILTERED flag is set on a message back to the user if the data returned is influenced by some input attributes. Normally this can be done as messages are added to the skb, but if the filter results in no data being returned, the user could be confused as to why. This patch adds answer_flags to the netlink_callback allowing dump handlers to set the NLM_F_DUMP_FILTERED at a minimum in the NLMSG_DONE message ensuring the flag gets back to the user. The netlink_callback space is initialized to 0 via a memset in __netlink_dump_start, so init of the new answer_flags is covered. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2018-10-16 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Convert BPF sockmap and kTLS to both use a new sk_msg API and enable sk_msg BPF integration for the latter, from Daniel and John. 2) Enable BPF syscall side to indicate for maps that they do not support a map lookup operation as opposed to just missing key, from Prashant. 3) Add bpftool map create command which after map creation pins the map into bpf fs for further processing, from Jakub. 4) Add bpftool support for attaching programs to maps allowing sock_map and sock_hash to be used from bpftool, from John. 5) Improve syscall BPF map update/delete path for map-in-map types to wait a RCU grace period for pending references to complete, from Daniel. 6) Couple of follow-up fixes for the BPF socket lookup to get it enabled also when IPv6 is compiled as a module, from Joe. 7) Fix a generic-XDP bug to handle the case when the Ethernet header was mangled and thus update skb's protocol and data, from Jesper. 8) Add a missing BTF header length check between header copies from user space, from Wenwen. 9) Minor fixups in libbpf to use __u32 instead u32 types and include proper perf_event.h uapi header instead of perf internal one, from Yonghong. 10) Allow to pass user-defined flags through EXTRA_CFLAGS and EXTRA_LDFLAGS to bpftool's build, from Jiri. 11) BPF kselftest tweaks to add LWTUNNEL to config fragment and to install with_addr.sh script from flow dissector selftest, from Anders. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15net: extend sk_pacing_rate to unsigned longEric Dumazet
sk_pacing_rate has beed introduced as a u32 field in 2013, effectively limiting per flow pacing to 34Gbit. We believe it is time to allow TCP to pace high speed flows on 64bit hosts, as we now can reach 100Gbit on one TCP flow. This patch adds no cost for 32bit kernels. The tcpi_pacing_rate and tcpi_max_pacing_rate were already exported as 64bit, so iproute2/ss command require no changes. Unfortunately the SO_MAX_PACING_RATE socket option will stay 32bit and we will need to add a new option to let applications control high pacing rates. State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 1787144 10.246.9.76:49992 10.246.9.77:36741 timer:(on,003ms,0) ino:91863 sk:2 <-> skmem:(r0,rb540000,t66440,tb2363904,f605944,w1822984,o0,bl0,d0) ts sack bbr wscale:8,8 rto:201 rtt:0.057/0.006 mss:1448 rcvmss:536 advmss:1448 cwnd:138 ssthresh:178 bytes_acked:256699822585 segs_out:177279177 segs_in:3916318 data_segs_out:177279175 bbr:(bw:31276.8Mbps,mrtt:0,pacing_gain:1.25,cwnd_gain:2) send 28045.5Mbps lastrcv:73333 pacing_rate 38705.0Mbps delivery_rate 22997.6Mbps busy:73333ms unacked:135 retrans:0/157 rcv_space:14480 notsent:2085120 minrtt:0.013 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15tcp: do not change tcp_wstamp_ns in tcp_mstamp_refreshEric Dumazet
In EDT design, I made the mistake of using tcp_wstamp_ns to store the last tcp_clock_ns() sample and to store the pacing virtual timer. This causes major regressions at high speed flows. Introduce tcp_clock_cache to store last tcp_clock_ns(). This is needed because some arches have slow high-resolution kernel time service. tcp_wstamp_ns is only updated when a packet is sent. Note that we can remove tcp_mstamp in the future since tcp_mstamp is essentially tcp_clock_cache/1000, so the apparent socket size increase is temporary. Fixes: 9799ccb0e984 ("tcp: add tcp_wstamp_ns socket field") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15net/ncsi: Extend NC-SI Netlink interface to allow user space to send NC-SI ↵Justin.Lee1@Dell.com
command The new command (NCSI_CMD_SEND_CMD) is added to allow user space application to send NC-SI command to the network card. Also, add a new attribute (NCSI_ATTR_DATA) for transferring request and response. The work flow is as below. Request: User space application -> Netlink interface (msg) -> new Netlink handler - ncsi_send_cmd_nl() -> ncsi_xmit_cmd() Response: Response received - ncsi_rcv_rsp() -> internal response handler - ncsi_rsp_handler_xxx() -> ncsi_rsp_handler_netlink() -> ncsi_send_netlink_rsp () -> Netlink interface (msg) -> user space application Command timeout - ncsi_request_timeout() -> ncsi_send_netlink_timeout () -> Netlink interface (msg with zero data length) -> user space application Error: Error detected -> ncsi_send_netlink_err () -> Netlink interface (err msg) -> user space application Signed-off-by: Justin Lee <justin.lee1@dell.com> Reviewed-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15Merge tag 'mlx5e-updates-2018-10-10' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5e-updates-2018-10-10 IPoIB netlink support and mlx5e pre-allocated netdevice initialization IP link was broken due to the changes in IPoIB for the rdma_netdev support after commit cd565b4b51e5 ("IB/IPoIB: Support acceleration options callbacks"). This patchset fixes IPoIB pkey creation and removal using rtnetlink by adding support in both IPoIB ULP layer and mlx5 layer: From Jason and Denis: 1) Introduces changes in the RDMA netdev code in order to allow allocation of the netdev to be done by the rtnl netdev code. 2) Reworks IPoIB initialization to use the two step rdma_netdev creation. From Feras and Saeed, mlx5e netdev layer refactoring to allow accepting pre-allocated netdevs: 3) Adds support to initialize/cleanup netdevs that are not created by mlx5 driver. 4) Change mlx5e netdevice layer to accept the pre-allocated netdevice queue number. 5) Initialize mlx5e generic structures in one place to be used for all netdevs types NIC/representors/IPoIB (both mlx5 allocated and pre-allocted). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15FDDI: defza: Support capturing outgoing SMT trafficMaciej W. Rozycki
DEC FDDIcontroller 700 (DEFZA) uses a Tx/Rx queue pair to communicate SMT frames with adapter's firmware. Any SMT frame received from the RMC via the Rx queue is queued back by the driver to the SMT Rx queue for the firmware to process. Similarly the firmware uses the SMT Tx queue to supply the driver with SMT frames which are queued back to the Tx queue for the RMC to send to the ring. When a network tap is attached to an FDDI interface handled by `defza' any incoming SMT frames captured are queued to our usual processing of network data received, which in turn delivers them to any listening taps. However the outgoing SMT frames produced by the firmware bypass our network protocol stack and are therefore not delivered to taps. This in turn means that taps are missing a part of network traffic sent by the adapter, which may make it more difficult to track down network problems or do general traffic analysis. Call `dev_queue_xmit_nit' then in the SMT Tx path, having checked that a network tap is attached, with a newly-created `dev_nit_active' helper wrapping the usual condition used in the transmit path. Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15FDDI: defza: Add support for DEC FDDIcontroller 700 TURBOchannel adapterMaciej W. Rozycki
Add support for the DEC FDDIcontroller 700 (DEFZA), Digital Equipment Corporation's first-generation FDDI network interface adapter, made for TURBOchannel and based on a discrete version of what eventually became Motorola's widely used CAMEL chipset. The CAMEL chipset is present for example in the DEC FDDIcontroller TURBOchannel, EISA and PCI adapters (DEFTA/DEFEA/DEFPA) that we support with the `defxx' driver, however the host bus interface logic and the firmware API are different in the DEFZA and hence a separate driver is required. There isn't much to say about the driver except that it works, but there is one peculiarity to mention. The adapter implements two Tx/Rx queue pairs. Of these one pair is the usual network Tx/Rx queue pair, in this case used by the adapter to exchange frames with the ring, via the RMC (Ring Memory Controller) chip. The Tx queue is handled directly by the RMC chip and resides in onboard packet memory. The Rx queue is maintained via DMA in host memory by adapter's firmware copying received data stored by the RMC in onboard packet memory. The other pair is used to communicate SMT frames with adapter's firmware. Any SMT frame received from the RMC via the Rx queue must be queued back by the driver to the SMT Rx queue for the firmware to process. Similarly the firmware uses the SMT Tx queue to supply the driver with SMT frames that must be queued back to the Tx queue for the RMC to send to the ring. This solution was chosen because the designers ran out of PCB space and could not squeeze in more logic onto the board that would be required to handle this SMT frame traffic without the need to involve the driver, as with the later DEFTA/DEFEA/DEFPA adapters. Finally the driver does some Frame Control byte decoding, so to avoid magic numbers some macros are added to <linux/if_fddi.h>. Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15bpf: Allow sk_lookup with IPv6 moduleJoe Stringer
This is a more complete fix than d71019b54bff ("net: core: Fix build with CONFIG_IPV6=m"), so that IPv6 sockets may be looked up if the IPv6 module is loaded (not just if it's compiled in). Signed-off-by: Joe Stringer <joe@wand.net.nz> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-15tls: add bpf support to sk_msg handlingJohn Fastabend
This work adds BPF sk_msg verdict program support to kTLS allowing BPF and kTLS to be combined together. Previously kTLS and sk_msg verdict programs were mutually exclusive in the ULP layer which created challenges for the orchestrator when trying to apply TCP based policy, for example. To resolve this, leveraging the work from previous patches that consolidates the use of sk_msg, we can finally enable BPF sk_msg verdict programs so they continue to run after the kTLS socket is created. No change in behavior when kTLS is not used in combination with BPF, the kselftest suite for kTLS also runs successfully. Joint work with Daniel. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-15tls: replace poll implementation with read hookJohn Fastabend
Instead of re-implementing poll routine use the poll callback to trigger read from kTLS, we reuse the stream_memory_read callback which is simpler and achieves the same. This helps to align sockmap and kTLS so we can more easily embed BPF in kTLS. Joint work with Daniel. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-15tls: convert to generic sk_msg interfaceDaniel Borkmann
Convert kTLS over to make use of sk_msg interface for plaintext and encrypted scattergather data, so it reuses all the sk_msg helpers and data structure which later on in a second step enables to glue this to BPF. This also allows to remove quite a bit of open coded helpers which are covered by the sk_msg API. Recent changes in kTLs 80ece6a03aaf ("tls: Remove redundant vars from tls record structure") and 4e6d47206c32 ("tls: Add support for inplace records encryption") changed the data path handling a bit; while we've kept the latter optimization intact, we had to undo the former change to better fit the sk_msg model, hence the sg_aead_in and sg_aead_out have been brought back and are linked into the sk_msg sgs. Now the kTLS record contains a msg_plaintext and msg_encrypted sk_msg each. In the original code, the zerocopy_from_iter() has been used out of TX but also RX path. For the strparser skb-based RX path, we've left the zerocopy_from_iter() in decrypt_internal() mostly untouched, meaning it has been moved into tls_setup_from_iter() with charging logic removed (as not used from RX). Given RX path is not based on sk_msg objects, we haven't pursued setting up a dummy sk_msg to call into sk_msg_zerocopy_from_iter(), but it could be an option to prusue in a later step. Joint work with John. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-15bpf, sockmap: convert to generic sk_msg interfaceDaniel Borkmann
Add a generic sk_msg layer, and convert current sockmap and later kTLS over to make use of it. While sk_buff handles network packet representation from netdevice up to socket, sk_msg handles data representation from application to socket layer. This means that sk_msg framework spans across ULP users in the kernel, and enables features such as introspection or filtering of data with the help of BPF programs that operate on this data structure. Latter becomes in particular useful for kTLS where data encryption is deferred into the kernel, and as such enabling the kernel to perform L7 introspection and policy based on BPF for TLS connections where the record is being encrypted after BPF has run and came to a verdict. In order to get there, first step is to transform open coding of scatter-gather list handling into a common core framework that subsystems can use. The code itself has been split and refactored into three bigger pieces: i) the generic sk_msg API which deals with managing the scatter gather ring, providing helpers for walking and mangling, transferring application data from user space into it, and preparing it for BPF pre/post-processing, ii) the plain sock map itself where sockets can be attached to or detached from; these bits are independent of i) which can now be used also without sock map, and iii) the integration with plain TCP as one protocol to be used for processing L7 application data (later this could e.g. also be extended to other protocols like UDP). The semantics are the same with the old sock map code and therefore no change of user facing behavior or APIs. While pursuing this work it also helped finding a number of bugs in the old sockmap code that we've fixed already in earlier commits. The test_sockmap kselftest suite passes through fine as well. Joint work with John. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-15tcp, ulp: remove ulp bits from sockmapDaniel Borkmann
In order to prepare sockmap logic to be used in combination with kTLS we need to detangle it from ULP, and further split it in later commits into a generic API. Joint work with John. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-14Merge ath-next from git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.gitKalle Valo
ath.git patches for 4.20. Major changes: ath10k * support NET_DETECT WoWLAN feature * wcn3990 basic functionality now working after we got QMI support
2018-10-13firmware: qcom: scm: Add WLAN VMID for Qualcomm SCM interfaceGovind Singh
Add WLAN related VMID's to support wlan driver to set up the remote's permissions call via TrustZone. Signed-off-by: Govind Singh <govinds@codeaurora.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Acked-by: Niklas Cassel <niklas.cassel@linaro.org> Reviewed-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-10-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts were easy to resolve using immediate context mostly, except the cls_u32.c one where I simply too the entire HEAD chunk. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-12netlink: replace __NLA_ENSURE implementationJohannes Berg
We already have BUILD_BUG_ON_ZERO() which I just hadn't found before, so we should use it here instead of open-coding another implementation thereof. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-12Merge tag 'mac80211-next-for-davem-2018-10-12' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== Highlights: * merge net-next, so I can finish the hwsim workqueue removal * fix TXQ NULL pointer issue that was reported multiple times * minstrel cleanups from Felix * simplify lib80211 code by not using skcipher, note that this will conflict with the crypto tree (and this new code here should be used) * use new netlink policy validation in nl80211 * fix up SAE (part of WPA3) in client-mode * FTM responder support in the stack ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-12net: bridge: add support for per-port vlan statsNikolay Aleksandrov
This patch adds an option to have per-port vlan stats instead of the default global stats. The option can be set only when there are no port vlans in the bridge since we need to allocate the stats if it is set when vlans are being added to ports (and respectively free them when being deleted). Also bump RTNL_MAX_TYPE as the bridge is the largest user of options. The current stats design allows us to add these without any changes to the fast-path, it all comes down to the per-vlan stats pointer which, if this option is enabled, will be allocated for each port vlan instead of using the global bridge-wide one. CC: bridge@lists.linux-foundation.org CC: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-12net: Evict neighbor entries on carrier downDavid Ahern
When a link's carrier goes down it could be a sign of the port changing networks. If the new network has overlapping addresses with the old one, then the kernel will continue trying to use neighbor entries established based on the old network until the entries finally age out - meaning a potentially long delay with communications not working. This patch evicts neighbor entries on carrier down with the exception of those marked permanent. Permanent entries are managed by userspace (either an admin or a routing daemon such as FRR). Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-12net/ipv6: Add knob to skip DELROUTE message on device downDavid Ahern
Another difference between IPv4 and IPv6 is the generation of RTM_DELROUTE notifications when a device is taken down (admin down) or deleted. IPv4 does not generate a message for routes evicted by the down or delete; IPv6 does. A NOS at scale really needs to avoid these messages and have IPv4 and IPv6 behave similarly, relying on userspace to handle link notifications and evict the routes. At this point existing user behavior needs to be preserved. Since notifications are a global action (not per app) the only way to preserve existing behavior and allow the messages to be skipped is to add a new sysctl (net/ipv6/route/skip_notify_on_dev_down) which can be set to disable the notificatioons. IPv6 route code already supports the option to skip the message (it is used for multipath routes for example). Besides the new sysctl we need to pass the skip_notify setting through the generic fib6_clean and fib6_walk functions to fib6_clean_node and to set skip_notify on calls to __ip_del_rt for the addrconf_ifdown path. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-12Merge tag 'armsoc-fixes-4.19' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Arnd writes: "ARM: SoC fixes for 4.19 Two last minute bugfixes, both for NXP platforms: * The Layerscape 'qbman' infrastructure suffers from probe ordering bugs in some configurations, a two-patch series adds a hotfix for this. 4.20 will have a longer set of patches to rework it. * The old imx53-qsb board regressed in 4.19 after the addition of cpufreq support, adding a set of explicit operating points fixes this." * tag 'armsoc-fixes-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: soc: fsl: qman_portals: defer probe after qman's probe soc: fsl: qbman: add APIs to retrieve the probing status ARM: dts: imx53-qsb: disable 1.2GHz OPP
2018-10-12mac80211: implement ieee80211_tx_rate_update to update rateAnilkumar Kolli
Current mac80211 has provision to update tx status through ieee80211_tx_status() and ieee80211_tx_status_ext(). But drivers like ath10k updates the tx status from the skb except txrate, txrate will be updated from a different path, peer stats. Using ieee80211_tx_status_ext() in two different paths (one for the stats, one for the tx rate) would duplicate the stats instead. To avoid this stats duplication, ieee80211_tx_rate_update() is implemented. Signed-off-by: Anilkumar Kolli <akolli@codeaurora.org> [minor commit message editing, use initializers in code] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-12nl80211: Add per peer statistics to compute FCS error rateAnkita Bajaj
Add support for drivers to report the total number of MPDUs received and the number of MPDUs received with an FCS error from a specific peer. These counters will be incremented only when the TA of the frame matches the MAC address of the peer irrespective of FCS error. It should be noted that the TA field in the frame might be corrupted when there is an FCS error and TA matching logic would fail in such cases. Hence, FCS error counter might not be fully accurate, but it can provide help in detecting bad RX links in significant number of cases. This FCS error counter without full accuracy can be used, e.g., to trigger a kick-out of a connected client with a bad link in AP mode to force such a client to roam to another AP. Signed-off-by: Ankita Bajaj <bankita@codeaurora.org> Signed-off-by: Jouni Malinen <jouni@codeaurora.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-12Merge tag 'gpio-v4.19-4' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio Linus writes: "GPIO fix for the v4.19 series: - Fix up the interrupt parent for the irqdomains." * tag 'gpio-v4.19-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: gpio: Assign gpio_irq_chip::parents to non-stack pointer
2018-10-12mac80211: support FTM responder configuration/statisticsPradeep Kumar Chitrapu
New bss param ftm_responder is used to notify the driver to enable fine timing request (FTM) responder role in AP mode. Plumb the new cfg80211 API for FTM responder statistics through to the driver API in mac80211. Signed-off-by: David Spinadel <david.spinadel@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Pradeep Kumar Chitrapu <pradeepc@codeaurora.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-12Merge branch 'for-linus' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Dmitry writes: "Input updates for v4.19-rc7 - we added a few scheduling points into various input interfaces to ensure that large writes will not cause RCU stalls - fixed configuring PS/2 keyboards as wakeup devices on newer platforms - added a new Xbox gamepad ID." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: uinput - add a schedule point in uinput_inject_events() Input: evdev - add a schedule point in evdev_write() Input: mousedev - add a schedule point in mousedev_write() Input: i8042 - enable keyboard wakeups by default when s2idle is used Input: xpad - add support for Xbox1 PDP Camo series gamepad
2018-10-12Merge tag 'next-fixes-20181012' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/sfr/next-fixes Stephen writes: "A couple of warning fixes: Two fixes from Peter Oberparleiter <oberpar@linux.ibm.com>: Commit 6b7dca401cb1 ("tracing: Allow gcov profiling on only ftrace subsystem") uncovered linker problems when using gcov kernel profiling on some architectures. These problems were likely introduced earlier, and are possibly related to compiler changes." * tag 'next-fixes-20181012' of git://git.kernel.org/pub/scm/linux/kernel/git/sfr/next-fixes: vmlinux.lds.h: Fix linker warnings about orphan .LPBX sections vmlinux.lds.h: Fix incomplete .text.exit discards
2018-10-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netGreg Kroah-Hartman
David writes: "Networking 1) RXRPC receive path fixes from David Howells. 2) Re-export __skb_recv_udp(), from Jiri Kosina. 3) Fix refcounting in u32 classificer, from Al Viro. 4) Userspace netlink ABI fixes from Eugene Syromiatnikov. 5) Don't double iounmap on rmmod in ena driver, from Arthur Kiyanovski. 6) Fix devlink string attribute handling, we must pull a copy into a kernel buffer if the lifetime extends past the netlink request. From Moshe Shemesh. 7) Fix hangs in RDS, from Ka-Cheong Poon. 8) Fix recursive locking lockdep warnings in tipc, from Ying Xue. 9) Clear RX irq correctly in socionext, from Ilias Apalodimas. 10) bcm_sf2 fixes from Florian Fainelli." * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (38 commits) net: dsa: bcm_sf2: Call setup during switch resume net: dsa: bcm_sf2: Fix unbind ordering net: phy: sfp: remove sfp_mutex's definition r8169: set RX_MULTI_EN bit in RxConfig for 8168F-family chips net: socionext: clear rx irq correctly net/mlx4_core: Fix warnings during boot on driverinit param set failures tipc: eliminate possible recursive locking detected by LOCKDEP selftests: udpgso_bench.sh explicitly requires bash selftests: rtnetlink.sh explicitly requires bash. qmi_wwan: Added support for Gemalto's Cinterion ALASxx WWAN interface tipc: queue socket protocol error messages into socket receive buffer tipc: set link tolerance correctly in broadcast link net: ipv4: don't let PMTU updates increase route MTU net: ipv4: update fnhe_pmtu when first hop's MTU changes net/ipv6: stop leaking percpu memory in fib6 info rds: RDS (tcp) hangs on sendto() to unresponding address net: make skb_partial_csum_set() more robust against overflows devlink: Add helper function for safely copy string param devlink: Fix param cmode driverinit for string type devlink: Fix param set handling for string type ...
2018-10-12vmlinux.lds.h: Fix linker warnings about orphan .LPBX sectionsPeter Oberparleiter
Enabling both CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y and CONFIG_GCOV_PROFILE_ALL=y results in linker warnings: warning: orphan section `.data..LPBX1' being placed in section `.data..LPBX1'. LD_DEAD_CODE_DATA_ELIMINATION adds compiler flag -fdata-sections. This option causes GCC to create separate data sections for data objects, including those generated by GCC internally for gcov profiling. The names of these objects start with a dot (.LPBX0, .LPBX1), resulting in section names starting with 'data..'. As section names starting with 'data..' are used for specific purposes in the Linux kernel, the linker script does not automatically include them in the output data section, resulting in the "orphan section" linker warnings. Fix this by specifically including sections named "data..LPBX*" in the data section. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Tested-by: Stephen Rothwell <sfr@canb.auug.org.au> Tested-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2018-10-12vmlinux.lds.h: Fix incomplete .text.exit discardsPeter Oberparleiter
Enabling CONFIG_GCOV_PROFILE_ALL=y causes linker errors on ARM: `.text.exit' referenced in section `.ARM.exidx.text.exit': defined in discarded section `.text.exit' `.text.exit' referenced in section `.fini_array.00100': defined in discarded section `.text.exit' And related errors on NDS32: `.text.exit' referenced in section `.dtors.65435': defined in discarded section `.text.exit' The gcov compiler flags cause certain compiler versions to generate additional destructor-related sections that are not yet handled by the linker script, resulting in references between discarded and non-discarded sections. Since destructors are not used in the Linux kernel, fix this by discarding these additional sections. Reported-by: Arnd Bergmann <arnd@arndb.de> Tested-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Reported-by: Greentime Hu <green.hu@gmail.com> Tested-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2018-10-11Merge branch 'for-4.19-fixes' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Tejun writes: "cgroup fixes for v4.19-rc7 One cgroup2 threaded mode fix for v4.19-rc7. While threaded mode isn't used widely (yet) and the bug requires somewhat convoluted sequence of operations, it causes a userland visible malfunction - EINVAL on a valid attempt to enable threaded mode. This pull request contains the fix" * 'for-4.19-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: Fix dom_cgrp propagation when enabling threaded mode
2018-10-11mac80211: minstrel: Enable STBC and LDPC for VHT RatesChaitanya T K
If peer support reception of STBC and LDPC, enable them for better performance. Signed-off-by: Chaitanya TK <chaitanya.mgit@gmail.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-10net: ipv4: update fnhe_pmtu when first hop's MTU changesSabrina Dubroca
Since commit 5aad1de5ea2c ("ipv4: use separate genid for next hop exceptions"), exceptions get deprecated separately from cached routes. In particular, administrative changes don't clear PMTU anymore. As Stefano described in commit e9fa1495d738 ("ipv6: Reflect MTU changes on PMTU of exceptions for MTU-less routes"), the PMTU discovered before the local MTU change can become stale: - if the local MTU is now lower than the PMTU, that PMTU is now incorrect - if the local MTU was the lowest value in the path, and is increased, we might discover a higher PMTU Similarly to what commit e9fa1495d738 did for IPv6, update PMTU in those cases. If the exception was locked, the discovered PMTU was smaller than the minimal accepted PMTU. In that case, if the new local MTU is smaller than the current PMTU, let PMTU discovery figure out if locking of the exception is still needed. To do this, we need to know the old link MTU in the NETDEV_CHANGEMTU notifier. By the time the notifier is called, dev->mtu has been changed. This patch adds the old MTU as additional information in the notifier structure, and a new call_netdevice_notifiers_u32() function. Fixes: 5aad1de5ea2c ("ipv4: use separate genid for next hop exceptions") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10Merge tag 'rxrpc-fixes-20181008' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Fix packet reception code Here are a set of patches that prepares for and fix problems in rxrpc's package reception code. There serious problems are: (A) There's a window between binding the socket and setting the data_ready hook in which packets can find their way into the UDP socket's receive queues. (B) The skb_recv_udp() will return an error (and clear the error state) if there was an error on the Tx side. rxrpc doesn't handle this. (C) The rxrpc data_ready handler doesn't fully drain the UDP receive queue. (D) The rxrpc data_ready handler assumes it is called in a non-reentrant state. The second patch fixes (A) - (C); the third patch renders (B) and (C) non-issues by using the recap_rcv hook instead of data_ready - and the final patch fixes (D). That last is the most complex. The preparatory patches are: (1) Fix some places that are doing things in the wrong net namespace. (2) Stop taking the rcu read lock as it's held by the IP input routine in the call chain. (3) Only end the Tx phase if *we* rotated the final packet out of the Tx buffer. (4) Don't assume that the call state won't change after dropping the call_state lock. (5) Only take receive window and MTU suze parameters from an ACK packet if it's the latest ACK packet. (6) Record connection-level abort information correctly. (7) Fix a trace line. And then there are three main patches - note that these are mixed in with the preparatory patches somewhat: (1) Fix the setup window (A), skb_recv_udp() error check (B) and packet drainage (C). (2) Switch to using the encap_rcv instead of data_ready to cut out the effects of the UDP read queues and get the packets delivered directly. (3) Add more locking into the various packet input paths to defend against re-entrance (D). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10net/ipv6: Make ipv6_route_table_template staticDavid Ahern
ipv6_route_table_template is exported but there are no users outside of route.c. Make it static. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10RDMA/netdev: Fix netlink support in IPoIBDenis Drozdov
IPoIB netlink support was broken by the below commit since integrating the rdma_netdev support relies on an allocation flow for netdevs that was controlled by the ipoib driver while netdev's rtnl_newlink implementation assumes that the netdev will be allocated by netlink. Such situation leads to crash in __ipoib_device_add, once trying to reuse netlink device. This patch fixes the kernel oops for both mlx4 and mlx5 devices triggered by the following command: Fixes: cd565b4b51e5 ("IB/IPoIB: Support acceleration options callbacks") Signed-off-by: Denis Drozdov <denisd@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-10RDMA/netdev: Hoist alloc_netdev_mqs out of the driverDenis Drozdov
netdev has several interfaces that expect to call alloc_netdev_mqs from the core code, with the driver only providing the arguments. This is incompatible with the rdma_netdev interface that returns the netdev directly. Thus re-organize the API used by ipoib so that the verbs core code calls alloc_netdev_mqs for the driver. This is done by allowing the drivers to provide the allocation parameters via a 'get_params' callback and then initializing an allocated netdev as a second step. Fixes: cd565b4b51e5 ("IB/IPoIB: Support acceleration options callbacks") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Denis Drozdov <denisd@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-10devlink: Add helper function for safely copy string paramMoshe Shemesh
Devlink string param buffer is allocated at the size of DEVLINK_PARAM_MAX_STRING_VALUE. Add helper function which makes sure this size is not exceeded. Renamed DEVLINK_PARAM_MAX_STRING_VALUE to __DEVLINK_PARAM_MAX_STRING_VALUE to emphasize that it should be used by devlink only. The driver should use the helper function instead to verify it doesn't exceed the allowed length. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10devlink: Fix param set handling for string typeMoshe Shemesh
In case devlink param type is string, it needs to copy the string value it got from the input to devlink_param_value. Fixes: e3b7ca18ad7b ("devlink: Add param set command") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10gpio: Assign gpio_irq_chip::parents to non-stack pointerStephen Boyd
gpiochip_set_cascaded_irqchip() is passed 'parent_irq' as an argument and then the address of that argument is assigned to the gpio chips gpio_irq_chip 'parents' pointer shortly thereafter. This can't ever work, because we've just assigned some stack address to a pointer that we plan to dereference later in gpiochip_irq_map(). I ran into this issue with the KASAN report below when gpiochip_irq_map() tried to setup the parent irq with a total junk pointer for the 'parents' array. BUG: KASAN: stack-out-of-bounds in gpiochip_irq_map+0x228/0x248 Read of size 4 at addr ffffffc0dde472e0 by task swapper/0/1 CPU: 7 PID: 1 Comm: swapper/0 Not tainted 4.14.72 #34 Call trace: [<ffffff9008093638>] dump_backtrace+0x0/0x718 [<ffffff9008093da4>] show_stack+0x20/0x2c [<ffffff90096b9224>] __dump_stack+0x20/0x28 [<ffffff90096b91c8>] dump_stack+0x80/0xbc [<ffffff900845a350>] print_address_description+0x70/0x238 [<ffffff900845a8e4>] kasan_report+0x1cc/0x260 [<ffffff900845aa14>] __asan_report_load4_noabort+0x2c/0x38 [<ffffff900897e098>] gpiochip_irq_map+0x228/0x248 [<ffffff900820cc08>] irq_domain_associate+0x114/0x2ec [<ffffff900820d13c>] irq_create_mapping+0x120/0x234 [<ffffff900820da78>] irq_create_fwspec_mapping+0x4c8/0x88c [<ffffff900820e2d8>] irq_create_of_mapping+0x180/0x210 [<ffffff900917114c>] of_irq_get+0x138/0x198 [<ffffff9008dc70ac>] spi_drv_probe+0x94/0x178 [<ffffff9008ca5168>] driver_probe_device+0x51c/0x824 [<ffffff9008ca6538>] __device_attach_driver+0x148/0x20c [<ffffff9008ca14cc>] bus_for_each_drv+0x120/0x188 [<ffffff9008ca570c>] __device_attach+0x19c/0x2dc [<ffffff9008ca586c>] device_initial_probe+0x20/0x2c [<ffffff9008ca18bc>] bus_probe_device+0x80/0x154 [<ffffff9008c9b9b4>] device_add+0x9b8/0xbdc [<ffffff9008dc7640>] spi_add_device+0x1b8/0x380 [<ffffff9008dcbaf0>] spi_register_controller+0x111c/0x1378 [<ffffff9008dd6b10>] spi_geni_probe+0x4dc/0x6f8 [<ffffff9008cab058>] platform_drv_probe+0xdc/0x130 [<ffffff9008ca5168>] driver_probe_device+0x51c/0x824 [<ffffff9008ca59cc>] __driver_attach+0x100/0x194 [<ffffff9008ca0ea8>] bus_for_each_dev+0x104/0x16c [<ffffff9008ca58c0>] driver_attach+0x48/0x54 [<ffffff9008ca1edc>] bus_add_driver+0x274/0x498 [<ffffff9008ca8448>] driver_register+0x1ac/0x230 [<ffffff9008caaf6c>] __platform_driver_register+0xcc/0xdc [<ffffff9009c4b33c>] spi_geni_driver_init+0x1c/0x24 [<ffffff9008084cb8>] do_one_initcall+0x240/0x3dc [<ffffff9009c017d0>] kernel_init_freeable+0x378/0x468 [<ffffff90096e8240>] kernel_init+0x14/0x110 [<ffffff9008086fcc>] ret_from_fork+0x10/0x18 The buggy address belongs to the page: page:ffffffbf037791c0 count:0 mapcount:0 mapping: (null) index:0x0 flags: 0x4000000000000000() raw: 4000000000000000 0000000000000000 0000000000000000 00000000ffffffff raw: ffffffbf037791e0 ffffffbf037791e0 0000000000000000 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffffffc0dde47180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffffffc0dde47200: f1 f1 f1 f1 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f2 f2 >ffffffc0dde47280: f2 f2 00 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 ^ ffffffc0dde47300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffffffc0dde47380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Let's leave around one unsigned int in the gpio_irq_chip struct for the single parent irq case and repoint the 'parents' array at it. This way code is left mostly intact to setup parents and we waste an extra few bytes per structure of which there should be only a handful in a system. Cc: Evan Green <evgreen@chromium.org> Cc: Thierry Reding <treding@nvidia.com> Cc: Grygorii Strashko <grygorii.strashko@ti.com> Fixes: e0d897289813 ("gpio: Implement tighter IRQ chip integration") Signed-off-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2018-10-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Alexei Starovoitov says: ==================== pull-request: bpf-next 2018-10-08 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) sk_lookup_[tcp|udp] and sk_release helpers from Joe Stringer which allow BPF programs to perform lookups for sockets in a network namespace. This would allow programs to determine early on in processing whether the stack is expecting to receive the packet, and perform some action (eg drop, forward somewhere) based on this information. 2) per-cpu cgroup local storage from Roman Gushchin. Per-cpu cgroup local storage is very similar to simple cgroup storage except all the data is per-cpu. The main goal of per-cpu variant is to implement super fast counters (e.g. packet counters), which don't require neither lookups, neither atomic operations in a fast path. The example of these hybrid counters is in selftests/bpf/netcnt_prog.c 3) allow HW offload of programs with BPF-to-BPF function calls from Quentin Monnet 4) support more than 64-byte key/value in HW offloaded BPF maps from Jakub Kicinski 5) rename of libbpf interfaces from Andrey Ignatov. libbpf is maturing as a library and should follow good practices in library design and implementation to play well with other libraries. This patch set brings consistent naming convention to global symbols. 6) relicense libbpf as LGPL-2.1 OR BSD-2-Clause from Alexei Starovoitov to let Apache2 projects use libbpf 7) various AF_XDP fixes from Björn and Magnus ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-09mm, sched/numa: Remove remaining traces of NUMA rate-limitingSrikar Dronamraju
Remove the leftover pglist_data::numabalancing_migrate_lock and its initialization, we stopped using this lock with: efaffc5e40ae ("mm, sched/numa: Remove rate-limiting of automatic NUMA balancing migration") [ mingo: Rewrote the changelog. ] Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Linux-MM <linux-mm@kvack.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1538824999-31230-1-git-send-email-srikar@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree: 1) Support for matching on ipsec policy already set in the route, from Florian Westphal. 2) Split set destruction into deactivate and destroy phase to make it fit better into the transaction infrastructure, also from Florian. This includes a patch to warn on imbalance when setting the new activate and deactivate interfaces. 3) Release transaction list from the workqueue to remove expensive synchronize_rcu() from configuration plane path. This speeds up configuration plane quite a bit. From Florian Westphal. 4) Add new xfrm/ipsec extension, this new extension allows you to match for ipsec tunnel keys such as source and destination address, spi and reqid. From Máté Eckl and Florian Westphal. 5) Add secmark support, this includes connsecmark too, patches from Christian Gottsche. 6) Allow to specify remaining bytes in xt_quota, from Chenbo Feng. One follow up patch to calm a clang warning for this one, from Nathan Chancellor. 7) Flush conntrack entries based on layer 3 family, from Kristian Evensen. 8) New revision for cgroups2 to shrink the path field. 9) Get rid of obsolete need_conntrack(), as a result from recent demodularization works. 10) Use WARN_ON instead of BUG_ON, from Florian Westphal. 11) Unused exported symbol in nf_nat_ipv4_fn(), from Florian. 12) Remove superfluous check for timeout netlink parser and dump functions in layer 4 conntrack helpers. 13) Unnecessary redundant rcu read side locks in NAT redirect, from Taehee Yoo. 14) Pass nf_hook_state structure to error handlers, patch from Florian Westphal. 15) Remove ->new() interface from layer 4 protocol trackers. Place them in the ->packet() interface. From Florian. 16) Place conntrack ->error() handling in the ->packet() interface. Patches from Florian Westphal. 17) Remove unused parameter in the pernet initialization path, also from Florian. 18) Remove additional parameter to specify layer 3 protocol when looking up for protocol tracker. From Florian. 19) Shrink array of layer 4 protocol trackers, from Florian. 20) Check for linear skb only once from the ALG NAT mangling codebase, from Taehee Yoo. 21) Use rhashtable_walk_enter() instead of deprecated rhashtable_walk_in