summaryrefslogtreecommitdiffstats
path: root/include
AgeCommit message (Collapse)Author
2016-10-29netlink: Add nla_memdup() to wrap kmemdup() use on nlattrThomas Graf
Wrap several common instances of: kmemdup(nla_data(attr), nla_len(attr), GFP_KERNEL); Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-27genetlink: mark families as __ro_after_initJohannes Berg
Now genl_register_family() is the only thing (other than the users themselves, perhaps, but I didn't find any doing that) writing to the family struct. In all families that I found, genl_register_family() is only called from __init functions (some indirectly, in which case I've add __init annotations to clarifly things), so all can actually be marked __ro_after_init. This protects the data structure from accidental corruption. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-27genetlink: use idr to track familiesJohannes Berg
Since generic netlink family IDs are small integers, allocated densely, IDR is an ideal match for lookups. Replace the existing hand-written hash-table with IDR for allocation and lookup. This lets the families only be written to once, during register, since the list_head can be removed and removal of a family won't cause any writes. It also slightly reduces the code size (by about 1.3k on x86-64). Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-27genetlink: statically initialize familiesJohannes Berg
Instead of providing macros/inline functions to initialize the families, make all users initialize them statically and get rid of the macros. This reduces the kernel code size by about 1.6k on x86-64 (with allyesconfig). Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-27genetlink: no longer support using static family IDsJohannes Berg
Static family IDs have never really been used, the only use case was the workaround I introduced for those users that assumed their family ID was also their multicast group ID. Additionally, because static family IDs would never be reserved by the generic netlink code, using a relatively low ID would only work for built-in families that can be registered immediately after generic netlink is started, which is basically only the control family (apart from the workaround code, which I also had to add code for so it would reserve those IDs) Thus, anything other than GENL_ID_GENERATE is flawed and luckily not used except in the cases I mentioned. Move those workarounds into a few lines of code, and then get rid of GENL_ID_GENERATE entirely, making it more robust. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-27genetlink: introduce and use genl_family_attrbuf()Johannes Berg
This helper function allows family implementations to access their family's attrbuf. This gets rid of the attrbuf usage in families, and also adds locking validation, since it's not valid to use the attrbuf with parallel_ops or outside of the dumpit callback. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-27skbedit: allow the user to specify bitmask for markAntonio Quartulli
The user may want to use only some bits of the skb mark in his skbedit rules because the remaining part might be used by something else. Introduce the "mask" parameter to the skbedit actor in order to implement such functionality. When the mask is specified, only those bits selected by the latter are altered really changed by the actor, while the rest is left untouched. Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-26net: phy: broadcom: Add support for BCM54612EXo Wang
This PHY has internal delays enabled after reset. This clears the internal delay enables unless the interface specifically requests them. Signed-off-by: Xo Wang <xow@google.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-26net: phy: broadcom: Update Auxiliary Control Register macrosXo Wang
Add the RXD-to-RXC skew (delay) time bit in the Miscellaneous Control shadow register and a mask for the shadow selector field. Remove a re-definition of MII_BCM54XX_AUXCTL_SHDWSEL_AUXCTL. Signed-off-by: Xo Wang <xow@google.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-23net: ip, diag -- Add diag interface for raw socketsCyrill Gorcunov
In criu we are actively using diag interface to collect sockets present in the system when dumping applications. And while for unix, tcp, udp[lite], packet, netlink it works as expected, the raw sockets do not have. Thus add it. v2: - add missing sock_put calls in raw_diag_dump_one (by eric.dumazet@) - implement @destroy for diag requests (by dsa@) v3: - add export of raw_abort for IPv6 (by dsa@) - pass net-admin flag into inet_sk_diag_fill due to changes in net-next branch (by dsa@) v4: - use @pad in struct inet_diag_req_v2 for raw socket protocol specification: raw module carries sockets which may have custom protocol passed from socket() syscall and sole @sdiag_protocol is not enough to match underlied ones - start reporting protocol specifed in socket() call when sockets are raw ones for the same reason: user space tools like ss may parse this attribute and use it for socket matching v5 (by eric.dumazet@): - use sock_hold in raw_sock_get instead of atomic_inc, we're holding (raw_v4_hashinfo|raw_v6_hashinfo)->lock when looking up so counter won't be zero here. v6: - use sdiag_raw_protocol() helper which will access @pad structure used for raw sockets protocol specification: we can't simply rename this member without breaking uapi v7: - sine sdiag_raw_protocol() helper is not suitable for uapi lets rather make an alias structure with proper names. __check_inet_diag_req_raw helper will catch if any of structure unintentionally changed. CC: David S. Miller <davem@davemloft.net> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: David Ahern <dsa@cumulusnetworks.com> CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> CC: James Morris <jmorris@namei.org> CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> CC: Patrick McHardy <kaber@trash.net> CC: Andrey Vagin <avagin@openvz.org> CC: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-23lwt: Remove unused len fieldThomas Graf
The field is initialized by ILA and MPLS but never used. Remove it. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-22bpf: add helper for retrieving current numa node idDaniel Borkmann
Use case is mainly for soreuseport to select sockets for the local numa node, but since generic, lets also add this for other networking and tracing program types. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-22udp: implement memory accounting helpersPaolo Abeni
Avoid using the generic helpers. Use the receive queue spin lock to protect the memory accounting operation, both on enqueue and on dequeue. On dequeue perform partial memory reclaiming, trying to leave a quantum of forward allocated memory. On enqueue use a custom helper, to allow some optimizations: - use a plain spin_lock() variant instead of the slightly costly spin_lock_irqsave(), - avoid dst_force check, since the calling code has already dropped the skb dst - avoid orphaning the skb, since skb_steal_sock() already did the work for us The above needs custom memory reclaiming on shutdown, provided by the udp_destruct_sock(). v5 -> v6: - don't orphan the skb on enqueue v4 -> v5: - replace the mem_lock with the receive queue spin lock - ensure that the bh is always allowed to enqueue at least a skb, even if sk_rcvbuf is exceeded v3 -> v4: - reworked memory accunting, simplifying the schema - provide an helper for both memory scheduling and enqueuing v1 -> v2: - use a udp specific destrctor to perform memory reclaiming - remove a couple of helpers, unneeded after the above cleanup - do not reclaim memory on dequeue if not under memory pressure - reworked the fwd accounting schema to avoid potential integer overflow Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-22net/socket: factor out helpers for memory and queue manipulationPaolo Abeni
Basic sock operations that udp code can use with its own memory accounting schema. No functional change is introduced in the existing APIs. v4 -> v5: - avoid whitespace changes v2 -> v4: - avoid exporting __sock_enqueue_skb v1 -> v2: - avoid export sock_rmem_free Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20net: use core MTU range checking in misc driversJarod Wilson
firewire-net: - set min/max_mtu - remove fwnet_change_mtu nes: - set max_mtu - clean up nes_netdev_change_mtu xpnet: - set min/max_mtu - remove xpnet_dev_change_mtu hippi: - set min/max_mtu - remove hippi_change_mtu batman-adv: - set max_mtu - remove batadv_interface_change_mtu - initialization is a little async, not 100% certain that max_mtu is set in the optimal place, don't have hardware to test with rionet: - set min/max_mtu - remove rionet_change_mtu slip: - set min/max_mtu - streamline sl_change_mtu um/net_kern: - remove pointless ndo_change_mtu hsi/clients/ssi_protocol: - use core MTU range checking - remove now redundant ssip_pn_set_mtu ipoib: - set a default max MTU value - Note: ipoib's actual max MTU can vary, depending on if the device is in connected mode or not, so we'll just set the max_mtu value to the max possible, and let the ndo_change_mtu function continue to validate any new MTU change requests with checks for CM or not. Note that ipoib has no min_mtu set, and thus, the network core's mtu > 0 check is the only lower bounds here. mptlan: - use net core MTU range checking - remove now redundant mpt_lan_change_mtu fddi: - min_mtu = 21, max_mtu = 4470 - remove now redundant fddi_change_mtu (including export) fjes: - min_mtu = 8192, max_mtu = 65536 - The max_mtu value is actually one over IP_MAX_MTU here, but the idea is to get past the core net MTU range checks so fjes_change_mtu can validate a new MTU against what it supports (see fjes_support_mtu in fjes_hw.c) hsr: - min_mtu = 0 (calls ether_setup, max_mtu is 1500) f_phonet: - min_mtu = 6, max_mtu = 65541 u_ether: - min_mtu = 14, max_mtu = 15412 phonet/pep-gprs: - min_mtu = 576, max_mtu = 65530 - remove redundant gprs_set_mtu CC: netdev@vger.kernel.org CC: linux-rdma@vger.kernel.org CC: Stefan Richter <stefanr@s5r6.in-berlin.de> CC: Faisal Latif <faisal.latif@intel.com> CC: linux-rdma@vger.kernel.org CC: Cliff Whickman <cpw@sgi.com> CC: Robin Holt <robinmholt@gmail.com> CC: Jes Sorensen <jes@trained-monkey.org> CC: Marek Lindner <mareklindner@neomailbox.ch> CC: Simon Wunderlich <sw@simonwunderlich.de> CC: Antonio Quartulli <a@unstable.cc> CC: Sathya Prakash <sathya.prakash@broadcom.com> CC: Chaitra P B <chaitra.basappa@broadcom.com> CC: Suganath Prabu Subramani <suganath-prabu.subramani@broadcom.com> CC: MPT-FusionLinux.pdl@broadcom.com CC: Sebastian Reichel <sre@kernel.org> CC: Felipe Balbi <balbi@kernel.org> CC: Arvid Brodin <arvid.brodin@alten.se> CC: Remi Denis-Courmont <courmisch@gmail.com> Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20net: use core MTU range checking in WAN driversJarod Wilson
- set min/max_mtu in all hdlc drivers, remove hdlc_change_mtu - sent max_mtu in lec driver, remove lec_change_mtu - set min/max_mtu in x25_asy driver CC: netdev@vger.kernel.org CC: Krzysztof Halasa <khc@pm.waw.pl> CC: Krzysztof Halasa <khalasa@piap.pl> CC: Jan "Yenya" Kasprzak <kas@fi.muni.cz> CC: Francois Romieu <romieu@fr.zoreil.com> CC: Kevin Curtis <kevin.curtis@farsite.co.uk> CC: Zhao Qiang <qiang.zhao@nxp.com> Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20ethernet: use net core MTU range checking in more driversJarod Wilson
Somehow, I missed a healthy number of ethernet drivers in the last pass. Most of these drivers either were in need of an updated max_mtu to make jumbo frames possible to enable again. In a few cases, also setting a different min_mtu to match previous lower bounds. There are also a few drivers that had no upper bounds checking, so they're getting a brand new ETH_MAX_MTU that is identical to IP_MAX_MTU, but accessible by includes all ethernet and ethernet-like drivers all have already. acenic: - min_mtu = 0, max_mtu = 9000 amazon/ena: - min_mtu = 128, max_mtu = adapter->max_mtu amd/xgbe: - min_mtu = 0, max_mtu = 9000 sb1250: - min_mtu = 0, max_mtu = 1518 cxgb3: - min_mtu = 81, max_mtu = 65535 cxgb4: - min_mtu = 81, max_mtu = 9600 cxgb4vf: - min_mtu = 81, max_mtu = 65535 benet: - min_mtu = 256, max_mtu = 9000 ibmveth: - min_mtu = 68, max_mtu = 65535 ibmvnic: - min_mtu = adapter->min_mtu, max_mtu = adapter->max_mtu - remove now redundant ibmvnic_change_mtu jme: - min_mtu = 1280, max_mtu = 9202 mv643xx_eth: - min_mtu = 64, max_mtu = 9500 mlxsw: - min_mtu = 0, max_mtu = 65535 - Basically bypassing the core checks, and instead relying on dynamic checks in the respective switch drivers' ndo_change_mtu functions ns83820: - min_mtu = 0 - remove redundant ns83820_change_mtu, only checked for mtu > 1500 netxen: - min_mtu = 0, max_mtu = 8000 (P2), max_mtu = 9600 (P3) qlge: - min_mtu = 1500, max_mtu = 9000 - driver only supports setting mtu to 1500 or 9000, so the core check only rules out < 1500 and > 9000, qlge_change_mtu still needs to check that the value is 1500 or 9000 qualcomm/emac: - min_mtu = 46, max_mtu = 9194 xilinx_axienet: - min_mtu = 64, max_mtu = 9000 Fixes: 61e84623ace3 ("net: centralize net_device min/max MTU checking") CC: netdev@vger.kernel.org CC: Jes Sorensen <jes@trained-monkey.org> CC: Netanel Belgazal <netanel@annapurnalabs.com> CC: Tom Lendacky <thomas.lendacky@amd.com> CC: Santosh Raspatur <santosh@chelsio.com> CC: Hariprasad S <hariprasad@chelsio.com> CC: Sathya Perla <sathya.perla@broadcom.com> CC: Ajit Khaparde <ajit.khaparde@broadcom.com> CC: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> CC: Somnath Kotur <somnath.kotur@broadcom.com> CC: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> CC: John Allen <jallen@linux.vnet.ibm.com> CC: Guo-Fu Tseng <cooldavid@cooldavid.org> CC: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> CC: Jiri Pirko <jiri@mellanox.com> CC: Ido Schimmel <idosch@mellanox.com> CC: Manish Chopra <manish.chopra@qlogic.com> CC: Sony Chacko <sony.chacko@qlogic.com> CC: Rajesh Borundia <rajesh.borundia@qlogic.com> CC: Timur Tabi <timur@codeaurora.org> CC: Anirudha Sarangi <anirudh@xilinx.com> CC: John Linn <John.Linn@xilinx.com> Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-19bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registersThomas Graf
A BPF program is required to check the return register of a map_elem_lookup() call before accessing memory. The verifier keeps track of this by converting the type of the result register from PTR_TO_MAP_VALUE_OR_NULL to PTR_TO_MAP_VALUE after a conditional jump ensures safety. This check is currently exclusively performed for the result register 0. In the event the compiler reorders instructions, BPF_MOV64_REG instructions may be moved before the conditional jump which causes them to keep their type PTR_TO_MAP_VALUE_OR_NULL to which the verifier objects when the register is accessed: 0: (b7) r1 = 10 1: (7b) *(u64 *)(r10 -8) = r1 2: (bf) r2 = r10 3: (07) r2 += -8 4: (18) r1 = 0x59c00000 6: (85) call 1 7: (bf) r4 = r0 8: (15) if r0 == 0x0 goto pc+1 R0=map_value(ks=8,vs=8) R4=map_value_or_null(ks=8,vs=8) R10=fp 9: (7a) *(u64 *)(r4 +0) = 0 R4 invalid mem access 'map_value_or_null' This commit extends the verifier to keep track of all identical PTR_TO_MAP_VALUE_OR_NULL registers after a map_elem_lookup() by assigning them an ID and then marking them all when the conditional jump is observed. Signed-off-by: Thomas Graf <tgraf@suug.ch> Reviewed-by: Josef Bacik <jbacik@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-18ARM: pxa: enhance smc91x platform dataRobert Jarzmik
Instead of having the smc91x driver relying on machine_is_*() calls, provide this data through platform data, ie. idp, mainstone and stargate. This way, the driver doesn't need anymore machine_is_*() calls, which wouldn't work anymore with a device-tree build. Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-18net: phy: leds: add support for led triggers on phy link state changeZach Brown
Create an option CONFIG_LED_TRIGGER_PHY (default n), which will create a set of led triggers for each instantiated PHY device. There is one LED trigger per link-speed, per-phy. The triggers are registered during phy_attach and unregistered during phy_detach. This allows for a user to configure their system to allow a set of LEDs not controlled by the phy to represent link state changes on the phy. LEDS controlled by the phy are unaffected. For example, we have a board where some of the leds in the RJ45 socket are controlled by the phy, but others are not. Using the triggers provided by this patch the leds not controlled by the phy can be configured to show the current speed of the ethernet connection. The leds controlled by the phy are unaffected. Signed-off-by: Josh Cartwright <josh.cartwright@ni.com> Signed-off-by: Nathan Sullivan <nathan.sullivan@ni.com> Signed-off-by: Zach Brown <zach.brown@ni.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-18net: phy: Create phy_supported_speeds function which lists speeds currently ↵Zach Brown
supported by a phydevice phy_supported_speeds provides a means to get a list of all the speeds a phy device currently supports. Signed-off-by: Zach Brown <zach.brown@ni.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-18net: Remove all_adj_list and its referencesDavid Ahern
Only direct adjacencies are maintained. All upper or lower devices can be learned via the new walk API which recursively walks the adj_list for upper devices or lower devices. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-18net: Introduce new api for walking upper and lower devicesDavid Ahern
This patch introduces netdev_walk_all_upper_dev_rcu, netdev_walk_all_lower_dev and netdev_walk_all_lower_dev_rcu. These functions recursively walk the adj_list of devices to determine all upper and lower devices. The functions take a callback function that is invoked for each device in the list. If the callback returns non-0, the walk is terminated and the functions return that code back to callers. v3 - simplified netdev_has_upper_dev_all_rcu and __netdev_has_upper_dev and removed typecast as suggested by Stephen v2 - fixed definition of netdev_next_lower_dev_rcu to mirror the upper_dev version. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-17net: phy: Threaded interrupts allow some simplificationAndrew Lunn
The PHY interrupts are now handled in a threaded interrupt handler, which can sleep. The work queue is no longer needed, phy_change() can be called directly. phy_mac_interrupt() still needs to be safe to call in interrupt context, so keep the work queue, and use a helper to call phy_change(). Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-15lwtunnel: Add destroy state operationTom Herbert
Users of lwt tunnels may set up some secondary state in build_state function. Add a corresponding destroy_state function to allow users to clean up state. This destroy state function is called from lwstate_free. Also, we now free lwstate using kfree_rcu so user can assume structure is not freed before rcu. Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-14qed*: Allow unicast filteringYuval Mintz
Apparently qede fails to set IFF_UNICAST_FLT, and as a result is not actually performing unicast MAC filtering. While we're at it - relax a hard-coded limitation that limits each interface into using at most 15 unicast MAC addresses before turning promiscuous. Instead utilize the HW resources to their limit. Signed-off-by: Yuval Mintz <Yuval.Mintz@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-14qed: Pass MAC hints to VFsYuval Mintz
Some hypervisors can support MAC hints to their VFs. Even though we don't have such a hypervisor API in linux, we add sufficient logic for the VF to be able to receive such hints and set the mac accordingly - as long as the VF has not been set with a MAC already. Signed-off-by: Yuval Mintz <Yuval.Mintz@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-14net/sched: tc_mirred: Rename public predicates 'is_tcf_mirred_redirect' and ↵Shmulik Ladkani
'is_tcf_mirred_mirror' These accessors are used in various drivers that support tc offloading, to detect properties of a given 'tc_action'. 'is_tcf_mirred_redirect' tests that the action is TCA_EGRESS_REDIR. 'is_tcf_mirred_mirror' tests that the action is TCA_EGRESS_MIRROR. As a prep towards supporting INGRESS redir/mirror, rename these predicates to reflect their true meaning: s/is_tcf_mirred_redirect/is_tcf_mirred_egress_redirect/ s/is_tcf_mirred_mirror/is_tcf_mirred_egress_mirror/ Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Cc: Hariprasad S <hariprasad@chelsio.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Cc: Jiri Pirko <jiri@mellanox.com> Cc: Ido Schimmel <idosch@mellanox.com> Cc: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-14net/sched: act_mirred: Rename tcfm_ok_push to tcfm_mac_header_xmit and make ↵Shmulik Ladkani
it a bool 'tcfm_ok_push' specifies whether a mac_len sized push is needed upon egress to the target device (if action is performed at ingress). Rename it to 'tcfm_mac_header_xmit' as this is actually an attribute of the target device (and use a bool instead of int). This allows to decouple the attribute from the action to be taken. Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-14net: phy: Cleanup the Edge-Rate feature in Microsemi PHYs.Allan W. Nielsen
Edge-Rate cleanup include the following: - Updated device tree bindings documentation for edge-rate - The edge-rate is now specified as a "slowdown", meaning that it is now being specified as positive values instead of negative (both documentation and implementation wise). - Only explicitly documented values for "vsc8531,vddmac" and "vsc8531,edge-slowdown" are accepted by the device driver. - Deleted include/dt-bindings/net/mscc-phy-vsc8531.h as it was not needed. - Read/validate devicetree settings in probe instead of init Signed-off-by: Allan W. Nielsen <allan.nielsen@microsemi.com> Signed-off-by: Raju Lakkaraju <raju.lakkaraju@microsemi.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-14Revert "net: Add driver helper functions to determine checksum offloadability"stephen hemminger
This reverts commit 6ae23ad36253a8033c5714c52b691b84456487c5. The code has been in kernel since 4.4 but there are no in tree code that uses. Unused code is broken code, remove it. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2016-10-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix various build warnings in tlan/qed/xen-netback drivers, from Arnd Bergmann. 2) Propagate proper error code in strparser's strp_recv(), from Geert Uytterhoeven. 3) Fix accidental broadcast of RTM_GETTFILTER responses, from Eric Dumazret. 4) Need to use list_for_each_entry_safe() in qed driver, from Wei Yongjun. 5) Openvswitch 802.1AD bug fixes from Jiri Benc. 6) Cure BUILD_BUG_ON() in mlx5 driver, from Tom Herbert. 7) Fix UDP ipv6 checksumming in netvsc driver, from Stephen Hemminger. 8) stmmac driver fixes from Giuseppe CAVALLARO. 9) Fix access to mangled IP6CB in tcp, from Eric Dumazet. 10) Fix info leaks in tipc and rtnetlink, from Dan Carpenter. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (27 commits) net: bridge: add the multicast_flood flag attribute to brport_attrs net: axienet: Remove unused parameter from __axienet_device_reset liquidio: CN23XX: fix a loop timeout net: rtnl: info leak in rtnl_fill_vfinfo() tipc: info leak in __tipc_nl_add_udp_addr() net: ipv4: Do not drop to make_route if oif is l3mdev net: phy: Trigger state machine on state change and not polling. ipv6: tcp: restore IP6CB for pktoptions skbs netvsc: Remove mistaken udp.h inclusion. xen-netback: fix type mismatch warning stmmac: fix error check when init ptp stmmac: fix ptp init for gmac4 qed: fix old-style function definition netvsc: fix checksum on UDP IPV6 net_sched: reorder pernet ops and act ops registrations xen-netback: fix guest Rx stall detection (after guest Rx refactor) drivers/ptp: Fix kernel memory disclosure net/mlx5: Add MLX5_ARRAY_SET64 to fix BUILD_BUG_ON qmi_wwan: add support for Quectel EC21 and EC25 openvswitch: add NETIF_F_HW_VLAN_STAG_TX to internal dev ...
2016-10-13Merge tag 'nfs-for-4.9-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client updates from Anna Schumaker: "Highlights include: Stable bugfixes: - sunrpc: fix writ espace race causing stalls - NFS: Fix inode corruption in nfs_prime_dcache() - NFSv4: Don't report revoked delegations as valid in nfs_have_delegation() - NFSv4: nfs4_copy_delegation_stateid() must fail if the delegation is invalid - NFSv4: Open state recovery must account for file permission changes - NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic Features: - Add support for tracking multiple layout types with an ordered list - Add support for using multiple backchannel threads on the client - Add support for pNFS file layout session trunking - Delay xprtrdma use of DMA API (for device driver removal) - Add support for xprtrdma remote invalidation - Add support for larger xprtrdma inline thresholds - Use a scatter/gather list for sending xprtrdma RPC calls - Add support for the CB_NOTIFY_LOCK callback - Improve hashing sunrpc auth_creds by using both uid and gid Bugfixes: - Fix xprtrdma use of DMA API - Validate filenames before adding to the dcache - Fix corruption of xdr->nwords in xdr_copy_to_scratch - Fix setting buffer length in xdr_set_next_buffer() - Don't deadlock the state manager on the SEQUENCE status flags - Various delegation and stateid related fixes - Retry operations if an interrupted slot receives EREMOTEIO - Make nfs boot time y2038 safe" * tag 'nfs-for-4.9-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (100 commits) NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic fs: nfs: Make nfs boot time y2038 safe sunrpc: replace generic auth_cred hash with auth-specific function sunrpc: add RPCSEC_GSS hash_cred() function sunrpc: add auth_unix hash_cred() function sunrpc: add generic_auth hash_cred() function sunrpc: add hash_cred() function to rpc_authops struct Retry operation on EREMOTEIO on an interrupted slot pNFS: Fix atime updates on pNFS clients sunrpc: queue work on system_power_efficient_wq NFSv4.1: Even if the stateid is OK, we may need to recover the open modes NFSv4: If recovery failed for a specific open stateid, then don't retry NFSv4: Fix retry issues with nfs41_test/free_stateid NFSv4: Open state recovery must account for file permission changes NFSv4: Mark the lock and open stateids as invalid after freeing them NFSv4: Don't test open_stateid unless it is set NFSv4: nfs4_do_handle_exception() handle revoke/expiry of a single stateid NFS: Always call nfs_inode_find_state_and_recover() when revoking a delegation NFSv4: Fix a race when updating an open_stateid NFSv4: Fix a race in nfs_inode_reclaim_delegation() ...
2016-10-13Merge tag 'nfsd-4.9' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd updates from Bruce Fields: "Some RDMA work and some good bugfixes, and two new features that could benefit from user testing: - Anna Schumacker contributed a simple NFSv4.2 COPY implementation. COPY is already supported on the client side, so a call to copy_file_range() on a recent client should now result in a server-side copy that doesn't require all the data to make a round trip to the client and back. - Jeff Layton implemented callbacks to notify clients when contended locks become available, which should reduce latency on workloads with contended locks" * tag 'nfsd-4.9' of git://linux-nfs.org/~bfields/linux: NFSD: Implement the COPY call nfsd: handle EUCLEAN nfsd: only WARN once on unmapped errors exportfs: be careful to only return expected errors. nfsd4: setclientid_confirm with unmatched verifier should fail nfsd: randomize SETCLIENTID reply to help distinguish servers nfsd: set the MAY_NOTIFY_LOCK flag in OPEN replies nfs: add a new NFS4_OPEN_RESULT_MAY_NOTIFY_LOCK constant nfsd: add a LRU list for blocked locks nfsd: have nfsd4_lock use blocking locks for v4.1+ locks nfsd: plumb in a CB_NOTIFY_LOCK operation NFSD: fix corruption in notifier registration svcrdma: support Remote Invalidation svcrdma: Server-side support for rpcrdma_connect_private rpcrdma: RDMA/CM private message data structure svcrdma: Skip put_page() when send_reply() fails svcrdma: Tail iovec leaves an orphaned DMA mapping nfsd: fix dprintk in nfsd4_encode_getdeviceinfo nfsd: eliminate cb_minorversion field nfsd: don't set a FL_LAYOUT lease for flexfiles layouts
2016-10-13Merge tag 'xfs-reflink-for-linus-4.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs < XFS has gained super CoW powers! > ---------------------------------- \ ^__^ \ (oo)\_______ (__)\ )\/\ ||----w | || || Pull XFS support for shared data extents from Dave Chinner: "This is the second part of the XFS updates for this merge cycle. This pullreq contains the new shared data extents feature for XFS. Given the complexity and size of this change I am expecting - like the addition of reverse mapping last cycle - that there will be some follow-up bug fixes and cleanups around the -rc3 stage for issues that I'm sure will show up once the code hits a wider userbase. What it is: At the most basic level we are simply adding shared data extents to XFS - i.e. a single extent on disk can now have multiple owners. To do this we have to add new on-disk features to both track the shared extents and the number of times they've been shared. This is done by the new "refcount" btree that sits in every allocation group. When we share or unshare an extent, this tree gets updated. Along with this new tree, the reverse mapping tree needs to be updated to track each owner or a shared extent. This also needs to be updated ever share/unshare operation. These interactions at extent allocation and freeing time have complex ordering and recovery constraints, so there's a significant amount of new intent-based transaction code to ensure that operations are performed atomically from both the runtime and integrity/crash recovery perspectives. We also need to break sharing when writes hit a shared extent - this is where the new copy-on-write implementation comes in. We allocate new storage and copy the original data along with the overwrite data into the new location. We only do this for data as we don't share metadata at all - each inode has it's own metadata that tracks the shared data extents, the extents undergoing CoW and it's own private extents. Of course, being XFS, nothing is simple - we use delayed allocation for CoW similar to how we use it for normal writes. ENOSPC is a significant issue here - we build on the reservation code added in 4.8-rc1 with the reverse mapping feature to ensure we don't get spurious ENOSPC issues part way through a CoW operation. These mechanisms also help minimise fragmentation due to repeated CoW operations. To further reduce fragmentation overhead, we've also introduced a CoW extent size hint, which indicates how large a region we should allocate when we execute a CoW operation. With all this functionality in place, we can hook up .copy_file_range, .clone_file_range and .dedupe_file_range and we gain all the capabilities of reflink and other vfs provided functionality that enable manipulation to shared extents. We also added a fallocate mode that explicitly unshares a range of a file, which we implemented as an explicit CoW of all the shared extents in a file. As such, it's a huge chunk of new functionality with new on-disk format features and internal infrastructure. It warns at mount time as an experimental feature and that it may eat data (as we do with all new on-disk features until they stabilise). We have not released userspace suport for it yet - userspace support currently requires download from Darrick's xfsprogs repo and build from source, so the access to this feature is really developer/tester only at this point. Initial userspace support will be released at the same time the kernel with this code in it is released. The new code causes 5-6 new failures with xfstests - these aren't serious functional failures but things the output of tests changing slightly due to perturbations in layouts, space usage, etc. OTOH, we've added 150+ new tests to xfstests that specifically exercise this new functionality so it's got far better test coverage than any functionality we've previously added to XFS. Darrick has done a pretty amazing job getting us to this stage, and special mention also needs to go to Christoph (review, testing, improvements and bug fixes) and Brian (caught several intricate bugs during review) for the effort they've also put in. Summary: - unshare range (FALLOC_FL_UNSHARE) support for fallocate - copy-on-write extent size hints (FS_XFLAG_COWEXTSIZE) for fsxattr interface - shared extent support for XFS - copy-on-write support for shared extents - copy_file_range support - clone_file_range support (implements reflink) - dedupe_file_range support - defrag support for reverse mapping enabled filesystems" * tag 'xfs-reflink-for-linus-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (71 commits) xfs: convert COW blocks to real blocks before unwritten extent conversion xfs: rework refcount cow recovery error handling xfs: clear reflink flag if setting realtime flag xfs: fix error initialization xfs: fix label inaccuracies xfs: remove isize check from unshare operation xfs: reduce stack usage of _reflink_clear_inode_flag xfs: check inode reflink flag before calling reflink functions xfs: implement swapext for rmap filesystems xfs: refactor swapext code xfs: various swapext cleanups xfs: recognize the reflink feature bit xfs: simulate per-AG reservations being critically low xfs: don't mix reflink and DAX mode for now xfs: check for invalid inode reflink flags xfs: set a default CoW extent size of 32 blocks xfs: convert unwritten status of reverse mappings for shared files xfs: use interval query for rmap alloc operations on shared files xfs: add shared rmap map/unmap/convert log item types xfs: increase log reservations for reflink ...
2016-10-13Merge git://www.linux-watchdog.org/linux-watchdogLinus Torvalds
Pull watchdog updates from Wim Van Sebroeck: - a new watchdog pretimeout governor framework - support to upload the firmware on the ziirave_wdt - several fixes and cleanups * git://www.linux-watchdog.org/linux-watchdog: (26 commits) watchdog: imx2_wdt: add pretimeout function support watchdog: softdog: implement pretimeout support watchdog: pretimeout: add pretimeout_available_governors attribute watchdog: pretimeout: add option to select a pretimeout governor in runtime watchdog: pretimeout: add panic pretimeout governor watchdog: pretimeout: add noop pretimeout governor watchdog: add watchdog pretimeout governor framework watchdog: hpwdt: add support for iLO5 fs: compat_ioctl: add pretimeout functions for watchdogs watchdog: add pretimeout support to the core watchdog: imx2_wdt: use preferred BIT macro instead of open coded values watchdog: st_wdt: Remove support for obsolete platforms watchdog: bindings: Remove obsolete platforms from dt doc. watchdog: mt7621_wdt: Remove assignment of dev pointer watchdog: rt2880_wdt: Remove assignment of dev pointer watchdog: constify watchdog_ops structures watchdog: tegra: constify watchdog_ops structures watchdog: iTCO_wdt: constify iTCO_wdt_pm structure watchdog: cadence_wdt: Fix the suspend resume watchdog: txx9wdt: Add missing clock (un)prepare calls for CCF ...
2016-10-13net: ipv4: Do not drop to make_route if oif is l3mdevDavid Ahern
Commit e0d56fdd7342 was a bit aggressive removing l3mdev calls in the IPv4 stack. If the fib_lookup fails we do not want to drop to make_route if the oif is an l3mdev device. Also reverts 19664c6a0009 ("net: l3mdev: Remove netif_index_is_l3_master") which removed netif_index_is_l3_master. Fixes: e0d56fdd7342 ("net: l3mdev: remove redundant calls") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-13net/mlx5: Add MLX5_ARRAY_SET64 to fix BUILD_BUG_ONTom Herbert
I am hitting this in mlx5: drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c: In function reclaim_pages_cmd.clone.0: drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c:346: error: call to __compiletime_assert_346 declared with attribute error: BUILD_BUG_ON failed: __mlx5_bit_off(manage_pages_out, pas[i]) % 64 drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c: In function give_pages: drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c:291: error: call to __compiletime_assert_291 declared with attribute error: BUILD_BUG_ON failed: __mlx5_bit_off(manage_pages_in, pas[i]) % 64 Problem is that this is doing a BUILD_BUG_ON on a non-constant expression because of trying to take offset of pas[i] in the structure. Fix is to create MLX5_ARRAY_SET64 that takes an additional argument that is the field index to separate between BUILD_BUG_ON on the array constant field and the indexed field to assign the value to. There are two callers of MLX5_SET64 that are trying to get a variable offset, change those to call MLX5_ARRAY_SET64 passing 'pas' and 'i' as the arguments to use in the offset check and the indexed value assignment. Fixes: a533ed5e179cd ("net/mlx5: Pages management commands via mlx5 ifc") Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-13sctp: remove the old ttl expires policyXin Long
The prsctp polices include ttl expires policy already, we should remove the old ttl expires codes, and just adjust the new polices' codes to be compatible with the old one for users. This patch is to remove all the old expires codes, and if prsctp polices are not set, it will still set msg's expires_at and check the expires in sctp_check_abandoned. Note that asoc->prsctp_enable is set by default, so users can't feel any difference even if they use the old expires api in userspace. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-13sctp: reuse sent_count to avoid retransmitted chunks for RTT measurementsXin Long
Now sctp uses chunk->resent to record if a chunk is retransmitted, for RTT measurements with retransmitted DATA chunks. chunk->sent_count was introduced to record how many times one chunk has been sent for prsctp RTX policy before. We actually can know if one chunk is retransmitted by checking chunk->sent_count is greater than 1. This patch is to remove resent from sctp_chunk and reuse sent_count to avoid retransmitted chunks for RTT measurements. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-13net: deprecate eth_change_mtu, remove usageJarod Wilson
With centralized MTU checking, there's nothing productive done by eth_change_mtu that isn't already done in dev_set_mtu, so mark it as deprecated and remove all usage of it in the kernel. All callers have been audited for calls to alloc_etherdev* or ether_setup directly, which means they all have a valid dev->min_mtu and dev->max_mtu. Now eth_change_mtu prints out a netdev_warn about being deprecated, for the benefit of out-of-tree drivers that might be utilizing it. Of note, dvb_net.c actually had dev->mtu = 4096, while using eth_change_mtu, meaning that if you ever tried changing it's mtu, you couldn't set it above 1500 anymore. It's now getting dev->max_mtu also set to 4096 to remedy that. v2: fix up lantiq_etop, missed breakage due to drive not compiling on x86 CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-13net: centralize net_device min/max MTU checkingJarod Wilson
While looking into an MTU issue with sfc, I started noticing that almost every NIC driver with an ndo_change_mtu function implemented almost exactly the same range checks, and in many cases, that was the only practical thing their ndo_change_mtu function was doing. Quite a few drivers have either 68, 64, 60 or 46 as their minimum MTU value checked, and then various sizes from 1500 to 65535 for their maximum MTU value. We can remove a whole lot of redundant code here if we simple store min_mtu and max_mtu in net_device, and check against those in net/core/dev.c's dev_set_mtu(). In theory, there should be zero functional change with this patch, it just puts the infrastructure in place. Subsequent patches will attempt to start using said infrastructure, with theoretically zero change in functionality. CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-12Merge tag 'pwm/for-4.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm Pull pwm updates from Thierry Reding: "This set of changes contains support for PWM signal capture in the STi driver as well as support for the PWM controller found on Meson SoCs. There's also support added for the MediaTek MT2701 and SunXi H3 to the existing drivers. Other than that there's a fair set of miscellaneous cleanups and fixes across the board" * tag 'pwm/for-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm: (24 commits) pwm: meson: Handle unknown ID values pwm: sti: Take the opportunity to conduct a little house keeping pwm: sti: It's now valid for number of PWM channels to be zero pwm: sti: Add PWM capture callback pwm: sti: Add support for PWM capture interrupts pwm: sti: Initialise PWM capture device data pwm: sti: Supply PWM Capture clock handling pwm: sti: Supply PWM capture register addresses and bit locations pwm: sti: Only request clock rate when needed pwm: sti: Reorganise register names in preparation for new functionality pwm: sti: Rename channel => device dt-bindings: pwm: sti: Update DT bindings for capture support pwm: lpc-18xx: use pwm_set_chip_data pwm: sunxi: Add H3 support pwm: Add support for Meson PWM Controller dt-bindings: pwm: Add bindings for Meson PWM Controller pwm: samsung: Fix to use lowest div for large enough modulation bits pwm: pwm-tipwmss: Remove all runtime PM gets/puts pwm: cros-ec: Add __packed to prevent padding pwm: Add MediaTek MT2701 display PWM driver support ...
2016-10-12Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux Pull thermal managament updates from Zhang Rui: - Enhance thermal "userspace" governor to export the reason when a thermal event is triggered and delivered to user space. From Srinivas Pandruvada - Introduce a single TSENS thermal driver for the different versions of the TSENS IP that exist, on different qcom msm/apq SoCs'. Support for msm8916, msm8960, msm8974 and msm8996 families is also added. From Rajendra Nayak - Introduce hardware-tracked trip points support to the device tree thermal sensor framework. The framework supports an arbitrary number of trip points. Whenever the current temperature is changed, the trip points immediately below and above the current temperature are found, driver callback is invoked to program the hardware to get notified when either of the two trip points are triggered. Hardware-tracked trip points support for rockchip thermal driver is also added at the same time. From Sascha Hauer, Caesar Wang - Introduce a new thermal driver, which enables TMU (Thermal Monitor Unit) on QorIQ platform. From Jia Hongtao - Introduce a new thermal driver for Maxim MAX77620. From Laxman Dewangan - Introduce a new thermal driver for Intel platforms using WhiskeyCove PMIC. From Bin Gao - Add mt2701 chip support to MTK thermal driver. From Dawei Chien - Enhance Tegra thermal driver to enable soctherm node and set "critical", "hot" trips, for Tegra124, Tegra132, Tegra210. From Wei Ni - Add resume support for tango thermal driver. From Marc Gonzalez - several small fixes and improvements for rockchip, qcom, imx, rcar, mtk thermal drivers and thermal core code. From Caesar Wang, Keerthy, Rocky Hao, Wei Yongjun, Peter Robinson, Bui Duc Phuc, Axel Lin, Hugh Kang * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (48 commits) thermal: int3403: Process trip change notification thermal: int340x: New Interface to read trip and notify thermal: user_space gov: Add additional information in uevent thermal: Enhance thermal_zone_device_update for events arm64: tegra: set hot trips for Tegra210 arm64: tegra: set critical trips for Tegra210 arm64: tegra: add soctherm node for Tegra210 arm64: tegra: set hot trips for Tegra132 arm64: tegra: set critical trips for Tegra132 arm64: tegra: use tegra132-soctherm for Tegra132 arm: tegra: set hot trips for Tegra124 arm: tegra: set critical trips for Tegra124 thermal: tegra: add hw-throttle for Tegra132 thermal: tegra: add hw-throttle function of: Add bindings of hw throttle for Tegra s