summaryrefslogtreecommitdiffstats
path: root/drivers/net/ethernet/mellanox/mlx4/en_tx.c
AgeCommit message (Collapse)Author
2020-12-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
xdp_return_frame_bulk() needs to pass a xdp_buff to __xdp_return(). strlcpy got converted to strscpy but here it makes no functional difference, so just keep the right code. Conflicts: net/netfilter/nf_tables_api.c Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-09net/mlx4_en: Handle TX error CQEMoshe Shemesh
In case error CQE was found while polling TX CQ, the QP is in error state and all posted WQEs will generate error CQEs without any data transmitted. Fix it by reopening the channels, via same method used for TX timeout handling. In addition add some more info on error CQE and WQE for debug. Fixes: bd2f631d7c60 ("net/mlx4_en: Notify user when TX ring in error state") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-11-20net/mlx4_en: Remove unused performance countersTariq Toukan
Performance analysis counters are maintained under the MLX4_EN_PERF_STAT definition, which is never set. Clean them up, with all related structures and logic. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Link: https://lore.kernel.org/r/20201118103427.4314-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Minor conflicts in net/mptcp/protocol.h and tools/testing/selftests/net/Makefile. In both cases code was added on both sides in the same place so just keep both. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-12mlx4: handle non-napi callers to napi_pollJonathan Lemon
netcons calls napi_poll with a budget of 0 to transmit packets. Handle this by: - skipping RX processing - do not try to recycle TX packets to the RX cache Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-09-25drivers/net/ethernet: clean up unused assignmentsJesse Brandeburg
As part of the W=1 compliation series, these lines all created warnings about unused variables that were assigned a value. Most of them are from register reads, but some are just picking up a return value from a function and never doing anything with it. Fixed warnings: .../ethernet/brocade/bna/bnad.c:3280:6: warning: variable ‘rx_count’ set but not used [-Wunused-but-set-variable] .../ethernet/brocade/bna/bnad.c:3280:6: warning: variable ‘rx_count’ set but not used [-Wunused-but-set-variable] .../ethernet/cortina/gemini.c:512:6: warning: variable ‘val’ set but not used [-Wunused-but-set-variable] .../ethernet/cortina/gemini.c:2110:21: warning: variable ‘config0’ set but not used [-Wunused-but-set-variable] .../ethernet/cavium/liquidio/octeon_device.c:1327:6: warning: variable ‘val32’ set but not used [-Wunused-but-set-variable] .../ethernet/cavium/liquidio/octeon_device.c:1358:6: warning: variable ‘val32’ set but not used [-Wunused-but-set-variable] .../ethernet/dec/tulip/media.c:322:8: warning: variable ‘setup’ set but not used [-Wunused-but-set-variable] .../ethernet/dec/tulip/de4x5.c:4928:13: warning: variable ‘r3’ set but not used [-Wunused-but-set-variable] .../ethernet/micrel/ksz884x.c:1652:7: warning: variable ‘dummy’ set but not used [-Wunused-but-set-variable] .../ethernet/micrel/ksz884x.c:1652:7: warning: variable ‘dummy’ set but not used [-Wunused-but-set-variable] .../ethernet/micrel/ksz884x.c:1652:7: warning: variable ‘dummy’ set but not used [-Wunused-but-set-variable] .../ethernet/micrel/ksz884x.c:1652:7: warning: variable ‘dummy’ set but not used [-Wunused-but-set-variable] .../ethernet/micrel/ksz884x.c:4981:6: warning: variable ‘rx_status’ set but not used [-Wunused-but-set-variable] .../ethernet/micrel/ksz884x.c:6510:6: warning: variable ‘rc’ set but not used [-Wunused-but-set-variable] .../ethernet/micrel/ksz884x.c:6087: warning: cannot understand function prototype: 'struct hw_regs ' .../ethernet/microchip/lan743x_main.c:161:6: warning: variable ‘int_en’ set but not used [-Wunused-but-set-variable] .../ethernet/microchip/lan743x_main.c:1702:6: warning: variable ‘int_sts’ set but not used [-Wunused-but-set-variable] .../ethernet/microchip/lan743x_main.c:3041:6: warning: variable ‘ret’ set but not used [-Wunused-but-set-variable] .../ethernet/natsemi/ns83820.c:603:6: warning: variable ‘tbisr’ set but not used [-Wunused-but-set-variable] .../ethernet/natsemi/ns83820.c:1207:11: warning: variable ‘tanar’ set but not used [-Wunused-but-set-variable] .../ethernet/marvell/mvneta.c:754:6: warning: variable ‘dummy’ set but not used [-Wunused-but-set-variable] .../ethernet/neterion/vxge/vxge-traffic.c:33:6: warning: variable ‘val64’ set but not used [-Wunused-but-set-variable] .../ethernet/neterion/vxge/vxge-traffic.c:160:6: warning: variable ‘val64’ set but not used [-Wunused-but-set-variable] .../ethernet/neterion/vxge/vxge-traffic.c:490:6: warning: variable ‘val32’ set but not used [-Wunused-but-set-variable] .../ethernet/neterion/vxge/vxge-traffic.c:2378:6: warning: variable ‘val64’ set but not used [-Wunused-but-set-variable] .../ethernet/packetengines/yellowfin.c:1063:18: warning: variable ‘yf_size’ set but not used [-Wunused-but-set-variable] .../ethernet/realtek/8139cp.c:1242:6: warning: variable ‘rc’ set but not used [-Wunused-but-set-variable] .../ethernet/mellanox/mlx4/en_tx.c:858:6: warning: variable ‘ring_cons’ set but not used [-Wunused-but-set-variable] .../ethernet/sis/sis900.c:792:6: warning: variable ‘status’ set but not used [-Wunused-but-set-variable] .../ethernet/sfc/falcon/farch.c:878:11: warning: variable ‘rx_ev_pkt_type’ set but not used [-Wunused-but-set-variable] .../ethernet/sfc/falcon/farch.c:877:23: warning: variable ‘rx_ev_mcast_pkt’ set but not used [-Wunused-but-set-variable] .../ethernet/sfc/falcon/farch.c:877:7: warning: variable ‘rx_ev_hdr_type’ set but not used [-Wunused-but-set-variable] .../ethernet/sfc/falcon/farch.c:876:7: warning: variable ‘rx_ev_other_err’ set but not used [-Wunused-but-set-variable] .../ethernet/sfc/falcon/farch.c:1646:21: warning: variable ‘buftbl_min’ set but not used [-Wunused-but-set-variable] .../ethernet/sfc/falcon/farch.c:2535:32: warning: variable ‘spec’ set but not used [-Wunused-but-set-variable] .../ethernet/via/via-velocity.c:880:6: warning: variable ‘curr_status’ set but not used [-Wunused-but-set-variable] .../ethernet/ti/tlan.c:656:6: warning: variable ‘rc’ set but not used [-Wunused-but-set-variable] .../ethernet/ti/davinci_emac.c:1230:6: warning: variable ‘num_tx_pkts’ set but not used [-Wunused-but-set-variable] .../ethernet/synopsys/dwc-xlgmac-common.c:516:8: warning: variable ‘str’ set but not used [-Wunused-but-set-variable] .../ethernet/ti/cpsw_new.c:1662:22: warning: variable ‘priv’ set but not used [-Wunused-but-set-variable] The register reads should be OK, because the current implementation of readl and friends will always execute even without an lvalue. When it makes sense, just remove the lvalue assignment and the local. Other times, just remove the offending code, and occasionally, just mark the variable as maybe unused since it could be used in an ifdef or debug scenario. Only compile tested with W=1. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-12net: ethernet: mlx4: Avoid assigning a value to ring_cons but not used it ↵Luo Jiaxing
anymore in mlx4_en_xmit() We found a set but not used variable 'ring_cons' in mlx4_en_xmit(), it will cause a warning when build the kernel. And after checking the commit record of this function, we found that it was introduced by a previous patch. So, We delete this redundant assignment code. Fixes: 488a9b48e398 ("net/mlx4_en: Wake TX queues only when there's enough room") Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Simple overlapping changes to linux/vermagic.h Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-23net/mlx4_en: use napi_complete_done() in TX completionEric Dumazet
In order to benefit from the new napi_defer_hard_irqs feature, we need to use napi_complete_done() variant in this driver. RX path is already using it, this patch implements TX completion side. mlx4_en_process_tx_cq() now returns the amount of retired packets, instead of a boolean, so that mlx4_en_poll_tx_cq() can pass this value to napi_complete_done(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-18net/mlx4_en: avoid indirect call in TX completionEric Dumazet
Commit 9ecc2d86171a ("net/mlx4_en: add xdp forwarding and data write support") brought another indirect call in fast path. Use INDIRECT_CALL_2() helper to avoid the cost of the indirect call when/if CONFIG_RETPOLINE=y Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tariq Toukan <tariqt@mellanox.com> Cc: Willem de Bruijn <willemb@google.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-22net: Use skb accessors in network driversMatthew Wilcox (Oracle)
In preparation for unifying the skb_frag and bio_vec, use the fine accessors which already exist and use skb_frag_t instead of struct skb_frag_struct. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-01drivers: mellanox: use netdev_xmit_more() helperFlorian Westphal
skb->xmit_more hint is now always 0. This switches the mellanox drivers to the netdev_xmit_more() helper. Cc: Saeed Mahameed <saeedm@mellanox.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Boris Pismenny <borisp@mellanox.com> Cc: Ilya Lesokhin <ilyal@mellanox.com> Cc: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-20net: remove 'fallback' argument from dev->ndo_select_queue()Paolo Abeni
After the previous patch, all the callers of ndo_select_queue() provide as a 'fallback' argument netdev_pick_tx. The only exceptions are nested calls to ndo_select_queue(), which pass down the 'fallback' available in the current scope - still netdev_pick_tx. We can drop such argument and replace fallback() invocation with netdev_pick_tx(). This avoids an indirect call per xmit packet in some scenarios (TCP syn, UDP unconnected, XDP generic, pktgen) with device drivers implementing such ndo. It also clean the code a bit. Tested with ixgbe and CONFIG_FCOE=m With pktgen using queue xmit: threads vanilla patched (kpps) (kpps) 1 2334 2428 2 4166 4278 4 7895 8100 v1 -> v2: - rebased after helper's name change Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-17net/mlx4_en: remove fallback after kzalloc_node()Eric Dumazet
kzalloc_node(..., GFP_KERNEL, node) will attempt to allocate memory as close as possible to the node. There is no need to fallback to kzalloc() if this has failed. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-03net/mlx4_en: use __netdev_tx_sent_queue()Eric Dumazet
doorbell only depends on xmit_more and netif_tx_queue_stopped() Using __netdev_tx_sent_queue() avoids messing with BQL stop flag, and is more generic. This patch increases performance on GSO workload by keeping doorbells to the minimum required. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-09net: allow fallback function to pass netdevAlexander Duyck
For most of these calls we can just pass NULL through to the fallback function as the sb_dev. The only cases where we cannot are the cases where we might be dealing with either an upper device or a driver that would have configured things to support an sb_dev itself. The only driver that has any significant change in this patch set should be ixgbe as we can drop the redundant functionality that existed in both the ndo_select_queue function and the fallback function that was passed through to us. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-07-09net: allow ndo_select_queue to pass netdevAlexander Duyck
This patch makes it so that instead of passing a void pointer as the accel_priv we instead pass a net_device pointer as sb_dev. Making this change allows us to pass the subordinate device through to the fallback function eventually so that we can keep the actual code in the ndo_select_queue call as focused on possible on the exception cases. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-29mlx4: Don't bother using skb_tx_hash in mlx4_en_select_queueAlexander Duyck
The code in the fallback path has supported XDP in conjunction with the Tx traffic classification for TCs for over a year now. So instead of just calling skb_tx_hash for every packet we are better off using the fallback since that will record the Tx queue to the socket and then that can be used instead of having to recompute the hash every time. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: "Highlights: 1) Maintain the TCP retransmit queue using an rbtree, with 1GB windows at 100Gb this really has become necessary. From Eric Dumazet. 2) Multi-program support for cgroup+bpf, from Alexei Starovoitov. 3) Perform broadcast flooding in hardware in mv88e6xxx, from Andrew Lunn. 4) Add meter action support to openvswitch, from Andy Zhou. 5) Add a data meta pointer for BPF accessible packets, from Daniel Borkmann. 6) Namespace-ify almost all TCP sysctl knobs, from Eric Dumazet. 7) Turn on Broadcom Tags in b53 driver, from Florian Fainelli. 8) More work to move the RTNL mutex down, from Florian Westphal. 9) Add 'bpftool' utility, to help with bpf program introspection. From Jakub Kicinski. 10) Add new 'cpumap' type for XDP_REDIRECT action, from Jesper Dangaard Brouer. 11) Support 'blocks' of transformations in the packet scheduler which can span multiple network devices, from Jiri Pirko. 12) TC flower offload support in cxgb4, from Kumar Sanghvi. 13) Priority based stream scheduler for SCTP, from Marcelo Ricardo Leitner. 14) Thunderbolt networking driver, from Amir Levy and Mika Westerberg. 15) Add RED qdisc offloadability, and use it in mlxsw driver. From Nogah Frankel. 16) eBPF based device controller for cgroup v2, from Roman Gushchin. 17) Add some fundamental tracepoints for TCP, from Song Liu. 18) Remove garbage collection from ipv6 route layer, this is a significant accomplishment. From Wei Wang. 19) Add multicast route offload support to mlxsw, from Yotam Gigi" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2177 commits) tcp: highest_sack fix geneve: fix fill_info when link down bpf: fix lockdep splat net: cdc_ncm: GetNtbFormat endian fix openvswitch: meter: fix NULL pointer dereference in ovs_meter_cmd_reply_start netem: remove unnecessary 64 bit modulus netem: use 64 bit divide by rate tcp: Namespace-ify sysctl_tcp_default_congestion_control net: Protect iterations over net::fib_notifier_ops in fib_seq_sum() ipv6: set all.accept_dad to 0 by default uapi: fix linux/tls.h userspace compilation error usbnet: ipheth: prevent TX queue timeouts when device not ready vhost_net: conditionally enable tx polling uapi: fix linux/rxrpc.h userspace compilation errors net: stmmac: fix LPI transitioning for dwmac4 atm: horizon: Fix irq release error net-sysfs: trigger netlink notification on ifalias change via sysfs openvswitch: Using kfree_rcu() to simplify the code openvswitch: Make local function ovs_nsh_key_attr_size() static openvswitch: Fix return value check in ovs_meter_cmd_features() ...
2017-10-25locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns ↵Mark Rutland
to READ_ONCE()/WRITE_ONCE() Please do not apply this to mainline directly, instead please re-run the coccinelle script shown below and apply its output. For several reasons, it is desirable to use {READ,WRITE}_ONCE() in preference to ACCESS_ONCE(), and new code is expected to use one of the former. So far, there's been no reason to change most existing uses of ACCESS_ONCE(), as these aren't harmful, and changing them results in churn. However, for some features, the read/write distinction is critical to correct operation. To distinguish these cases, separate read/write accessors must be used. This patch migrates (most) remaining ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following coccinelle script: ---- // Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and // WRITE_ONCE() // $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch virtual patch @ depends on patch @ expression E1, E2; @@ - ACCESS_ONCE(E1) = E2 + WRITE_ONCE(E1, E2) @ depends on patch @ expression E; @@ - ACCESS_ONCE(E) + READ_ONCE(E) ---- Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: davem@davemloft.net Cc: linux-arch@vger.kernel.org Cc: mpe@ellerman.id.au Cc: shuah@kernel.org Cc: snitzer@redhat.com Cc: thor.thayer@linux.intel.com Cc: tj@kernel.org Cc: viro@zeniv.linux.org.uk Cc: will.deacon@arm.com Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-11net/mlx4_en: XDP_TX, assign constant values of TX descs on ring creaionTariq Toukan
In XDP_TX, some fields in tx_info and tx_desc are constants across all entries of the different XDP_TX rings. Assign values to these fields on ring creation time, rather than in data-path. Patchset performance tests: Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz Single queue no-RSS optimization ON. XDP_TX packet rate: ------------------------------ Before | After | Gain | 13.7 Mpps | 14.0 Mpps | %2.2 | ------------------------------ Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-11net/mlx4_en: Obsolete call to generic write_desc in XDP xmit flowTariq Toukan
Function mlx4_en_tx_write_desc() is not optimized to use of XDP xmit. Use the relevant parts inline instead. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-11net/mlx4_en: Replace netdev parameter with priv in XDP xmit functionTariq Toukan
The struct net_device parameter was passed only to extract struct mlx4_en_priv out of it. Here we pass the priv parameter directly. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-09net/mlx4_en: Use __force to fix a sparse warning in TX datapathTariq Toukan
In TX data-path, we intentionally do not byte-swap, as documented in code and in the cited commit log. This fixes sparse warning: en_tx.c:720:23: warning: incorrect type in argument 1 (different base types) en_tx.c:720:23: expected unsigned int [unsigned] [usertype] <noident> en_tx.c:720:23: got restricted __be32 [usertype] doorbell_qpn Fixes: 492f5add4be8 ("net/mlx4_en: Doorbell is byteswapped in Little Endian archs") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: 1) Support ipv6 checksum offload in sunvnet driver, from Shannon Nelson. 2) Move to RB-tree instead of custom AVL code in inetpeer, from Eric Dumazet. 3) Allow generic XDP to work on virtual devices, from John Fastabend. 4) Add bpf device maps and XDP_REDIRECT, which can be used to build arbitrary switching frameworks using XDP. From John Fastabend. 5) Remove UFO offloads from the tree, gave us little other than bugs. 6) Remove the IPSEC flow cache, from Florian Westphal. 7) Support ipv6 route offload in mlxsw driver. 8) Support VF representors in bnxt_en, from Sathya Perla. 9) Add support for forward error correction modes to ethtool, from Vidya Sagar Ravipati. 10) Add time filter for packet scheduler action dumping, from Jamal Hadi Salim. 11) Extend the zerocopy sendmsg() used by virtio and tap to regular sockets via MSG_ZEROCOPY. From Willem de Bruijn. 12) Significantly rework value tracking in the BPF verifier, from Edward Cree. 13) Add new jump instructions to eBPF, from Daniel Borkmann. 14) Rework rtnetlink plumbing so that operations can be run without taking the RTNL semaphore. From Florian Westphal. 15) Support XDP in tap driver, from Jason Wang. 16) Add 32-bit eBPF JIT for ARM, from Shubham Bansal. 17) Add Huawei hinic ethernet driver. 18) Allow to report MD5 keys in TCP inet_diag dumps, from Ivan Delalande. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1780 commits) i40e: point wb_desc at the nvm_wb_desc during i40e_read_nvm_aq i40e: avoid NVM acquire deadlock during NVM update drivers: net: xgene: Remove return statement from void function drivers: net: xgene: Configure tx/rx delay for ACPI drivers: net: xgene: Read tx/rx delay for ACPI rocker: fix kcalloc parameter order rds: Fix non-atomic operation on shared flag variable net: sched: don't use GFP_KERNEL under spin lock vhost_net: correctly check tx avail during rx busy polling net: mdio-mux: add mdio_mux parameter to mdio_mux_init() rxrpc: Make service connection lookup always check for retry net: stmmac: Delete dead code for MDIO registration gianfar: Fix Tx flow control deactivation cxgb4: Ignore MPS_TX_INT_CAUSE[Bubble] for T6 cxgb4: Fix pause frame count in t4_get_port_stats cxgb4: fix memory leak tun: rename generic_xdp to skb_xdp tun: reserve extra headroom only when XDP is set net: dsa: bcm_sf2: Configure IMP port TC2QOS mapping net: dsa: bcm_sf2: Advertise number of egress queues ...
2017-08-16mlx4: sizeof style usagestephen hemminger
The kernel coding style is to treat sizeof as a function (ie. with parenthesis) not as an operator. Also use kcalloc and kmalloc_array Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-24(IB, net)/mlx4: Add resource utilization supportMoshe Shemesh
Adding visibility of resource usage of QPs, CQs and counters used by virtual functions. This feature will be used to give the PF administrator more data while debugging VF status. Usage info was added to ALLOC_RES command, to notify the PF if the resource which is being reserved or allocated for the VF will be used by kernel driver or by user verbs. Updated reservation and allocation functions of QP, CQ and counter with additional usage parameter. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-17{net, IB}/mlx4: Remove gfp flags argumentLeon Romanovsky
The caller to the driver marks GFP_NOIO allocations with help of memalloc_noio-* calls now. This makes redundant to pass down to the driver gfp flags, which can be GFP_KERNEL only. The patch removes the gfp flags argument and updates all driver paths. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-29net/mlx4_en: Do not allocate redundant TX queues when TC is disabledInbar Karmy
Currently the number of TX queues that are allocated doesn't depend on the number of TCs, the module always loads with max num of UP per channel. In order to prevent the allocation of unnecessary memory, the module will load with minimum number of UPs per channel, and the user will be able to control the number of TX queues per channel by changing the number of TC to 8 using the tc command. The variable num_up will hold the information about the current number of UPs. Due to the change, needed to remove the lines that set the value of UP to be different than zero in the func "mlx4_en_select_queue", since now the num of TX queues that are allocated is only one per channel in default. In order not to force the UP to be zero in case of only one TC, added a condition before forcing it in the func "mlx4_en_fill_qp_context". Tested: After the module is loaded with minimum number of UP per channel, to increase num of TCs to 8, use: tc qdisc add dev ens8 root mqprio num_tc 8 In order to decrease the number of TCs to minimum number of UP per channel, use: tc qdisc del dev ens8 root Signed-off-by: Inbar Karmy <inbark@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Cc: Tarick Bedeir <tarick@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15net/mlx4_en: Refactor mlx4_en_free_tx_descTariq Toukan
Some code re-ordering, functionally equivalent. - The !tx_info->inl check is evaluated anyway in both flows (common case/end case). Run it first, this might finish the flows earlier. - dma_unmap calls are identical in both flows, get it out of the if block into the common area. Performance tests: Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz Gain is too small to be measurable, no degradation sensed. Results are similar for IPv4 and IPv6. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Cc: kernel-team@fb.com Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15net/mlx4_en: Replace TXBB_SIZE multiplications with shift operationsTariq Toukan
Define LOG_TXBB_SIZE, log of TXBB_SIZE, and use it with a shift operation instead of a multiplication with TXBB_SIZE. Operations are equivalent as TXBB_SIZE is a power of two. Performance tests: Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz Gain is too small to be measurable, no degradation sensed. Results are similar for IPv4 and IPv6. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Cc: kernel-team@fb.com Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15net/mlx4_en: Poll XDP TX completion queue in RX NAPITariq Toukan
Instead of having their own NAPIs, XDP TX completion queues get polled within the corresponding RX NAPI. This prevents any possible race on TX ring prod/cons indices, between the context that issues the transmits (RX NAPI) and the context that handles the completions (was previously done in a separate NAPI). This also improves performance, as it decreases the number of NAPIs running on a CPU, saving the overhead of syncing and switching between the contexts. Performance tests: Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz Single queue no-RSS optimization ON. XDP_TX packet rate: ------------------------------------- | Before | After | Gain | IPv4 | 12.0 Mpps | 13.8 Mpps | 15% | IPv6 | 12.0 Mpps | 13.8 Mpps | 15% | ------------------------------------- Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Cc: kernel-team@fb.com Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15net/mlx4_en: Improve XDP xmit functionTariq Toukan
Several performance improvements in XDP TX datapath, including: - Ring a single doorbell for XDP TX ring per NAPI budget, instead of doing it per a lower threshold (was 8). This includes removing the flow of immediate doorbell ringing in case of a full TX ring. - Compiler branch predictor hints. - Calculate values in compile time rather than in runtime. Performance tests: Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz Single queue no-RSS optimization ON. XDP_TX packet rate: ------------------------------------- | Before | After | Gain | IPv4 | 10.3 Mpps | 12.0 Mpps | 17% | IPv6 | 10.3 Mpps | 12.0 Mpps | 17% | ------------------------------------- Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Cc: kernel-team@fb.com Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15net/mlx4_en: Improve stack xmit functionTariq Toukan
Several small code and performance improvements in stack TX datapath, including: - Compiler branch predictor hints. - Minimize variables scope. - Move tx_info non-inline flow handling to a separate function. - Calculate data_offset in compile time rather than in runtime (for !lso_header_size branch). - Avoid trinary-operator ("?") when value can be preset in a matching branch. Performance tests: Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz Gain is too small to be measurable, no degradation sensed. Results are similar for IPv4 and IPv6. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Cc: kernel-team@fb.com Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15net/mlx4_en: Improve transmit CQ pollingTariq Toukan
Several small performance improvements in TX CQ polling, including: - Compiler branch predictor hints. - Minimize variables scope. - More proper check of cq type. - Use boolean instead of int for a binary indication. Performance tests: Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz Packet-rate tests for both regular stack and XDP use cases: No noticeable gain, no degradation. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Cc: kernel-team@fb.com Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15net/mlx4_en: Remove unused argument in TX datapath functionTariq Toukan
Remove owner argument, as it is obsolete and unused. This also saves the overhead of calculating its value in data-path. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Cc: kernel-team@fb.com Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-08treewide: use kv[mz]alloc* rather than opencoded variantsMichal Hocko
There are many code paths opencoding kvmalloc. Let's use the helper instead. The main difference to kvmalloc is that those users are usually not considering all the aspects of the memory allocator. E.g. allocation requests <= 32kB (with 4kB pages) are basically never failing and invoke OOM killer to satisfy the allocation. This sounds too disruptive for something that has a reasonable fallback - the vmalloc. On the other hand those requests might fallback to vmalloc even when the memory allocator would succeed after several more reclaim/compaction attempts previously. There is no guarantee something like that happens though. This patch converts many of those places to kv[mz]alloc* helpers because they are more conservative. Link: http://lkml.kernel.org/r/20170306103327.2766-2-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> # Xen bits Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Andreas Dilger <andreas.dilger@intel.com> # Lustre Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> # KVM/s390 Acked-by: Dan Williams <dan.j.williams@intel.com> # nvdim Acked-by: David Sterba <dsterba@suse.com> # btrfs Acked-by: Ilya Dryomov <idryomov@gmail.com> # Ceph Acked-by: Tariq Toukan <tariqt@mellanox.com> # mlx4 Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx5 Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Anton Vorontsov <anton@enomsg.org> Cc: Colin Cross <ccross@android.com> Cc: Tony Luck <tony.luck@intel.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Santosh Raspatur <santosh@chelsio.com> Cc: Hariprasad S <hariprasad@chelsio.com> Cc: Yishai Hadas <yishaih@mellanox.com> Cc: Oleg Drokin <oleg.drokin@intel.com> Cc: "Yan, Zheng" <zyan@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-06mlx4: trust shinfo->gso_segsEric Dumazet
mlx4 is the only driver in the tree making a point to recompute shinfo->gso_segs. Lets remove superfluous code. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tariq Toukan <tariqt@mellanox.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-09mlx4: reduce rx ring page_cache sizeEric Dumazet
We only need to store the page and dma address. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-09mlx4: dma_dir is a mlx4_en_priv attributeEric Dumazet
No need to duplicate it for all queues and frags. num_frags & log_rx_info become u8 to save space. u8 accesses are a bit faster than u16 anyway. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30net/mlx4_core: Get num_tc using netdev_get_num_tcAlaa Hleihel
Avoid reading num_tc directly from struct net_device, but use the helper function netdev_get_num_tc. Fixes: bc6a4744b827 ("net/mlx4_en: num cores tx rings for every UP") Fixes: f5b6345ba8da ("net/mlx4_en: User prio mapping gets corrupted when changing number of channels") Signed-off-by: Alaa Hleihel <alaa@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-08mlx4: xdp: Reserve headroom for receiving packet when XDP prog is activeMartin KaFai Lau
Reserve XDP_PACKET_HEADROOM for packet and enable bpf_xdp_adjust_head() support. This patch only affects the code path when XDP is active. After testing, the tx_dropped counter is incremented if the xdp_prog sends more than wire MTU. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-24mlx4: reorganize struct mlx4_en_tx_ringEric Dumazet
Goal is to reorganize this critical structure to increase performance. ndo_start_xmit() should only dirty one cache line, and access as few cache lines as possible. Add sp_ (Slow Path) prefix to fields that are not used in fast path, to make clear what is going on. After this patch pahole reports something much better, as all ndo_start_xmit() needed fields are packed into two cache lines instead of seven or eight struct mlx4_en_tx_ring { u32 last_nr_txbb; /* 0 0x4 */ u32 cons; /* 0x4 0x4 */ long unsigned int wake_queue; /* 0x8 0x8 */ struct netdev_queue * tx_queue; /* 0x10 0x8 */ u32 (*free_tx_desc)(struct mlx4_en_priv *, struct mlx4_en_tx_ring *, int, u8, u64, int); /* 0x18 0x8 */ struct mlx4_en_rx_ring * recycle_ring; /* 0x20 0x8 */ /* XXX 24 bytes hole, try to pack */ /* --- cacheline 1 boundary (64 bytes) --- */ u32 prod; /* 0x40 0x4 */ unsigned int tx_dropped; /* 0x44 0x4 */ long unsigned int bytes; /* 0x48 0x8 */ long unsigned int packets; /* 0x50 0x8 */ long unsigned int tx_csum; /* 0x58 0x8 */ long unsigned int tso_packets; /* 0x60 0x8 */ long unsigned int xmit_more; /* 0x68 0x8 */ struct mlx4_bf bf; /* 0x70 0x18 */ /* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */ __be32 doorbell_qpn; /* 0x88 0x4 */ __be32 mr_key; /* 0x8c 0x4 */ u32 size; /* 0x90 0x4 */ u32 size_mask; /* 0x94 0x4 */ u32 full_size; /* 0x98 0x4 */ u32 buf_size; /* 0x9c 0x4 */ void * buf; /* 0xa0 0x8 */ struct mlx4_en_tx_info * tx_info; /* 0xa8 0x8 */ int qpn; /* 0xb0 0x4 */ u8 queue_index; /* 0xb4 0x1 */ bool bf_enabled; /* 0xb5 0x1 */ bool bf_alloced; /* 0xb6 0x1 */ u8 hwtstamp_tx_type; /* 0xb7 0x1 */ u8 * bounce_buf; /* 0xb8 0x8 */ /* --- cacheline 3 boundary (192 bytes) --- */ long unsigned int queue_stopped; /* 0xc0 0x8 */ struct mlx4_hwq_resources sp_wqres; /* 0xc8 0x58 */ /* --- cacheline 4 boundary (256 bytes) was 32 bytes ago --- */ struct mlx4_qp sp_qp; /* 0x120 0x30 */ /* --- cacheline 5 boundary (320 bytes) was 16 bytes ago --- */ struct mlx4_qp_context sp_context; /* 0x150 0xf8 */ /* --- cacheline 9 boundary (576 bytes) was 8 bytes ago --- */ cpumask_t sp_affinity_mask; /* 0x248 0x20 */ enum mlx4_qp_state sp_qp_state; /* 0x268 0x4 */ u16 sp_stride; /* 0x26c 0x2 */ u16 sp_cqn; /* 0x26e 0x2 */ /* size: 640, cachelines: 10, members: 36 */ /* sum members: 600, holes: 1, sum holes: 24 */ /* padding: 16 */ }; Instead of this silly placement : struct mlx4_en_tx_ring { u32 last_nr_txbb; /* 0 0x4 */ u32 cons; /* 0x4 0x4 */ long unsigned int wake_queue; /* 0x8 0x8 */ /* XXX 48 bytes hole, try to pack */ /* --- cacheline 1 boundary (64 bytes) --- */ u32 prod; /* 0x40 0x4 */ /* XXX 4 bytes hole, try to pack */ long unsigned int bytes; /* 0x48 0x8 */ long unsigned int packets; /* 0x50 0x8 */ long unsigned int tx_csum; /* 0x58 0x8 */ long unsigned int tso_packets; /* 0x60 0x8 */ long unsigned int xmit_more; /* 0x68 0x8 */ unsigned int tx_dropped; /* 0x70 0x4 */ /* XXX 4 bytes hole, try to pack */ struct mlx4_bf bf; /* 0x78 0x18 */ /* --- cacheline 2 boundary (128 bytes) was 16 bytes ago --- */ long unsigned int queue_stopped; /* 0x90 0x8 */ cpumask_t affinity_mask; /* 0x98 0x10 */ struct mlx4_qp qp; /* 0xa8 0x30 */ /* --- cacheline 3 boundary (192 bytes) was 24 bytes ago --- */ struct mlx4_hwq_resources wqres; /* 0xd8 0x58 */ /* --- cacheline 4 boundary (256 bytes) was 48 bytes ago --- */ u32 size; /* 0x130 0x4 */ u32 size_mask; /* 0x134 0x4 */ u16 stride; /* 0x138 0x2 */ /* XXX 2 bytes hole, try to pack */ u32 full_size; /* 0x13c 0x4 */ /* --- cacheline 5 boundary (320 bytes) --- */ u16 cqn; /* 0x140 0x2 */ /* XXX 2 bytes hole, try to pack */ u32 buf_size; /* 0x144 0x4 */ __be32 doorbell_qpn; /* 0x148 0x4 */ __be32 mr_key; /* 0x14c 0x4 */ void * buf; /* 0x150 0x8 */ struct mlx4_en_tx_info * tx_info; /* 0x158 0x8 */ struct mlx4_en_rx_ring * recycle_ring; /* 0x160 0x8 */ u32 (*free_tx_desc)(struct mlx4_en_priv *, struct mlx4_en_tx_ring *, int, u8, u64, int); /* 0x168 0x8 */ u8 * bounce_buf; /* 0x170 0x8 */ struct mlx4_qp_context context; /* 0x178 0xf8 */ /* --- cacheline 9 boundary (576 bytes) was 48 bytes ago --- */ int qpn; /* 0x270 0x4 */ enum mlx4_qp_state qp_state; /* 0x274 0x4 */ u8 queue_index; /* 0x278 0x1 */ bool bf_enabled; /* 0x279 0x1 */ bool bf_alloced; /* 0x27a 0x1 */ /* XXX 5 bytes hole, try to pack */ /* --- cacheline 10 boundary (640 bytes) --- */ struct netdev_queue * tx_queue; /* 0x280 0x8 */ int hwtstamp_tx_type; /* 0x288 0x4 */ /* size: 704, cachelines: 11, members: 36 */ /* sum members: 587, holes: 6, sum holes: 65 */ /* padding: 52 */ }; Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-02net/mlx4_en: Add ethtool statistics for XDP casesTariq Toukan
XDP statistics are reported in ethtool, in total and per ring, as follows: - xdp_drop: the number of packets dropped by xdp. - xdp_tx: the number of packets forwarded by xdp. - xdp_tx_full: the number of times an xdp forward failed due to a full tx xdp ring. In addition, all packets that are dropped/forwarded by XDP are no longer accounted in rx_packets/rx_bytes of the ring, so that they count traffic that is passed to the stack. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-02net/mlx4_en: Refactor the XDP forwarding rings schemeTariq Toukan
Separately manage the two types of TX rings: regular ones, and XDP. Upon an XDP set, do not borrow regular TX rings and convert them into XDP ones, but allocate new ones, unless we hit the max number of rings. Which means that in systems with smaller #cores we will not consume the current TX rings for XDP, while we are still in the num TX limit. XDP TX rings counters are not shown in ethtool statistics. Instead, XDP counters will be added to the respective RX rings in a downstream patch. This has no performance implications. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-11net/mlx4_en: Fix panic on xmit while port is downMoshe Shemesh
When port is down, tx drop counter update is not needed. Updating the counter in this case can cause a kernel panic as when the port is down, ring can be NULL. Fixes: 63a664b7e92b ("net/mlx4_en: fix tx_dropped bug") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19net/mlx4_en: add xdp forwarding and data write supportBrenden Blanco
A user will now be able to loop packets back out of the same port using a bpf program attached to xdp hook. Updates to the packet contents from the bpf program is also supported. For the packet write feature to work, the rx buffers are now mapped as bidirectional when the page is allocated. This occurs only when the xdp hook is active. When the program returns a TX action, enqueue the packet directly to a dedicated tx ring, so as to avoid completely any locking. This requires the tx ring to be allocated 1:1 for each rx ring, as well as the tx completion running in the same softirq. Upon tx completion, this dedicated tx ring recycles pages without unmapping directly back to the original rx ring. In steady state tx/drop workload, effectively 0 page allocs/frees will occur. In order to separate out the paths between free and recycle, a free_tx_desc func pointer is introduced that is optionally updated whenever recycle_ring is activated. By default the original free function is always initialized. Signed-off-by: Brenden Blanco <bblanco@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19net/mlx4_en: break out tx_desc write into separate functionBrenden Blanco
In preparation for writing the tx descriptor from multiple functions, create a helper for both normal and blueflame access. Signed-off-by: Brenden Blanco <bblanco@plu