summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)Author
2019-11-07veth: use standard dev_lstats_add() and dev_lstats_read()Eric Dumazet
This cleanup will ease u64_stats_t adoption in a single location. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07net: nlmon: use standard dev_lstats_add() and dev_lstats_read()Eric Dumazet
No need to hand-code the exact same functions. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07net: provide dev_lstats_add() helperEric Dumazet
Many network drivers need it and hand-coded the same function. In order to ease u64_stats_t adoption, it is time to factorize. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07net: provide dev_lstats_read() helperEric Dumazet
Many network drivers use hand-coded implementation of the same thing, let's factorize things so that u64_stats_t adoption is done once. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07Merge branch 'net-Demote-MTU-change-prints-to-debug'David S. Miller
Florian Fainelli says: ==================== net: Demote MTU change prints to debug This patch series demotes several drivers that printed MTU change and could therefore spam the kernel console if one has a test that it's all about testing the values. Intel drivers were not also particularly consistent in how they printed the same message, so now they are. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07net: qcom/emac: Demote MTU change print to debugFlorian Fainelli
Changing the MTU can be a frequent operation and it is already clear when (or not) a MTU change is successful, demote prints to debug prints. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Timur Tabi <timur@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07net: ethernet: intel: Demote MTU change prints to debugFlorian Fainelli
Changing a network device MTU can be a fairly frequent operation, and failure to change the MTU is reflected to user-space properly, both by an appropriate message as well as by looking at whether the device's MTU matches the configuration. Demote the prints to debug prints by using netdev_dbg(), making all Intel wired LAN drivers consistent, since they used a mixture of PCI device and network device prints before. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07ethernet: ti: cpts: use ktime_get_real_ns helperIvan Khoronzhuk
Update on more short variant for getting real clock in ns. Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07Merge branch 'aquantia-next'David S. Miller
Igor Russkikh says: ==================== Aquantia Marvell atlantic driver updates 11-2019 Here is a bunch of atlantic driver new features and updates. Shortlist: - Me adding ethtool private flags for various loopback test modes, - Nikita is doing some work here on power management, implementing new PM API, He also did some checkpatch style cleanup of older driver parts. - I'm also adding a new UDP GSO offload support and flags for loopback activation - We are now Marvell, so I am changing email addresses on maintainers list. v2: styling, ip6 correct handling in udpgso ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07net: atlantic: change email domains to MarvellIgor Russkikh
Aquantia is now part of Marvell, eventually we'll cease standalone aquantia.com domain. Thus, change the maintainers file and some other references to @marvell.com domain Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07net: atlantic: implement UDP GSO offloadIgor Russkikh
atlantic hardware does support UDP hardware segmentation offload. This allows user to specify one large contiguous buffer with data which then will be split automagically into multiple UDP packets of specified size. Bulk sending of large UDP streams lowers CPU usage and increases bandwidth. We did estimations both with udpgso_bench_tx test tool and with modified iperf3 measurement tool (4 streams, multithread, 200b packet size) over AQC<->AQC 10G link. Flow control is disabled to prevent RX side impact on measurements. No UDP GSO: iperf3 -c 10.0.1.2 -u -b0 -l 200 -P4 --multithread UDP GSO: iperf3 -c 10.0.1.2 -u -b0 -l 12600 --udp-lso 200 -P4 --multithread Mode CPU iperf speed Line speed Packets per second ------------------------------------------------------------- NO UDP GSO 350% 3.07 Gbps 3.8 Gbps 1,919,419 SW UDP GSO 200% 5.55 Gbps 6.4 Gbps 3,286,144 HW UDP GSO 90% 6.80 Gbps 8.4 Gbps 4,273,117 Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07net: atlantic: update flow control logicNikita Danilov
We now differentiate requested and negotiated flow control modes. Therefore `ethtool -A` now operates on local requested FC values, and regular link settings shows the negotiated FC settings. Signed-off-by: Nikita Danilov <ndanilov@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07net: atlantic: stylistic renamesIgor Russkikh
We are trying to follow the naming of the chip (atlantic), not company. So replace some old namings. Signed-off-by: Nikita Danilov <ndanilov@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07net: atlantic: code style cleanupNikita Danilov
Thats a pure checkpatck walkthrough the code with no functional changes. Reverse christmas tree, spacing, etc. Signed-off-by: Nikita Danilov <ndanilov@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07net: atlantic: loopback tests via private flagsIgor Russkikh
Here we add a number of ethtool private flags to allow enabling various loopbacks on HW. Thats useful for verification and bringup works. Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07net: atlantic: add fw configuration memory areaNikita Danilov
Device FW has a separate memory area where various config fields are stored and could be used by the driver. Here we modify download/upload infrastructure to allow accessing this area. Lateron this will be used to configure various behaviours Signed-off-by: Nikita Danilov <ndanilov@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07net: atlantic: adding ethtool physical identificationNikita Danilov
`ethtool -p eth0` will blink leds helping identify physical port. Signed-off-by: Nikita Danilov <ndanilov@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07net: atlantic: add msglevel configurationNikita Danilov
We add ethtool msglevel configuration and change some printouts to use netdev_info set of functions. Signed-off-by: Nikita Danilov <ndanilov@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07net: atlantic: refactoring pm logicNikita Danilov
We now implement .driver.pm callbacks, these allows driver to work correctly in hibernate usecases, especially when used in conjunction with WOL feature. Before that driver only reacted to legacy .suspend/.resume callbacks, that was a limitation in some cases. Signed-off-by: Nikita Danilov <ndanilov@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07net: atlantic: implement wake_phy featureNikita Danilov
Wake on PHY allows to configure device to wakeup host as soon as PHY link status is changed to active. Signed-off-by: Nikita Danilov <ndanilov@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07net: atlantic: update firmware interfaceNikita Danilov
Here we improve FW interface structures layout and prepare these for the wake phy feature implementation. Signed-off-by: Nikita Danilov <ndanilov@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07Merge branch 'mlxsw-Add-layer-3-devlink-trap-support'David S. Miller
Ido Schimmel says: ==================== mlxsw: Add layer 3 devlink-trap support This patch set from Amit adds support in mlxsw for layer 3 traps that can report drops and exceptions via devlink-trap. In a similar fashion to the existing layer 2 traps, these traps can send packets to the CPU that were not routed as intended by the underlying device. The traps are divided between the two types detailed in devlink-trap documentation: drops and exceptions. Unlike drops, packets received via exception traps are also injected to the kernel's receive path, as they are required for the correct functioning of the control plane. For example, packets trapped due to TTL error must be injected to kernel's receive path for traceroute to work properly. Patch set overview: Patch #1 adds the layer 3 drop traps to devlink along with their documentation. Patch #2 adds support for layer 3 drop traps in mlxsw. Patches #3-#5 add selftests for layer 3 drop traps. Patch #6 adds the layer 3 exception traps to devlink along with their documentation. Patches #7-#9 gradually add support for layer 3 exception traps in mlxsw. Patches #10-#12 add selftests for layer 3 exception traps. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07selftests: mlxsw: Add test cases for devlink-trap layer 3 exceptionsAmit Cohen
Test that each supported packet trap exception is triggered under the right conditions. Signed-off-by: Amit Cohen <amitc@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07selftests: forwarding: tc_common: Add hitting checkAmit Cohen
Add an option to check that packets hit the tc filter without providing the exact number of packets that should hit it. It is useful while sending many packets in background and checking that at least one of them hit the tc filter. Signed-off-by: Amit Cohen <amitc@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07selftests: forwarding: devlink: Add functionality for trap exceptions testAmit Cohen
Add common part of all the tests - check devlink status to ensure that packets were trapped. Signed-off-by: Amit Cohen <amitc@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07mlxsw: Add layer 3 devlink-trap exceptions supportAmit Cohen
Add the trap IDs used to report layer 3 exceptions. Trapped packets are first reported to devlink and then injected to the kernel's receive path. All the packets have 'offload_fwd_mark' set in order to prevent them from potentially being forwarded by the bridge again. Signed-off-by: Amit Cohen <amitc@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07mlxsw: Add specific trap for packets routed via invalid nexthopsAmit Cohen
Currently, mlxsw does not differentiate between these two cases of routes with invalid nexthops: 1. Nexthops whose nexthop device is a mlxsw upper (has a RIF), but whose neighbour could not be resolved 2. Nexthops whose nexthop device is not a mlxsw upper (e.g., management interface) Up until now this did not matter and mlxsw trapped packets for both cases using the same trap ID. However, packets that should have been routed in hardware (case 1), but incurred a problem are considered exceptions and should be reported to the user. The two cases should therefore be split between two different trap IDs. Allocate a new adjacency entry during initialization and upon the insertion of the first route with an invalid mlxsw nexthop, program this entry to discard packets. Packets hitting this entry will be reported using new trap ID - "DISCARD_ROUTER3". In the future, the entry could be written during initialization, but currently firmware requires a valid RIF, which is not available at this stage. Signed-off-by: Amit Cohen <amitc@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07mlxsw: Add new FIB entry type for reject routesAmit Cohen
Currently, packets that cannot be routed in hardware (e.g., nexthop device is not upper of mlxsw), are trapped to the kernel for forwarding. Such packets are trapped using "RTR_INGRESS0" trap. This trap also traps packets that hit reject routes (e.g., "unreachable") so that the kernel will generate the appropriate ICMP error message for them. Subsequent patch will need to only report to devlink packets that hit a reject route, which is impossible as long as "RTR_INGRESS0" is overloaded like that. Solve this by using "RTR_INGRESS1" trap for packets that hit reject routes. Signed-off-by: Amit Cohen <amitc@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07devlink: Add layer 3 generic packet exception trapsAmit Cohen
Add layer 3 generic packet exception traps that can report trapped packets and documentation of the traps. Unlike drop traps, these exception traps also need to inject the packet to the kernel's receive path. For example, a packet that was trapped due to unreachable neighbour need to be injected into the kernel so that it will trigger an ARP request or a neighbour solicitation message. Signed-off-by: Amit Cohen <amitc@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07selftests: mlxsw: Add test cases for devlink-trap layer 3 dropsAmit Cohen
Test that each supported packet trap is triggered under the right conditions and that packets are indeed dropped and not forwarded. Signed-off-by: Amit Cohen <amitc@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07selftests: devlink: Make devlink_trap_cleanup() more genericAmit Cohen
Add proto parameter in order to enable the use of devlink_trap_cleanup() in tests that use IPv6 protocol. Signed-off-by: Amit Cohen <amitc@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07selftests: devlink: Export functions to devlink libraryAmit Cohen
l2_drops_test() is used to check that drop traps are functioning as intended. Currently it is only used in the layer 2 test, but it is also useful for the layer 3 test introduced in the subsequent patch. l2_drops_cleanup() is used to clean configurations and kill mausezahn proccess. Export the functions to the common devlink library to allow it to be re-used by future tests. Signed-off-by: Amit Cohen <amitc@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07mlxsw: Add layer 3 devlink-trap supportAmit Cohen
Add the trap IDs and trap group used to report layer 3 drops. Register layer 3 packet traps and associated layer 3 trap group with devlink during driver initialization. Signed-off-by: Amit Cohen <amitc@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07devlink: Add layer 3 generic packet trapsAmit Cohen
Add packet traps that can report packets that were dropped during layer 3 forwarding. Signed-off-by: Amit Cohen <amitc@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07enetc: fix return value for enetc_ioctl()Michael Walle
Return -EOPNOTSUPP instead of -EINVAL if the requested ioctl is not implemented. Signed-off-by: Michael Walle <michael@walle.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07tcp: Remove one extra ktime_get_ns() from cookie_init_timestampEric Dumazet
tcp_make_synack() already uses tcp_clock_ns(), and can pass the value to cookie_init_timestamp() to avoid another call to ktime_get_ns() helper. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07selftests: Add source route tests to fib_testsDavid Ahern
Add tests to verify routes with source address set are deleted when source address is deleted. Signed-off-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07inetpeer: fix data-race in inet_putpeer / inet_putpeerEric Dumazet
We need to explicitely forbid read/store tearing in inet_peer_gc() and inet_putpeer(). The following syzbot report reminds us about inet_putpeer() running without a lock held. BUG: KCSAN: data-race in inet_putpeer / inet_putpeer write to 0xffff888121fb2ed0 of 4 bytes by interrupt on cpu 0: inet_putpeer+0x37/0xa0 net/ipv4/inetpeer.c:240 ip4_frag_free+0x3d/0x50 net/ipv4/ip_fragment.c:102 inet_frag_destroy_rcu+0x58/0x80 net/ipv4/inet_fragment.c:228 __rcu_reclaim kernel/rcu/rcu.h:222 [inline] rcu_do_batch+0x256/0x5b0 kernel/rcu/tree.c:2157 rcu_core+0x369/0x4d0 kernel/rcu/tree.c:2377 rcu_core_si+0x12/0x20 kernel/rcu/tree.c:2386 __do_softirq+0x115/0x33f kernel/softirq.c:292 invoke_softirq kernel/softirq.c:373 [inline] irq_exit+0xbb/0xe0 kernel/softirq.c:413 exiting_irq arch/x86/include/asm/apic.h:536 [inline] smp_apic_timer_interrupt+0xe6/0x280 arch/x86/kernel/apic/apic.c:1137 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:830 native_safe_halt+0xe/0x10 arch/x86/kernel/paravirt.c:71 arch_cpu_idle+0x1f/0x30 arch/x86/kernel/process.c:571 default_idle_call+0x1e/0x40 kernel/sched/idle.c:94 cpuidle_idle_call kernel/sched/idle.c:154 [inline] do_idle+0x1af/0x280 kernel/sched/idle.c:263 write to 0xffff888121fb2ed0 of 4 bytes by interrupt on cpu 1: inet_putpeer+0x37/0xa0 net/ipv4/inetpeer.c:240 ip4_frag_free+0x3d/0x50 net/ipv4/ip_fragment.c:102 inet_frag_destroy_rcu+0x58/0x80 net/ipv4/inet_fragment.c:228 __rcu_reclaim kernel/rcu/rcu.h:222 [inline] rcu_do_batch+0x256/0x5b0 kernel/rcu/tree.c:2157 rcu_core+0x369/0x4d0 kernel/rcu/tree.c:2377 rcu_core_si+0x12/0x20 kernel/rcu/tree.c:2386 __do_softirq+0x115/0x33f kernel/softirq.c:292 run_ksoftirqd+0x46/0x60 kernel/softirq.c:603 smpboot_thread_fn+0x37d/0x4a0 kernel/smpboot.c:165 kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 5.4.0-rc3+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Fixes: 4b9d9be839fd ("inetpeer: remove unused list") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07net: phy: at803x: add missing dependency on CONFIG_REGULATORMadalin Bucur
Compilation fails on PPC targets as CONFIG_REGULATOR is not set and drivers/regulator/devres.c is not compiled in while functions exported there are used by drivers/net/phy/at803x.c. Here's the error log: LD .tmp_vmlinux1 drivers/net/phy/at803x.o: In function `at803x_rgmii_reg_set_voltage_sel': drivers/net/phy/at803x.c:294: undefined reference to `.rdev_get_drvdata' drivers/net/phy/at803x.o: In function `at803x_rgmii_reg_get_voltage_sel': drivers/net/phy/at803x.c:306: undefined reference to `.rdev_get_drvdata' drivers/net/phy/at803x.o: In function `at8031_register_regulators': drivers/net/phy/at803x.c:359: undefined reference to `.devm_regulator_register' drivers/net/phy/at803x.c:365: undefined reference to `.devm_regulator_register' drivers/net/phy/at803x.o:(.data.rel+0x0): undefined reference to `regulator_list_voltage_table' linux/Makefile:1074: recipe for target 'vmlinux' failed make[1]: *** [vmlinux] Error 1 Fixes: 2f664823a470 ("net: phy: at803x: add device tree binding") Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07dpaa2-eth: add ethtool MAC countersIoana Ciornei
When a DPNI is connected to a MAC, export its associated counters. Ethtool related functions are added in dpaa2_mac for returning the number of counters, their strings and also their values. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07enetc: ethtool: add wake-on-lan callbacksMichael Walle
If there is an external PHY, pass the wake-on-lan request to the PHY. Signed-off-by: Michael Walle <michael@walle.cc> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07enetc: add ioctl() support for PHY-related opsMichael Walle
If there is an attached PHY try to handle the requested ioctl with its handler, which allows the userspace to access PHY registers, for example. This will make mii-diag and similar tools work. Signed-off-by: Michael Walle <michael@walle.cc> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07mlxsw: spectrum: Fix error return code in mlxsw_sp_port_module_info_init()Wei Yongjun
Fix to return negative error code -ENOMEM from the error handling case instead of 0, as done elsewhere in this function. Fixes: 4a7f970f1240 ("mlxsw: spectrum: Replace port_to_module array with array of structs") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07Merge branch 'cxgb4-add-support-for-TC-MQPRIO-Qdisc-Offload'David S. Miller
Rahul Lakkireddy says: ==================== cxgb4: add support for TC-MQPRIO Qdisc Offload This series of patches add support for offloading TC-MQPRIO Qdisc to Chelsio T5/T6 NICs. Offloading QoS traffic shaping and pacing requires using Ethernet Offload (ETHOFLD) resources available on Chelsio NICs. The ETHOFLD resources are configured by firmware and taken from the resource pool shared with other Chelsio Upper Layer Drivers. Traffic flowing through ETHOFLD region requires a software netdev Tx queue (EOSW_TXQ) exposed to networking stack, and an underlying hardware Tx queue (EOHW_TXQ) used for sending packets through hardware. ETHOFLD region is addressed using EOTIDs, which are per-connection resource. Hence, EOTIDs are capable of storing only a very small number of packets in flight. To allow more connections to share the the QoS rate limiting configuration, multiple EOTIDs must be allocated to reduce packet drops. EOTIDs are 1-to-1 mapped with software EOSW_TXQ. Several software EOSW_TXQs can post packets to a single hardware EOHW_TXQ. The series is broken down as follows: Patch 1 queries firmware for maximum available traffic classes, as well as, start and maximum available indices (EOTID) into ETHOFLD region, supported by the underlying device. Patch 2 reworks queue configuration and simplifies MSI-X allocation logic in preparation for ETHOFLD queues support. Patch 3 adds skeleton for validating and configuring TC-MQPRIO Qdisc offload. Also, adds support for software EOSW_TXQs and exposes them to network stack. Updates Tx queue selection to use fallback NIC Tx path for unsupported traffic that can't go through ETHOFLD queues. Patch 4 adds support for managing hardware queues to rate limit traffic flowing through them. The queues are allocated/removed based on enabling/disabling TC-MQPRIO Qdisc offload, respectively. Patch 5 adds Tx path for traffic flowing through software EOSW_TXQ and EOHW_TXQ. Also, adds Rx path to handle Tx completions. Patch 6 updates exisiting SCHED API to configure FLOWC based QoS offload. In the existing QUEUE based rate limiting, multiple queues sharing a traffic class get the aggreagated max rate limit value. On the other hand, in FLOWC based rate limiting, multiple queues sharing a traffic class get their own individual max rate limit value. For example, if 2 queues are bound to class 0, which is rate limited to 1 Gbps, then in QUEUE based rate limiting, both the queues get the aggregate max output of 1 Gbps only. In FLOWC based rate limiting, each queue gets its own output of max 1 Gbps each; i.e. 2 queues * 1 Gbps rate limit = 2 Gbps max output. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07cxgb4: add FLOWC based QoS offloadRahul Lakkireddy
Rework SCHED API to allow offloading TC-MQPRIO QoS configuration. The existing QUEUE based rate limiting throttles all queues sharing a traffic class, to the specified max rate limit value. So, if multiple queues share a traffic class, then all the queues get the aggregate specified max rate limit. So, introduce the new FLOWC based rate limiting, where multiple queues can share a traffic class with each queue getting its own individual specified max rate limit. For example, if 2 queues are bound to class 0, which is rate limited to 1 Gbps, then 2 queues using QUEUE based rate limiting, get the aggregate output of 1 Gbps only. In FLOWC based rate limiting, each queue gets its own output of max 1 Gbps each; i.e. 2 queues * 1 Gbps rate limit = 2 Gbps. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07cxgb4: add Tx and Rx path for ETHOFLD trafficRahul Lakkireddy
Implement Tx path for traffic flowing through software EOSW_TXQ and EOHW_TXQ. Since multiple EOSW_TXQ can post packets to a single EOHW_TXQ, protect the hardware queue with necessary spinlock. Also, move common code used to generate TSO work request to a common function. Implement Rx path to handle Tx completions for successfully transmitted packets. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07cxgb4: add ETHOFLD hardware queue supportRahul Lakkireddy
Add support for configuring and managing ETHOFLD hardware queues. Keep the queue count and MSI-X allocation scheme same as NIC queues. ETHOFLD hardware queues are dynamically allocated/destroyed as TC-MQPRIO Qdisc offload is enabled/disabled on the corresponding interface, respectively. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07cxgb4: parse and configure TC-MQPRIO offloadRahul Lakkireddy
Add logic for validation and configuration of TC-MQPRIO Qdisc offload. Also, add support to manage EOSW_TXQ, which have 1-to-1 mapping with EOTIDs, and expose them to network stack. Move common skb validation in Tx path to a separate function and add minimal Tx path for ETHOFLD. Update Tx queue selection to return normal NIC Txq to send traffic pattern that can't go through ETHOFLD Tx path. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07cxgb4: rework queue config and MSI-X allocationRahul Lakkireddy
Simplify queue configuration and MSI-X allocation logic. Use a single MSI-X information table for both NIC and ULDs. Remove hard-coded MSI-X indices for firmware event queue and non data interrupts. Instead, use the MSI-X bitmap to obtain a free MSI-X index dynamically. Save each Rxq's index into the MSI-X information table, within the Rxq structures themselves, for easier cleanup. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07cxgb4: query firmware for QoS offload resourcesRahul Lakkireddy
QoS offload needs Ethernet Offload (ETHOFLD) resources present in the NIC. These resources are shared with other ULDs. So, query firmware for the available number of traffic classes, as well as, start and end indices (EOTID) of the ETHOFLD region. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>