summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)Author
2020-07-02bridge: uapi: mrp: Extend MRP attributes to get the statusHoratiu Vultur
Add MRP attribute IFLA_BRIDGE_MRP_INFO to allow the userspace to get the current state of the MRP instances. This is a nested attribute that contains other attributes like, ring id, index of primary and secondary port, priority, ring state, ring role. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-02bpf: selftests: Restore netns after each testMartin KaFai Lau
It is common for networking tests creating its netns and making its own setting under this new netns (e.g. changing tcp sysctl). If the test forgot to restore to the original netns, it would affect the result of other tests. This patch saves the original netns at the beginning and then restores it after every test. Since the restore "setns()" is not expensive, it does it on all tests without tracking if a test has created a new netns or not. The new restore_netns() could also be done in test__end_subtest() such that each subtest will get an automatic netns reset. However, the individual test would lose flexibility to have total control on netns for its own subtests. In some cases, forcing a test to do unnecessary netns re-configure for each subtest is time consuming. e.g. In my vm, forcing netns re-configure on each subtest in sk_assign.c increased the runtime from 1s to 8s. On top of that, test_progs.c is also doing per-test (instead of per-subtest) cleanup for cgroup. Thus, this patch also does per-test restore_netns(). The only existing per-subtest cleanup is reset_affinity() and no test is depending on this. Thus, it is removed from test__end_subtest() to give a consistent expectation to the individual tests. test_progs.c only ensures any affinity/netns/cgroup change made by an earlier test does not affect the following tests. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200702004858.2103728-1-kafai@fb.com
2020-07-02bpf: selftests: A few improvements to network_helpers.cMartin KaFai Lau
This patch makes a few changes to the network_helpers.c 1) Enforce SO_RCVTIMEO and SO_SNDTIMEO This patch enforces timeout to the network fds through setsockopt SO_RCVTIMEO and SO_SNDTIMEO. It will remove the need for SOCK_NONBLOCK that requires a more demanding timeout logic with epoll/select, e.g. epoll_create, epoll_ctrl, and then epoll_wait for timeout. That removes the need for connect_wait() from the cgroup_skb_sk_lookup.c. The needed change is made in cgroup_skb_sk_lookup.c. 2) start_server(): Add optional addr_str and port to start_server(). That removes the need of the start_server_with_port(). The caller can pass addr_str==NULL and/or port==0. I have a future tcp-hdr-opt test that will pass a non-NULL addr_str and it is in general useful for other future tests. "int timeout_ms" is also added to control the timeout on the "accept(listen_fd)". 3) connect_to_fd(): Fully use the server_fd. The server sock address has already been obtained from getsockname(server_fd). The sockaddr includes the family, so the "int family" arg is redundant. Since the server address is obtained from server_fd, there is little reason not to get the server's socket type from the server_fd also. getsockopt(server_fd) can be used to do that, so "int type" arg is also removed. "int timeout_ms" is added. 4) connect_fd_to_fd(): "int timeout_ms" is added. Some code is also refactored to connect_fd_to_addr() which is shared with connect_to_fd(). 5) Preserve errno: Some callers need to check errno, e.g. cgroup_skb_sk_lookup.c. Make changes to do it more consistently in save_errno_close() and log_err(). Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200702004852.2103003-1-kafai@fb.com
2020-07-01Merge branch 'mptcp-add-receive-buffer-auto-tuning'David S. Miller
Florian Westphal says: ==================== mptcp: add receive buffer auto-tuning First patch extends the test script to allow for reproducible results. Second patch adds receive auto-tuning. Its based on what TCP is doing, only difference is that we use the largest RTT of any of the subflows and that we will update all subflows with the new value. Else, we get spurious packet drops because the mptcp work queue might not be able to move packets from subflow socket to master socket fast enough. Without the adjustment, TCP may drop the packets because the subflow socket is over its rcvbuffer limit. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01mptcp: add receive buffer auto-tuningFlorian Westphal
When mptcp is used, userspace doesn't read from the tcp (subflow) socket but from the parent (mptcp) socket receive queue. skbs are moved from the subflow socket to the mptcp rx queue either from 'data_ready' callback (if mptcp socket can be locked), a work queue, or the socket receive function. This means tcp_rcv_space_adjust() is never called and thus no receive buffer size auto-tuning is done. An earlier (not merged) patch added tcp_rcv_space_adjust() calls to the function that moves skbs from subflow to mptcp socket. While this enabled autotuning, it also meant tuning was done even if userspace was reading the mptcp socket very slowly. This adds mptcp_rcv_space_adjust() and calls it after userspace has read data from the mptcp socket rx queue. Its very similar to tcp_rcv_space_adjust, with two differences: 1. The rtt estimate is the largest one observed on a subflow 2. The rcvbuf size and window clamp of all subflows is adjusted to the mptcp-level rcvbuf. Otherwise, we get spurious drops at tcp (subflow) socket level if the skbs are not moved to the mptcp socket fast enough. Before: time mptcp_connect.sh -t -f $((4*1024*1024)) -d 300 -l 0.01% -r 0 -e "" -m mmap [..] ns4 MPTCP -> ns3 (10.0.3.2:10108 ) MPTCP (duration 40823ms) [ OK ] ns4 MPTCP -> ns3 (10.0.3.2:10109 ) TCP (duration 23119ms) [ OK ] ns4 TCP -> ns3 (10.0.3.2:10110 ) MPTCP (duration 5421ms) [ OK ] ns4 MPTCP -> ns3 (dead:beef:3::2:10111) MPTCP (duration 41446ms) [ OK ] ns4 MPTCP -> ns3 (dead:beef:3::2:10112) TCP (duration 23427ms) [ OK ] ns4 TCP -> ns3 (dead:beef:3::2:10113) MPTCP (duration 5426ms) [ OK ] Time: 1396 seconds After: ns4 MPTCP -> ns3 (10.0.3.2:10108 ) MPTCP (duration 5417ms) [ OK ] ns4 MPTCP -> ns3 (10.0.3.2:10109 ) TCP (duration 5427ms) [ OK ] ns4 TCP -> ns3 (10.0.3.2:10110 ) MPTCP (duration 5422ms) [ OK ] ns4 MPTCP -> ns3 (dead:beef:3::2:10111) MPTCP (duration 5415ms) [ OK ] ns4 MPTCP -> ns3 (dead:beef:3::2:10112) TCP (duration 5422ms) [ OK ] ns4 TCP -> ns3 (dead:beef:3::2:10113) MPTCP (duration 5423ms) [ OK ] Time: 296 seconds Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01selftests: mptcp: add option to specify size of file to transferFlorian Westphal
The script generates two random files that are then sent via tcp and mptcp connections. In order to compare throughput over consecutive runs add an option to provide the file size on the command line: "-f 128000". Also add an option, -t, to enable tcp tests. This is useful to compare throughput of mptcp connections and tcp connections. Example: run tests with a 4mb file size, 300ms delay 0.01% loss, default gso/tso/gro settings and with large write/blocking io: mptcp_connect.sh -t -f $((4 * 1024 * 1024)) -d 300 -l 0.01% -r 0 -e "" -m mmap Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01net: sched: Allow changing default qdisc to FQ-PIEDanny Lin
Similar to fq_codel and the other qdiscs that can set as default, fq_pie is also suitable for general use without explicit configuration, which makes it a valid choice for this. This is useful in situations where a painless out-of-the-box solution for reducing bufferbloat is desired but fq_codel is not necessarily the best choice. For example, fq_pie can be better for DASH streaming, but there could be more cases where it's the better choice of the two simple AQMs available in the kernel. Signed-off-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01Merge branch '40GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2020-07-01 This series contains updates to all Intel drivers, but a majority of the changes are to the i40e driver. Jeff converts 'fall through' comments to the 'fallthrough;' keyword for all Intel drivers. Removed unnecessary delay in the ixgbe ethtool diagnostics test. Arkadiusz implements Total Port Shutdown for i40e. This is the revised patch based on Jakub's feedback from an earlier submission of this patch, where additional code comments and description was needed to describe the functionality. Wei Yongjun fixes return error code for iavf_init_get_resources(). Magnus optimizes XDP code in i40e; starting with AF_XDP zero-copy transmit completion path. Then by only executing a division when necessary in the napi_poll data path. Move the check for transmit ring full outside the send loop to increase performance. Ciara add XDP ring statistics to i40e and the ability to dump these statistics and descriptors. Tony fixes reporting iavf statistics. Radoslaw adds support for 2.5 and 5 Gbps by implementing the newer ethtool ksettings API in ixgbe. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01Merge branch '100GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Tony Nguyen says: ==================== 100GbE Intel Wired LAN Driver Updates 2020-07-01 This series contains updates to the ice driver only. Jacob implements a devlink region for device capabilities. Bruce removes structs containing only one-element arrays that are either unused or only used for indexing. Instead, use pointer arithmetic or other indexing to access the elements. Converts "C struct hack" variable-length types to the preferred C99 flexible array member. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01ice: replace single-element array used for C struct hackBruce Allan
Convert the pre-C90-extension "C struct hack" method (using a single- element array at the end of a structure for implementing variable-length types) to the preferred use of C99 flexible array member. Additional code cleanups were done near areas affected by this change. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-01ice: avoid unnecessary single-member variable-length structsBruce Allan
There are a number of structures that consist of a one-element array as the only struct member. Some of those are unused so remove them. Others are used to index into a buffer/array consisting of a variable number of a different data or structure type. Those are unnecessary since we can use simple pointer arithmetic or index directly into the buffer to access individual elements of the buffer/array. Additional code cleanups were done near areas affected by this change. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-01bonding: allow xfrm offload setup post-module-loadJarod Wilson
At the moment, bonding xfrm crypto offload can only be set up if the bonding module is loaded with active-backup mode already set. We need to be able to make this work with bonds set to AB after the bonding driver has already been loaded. So what's done here is: 1) move #define BOND_XFRM_FEATURES to net/bonding.h so it can be used by both bond_main.c and bond_options.c 2) set BOND_XFRM_FEATURES in bond_dev->hw_features universally, rather than only when loading in AB mode 3) wire up xfrmdev_ops universally too 4) disable BOND_XFRM_FEATURES in bond_dev->features if not AB 5) exit early (non-AB case) from bond_ipsec_offload_ok, to prevent a performance hit from traversing into the underlying drivers 5) toggle BOND_XFRM_FEATURES in bond_dev->wanted_features and call netdev_change_features() from bond_option_mode_set() In my local testing, I can change bonding modes back and forth on the fly, have hardware offload work when I'm in AB, and see no performance penalty to non-AB software encryption, despite having xfrm bits all wired up for all modes now. Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves") Reported-by: Huy Nguyen <huyn@mellanox.com> CC: Saeed Mahameed <saeedm@mellanox.com> CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: "David S. Miller" <davem@davemloft.net> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: Jakub Kicinski <kuba@kernel.org> CC: Steffen Klassert <steffen.klassert@secunet.com> CC: Herbert Xu <herbert@gondor.apana.org.au> CC: netdev@vger.kernel.org CC: intel-wired-lan@lists.osuosl.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01ice: implement snapshot for device capabilitiesJacob Keller
Add a new devlink region used for capturing a snapshot of the device capabilities buffer which is reported by the firmware over the AdminQ. This information can useful in debugging driver and firmware interactions. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-01Merge branch 'net-ipa-endpoint-configuration-updates'David S. Miller
Alex Elder says: ==================== net: ipa: endpoint configuration updates This series updates code that configures IPA endpoints. The changes made mainly affect access to registers that are valid only for RX, or only for TX endpoints. The first three patches avoid writing endpoint registers if they are not defined to be valid. The fourth patch slightly modifies the parameters for the offset macros used for these endpoint registers, to make it explicit when only some endpoints are valid. The last patch just tweaks one line of code so it uses a convention used everywhere else in the driver. Version 2 of this series eliminates some of the "assert()" comments that Jakub inquired about. The ones removed will actually go away in an upcoming (not-yet-posted) patch series anyway. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01net: ipa: HOL_BLOCK_EN_FMASK is a 1-bit maskAlex Elder
The convention throughout the IPA driver is to directly use single-bit field mask values, rather than using (for example) u32_encode_bits() to set or clear them. Fix the one place that doesn't follow that convention, which sets HOL_BLOCK_EN_FMASK in ipa_endpoint_init_hol_block_enable(). Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01net: ipa: clarify endpoint register macro constraintsAlex Elder
A handful of registers are valid only for RX endpoints, and some others are valid only for TX endpoints. For these endpoints, add a comment above their defined offset macro that indicates the endpoints to which they apply. Extend the endpoint parameter naming convention as well, to make these constraints more explicit. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01net: ipa: mode register is TX onlyAlex Elder
The INIT_MODE endpoint configuration register is only valid for TX endpoints. Rather than writing a zero to that register for RX endpoints, avoid writing the register at all. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01net: ipa: metadata_mask register is RX onlyAlex Elder
The INIT_HDR_METADATA_MASK endpoint configuration register is only valid for RX endpoints. Rather than writing a zero to that register for TX endpoints, avoid writing the register at all. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01net: ipa: head-of-line block registers are RX onlyAlex Elder
The INIT_HOL_BLOCK_EN and INIT_HOL_BLOCK_TIMER endpoint registers are only valid for RX endpoints. Have ipa_endpoint_modem_hol_block_clear_all() skip writing these registers for TX endpoints. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01Merge branch 'net-ipa-small-improvements'David S. Miller
Alex Elder says: ==================== net: ipa: small improvements This series contains two patches that improve the error output that's reported when an error occurs while changing the state of a GSI channel or event ring. The first ensures all such error conditions report an error, and the second simplifies the messages a little and ensures they are all consistent. A third (independent) patch gets rid of an unused symbol in the microcontroller code. Version 2 fixes two alignment problems pointed out by checkpatch.pl, as requested by Jakub Kicinski. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01net: ipa: kill IPA_MEM_UC_OFFSETAlex Elder
The microcontroller shared memory area is at the beginning of the IPA resident memory. IPA_MEM_UC_OFFSET was defined as the offset within that region where it's found, but it's 0, and it's never actually used. Just get rid of the definition, and move some of the description it had to be above the definition of the ipa_uc_mem_area structure. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01net: ipa: standarize more GSI error messagesAlex Elder
Make minor updates to error messages reported in "gsi.c": - Use local variables to reduce multi-line function calls - Don't use parentheses in messages - Do some slight rewording in a few cases Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01net: ipa: always report GSI state errorsAlex Elder
We check the state of an event ring or channel both before and after any GSI command issued that will change that state. In most--but not all--cases, if the state is something different than expected we report an error message. Add error messages where missing, so that all unexpected states provide information about what went wrong. Drop the parentheses around the state value shown in all cases. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01Merge branch 'net-ipa-simple-refactorizations'David S. Miller
Alex Elder says: ==================== net: ipa: simple refactorizations This series makes three small changes to some endpoint configuration code. The first uses a constant to represent the frequency of an internal clock used for timers in the IPA. The second modifies a limit used so it matches Qualcomm's internal code. And the third reworks a few lines of code, eliminating a multi-line function call. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01net: ipa: reuse a local variable in ipa_endpoint_init_aggr()Alex Elder
Reuse the "limit" local variable in ipa_endpoint_init_aggr() when setting the aggregation size limit. Simple cleanup. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01net: ipa: reduce aggregation time limitAlex Elder
Halve the time limit used when aggregation is enabled on an RX endpoint, to half a millisecond. Use DIV_ROUND_CLOSEST() to compute the value that represents the time period, to get better accuracy in the event the time limit is not an even multiple of the granularity. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01net: ipa: rework ipa_aggr_granularity_val()Alex Elder
The timer used for aggregation makes use of an internal 32 KHz clock. The granularity of the timer is programmed by a field whose value is computed by ipa_aggr_granularity_val(). Redefine the way that value is computed by using a new TIMER_FREQUENCY constant representing the underlying clock frequency. Add two BUILD_BUG_ON() calls to ensure the value used is valid. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01Merge branch 'add-XDP-support-to-xen-netfront'David S. Miller
Denis Kirjanov says: ==================== xen networking: add XDP support to xen-netfront The first patch adds a new extra type to enable proper synchronization between an RX request/response pair. The second patch implements BFP interface for xen-netfront. The third patch enables extra space for XDP processing. v14: - fixed compilation warnings v13: - fixed compilation due to previous rename v12: - xen-netback: rename netfront_xdp_headroom to xdp_headroom v11: - add the new headroom constant to netif.h - xenbus_scanf check - lock a bulk of puckets in xennet_xdp_xmit() v10: - add a new xen_netif_extra_info type to enable proper synchronization between an RX request/response pair. - order local variable declarations v9: - assign an xdp program before switching to Reconfiguring - minor cleanups - address checkpatch issues v8: - add PAGE_POOL config dependency - keep the state of XDP processing in netfront_xdp_enabled - fixed allocator type in xdp_rxq_info_reg_mem_model() - minor cleanups in xen-netback v7: - use page_pool_dev_alloc_pages() on page allocation - remove the leftover break statement from netback_changed v6: - added the missing SOB line - fixed subject v5: - split netfront/netback changes - added a sync point between backend/frontend on switching to XDP - added pagepool API v4: - added verbose patch descriprion - don't expose the XDP headroom offset to the domU guest - add a modparam to netback to toggle XDP offset - don't process jumbo frames for now v3: - added XDP_TX support (tested with xdping echoserver) - added XDP_REDIRECT support (tested with modified xdp_redirect_kern) - moved xdp negotiation to xen-netback v2: - avoid data copying while passing to XDP - tell xen-netback that we need the headroom space ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01xen networking: add XDP offset adjustment to xen-netbackDenis Kirjanov
the patch basically adds the offset adjustment and netfront state reading to make XDP work on netfront side. Reviewed-by: Paul Durrant <paul@xen.org> Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01xen networking: add basic XDP support for xen-netfrontDenis Kirjanov
The patch adds a basic XDP processing to xen-netfront driver. We ran an XDP program for an RX response received from netback driver. Also we request xen-netback to adjust data offset for bpf_xdp_adjust_head() header space for custom headers. synchronization between frontend and backend parts is done by using xenbus state switching: Reconfiguring -> Reconfigured- > Connected UDP packets drop rate using xdp program is around 310 kpps using ./pktgen_sample04_many_flows.sh and 160 kpps without the patch. Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01xen: netif.h: add a new extra type for XDPDenis Kirjanov
The patch adds a new extra type to be able to diffirentiate between RX responses on xen-netfront side with the adjusted offset required for XDP processing. The offset value from a guest is passed via xenstore. Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01Merge branch 'test_progs-improvements'Alexei Starovoitov
Jesper Dangaard Brouer says: ==================== V3: Reorder patches to cause less code churn. The BPF selftest 'test_progs' contains many tests, that cover all the different areas of the kernel where BPF is used. The CI system sees this as one test, which is impractical for identifying what team/engineer is responsible for debugging the problem. This patchset add some options that makes it easier to create a shell for-loop that invoke each (top-level) test avail in test_progs. Then each test FAIL/PASS result can be presented the CI system to have a separate bullet. (For Red Hat use-case in Beaker https://beaker-project.org/) Created a public script[1] that uses these features in an advanced way. Demonstrating howto reduce the number of (top-level) tests by grouping tests together via using the existing test pattern selection feature, and then using the new --list feature combined with exclude (-b) to get a list of remaining test names that was not part of the groups. [1] https://github.com/netoptimizer/prototype-kernel/blob/master/scripts/bpf_selftests_grouping.sh ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-07-01selftests/bpf: Test_progs option for listing test namesJesper Dangaard Brouer
The program test_progs have some very useful ability to specify a list of test name substrings for selecting which tests to run. This patch add the ability to list the selected test names without running them. This is practical for seeing which tests gets selected with given select arguments (which can also contain a exclude list via --name-blacklist). This output can also be used by shell-scripts in a for-loop: for N in $(./test_progs --list -t xdp); do \ ./test_progs -t $N 2>&1 > result_test_${N}.log & \ done ; wait This features can also be used for looking up a test number and returning a testname. If the selection was empty then a shell EXIT_FAILURE is returned. This is useful for scripting. e.g. like this: n=1; while [ $(./test_progs --list -n $n) ] ; do \ ./test_progs -n $n ; n=$(( n+1 )); \ done Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/159363985751.930467.9610992940793316982.stgit@firesoul
2020-07-01selftests/bpf: Test_progs option for getting number of testsJesper Dangaard Brouer
It can be practial to get the number of tests that test_progs contain. This could for example be used to create a shell for-loop construct that runs the individual tests. Like: for N in $(seq 1 $(./test_progs -c)); do ./test_progs -n $N 2>&1 > result_test_${N}.log & done ; wait V2: Add the ability to return the count for the selected tests. This is useful for getting a count e.g. after excluding some tests with option -b. The current beakers test script like to report the max test count upfront. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/159363985244.930467.12617117873058936829.stgit@firesoul
2020-07-01selftests/bpf: Test_progs indicate to shell on non-actionsJesper Dangaard Brouer
When a user selects a non-existing test the summary is printed with indication 0 for all info types, and shell "success" (EXIT_SUCCESS) is indicated. This can be understood by a human end-user, but for shell scripting is it useful to indicate a shell failure (EXIT_FAILURE). Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/159363984736.930467.17956007131403952343.stgit@firesoul
2020-07-02tools/bpftool: Turn off -Wnested-externs warningAndrii Nakryiko
Turn off -Wnested-externs to avoid annoying warnings in BUILD_BUG_ON macro when compiling bpftool: In file included from /data/users/andriin/linux/tools/include/linux/build_bug.h:5, from /data/users/andriin/linux/tools/include/linux/kernel.h:8, from /data/users/andriin/linux/kernel/bpf/disasm.h:10, from /data/users/andriin/linux/kernel/bpf/disasm.c:8: /data/users/andriin/linux/kernel/bpf/disasm.c: In function ‘__func_get_name’: /data/users/andriin/linux/tools/include/linux/compiler.h:37:38: warning: nested extern declaration of ‘__compiletime_assert_0’ [-Wnested-externs] _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) ^~~~~~~~~~~~~~~~~~~~~ /data/users/andriin/linux/tools/include/linux/compiler.h:16:15: note: in definition of macro ‘__compiletime_assert’ extern void prefix ## suffix(void) __compiletime_error(msg); \ ^~~~~~ /data/users/andriin/linux/tools/include/linux/compiler.h:37:2: note: in expansion of macro ‘_compiletime_assert’ _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) ^~~~~~~~~~~~~~~~~~~ /data/users/andriin/linux/tools/include/linux/build_bug.h:39:37: note: in expansion of macro ‘compiletime_assert’ #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg) ^~~~~~~~~~~~~~~~~~ /data/users/andriin/linux/tools/include/linux/build_bug.h:50:2: note: in expansion of macro ‘BUILD_BUG_ON_MSG’ BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition) ^~~~~~~~~~~~~~~~ /data/users/andriin/linux/kernel/bpf/disasm.c:20:2: note: in expansion of macro ‘BUILD_BUG_ON’ BUILD_BUG_ON(ARRAY_SIZE(func_id_str) != __BPF_FUNC_MAX_ID); ^~~~~~~~~~~~ Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200701212816.2072340-1-andriin@fb.com
2020-07-01selftests/bpf: Switch test_vmlinux to use hrtimer_range_start_ns.Hao Luo
The test_vmlinux test uses hrtimer_nanosleep as hook to test tracing programs. But in a kernel built by clang, which performs more aggresive inlining, that function gets inlined into its caller SyS_nanosleep. Therefore, even though fentry and kprobe do hook on the function, they aren't triggered by the call to nanosleep in the test. A possible fix is switching to use a function that is less likely to be inlined, such as hrtimer_range_start_ns. The EXPORT_SYMBOL functions shouldn't be inlined based on the description of [1], therefore safe to use for this test. Also the arguments of this function include the duration of sleep, therefore suitable for test verification. [1] af3b56289be1 time: don't inline EXPORT_SYMBOL functions Tested: In a clang build kernel, before this change, the test fails: test_vmlinux:PASS:skel_open 0 nsec test_vmlinux:PASS:skel_attach 0 nsec test_vmlinux:PASS:tp 0 nsec test_vmlinux:PASS:raw_tp 0 nsec test_vmlinux:PASS:tp_btf 0 nsec test_vmlinux:FAIL:kprobe not called test_vmlinux:FAIL:fentry not called After switching to hrtimer_range_start_ns, the test passes: test_vmlinux:PASS:skel_open 0 nsec test_vmlinux:PASS:skel_attach 0 nsec test_vmlinux:PASS:tp 0 nsec test_vmlinux:PASS:raw_tp 0 nsec test_vmlinux:PASS:tp_btf 0 nsec test_vmlinux:PASS:kprobe 0 nsec test_vmlinux:PASS:fentry 0 nsec Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200701175315.1161242-1-haoluo@google.com
2020-07-01ixgbe: Add ethtool support to enable 2.5 and 5.0 Gbps supportRadoslaw Tyl
Added full support for new version Ethtool API. New API allow use 2500Gbase-T and 5000base-T supported and advertised link speed modes. Signed-off-by: Radoslaw Tyl <radoslawx.tyl@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-01ixgbe: Cleanup unneeded delay in ethtool testJeff Kirsher
There is a 4 seconds delay in ixgbe_diag_test() that is holding up other ioctls such as SIOCGIFCONF that Oracle database applications use. One of Oracle's product runs "ethtool -t ethX online" periodically for system monitoring and that is impacting database applications that use SIOCGIFCONF at that same time. This 4 second delay was needed in out early 1GbE parts to give the PHY time to recover from a reset. This code was carried forward to the 10 GbE driver even it was not needed for the supported PHYs in the ixgbe driver. CC: Aleksandr Loktionov <aleksandr.loktionov@intel.com> CC: Jack Vogel <jack.vogel@oracle.com> Reported-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-01iavf: Fix updating statisticsTony Nguyen
Commit bac8486116b0 ("iavf: Refactor the watchdog state machine") inverted the logic for when to update statistics. Statistics should be updated when no other commands are pending, instead they were only requested when a command was processed. iavf_request_stats() would see a pending request and not request statistics to be updated. This caused statistics to never be updated; fix the logic. Fixes: bac8486116b0 ("iavf: Refactor the watchdog state machine") Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
2020-07-01i40e: introduce new dump desc XDP commandCiara Loftus
Interfaces already exist for dumping Rx and Tx descriptor information. Introduce another for doing the same for XDP descriptors. Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-01i40e: add XDP ring statistics to dump VSI debug outputCiara Loftus
Prior to this, only the Rx and Tx ring statistics were dumped. The XDP ring statistics are now dumped as well. Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-01i40e: add XDP ring statistics to VSI statsCiara Loftus
Prior to this, only Rx and Tx ring statistics were accounted for. Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-01i40e: move check of full Tx ring to outside of send loopMagnus Karlsson
Move the check if the HW Tx ring is full to outside the send loop. Currently it is checked for every single descriptor that we send. Instead, tell the send loop to only process a maximum number of packets equal to the number of available slots in the Tx ring. This way, we can remove the check inside the send loop to and gain some performance. Suggested-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-01i40e: eliminate division in napi_poll data pathMagnus Karlsson
Eliminate a division in the napi_poll data path. This division is executed even though it is only needed in the rare case when there are not enough interrupt lines so they have to be shared between queue pairs. Instead, just test for this case and only execute the division if needed. The code has been lifted from the ice driver. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-01i40e: optimize AF_XDP Tx completion pathMagnus Karlsson
Improve the performance of the AF_XDP zero-copy Tx completion path. When there are no XDP buffers being sent using XDP_TX or XDP_REDIRECT, we do not have go through the SW ring to clean up any entries since the AF_XDP path does not use these. In these cases, just fast forward the next-to-use counter and skip going through the SW ring. The limit on the maximum number of entries to complete is also removed since the algorithm is now O(1). To simplify the code path, the maximum number of entries to complete for the XDP path is therefore also increased from 256 to 512 (the default number of Tx HW descriptors). This should be fine since the completion in the XDP path is faster than in the SKB path that has 256 as the maximum number. This patch provides around 4% throughput improvement for the l2fwd application in xdpsock on my machine. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-01iavf: fix error return code in iavf_init_get_resources()Wei Yongjun
Fix to return negative error code -ENOMEM from the error handling case instead of 0, as done elsewhere in this function. Fixes: b66c7bc1cd4d ("iavf: Refactor init state machine") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-01i40e: Add support for a new feature Total Port ShutdownArkadiusz Kubalewski
After OS requests to down a link on a physical network port, the traffic is no longer being processed but the physical link with a link partner is still established. Currently there is a feature (Link down on close) which allows to physically bring the link down (after OS request). With this patch new feature with similar capability is introduced: TOTAL_PORT_SHUTDOWN Allows to physically disable the link on the NIC's port. If enabled, (after link down request from the OS) no link, traffic or led activity is possible on that port. If I40E_FLAG_TOTAL_PORT_SHUTDOWN is enabled, the I40E_FLAG_LINK_DOWN_ON_CLOSE_ENABLED must be explicitly forced to true and cannot be disabled at that time. The functionalities are exclusive in terms of configuration, but they also have similar behavior (allowing to disable physical link of the port), with following differences: - LINK_DOWN_ON_CLOSE_ENABLED is configurable at host OS run-time and is supported by whole family of 7xx Intel Ethernet Controllers - TOTAL_PORT_SHUTDOWN may be enabled only before OS loads (in BIOS) only if motherboard's BIOS and NIC's FW has support of it - when LINK_DOWN_ON_CLOSE_ENABLED is used, the link is being brought down by sending phy_type=0 to NIC's FW - when TOTAL_PORT_SHUTDOWN is used, phy_type is not altered, instead the link is being brought down by clearing bit (I40E_AQ_PHY_ENABLE_LINK) in abilities field of i40e_aq_set_phy_config structure Introduced changes: - new private flag I40E_FLAG_TOTAL_PORT_SHUTDOWN for handling the feature - probe of NVM if the feature was enabled at driver's port initialization - special handling on link-down procedure to let FW physically shutdown the port if the feature was enabled Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-01ethernet/intel: Convert fallthrough code commentsJeff Kirsher
Convert all the remaining 'fall through" code comments to the newer 'fallthrough;' keyword. Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-01Merge branch 'net-ethernet-use-generic-power-management'David S. Miller
Vaibhav Gupta says: ==================== net: ethernet: use generic power management Linux Kernel Mentee: Remove Legacy Power Management. The purpose of this patch series is to remove legacy power management callbacks from net ethernet drivers. The callbacks performing suspend() and resume() operations are still calling pci_save_state(), pci_set_power_state(), etc. and handling the power management themselves, which is not recommended. The conversion requires the removal of the those function calls and change the callback definition accordingly and make use of dev_pm_ops structure. All patches are compile-tested only. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>