summaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)Author
2020-05-03net/smc: enqueue local LLC messagesKarsten Graul
As SMC server, when a second link was deleted, trigger the setup of an asymmetric link. Do this by enqueueing a local ADD_LINK message which is processed by the LLC layer as if it were received from peer. Do the same when a new IB port became active and a new link could be created. smc_llc_srv_add_link_local() enqueues a local ADD_LINK message. And smc_llc_srv_delete_link_local() is used the same way to enqueue a local DELETE_LINK message. This is used when an IB port is no longer active. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-03net/smc: delete link processing as SMC serverKarsten Graul
Add smc_llc_process_srv_delete_link() to process a DELETE_LINK request as SMC server. When the request is to delete ALL links then terminate the whole link group. If not, find the link to delete by its link_id, send the DELETE_LINK request LLC message and wait for the response. No matter if a response was received, clear the deleted link and update the link group state. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-03net/smc: delete link processing as SMC clientKarsten Graul
Add smc_llc_process_cli_delete_link() to process a DELETE_LINK request as SMC client. When the request is to delete ALL links then terminate the whole link group. If not, find the link to delete by its link_id, send the DELETE_LINK response LLC message and then clear the deleted link. Finally determine and update the link group state. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-03net/smc: llc_del_link_work and use the LLC flow for delete linkKarsten Graul
Introduce a work that is scheduled when a new DELETE_LINK LLC request is received. The work will call either the SMC client or SMC server DELETE_LINK processing. And use the LLC flow framework to process incoming DELETE_LINK LLC messages, scheduling the llc_del_link_work for those events. With these changes smc_lgr_forget() is only called by one function and can be migrated into smc_lgr_cleanup_early(). Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-03net/smc: delete an asymmetric link as SMC serverKarsten Graul
When a link group moved from asymmetric to symmetric state then the dangling asymmetric link can be deleted. Add smc_llc_find_asym_link() to find the respective link and add smc_llc_delete_asym_link() to delete it. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-03net/smc: final part of add link processing as SMC serverKarsten Graul
This patch finalizes the ADD_LINK processing of new links. Send the CONFIRM_LINK request to the peer, receive the response and set link state to ACTIVE. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-03net/smc: rkey processing for a new link as SMC serverKarsten Graul
Part of SMC server new link establishment is the exchange of rkeys for used buffers. Loop over all used RMB buffers and send ADD_LINK_CONTINUE LLC messages to the peer. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-03net/smc: first part of add link processing as SMC serverKarsten Graul
First set of functions to process an ADD_LINK LLC request as an SMC server. Find an alternate IB device, determine the new link group type and get the index for the new link. Then initialize the link and send the ADD_LINK LLC message to the peer. Save the contents of the response, ready the link, map all used buffers and register the buffers with the IB device. If any error occurs, stop the processing and clear the link. And call smc_llc_srv_add_link() in af_smc.c to start second link establishment after the initial link of a link group was created. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-03net/smc: final part of add link processing as SMC clientKarsten Graul
This patch finalizes the ADD_LINK processing of new links. Receive the CONFIRM_LINK request from peer, complete the link initialization, register all used buffers with the IB device and finally send the CONFIRM_LINK response, which completes the ADD_LINK processing. And activate smc_llc_cli_add_link() in af_smc.c. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-03net/smc: rkey processing for a new link as SMC clientKarsten Graul
Part of the SMC client new link establishment process is the exchange of rkeys for all used buffers. Add new LLC message type ADD_LINK_CONTINUE which is used to exchange rkeys of all current RMB buffers. Add functions to iterate over all used RMB buffers of the link group, and implement the ADD_LINK_CONTINUE processing. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-03net/smc: first part of add link processing as SMC clientKarsten Graul
First set of functions to process an ADD_LINK LLC request as an SMC client. Find an alternate IB device, determine the new link group type and get the index for the new link. Then ready the link, map the buffers and send an ADD_LINK LLC response. If any error occurs, send a reject LLC message and terminate the processing. Add smc_llc_alloc_alt_link() to find a free link index for a new link, depending on the new link group type. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-03net_sched: sch_fq: perform a prefetch() earlierEric Dumazet
The prefetch() done in fq_dequeue() can be done a bit earlier after the refactoring of the code done in the prior patch. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-03net_sched: sch_fq: do not call fq_peek() twice per packetEric Dumazet
This refactors the code to not call fq_peek() from fq_dequeue_head() since the caller can provide the skb. Also rename fq_dequeue_head() to fq_dequeue_skb() because 'head' is a bit vague, given the skb could come from t_root rb-tree. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-03net_sched: sch_fq: use bulk freeing in fq_gc()Eric Dumazet
fq_gc() already builds a small array of pointers, so using kmem_cache_free_bulk() needs very little change. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-03net_sched: sch_fq: change fq_flow size/layoutEric Dumazet
sizeof(struct fq_flow) is 112 bytes on 64bit arches. This means that half of them use two cache lines, but 50% use three cache lines. This patch adds cache line alignment, and makes sure that only the first cache line is touched by fq_enqueue(), which is more expensive that fq_dequeue() in general. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-03net_sched: sch_fq: avoid touching f->next from fq_gc()Eric Dumazet
A significant amount of cpu cycles is spent in fq_gc() When fq_gc() does its lookup in the rb-tree, it needs the following fields from struct fq_flow : f->sk (lookup key in the rb-tree) f->fq_node (anchor in the rb-tree) f->next (used to determine if the flow is detached) f->age (used to determine if the flow is candidate for gc) This unfortunately spans two cache lines (assuming 64 bytes cache lines) We can avoid using f->next, if we use the low order bit of f->{age|tail} This low order bit is 0, if f->tail points to an sk_buff. We set the low order bit to 1, if the union contains a jiffies value. Combined with the following patch, this makes sure we only need to bring into cpu caches one cache line per flow. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-02inet_diag: bc: read cgroup id only for full socketsDmitry Yakunin
Fix bug introduced by commit b1f3e43dbfac ("inet_diag: add support for cgroup filter"). Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru> Reported-by: syzbot+ee80f840d9bf6893223b@syzkaller.appspotmail.com Reported-by: syzbot+13bef047dbfffa5cd1af@syzkaller.appspotmail.com Fixes: b1f3e43dbfac ("inet_diag: add support for cgroup filter") Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-02smc: Remove unused function.David S. Miller
net/smc/smc_llc.c:544:12: warning: ‘smc_llc_alloc_alt_link’ defined but not used [-Wunused-function] static int smc_llc_alloc_alt_link(struct smc_link_group *lgr, ^~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Alexei Starovoitov says: ==================== pull-request: bpf-next 2020-05-01 (v2) The following pull-request contains BPF updates for your *net-next* tree. We've added 61 non-merge commits during the last 6 day(s) which contain a total of 153 files changed, 6739 insertions(+), 3367 deletions(-). The main changes are: 1) pulled work.sysctl from vfs tree with sysctl bpf changes. 2) bpf_link observability, from Andrii. 3) BTF-defined map in map, from Andrii. 4) asan fixes for selftests, from Andrii. 5) Allow bpf_map_lookup_elem for SOCKMAP and SOCKHASH, from Jakub. 6) production cloudflare classifier as a selftes, from Lorenz. 7) bpf_ktime_get_*_ns() helper improvements, from Maciej. 8) unprivileged bpftool feature probe, from Quentin. 9) BPF_ENABLE_STATS command, from Song. 10) enable bpf_[gs]etsockopt() helpers for sock_ops progs, from Stanislav. 11) enable a bunch of common helpers for cg-device, sysctl, sockopt progs, from Stanislav. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-01net/smc: llc_add_link_work to handle ADD_LINK LLC requestsKarsten Graul
Introduce a work that is scheduled when a new ADD_LINK LLC request is received. The work will call either the SMC client or SMC server ADD_LINK processing. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-01net/smc: allocate index for a new linkKarsten Graul
Add smc_llc_alloc_alt_link() to find a free link index for a new link, depending on the new link group type. And update constants for the maximum number of links to 3 (2 symmetric and 1 dangling asymmetric link). These maximum numbers are the same as used by other implementations of the SMC-R protocol. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-01net/smc: introduce smc_pnet_find_alt_roce()Karsten Graul
Introduce a new function in smc_pnet.c that searches for an alternate IB device, using an existing link group and a primary IB device. The alternate IB device needs to be active and must have the same PNETID as the link group. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-01net/smc: remove DELETE LINK processing from smc_core.cKarsten Graul
Support for multiple links makes the former DELETE LINK processing obsolete which sent one DELETE_LINK LLC message for each single link. Remove this processing from smc_core.c. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-01net/smc: take link down instead of terminating the link groupKarsten Graul
Use the introduced link down processing in all places where the link group is terminated and take down the affected link only. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-01net/smc: add smcr_port_err() and smcr_link_down() processingKarsten Graul
Call smcr_port_err() when an IB event reports an inactive IB device. smcr_port_err() calls smcr_link_down() for all affected links. smcr_link_down() either triggers the local DELETE_LINK processing, or sends an DELETE_LINK LLC message to the SMC server to initiate the processing. The old handler function smc_port_terminate() is removed. Add helper smcr_link_down_cond() to take a link down conditionally, and smcr_link_down_cond_sched() to schedule the link_down processing to a work. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-01net/smc: add smcr_port_add() and smcr_link_up() processingKarsten Graul
Call smcr_port_add() when an IB event reports a new active IB device. smcr_port_add() will start a work which either triggers the local ADD_LINK processing, or send an ADD_LINK LLC message to the SMC server to initiate the processing. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-01net/smc: remember PNETID of IB device for later device matchingKarsten Graul
The PNETID is needed to find an alternate link for a link group. Save the PNETID of the link that is used to create the link group for later device matching. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-01net/smc: mutex to protect the lgr against parallel reconfigurationsKarsten Graul
Introduce llc_conf_mutex in the link group which is used to protect the buffers and lgr states against parallel link reconfiguration. This ensures that new connections do not start to register buffers with the links of a link group when link creation or termination is running. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-01net/smc: extend smc_llc_send_add_link() and smc_llc_send_delete_link()Karsten Graul
All LLC sends are done from worker context only, so remove the prep functions which were used to build the message before it was sent, and add the function content into the respective send function smc_llc_send_add_link() and smc_llc_send_delete_link(). Extend smc_llc_send_add_link() to include the qp_mtu value in the LLC message, which is needed to establish a link after the initial link was created. Extend smc_llc_send_delete_link() to contain a link_id and a reason code for the link deletion in the LLC message, which is needed when a specific link should be deleted. And add the list of existing DELETE_LINK reason codes. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-01net/smc: map and register buffers for a new linkKarsten Graul
Introduce support to map and register all current buffers for a new link. smcr_buf_map_lgr() will map used buffers for a new link and smcr_buf_reg_lgr() can be called to register used buffers on the IB device of the new link. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-01net/smc: unmapping of buffers to support multiple linksKarsten Graul
With the support of multiple links that are created and cleared there is a need to unmap one link from all current buffers. Add unmapping by link and by rmb. And make smcr_link_clear() available to be called from the LLC layer. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-01net/smc: multiple link support for rmb buffer registrationKarsten Graul
The CONFIRM_RKEY LLC processing handles all links in one LLC message. Move the call to this processing out of smcr_link_reg_rmb() which does processing per link, into smcr_lgr_reg_rmbs() which is responsible for link group level processing. Move smcr_link_reg_rmb() into module smc_core.c. >From af_smc.c now call smcr_lgr_reg_rmbs() to register new rmbs on all available links. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-01net: schedule: add action gate offloadingPo Liu
Add the gate action to the flow action entry. Add the gate parameters to the tc_setup_flow_action() queueing to the entries of flow_action_entry array provide to the driver. Signed-off-by: Po Liu <Po.Liu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-01net: qos: introduce a gate control flow actionPo Liu
Introduce a ingress frame gate control flow action. Tc gate action does the work like this: Assume there is a gate allow specified ingress frames can be passed at specific time slot, and be dropped at specific time slot. Tc filter chooses the ingress frames, and tc gate action would specify what slot does these frames can be passed to device and what time slot would be dropped. Tc gate action would provide an entry list to tell how much time gate keep open and how much time gate keep state close. Gate action also assign a start time to tell when the entry list start. Then driver would repeat the gate entry list cyclically. For the software simulation, gate action requires the user assign a time clock type. Below is the setting example in user space. Tc filter a stream source ip address is 192.168.0.20 and gate action own two time slots. One is last 200ms gate open let frame pass another is last 100ms gate close let frames dropped. When the ingress frames have reach total frames over 8000000 bytes, the excessive frames will be dropped in that 200000000ns time slot. > tc qdisc add dev eth0 ingress > tc filter add dev eth0 parent ffff: protocol ip \ flower src_ip 192.168.0.20 \ action gate index 2 clockid CLOCK_TAI \ sched-entry open 200000000 -1 8000000 \ sched-entry close 100000000 -1 -1 > tc chain del dev eth0 ingress chain 0 "sched-entry" follow the name taprio style. Gate state is "open"/"close". Follow with period nanosecond. Then next item is internal priority value means which ingress queue should put. "-1" means wildcard. The last value optional specifies the maximum number of MSDU octets that are permitted to pass the gate during the specified time interval. Base-time is not set will be 0 as default, as result start time would be ((N + 1) * cycletime) which is the minimal of future time. Below example shows filtering a stream with destination mac address is 10:00:80:00:00:00 and ip type is ICMP, follow the action gate. The gate action would run with one close time slot which means always keep close. The time cycle is total 200000000ns. The base-time would calculate by: 1357000000000 + (N + 1) * cycletime When the total value is the future time, it will be the start time. The cycletime here would be 200000000ns for this case. > tc filter add dev eth0 parent ffff: protocol ip \ flower skip_hw ip_proto icmp dst_mac 10:00:80:00:00:00 \ action gate index 12 base-time 1357000000000 \ sched-entry close 200000000 -1 -1 \ clockid CLOCK_TAI Signed-off-by: Po Liu <Po.Liu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-01net: fix skb_panic to output real addressJesper Dangaard Brouer
In skb_panic() the real pointer values are really needed to diagnose issues, e.g. data and head are related (to calculate headroom). The hashed versions of the addresses doesn't make much sense here. The patch use the printk specifier %px to print the actual address. The printk documentation on %px: https://www.kernel.org/doc/html/latest/core-api/printk-formats.html#unmodified-addresses Fixes: ad67b74d2469 ("printk: hash addresses printed with %p") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-01net: Replace the limit of TCP_LINGER2 with TCP_FIN_TIMEOUT_MAXCambda Zhu
This patch changes the behavior of TCP_LINGER2 about its limit. The sysctl_tcp_fin_timeout used to be the limit of TCP_LINGER2 but now it's only the default value. A new macro named TCP_FIN_TIMEOUT_MAX is added as the limit of TCP_LINGER2, which is 2 minutes. Since TCP_LINGER2 used sysctl_tcp_fin_timeout as the default value and the limit in the past, the system administrator cannot set the default value for most of sockets and let some sockets have a greater timeout. It might be a mistake that let the sysctl to be the limit of the TCP_LINGER2. Maybe we can add a new sysctl to set the max of TCP_LINGER2, but FIN-WAIT-2 timeout is usually no need to be too long and 2 minutes are legal considering TCP specs. Changes in v3: - Remove the new socket option and change the TCP_LINGER2 behavior so that the timeout can be set to value between sysctl_tcp_fin_timeout and 2 minutes. Changes in v2: - Add int overflow check for the new socket option. Changes in v1: - Add a new socket option to set timeout greater than sysctl_tcp_fin_timeout. Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-01bpf: Bpf_{g,s}etsockopt for struct bpf_sock_addrStanislav Fomichev
Currently, bpf_getsockopt and bpf_setsockopt helpers operate on the 'struct bpf_sock_ops' context in BPF_PROG_TYPE_SOCK_OPS program. Let's generalize them and make them available for 'struct bpf_sock_addr'. That way, in the future, we can allow those helpers in more places. As an example, let's expose those 'struct bpf_sock_addr' based helpers to BPF_CGROUP_INET{4,6}_CONNECT hooks. That way we can override CC before the connection is made. v3: * Expose custom helpers for bpf_sock_addr context instead of doing generic bpf_sock argument (as suggested by Daniel). Even with try_socket_lock that doesn't sleep we have a problem where context sk is already locked and socket lock is non-nestable. v2: * s/BPF_PROG_TYPE_CGROUP_SOCKOPT/BPF_PROG_TYPE_SOCK_OPS/ Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200430233152.199403-1-sdf@google.com
2020-05-01docs: networking: convert x25.txt to ReSTMauro Carvalho Chehab
Not much to be done here: - add SPDX header; - add a document title; - add to networking/index.rst. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-01docs: networking: convert x25-iface.txt to ReSTMauro Carvalho Chehab
Not much to be done here: - add SPDX header; - adjust title markup; - remove a tail whitespace; - add to networking/index.rst. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30netlink: add infrastructure to expose policies to userspaceJohannes Berg
Add, and use in generic netlink, helpers to dump out a netlink policy to userspace, including all the range validation data, nested policies etc. This lets userspace discover what the kernel understands. For families/commands other than generic netlink, the helpers need to be used directly in an appropriate command, or we can add some infrastructure (a new netlink family) that those can register their policies with for introspection. I'm not that familiar with non-generic netlink, so that's left out for now. The data exposed to userspace also includes min and max length for binary/string data, I've done that instead of letting the userspace tools figure out whether min/max is intended based on the type so that we can extend this later in the kernel, we might want to just use the range data for example. Because of this, I opted to not directly expose the NLA_* values, even if some of them are already exposed via BPF, as with min/max length we don't need to have different types here for NLA_BINARY/NLA_MIN_LEN/NLA_EXACT_LEN, we just make them all NL_ATTR_TYPE_BINARY with min/max length optionally set. Similarly, we don't really need NLA_MSECS, and perhaps can remove it in the future - but not if we encode it into the userspace API now. It gets mapped to NL_ATTR_TYPE_U64 here. Note that the exposing here corresponds to the strict policy interpretation, and NLA_UNSPEC items are omitted entirely. To get those, change them to NLA_MIN_LEN which behaves in exactly the same way, but is exposed. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30netlink: remove NLA_EXACT_LEN_WARNJohannes Berg
Use a validation type instead, so we can later expose the NLA_* values to userspace for policy descriptions. Some transformations were done with this spatch: @@ identifier p; expression X, L, A; @@ struct nla_policy p[X] = { [A] = -{ .type = NLA_EXACT_LEN_WARN, .len = L }, +NLA_POLICY_EXACT_LEN_WARN(L), ... }; Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30nl80211: link recursive netlink nested policyJohannes Berg
Now that we have limited recursive policy validation to avoid stack overflows, change nl80211 to actually link the nested policy (linking back to itself eventually), which allows some code cleanups. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30netlink: remove type-unsafe validation_data pointerJohannes Berg
In the netlink policy, we currently have a void *validation_data that's pointing to different things: * a u32 value for bitfield32, * the netlink policy for nested/nested array * the string for NLA_REJECT Remove the pointer and place appropriate type-safe items in the union instead. While at it, completely dissolve the pointer for the bitfield32 case and just put the value there directly. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30hsr: remove hsr interface if all slaves are removedTaehee Yoo
When all hsr slave interfaces are removed, hsr interface doesn't work. At that moment, it's fine to remove an unused hsr interface automatically for saving resources. That's a common behavior of virtual interfaces. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30tcp: add hrtimer slack to sack compressionEric Dumazet
Add a sysctl to control hrtimer slack, default of 100 usec. This gives the opportunity to reduce system overhead, and help very short RTT flows. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30tcp: tcp_sack_new_ofo_skb() should be more conservativeEric Dumazet
Currently, tcp_sack_new_ofo_skb() sends an ack if prior acks were 'compressed', if room has to be made in tp->selective_acks[] But there is no guarantee all four sack ranges can be included in SACK option. As a matter of fact, when TCP timestamps option is used, only three SACK ranges can be included. Lets assume only two ranges can be included, and force the ack: - When we touch more than 2 ranges in the reordering done if tcp_sack_extend() could be done. - If we have at least 2 ranges when adding a new one. This enforces that before a range is in third or fourth position, at least one ACK packet included it in first/second position. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30tcp: add tp->dup_ack_counterEric Dumazet
In commit 86de5921a3d5 ("tcp: defer SACK compression after DupThresh") I added a TCP_FASTRETRANS_THRESH bias to tp->compressed_ack in order to enable sack compression only after 3 dupacks. Since we plan to relax this rule for flows that involve stacks not requiring this old rule, this patch adds a distinct tp->dup_ack_counter. This means the TCP_FASTRETRANS_THRESH value is now used in a single location that a future patch can adjust: if (tp->dup_ack_counter < TCP_FASTRETRANS_THRESH) { tp->dup_ack_counter++; goto send_now; } This patch also introduces tcp_sack_compress_send_ack() helper to ease following patch comprehension. This patch refines LINUX_MIB_TCPACKCOMPRESSED to not count the acks that we had to send if the timer expires or tcp_sack_compress_send_ack() is sending an ack. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30docs: networking: convert tproxy.txt to ReSTMauro Carvalho Chehab
- add SPDX header; - adjust title markup; - mark code blocks and literals as such; - adjust identation, whitespaces and blank lines where needed; - add to networking/index.rst. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30docs: networking: convert rxrpc.txt to ReSTMauro Carvalho Chehab
- add SPDX header; - adjust title markup; - use autonumbered list markups; - mark code blocks and literals as such; - mark tables as such; - adjust identation, whitespaces and blank lines where needed; - add to networking/index.rst. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30docs: networking: convert radiotap-headers.txt to ReSTMauro Carvalho Chehab
- add SPDX header; - adjust title markup; - mark code blocks and literals as such; - adjust identation, whitespaces and blank lines where needed; - add to networking/index.rst. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>