summaryrefslogtreecommitdiffstats
path: root/net/sched
AgeCommit message (Collapse)Author
2018-08-11net: sched: act_gact: remove dependency on rtnl lockVlad Buslov
Use tcf spinlock to protect gact action private state from concurrent modification during dump and init. Remove rtnl assertion that is no longer necessary. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11net: sched: act_csum: remove dependency on rtnl lockVlad Buslov
Use tcf lock to protect csum action struct private data from concurrent modification in init and dump. Use rcu swap operation to reassign params pointer under protection of tcf lock. (old params value is not used by init, so there is no need of standalone rcu dereference step) Remove rtnl assertion that is no longer necessary. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11net: sched: act_bpf: remove dependency on rtnl lockVlad Buslov
Use tcf spinlock to protect bpf action private data from concurrent modification during dump and init. Remove rtnl lock assertion that is no longer necessary. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-09net: sched: fix block->refcnt decrementJiri Pirko
Currently the refcnt is never decremented in case the value is not 1. Fix it by adding decrement in case the refcnt is not 1. Reported-by: Vlad Buslov <vladbu@mellanox.com> Fixes: f71e0ca4db18 ("net: sched: Avoid implicit chain 0 creation") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-07net: sched: cls_flower: set correct offload data in fl_reoffloadVlad Buslov
fl_reoffload implementation sets following members of struct tc_cls_flower_offload incorrectly: - masked key instead of mask - key instead of masked key Fix fl_reoffload to provide correct data to offload callback. Fixes: 31533cba4327 ("net: sched: cls_flower: implement offload tcf_proto_op") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-07net/sched: allow flower to match tunnel optionsPieter Jansen van Vuuren
Allow matching on options in Geneve tunnel headers. This makes use of existing tunnel metadata support. The options can be described in the form CLASS:TYPE:DATA/CLASS_MASK:TYPE_MASK:DATA_MASK, where CLASS is represented as a 16bit hexadecimal value, TYPE as an 8bit hexadecimal value and DATA as a variable length hexadecimal value. e.g. # ip link add name geneve0 type geneve dstport 0 external # tc qdisc add dev geneve0 ingress # tc filter add dev geneve0 protocol ip parent ffff: \ flower \ enc_src_ip 10.0.99.192 \ enc_dst_ip 10.0.99.193 \ enc_key_id 11 \ geneve_opts 0102:80:1122334421314151/ffff:ff:ffffffffffffffff \ ip_proto udp \ action mirred egress redirect dev eth1 This patch adds support for matching Geneve options in the order supplied by the user. This leads to an efficient implementation in the software datapath (and in our opinion hardware datapaths that offload this feature). It is also compatible with Geneve options matching provided by the Open vSwitch kernel datapath which is relevant here as the Flower classifier may be used as a mechanism to program flows into hardware as a form of Open vSwitch datapath offload (sometimes referred to as OVS-TC). The netlink Kernel/Userspace API may be extended, for example by adding a flag, if other matching options are desired, for example matching given options in any order. This would require an implementation in the TC software datapath. And be done in a way that drivers that facilitate offload of the Flower classifier can reject or accept such flows based on hardware datapath capabilities. This approach was discussed and agreed on at Netconf 2017 in Seoul. Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-05net: sched: cls_flower: Fix an error code in fl_tmplt_create()Dan Carpenter
We forgot to set the error code on this path, so we return NULL instead of an error pointer. In the current code kzalloc() won't fail for small allocations so this doesn't really affect runtime. Fixes: b95ec7eb3b4d ("net: sched: cls_flower: implement chain templates") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-03net: sched: fix flush on non-existing chainJiri Pirko
User was able to perform filter flush on chain 0 even if it didn't have any filters in it. With the patch that avoided implicit chain 0 creation, this changed. So in case user wants filter flush on chain which does not exist, just return success. There's no reason for non-0 chains to behave differently than chain 0, so do the same for them. Reported-by: Ido Schimmel <idosch@mellanox.com> Fixes: f71e0ca4db18 ("net: sched: Avoid implicit chain 0 creation") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01net: sched: make tcf_chain_{get,put}() staticJiri Pirko
These are no longer used outside of cls_api.c so make them static. Move tcf_chain_flush() to avoid fwd declaration of tcf_chain_put(). Signed-off-by: Jiri Pirko <jiri@mellanox.com> v1->v2: - new patch Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01net: sched: fix notifications for action-held chainsJiri Pirko
Chains that only have action references serve as placeholders. Until a non-action reference is created, user should not be aware of the chain. Also he should not receive any notifications about it. So send notifications for the new chain only in case the chain gets the first non-action reference. Symmetrically to that, when the last non-action reference is dropped, send the notification about deleted chain. Reported-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> v1->v2: - made __tcf_chain_{get,put}() static as suggested by Cong Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01net: sched: change name of zombie chain to "held_by_acts_only"Jiri Pirko
As mentioned by Cong and Jakub during the review process, it is a bit odd to sometimes (act flow) create a new chain which would be immediately a "zombie". So just rename it to "held_by_acts_only". Signed-off-by: Jiri Pirko <jiri@mellanox.com> Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-30act_mirred: use TC_ACT_REINSERT when possiblePaolo Abeni
When mirred is invoked from the ingress path, and it wants to redirect the processed packet, it can now use the TC_ACT_REINSERT action, filling the tcf_result accordingly, and avoiding a per packet skb_clone(). Overall this gives a ~10% improvement in forwarding performance for the TC S/W data path and TC S/W performances are now comparable to the kernel openvswitch datapath. v1 -> v2: use ACT_MIRRED instead of ACT_REDIRECT v2 -> v3: updated after action rename, fixed typo into the commit message v3 -> v4: updated again after action rename, added more comments to the code (JiriP), skip the optimization if the control action need to touch the tcf_result (Paolo) v4 -> v5: fix sparse warning (kbuild bot) Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-30tc/act: remove unneeded RCU lock in action callbackPaolo Abeni
Each lockless action currently does its own RCU locking in ->act(). This allows using plain RCU accessor, even if the context is really RCU BH. This change drops the per action RCU lock, replace the accessors with the _bh variant, cleans up a bit the surrounding code and documents the RCU status in the relevant header. No functional nor performance change is intended. The goal of this patch is clarifying that the RCU critical section used by the tc actions extends up to the classifier's caller. v1 -> v2: - preserve rcu lock in act_bpf: it's needed by eBPF helpers, as pointed out by Daniel v3 -> v4: - fixed some typos in the commit message (JiriP) Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-30net/sched: user-space can't set unknown tcfa_action valuesPaolo Abeni
Currently, when initializing an action, the user-space can specify and use arbitrary values for the tcfa_action field. If the value is unknown by the kernel, is implicitly threaded as TC_ACT_UNSPEC. This change explicitly checks for unknown values at action creation time, and explicitly convert them to TC_ACT_UNSPEC. No functional changes are introduced, but this will allow introducing tcfa_action values not exposed to user-space in a later patch. Note: we can't use the above to hide TC_ACT_REDIRECT from user-space, as the latter is already part of uAPI. v3 -> v4: - use an helper to check for action validity (JiriP) - emit an extack for invalid actions (JiriP) v4 -> v5: - keep messages on a single line, drop net_warn (Marcelo) Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-29act_bpf: Use kmemdup instead of duplicating it in tcf_bpf_init_from_opsYueHaibing
Replace calls to kmalloc followed by a memcpy with a direct call to kmemdup. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-29cls_bpf: Use kmemdup instead of duplicating it in cls_bpf_prog_from_opsYueHaibing
Replace calls to kmalloc followed by a memcpy with a direct call to kmemdup. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-29act_pedit: remove unnecessary semicolonYueHaibing
net/sched/act_pedit.c:289:2-3: Unneeded semicolon Remove unneeded semicolon. Generated by: scripts/coccinelle/misc/semicolon.cocci Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-27sch_cake: Make gso-splitting configurable from userspaceDave Taht
This patch restores cake's deployed behavior at line rate to always split gso, and makes gso splitting configurable from userspace. running cake unlimited (unshaped) at 1gigE, local traffic: no-split-gso bql limit: 131966 split-gso bql limit: ~42392-45420 On this 4 stream test splitting gso apart results in halving the observed interpacket latency at no loss in throughput. Summary of tcp_nup test run 'gso-split' (at 2018-07-26 16:03:51.824728): Ping (ms) ICMP : 0.83 0.81 ms 341 TCP upload avg : 235.43 235.39 Mbits/s 301 TCP upload sum : 941.71 941.56 Mbits/s 301 TCP upload::1 : 235.45 235.43 Mbits/s 271 TCP upload::2 : 235.45 235.41 Mbits/s 289 TCP upload::3 : 235.40 235.40 Mbits/s 288 TCP upload::4 : 235.41 235.40 Mbits/s 291 verses Summary of tcp_nup test run 'no-split-gso' (at 2018-07-26 16:37:23.563960): avg median # data pts Ping (ms) ICMP : 1.67 1.73 ms 348 TCP upload avg : 234.56 235.37 Mbits/s 301 TCP upload sum : 938.24 941.49 Mbits/s 301 TCP upload::1 : 234.55 235.38 Mbits/s 285 TCP upload::2 : 234.57 235.37 Mbits/s 286 TCP upload::3 : 234.58 235.37 Mbits/s 274 TCP upload::4 : 234.54 235.42 Mbits/s 288 Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-27net: sched: don't dump chains only held by actionsJiri Pirko
In case a chain is empty and not explicitly created by a user, such chain should not exist. The only exception is if there is an action "goto chain" pointing to it. In that case, don't show the chain in the dump. Track the chain references held by actions and use them to find out if a chain should or should not be shown in chain dump. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-26net: sched: unmark chain as explicitly created on deleteJiri Pirko
Once user manually deletes the chain using "chain del", the chain cannot be marked as explicitly created anymore. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Fixes: 32a4f5ecd738 ("net: sched: introduce chain object to uapi") Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-26net: sched: cls_api: fix dead code in switchGustavo A. R. Silva
Code at line 1850 is unreachable. Fix this by removing the break statement above it, so the code for case RTM_GETCHAIN can be properly executed. Addresses-Coverity-ID: 1472050 ("Structurally dead code") Fixes: 32a4f5ecd738 ("net: sched: introduce chain object to uapi") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-26cbs: Add support for the graft functionVinicius Costa Gomes
This will allow to install a child qdisc under cbs. The main use case is to install ETF (Earliest TxTime First) qdisc under cbs, so there's another level of control for time-sensitive traffic. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-25net/sched: cls_flower: Use correct inline function for assignment of vlan tpidJianbo Liu
This fixes the following sparse warning: net/sched/cls_flower.c:1356:36: warning: incorrect type in argument 3 (different base types) net/sched/cls_flower.c:1356:36: expected unsigned short [unsigned] [usertype] value net/sched/cls_flower.c:1356:36: got restricted __be16 [usertype] vlan_tpid Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reported-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-24net/sched: add skbprio schedulerNishanth Devarajan
Skbprio (SKB Priority Queue) is a queueing discipline that prioritizes packets according to their skb->priority field. Under congestion, already-enqueued lower priority packets will be dropped to make space available for higher priority packets. Skbprio was conceived as a solution for denial-of-service defenses that need to route packets with different priorities as a means to overcome DoS attacks. v5 *Do not reference qdisc_dev(sch)->tx_queue_len for setting limit. Instead set default sch->limit to 64. v4 *Drop Documentation/networking/sch_skbprio.txt doc file to move it to tc man page for Skbprio, in iproute2. v3 *Drop max_limit parameter in struct skbprio_sched_data and instead use sch->limit. *Reference qdisc_dev(sch)->tx_queue_len only once, during initialisation for qdisc (previously being referenced every time qdisc changes). *Move qdisc's detailed description from in-code to Documentation/networking. *When qdisc is saturated, enqueue incoming packet first before dequeueing lowest priority packet in queue - improves usage of call stack registers. *Introduce and use overlimit stat to keep track of number of dropped packets. v2 *Use skb->priority field rather than DS field. Rename queueing discipline as SKB Priority Queue (previously Gatekeeper Priority Queue). *Queueing discipline is made classful to expose Skbprio's internal priority queues. Signed-off-by: Nishanth Devarajan <ndev2021@gmail.com> Reviewed-by: Sachin Paryani <sachin.paryani@gmail.com> Reviewed-by: Cody Doucette <doucette@bu.edu> Reviewed-by: Michel Machado <michel@digirati.com.br> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-24sched: fix trailing whitespaceStephen Hemminger
Remove trailing whitespace and blank lines at EOF Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23net: sched: cls_flower: propagate chain teplate creation and destruction to ↵Jiri Pirko
drivers Introduce a couple of flower offload commands in order to propagate template creation/destruction events down to device drivers. Drivers may use this information to prepare HW in an optimal way for future filter insertions. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23net: sched: cls_flower: implement chain templatesJiri Pirko
Use the previously introduced template extension and implement callback to create, destroy and dump chain template. The existing parsing and dumping functions are re-used. Also, check if newly added filters fit the template if it is set. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23net: sched: cls_flower: change fl_init_dissector to accept mask and dissectorJiri Pirko
This function is going to be used for templates as well, so we need to pass the pointer separately. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23net: sched: cls_flower: move key/mask dumping into a separate functionJiri Pirko
Push key/mask dumping from fl_dump() into a separate function fl_dump_key(), that will be reused for template dumping. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23net: sched: introduce chain templatesJiri Pirko
Allow user to set a template for newly created chains. Template lock down the chain for particular classifier type/options combinations. The classifier needs to support templates, otherwise kernel would reply with error. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23net: sched: introduce chain object to uapiJiri Pirko
Allow user to create, destroy, get and dump chain objects. Do that by extending rtnl commands by the chain-specific ones. User will now be able to explicitly create or destroy chains (so far this was done only automatically according the filter/act needs and refcounting). Also, the user will receive notification about any chain creation or destuction. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23net: sched: Avoid implicit chain 0 creationJiri Pirko
Currently, chain 0 is implicitly created during block creation. However that does not align with chain object exposure, creation and destruction api introduced later on. So make the chain 0 behave the same way as any other chain and only create it when it is needed. Since chain 0 is somehow special as the qdiscs need to hold pointer to the first chain tp, this requires to move the chain head change callback infra to the block structure. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23net: sched: push ops lookup bits into tcf_proto_lookup_ops()Jiri Pirko
Push all bits that take care of ops lookup, including module loading outside tcf_proto_create() function, into tcf_proto_lookup_ops() Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-21net: sched: use PTR_ERR_OR_ZERO macro in tcf_block_cb_registerGustavo A. R. Silva
This line makes up what macro PTR_ERR_OR_ZERO already does. So, make use of PTR_ERR_OR_ZERO rather than an open-code version. This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-20Merge ra.kernel.org:/pub/scm/linux/kernel/git/torvalds/linuxDavid S. Miller
All conflicts were trivial overlapping changes, so reasonably easy to resolve. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-19net/sched: cls_flower: Support matching on ip tos and ttl for tunnelsOr Gerlitz
Allow users to set rules matching on ipv4 tos and ttl or ipv6 traffic-class and hoplimit of tunnel headers. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-19net/sched: tunnel_key: Allow to set tos and ttl for tc based ip tunnelsOr Gerlitz
Allow user-space to provide tos and ttl to be set for the tunnel headers. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-18net: sched: Using NULL instead of plain integerYueHaibing
Fixes the following sparse warnings: net/sched/cls_api.c:1101:43: warning: Using plain integer as NULL pointer net/sched/cls_api.c:1492:75: warning: Using plain integer as NULL pointer Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-16sch_cake: Fix tin order when set through skb->priorityToke Høiland-Jørgensen
In diffserv mode, CAKE stores tins in a different order internally than the logical order exposed to userspace. The order remapping was missing in the handling of 'tc filter' priority mappings through skb->priority, resulting in bulk and best effort mappings being reversed relative to how they are displayed. Fix this by adding the missing mapping when reading skb->priority. Fixes: 83f8fd69af4f ("sch_cake: Add DiffServ handling") Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-13net: sched: refactor flower walk to iterate over idrVlad Buslov
Extend struct tcf_walker with additional 'cookie' field. It is intended to be used by classifier walk implementations to continue iteration directly from particular filter, instead of iterating 'skip' number of times. Change flower walk implementation to save filter handle in 'cookie'. Each time flower walk is called, it looks up filter with saved handle directly with idr, instead of iterating over filter linked list 'skip' number of times. This change improves complexity of dumping flower classifier from quadratic to linearithmic. (assuming idr lookup has logarithmic complexity) Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reported-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12net/sched: act_skbedit: don't use spinlock in the data pathDavide Caratti
use RCU instead of spin_{,un}lock_bh, to protect concurrent read/write on act_skbedit configuration. This reduces the effects of contention in the data path, in case multiple readers are present. Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12net/sched: skbedit: use per-cpu countersDavide Caratti
use per-CPU counters, instead of sharing a single set of stats with all cores: this removes the need of spinlocks when stats are read/updated. Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12sch_fq_codel: zero q->flows_cnt when fq_codel_init failsJacob Keller
When fq_codel_init fails, qdisc_create_dflt will cleanup by using qdisc_destroy. This function calls the ->reset() op prior to calling the ->destroy() op. Unfortunately, during the failure flow for sch_fq_codel, the ->flows parameter is not initialized, so the fq_codel_reset function will null pointer dereference. kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 kernel: IP: fq_codel_reset+0x58/0xd0 [sch_fq_codel] kernel: PGD 0 P4D 0 kernel: Oops: 0000 [#1] SMP PTI kernel: Modules linked in: i40iw i40e(OE) xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack tun bridge stp llc devlink ebtable_filter ebtables ip6table_filter ip6_tables rpcrdma ib_isert iscsi_target_mod sunrpc ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate iTCO_wdt iTCO_vendor_support intel_uncore ib_core intel_rapl_perf mei_me mei joydev i2c_i801 lpc_ich ioatdma shpchp wmi sch_fq_codel xfs libcrc32c mgag200 ixgbe drm_kms_helper isci ttm firewire_ohci kernel: mdio drm igb libsas crc32c_intel firewire_core ptp pps_core scsi_transport_sas crc_itu_t dca i2c_algo_bit ipmi_si ipmi_devintf ipmi_msghandler [last unloaded: i40e] kernel: CPU: 10 PID: 4219 Comm: ip Tainted: G OE 4.16.13custom-fq-codel-test+ #3 kernel: Hardware name: Intel Corporation S2600CO/S2600CO, BIOS SE5C600.86B.02.05.0004.051120151007 05/11/2015 kernel: RIP: 0010:fq_codel_reset+0x58/0xd0 [sch_fq_codel] kernel: RSP: 0018:ffffbfbf4c1fb620 EFLAGS: 00010246 kernel: RAX: 0000000000000400 RBX: 0000000000000000 RCX: 00000000000005b9 kernel: RDX: 0000000000000000 RSI: ffff9d03264a60c0 RDI: ffff9cfd17b31c00 kernel: RBP: 0000000000000001 R08: 00000000000260c0 R09: ffffffffb679c3e9 kernel: R10: fffff1dab06a0e80 R11: ffff9cfd163af800 R12: ffff9cfd17b31c00 kernel: R13: 0000000000000001 R14: ffff9cfd153de600 R15: 0000000000000001 kernel: FS: 00007fdec2f92800(0000) GS:ffff9d0326480000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 0000000000000008 CR3: 0000000c1956a006 CR4: 00000000000606e0 kernel: Call Trace: kernel: qdisc_destroy+0x56/0x140 kernel: qdisc_create_dflt+0x8b/0xb0 kernel: mq_init+0xc1/0xf0 kernel: qdisc_create_dflt+0x5a/0xb0 kernel: dev_activate+0x205/0x230 kernel: __dev_open+0xf5/0x160 kernel: __dev_change_flags+0x1a3/0x210 kernel: dev_change_flags+0x21/0x60 kernel: do_setlink+0x660/0xdf0 kernel: ? down_trylock+0x25/0x30 kernel: ? xfs_buf_trylock+0x1a/0xd0 [xfs] kernel: ? rtnl_newlink+0x816/0x990 kernel: ? _xfs_buf_find+0x327/0x580 [xfs] kernel: ? _cond_resched+0x15/0x30 kernel: ? kmem_cache_alloc+0x20/0x1b0 kernel: ? rtnetlink_rcv_msg+0x200/0x2f0 kernel: ? rtnl_calcit.isra.30+0x100/0x100 kernel: ? netlink_rcv_skb+0x4c/0x120 kernel: ? netlink_unicast+0x19e/0x260 kernel: ? netlink_sendmsg+0x1ff/0x3c0 kernel: ? sock_sendmsg+0x36/0x40 kernel: ? ___sys_sendmsg+0x295/0x2f0 kernel: ? ebitmap_cmp+0x6d/0x90 kernel: ? dev_get_by_name_rcu+0x73/0x90 kernel: ? skb_dequeue+0x52/0x60 kernel: ? __inode_wait_for_writeback+0x7f/0xf0 kernel: ? bit_waitqueue+0x30/0x30 kernel: ? fsnotify_grab_connector+0x3c/0x60 kernel: ? __sys_sendmsg+0x51/0x90 kernel: ? do_syscall_64+0x74/0x180 kernel: ? entry_SYSCALL_64_after_hwframe+0x3d/0xa2 kernel: Code: 00 00 48 89 87 00 02 00 00 8b 87 a0 01 00 00 85 c0 0f 84 84 00 00 00 31 ed 48 63 dd 83 c5 01 48 c1 e3 06 49 03 9c 24 90 01 00 00 <48> 8b 73 08 48 8b 3b e8 6c 9a 4f f6 48 8d 43 10 48 c7 03 00 00 kernel: RIP: fq_codel_reset+0x58/0xd0 [sch_fq_codel] RSP: ffffbfbf4c1fb620 kernel: CR2: 0000000000000008 kernel: ---[ end trace e81a62bede66274e ]--- This is caused because flows_cnt is non-zero, but flows hasn't been initialized. fq_codel_init has left the private data in a partially initialized state. To fix this, reset flows_cnt to 0 when we fail to initialize. Additionally, to make the state more consistent, also cleanup the flows pointer when the allocation of backlogs fails. This fixes the NULL pointer dereference, since both the for-loop and memset in fq_codel_reset will be no-ops when flow_cnt is zero. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-11net: sched: fix unprotected access to rcu cookie pointerVlad Buslov
Fix action attribute size calculation function to take rcu read lock and access act_cookie pointer with rcu dereference. Fixes: eec94fdb0480 ("net: sched: use rcu for action cookie update") Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-11net: sched: act_ife: fix memory leak in ife initVlad Buslov
Free params if tcf_idr_check_alloc() returned error. Fixes: 0190c1d452a9 ("net: sched: atomically check-allocate action") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-11net/sched: flower: Fix null pointer dereference when run tc vlan commandJianbo Liu
Zahari issued tc vlan command without setting vlan_ethtype, which will crash kernel. To avoid this, we must check tb[TCA_FLOWER_KEY_VLAN_ETH_TYPE] is not null before use it. Also we don't need to dump vlan_ethtype or cvlan_ethtype in this case. Fixes: d64efd0926ba ('net/sched: flower: Add supprt for matching on QinQ vlan headers') Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reported-by: Zahari Doychev <zahari.doychev@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-10sch_cake: Conditionally split GSO segmentsToke Høiland-Jørgensen
At lower bandwidths, the transmission time of a single GSO segment can add an unacceptable amount of latency due to HOL blocking. Furthermore, with a software shaper, any tuning mechanism employed by the kernel to control the maximum size of GSO segments is thrown off by the artificial limit on bandwidth. For this reason, we split GSO segments into their individual packets iff the shaper is active and configured to a bandwidth <= 1 Gbps. Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-10sch_cake: Add overhead compensation support to the rate shaperToke Høiland-Jørgensen
This commit adds configurable overhead compensation support to the rate shaper. With this feature, userspace can configure the actual bottleneck link overhead and encapsulation mode used, which will be used by the shaper to calculate the precise duration of each packet on the wire. This feature is needed because CAKE is often deployed one or two hops upstream of the actual bottleneck (which can be, e.g., inside a DSL or cable modem). In this case, the link layer characteristics and overhead reported by the kernel does not match the actual bottleneck. Being able to set the actual values in use makes it possible to configure the shaper rate much closer to the actual bottleneck rate (our experience shows it is possible to get with 0.1% of the actual physical bottleneck rate), thus keeping latency low without sacrificing bandwidth. The overhead compensation has three tunables: A fixed per-packet overhead size (which, if set, will be accounted from the IP packet header), a minimum packet size (MPU) and a framing mode supporting either ATM or PTM framing. We include a set of common keywords in TC to help users configure the right parameters. If no overhead value is set, the value reported by the kernel is used. Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-10sch_cake: Add DiffServ handlingToke Høiland-Jørgensen
This adds support for DiffServ-based priority queueing to CAKE. If the shaper is in use, each priority tier gets its own virtual clock, which limits that tier's rate to a fraction of the overall shaped rate, to discourage trying to game the priority mechanism. CAKE defaults to a simple, three-tier mode that interprets most code points as "best effort", but places CS1 traffic into a low-priority "bulk" tier which is assigned 1/16 of the total rate, and a few code points indicating latency-sensitive or control traffic (specifically TOS4, VA, EF, CS6, CS7) into a "latency sensitive" high-priority tier, which is assigned 1/4 rate. The other supported DiffServ modes are a 4-tier mode matching the 802.11e precedence rules, as well as two 8-tier modes, one of which implements strict precedence of the eight priority levels. This commit also adds an optional DiffServ 'wash' mode, which will zero out the DSCP fields of any packet passing through CAKE. While this can technically be done with other mechanisms in the kernel, having the feature available in CAKE significantly decreases configuration complexity; and the implementation cost is low on top of the other DiffServ-handling code. Filters and applications can set the skb->priority field to override the DSCP-based classification into tiers. If TC_H_MAJ(skb->priority) matches CAKE's qdisc handle, the minor number will be interpreted as a priority tier if it is less than or equal to the number of configured priority tiers. Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-10sch_cake: Add NAT awareness to packet classifierToke Høiland-Jørgensen
When CAKE is deployed on a gateway that also performs NAT (which is a common deployment mode), the host fairness mechanism cannot distinguish internal hosts from each other, and so fails to work correctly. To fix this, we add an optional NAT awareness mode, which will query the kernel conntrack mechanism to obtain the pre-NAT addresses for each packet and use that in the flow and host hashing. When the shaper is enabled and the host is already performing NAT, the cost of this lookup is negligible. However, in unlimited mode with no NAT being performed, there is a significant CPU cost at higher bandwidths. For this reason, the feature is turned off by default. Cc: netfilter-devel@vger.kernel.org Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: David S. Miller <davem@davemloft.net>