summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)Author
2017-11-03net_sched: check NULL in tcf_block_put()Cong Wang
Callers of tcf_block_put() could pass NULL so we can't use block->q before checking if block is NULL or not. tcf_block_put_ext() callers are fine, it is always non-NULL. Fixes: 8c4083b30e56 ("net: sched: add block bind/unbind notif. and extended block_get/put") Reported-by: Dave Taht <dave.taht@gmail.com> Cc: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03tcp: tcp_fragment() should not assume rtx skbsEric Dumazet
While stress testing MTU probing, we had crashes in list_del() that we root-caused to the fact that tcp_fragment() is unconditionally inserting the freshly allocated skb into tsorted_sent_queue list. But this list is supposed to contain skbs that were sent. This was mostly harmless until MTU probing was enabled. Fortunately we can use the tcp_queue enum added later (but in same linux version) for rtx-rb-tree to fix the bug. Fixes: e2080072ed2d ("tcp: new list for sent but unacked skbs for RACK recovery") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Priyaranjan Jha <priyarjha@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03mISDN: hfcpci: Convert timers to use timer_setup()Kees Cook
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: Karsten Keil <isdn@linux-pingi.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Arvind Yadav <arvind.yadav.cs@gmail.com> Cc: Geliang Tang <geliangtang@gmail.com> Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03ISDN: eicon: message: mark expected switch fall-throughsGustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Addresses-Coverity-ID: 114780 Addresses-Coverity-ID: 114781 Addresses-Coverity-ID: 114782 Addresses-Coverity-ID: 114783 Addresses-Coverity-ID: 114784 Addresses-Coverity-ID: 114785 Addresses-Coverity-ID: 114786 Addresses-Coverity-ID: 114787 Addresses-Coverity-ID: 114788 Addresses-Coverity-ID: 114789 Addresses-Coverity-ID: 114790 Addresses-Coverity-ID: 114791 Addresses-Coverity-ID: 114792 Addresses-Coverity-ID: 114793 Addresses-Coverity-ID: 114794 Addresses-Coverity-ID: 114795 Addresses-Coverity-ID: 200521 Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03net: sched: cls_bpf: use bitwise & rather than logical && on gen_flagsColin Ian King
Currently gen_flags is being operated on by a logical && operator rather than a bitwise & operator. This looks incorrect as these should be bit flag operations. Fix this. Detected by CoverityScan, CID#1460305 ("Logical vs. bitwise operator") Fixes: 3f7889c4c79b ("net: sched: cls_bpf: call block callbacks for offload) Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03tcp: fix a lockdep issue in tcp_fastopen_reset_cipher()Eric Dumazet
icsk_accept_queue.fastopenq.lock is only fully initialized at listen() time. LOCKDEP is not happy if we attempt a spin_lock_bh() on it, because of missing annotation. (Although kernel runs just fine) Lets use net->ipv4.tcp_fastopen_ctx_lock to protect ctx access. Fixes: 1fba70e5b6be ("tcp: socket option to set TCP fast open key") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Christoph Paasch <cpaasch@apple.com> Reviewed-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03bpf: fix verifier NULL pointer dereferenceCraig Gallek
do_check() can fail early without allocating env->cur_state under memory pressure. Syzkaller found the stack below on the linux-next tree because of this. kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 27062 Comm: syz-executor5 Not tainted 4.14.0-rc7+ #106 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 task: ffff8801c2c74700 task.stack: ffff8801c3e28000 RIP: 0010:free_verifier_state kernel/bpf/verifier.c:347 [inline] RIP: 0010:bpf_check+0xcf4/0x19c0 kernel/bpf/verifier.c:4533 RSP: 0018:ffff8801c3e2f5c8 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 00000000fffffff4 RCX: 0000000000000000 RDX: 0000000000000070 RSI: ffffffff817d5aa9 RDI: 0000000000000380 RBP: ffff8801c3e2f668 R08: 0000000000000000 R09: 1ffff100387c5d9f R10: 00000000218c4e80 R11: ffffffff85b34380 R12: ffff8801c4dc6a28 R13: 0000000000000000 R14: ffff8801c4dc6a00 R15: ffff8801c4dc6a20 FS: 00007f311079b700(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004d4a24 CR3: 00000001cbcd0000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: bpf_prog_load+0xcbb/0x18e0 kernel/bpf/syscall.c:1166 SYSC_bpf kernel/bpf/syscall.c:1690 [inline] SyS_bpf+0xae9/0x4620 kernel/bpf/syscall.c:1652 entry_SYSCALL_64_fastpath+0x1f/0xbe RIP: 0033:0x452869 RSP: 002b:00007f311079abe8 EFLAGS: 00000212 ORIG_RAX: 0000000000000141 RAX: ffffffffffffffda RBX: 0000000000758020 RCX: 0000000000452869 RDX: 0000000000000030 RSI: 0000000020168000 RDI: 0000000000000005 RBP: 00007f311079aa20 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000212 R12: 00000000004b7550 R13: 00007f311079ab58 R14: 00000000004b7560 R15: 0000000000000000 Code: df 48 c1 ea 03 80 3c 02 00 0f 85 e6 0b 00 00 4d 8b 6e 20 48 b8 00 00 00 00 00 fc ff df 49 8d bd 80 03 00 00 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 b6 0b 00 00 49 8b bd 80 03 00 00 e8 d6 0c 26 RIP: free_verifier_state kernel/bpf/verifier.c:347 [inline] RSP: ffff8801c3e2f5c8 RIP: bpf_check+0xcf4/0x19c0 kernel/bpf/verifier.c:4533 RSP: ffff8801c3e2f5c8 ---[ end trace c8d37f339dc64004 ]--- Fixes: 638f5b90d460 ("bpf: reduce verifier memory consumption") Fixes: 1969db47f8d0 ("bpf: fix verifier memory leaks") Signed-off-by: Craig Gallek <kraig@google.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03tipc: eliminate unnecessary probingJon Maloy
The neighbor monitor employs a threshold, default set to 32 peer nodes, where it activates the "Overlapping Neighbor Monitoring" algorithm. Below that threshold, monitoring is full-mesh, and no "domain records" are passed between the nodes. Because of this, a node never received a peer's ack that it has received the most recent update of the own domain. Hence, the field 'acked_gen' in struct tipc_monitor_state remains permamently at zero, whereas the own domain generation is incremented for each added or removed peer. This has the effect that the function tipc_mon_get_state() always sets the field 'probing' in struct tipc_monitor_state true, again leading the tipc_link_timeout() of the link in question to always send out a probe, even when link->silent_intv_count is zero. This is functionally harmless, but leads to some unncessary probing, which can easily be eliminated by setting the 'probing' field of the said struct correctly in such cases. At the same time, we explictly invalidate the sent domain records when the algorithm is not activated. This will eliminate any risk that an invalid domain record might be inadverently accepted by the peer. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03net: sched: move block offload unbind after all chains are flushedJiri Pirko
Currently, the offload unbind is done before the chains are flushed. That causes driver to unregister block callback before it can get all the callback calls done during flush, leaving the offloaded tps inside the HW. So fix the order to prevent this situation and restore the original behaviour. Reported-by: Alexander Duyck <alexander.duyck@gmail.com> Reported-by: Jakub Kicinski <kubakici@wp.pl> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03cxgb4vf: define get_fecparam ethtool callbackGanesh Goudar
Add support to new ethtool operation get_fecparam to fetch FEC parameters. Original Work by: Casey Leedom <leedom@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03cxgb4: add new T6 pci device id'sGanesh Goudar
Add 0x6086 T6 device id. Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03net/ncsi: Make local function ncsi_get_filter() staticWei Yongjun
Fixes the following sparse warnings: net/ncsi/ncsi-manage.c:41:5: warning: symbol 'ncsi_get_filter' was not declared. Should it be static? Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03net: bridge: Convert timers to use timer_setup()Allen Pais
switch to using the new timer_setup() and from_timer() api's. Signed-off-by: Allen Pais <allen.pais@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03net: bridge: Convert timers to use timer_setup()Allen Pais
switch to using the new timer_setup() and from_timer() api's. Signed-off-by: Allen Pais <allen.pais@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03Merge branch 'mlxsw-Align-multipath-hash-parameters-with-kernels'David S. Miller
Jiri Pirko says: ==================== mlxsw: Align multipath hash parameters with kernel's Ido says: This set makes sure the device is using the same parameters as the kernel when it computes the multipath hash during IP forwarding. First patch adds a new netevent to let interested listeners know that the multipath hash policy has changed. Next two patches do small and non-functional changes in the mlxsw driver. Last patches configure the multipath hash policy upon driver initialization and as a response to netevents. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03mlxsw: spectrum_router: Update multipath hash parameters upon neteventsIdo Schimmel
Make sure the device and the kernel are performing the multipath hash according to the same parameters by updating the device whenever the relevant netevent is generated. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03mlxsw: spectrum_router: Align multipath hash parameters with kernel'sIdo Schimmel
Up until now we used the hardware's defaults for multipath hash computation. This patch aligns the hardware's multipath parameters with the kernel's. For IPv4 packets, the parameters are determined according to the 'fib_multipath_hash_policy' sysctl during module initialization. In case L3-mode is requested, only the source and destination IP addresses are used. There is no special handling of ICMP error packets. In case L4-mode is requested, a 5-tuple is used: source and destination IP addresses, source and destination ports and IP protocol. Note that the layer 4 fields are not considered for fragmented packets. For IPv6 packets, the source and destination IP addresses are used, as well as the flow label and the next header fields. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03mlxsw: reg: Add Router ECMP Configuration Register Version 2Ido Schimmel
The RECRv2 register is used for setting up the router's ECMP hash configuration. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03mlxsw: spectrum_router: Properly name netevent work structIdo Schimmel
The struct containing the work item queued from the netevent handler is named after the only event it is currently used for, which is neighbour updates. Use a more appropriate name for the struct, as we are going to use it for more events. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03mlxsw: spectrum_router: Embed netevent notifier block in router structIdo Schimmel
We are going to need to respond to netevents notifying us about multipath hash updates by configuring the device's hash parameters. Embed the netevent notifier in the router struct so that we could retrieve it upon notifications and use it to configure the device. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03ipv4: Send a netevent whenever multipath hash policy is changedIdo Schimmel
Devices performing IPv4 forwarding need to update their multipath hash policy whenever it is changed. Inform these devices by generating a netevent. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03net: systemport: Correct IPG length settingsFlorian Fainelli
Due to a documentation mistake, the IPG length was set to 0x12 while it should have been 12 (decimal). This would affect short packet (64B typically) performance since the IPG was bigger than necessary. Fixes: 44a4524c54af ("net: systemport: Add support for SYSTEMPORT Lite") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03tcp: do not mangle skb->cb[] in tcp_make_synack()Eric Dumazet
Christoph Paasch sent a patch to address the following issue : tcp_make_synack() is leaving some TCP private info in skb->cb[], then send the packet by other means than tcp_transmit_skb() tcp_transmit_skb() makes sure to clear skb->cb[] to not confuse IPv4/IPV6 stacks, but we have no such cleanup for SYNACK. tcp_make_synack() should not use tcp_init_nondata_skb() : tcp_init_nondata_skb() really should be limited to skbs put in write/rtx queues (the ones that are only sent via tcp_transmit_skb()) This patch fixes the issue and should even save few cpu cycles ;) Fixes: 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache line misses") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Christoph Paasch <cpaasch@apple.com> Reviewed-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03fib: fib_dump_info can no longer use __in_dev_get_rtnlFlorian Westphal
syzbot reported yet another regression added with DOIT_UNLOCKED. When nexthop is marked as dead, fib_dump_info uses __in_dev_get_rtnl(): ./include/linux/inetdevice.h:230 suspicious rcu_dereference_protected() usage! rcu_scheduler_active = 2, debug_locks = 1 1 lock held by syz-executor2/23859: #0: (rcu_read_lock){....}, at: [<ffffffff840283f0>] inet_rtm_getroute+0xaa0/0x2d70 net/ipv4/route.c:2738 [..] lockdep_rcu_suspicious+0x123/0x170 kernel/locking/lockdep.c:4665 __in_dev_get_rtnl include/linux/inetdevice.h:230 [inline] fib_dump_info+0x1136/0x13d0 net/ipv4/fib_semantics.c:1377 inet_rtm_getroute+0xf97/0x2d70 net/ipv4/route.c:2785 .. This isn't safe anymore, callers either hold RTNL mutex or rcu read lock, so these spots must use rcu_dereference_rtnl() or plain rcu_derefence() (plus unconditional rcu read lock). This does the latter. Fixes: 394f51abb3d04f ("ipv4: route: set ipv4 RTM_GETROUTE to not use rtnl") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03cxgb4: fix error return code in cxgb4_set_hash_filter()Wei Yongjun
Fix to return a negative error code from thecxgb4_alloc_atid() error handling case instead of 0. Fixes: 12b276fbf6e0 ("cxgb4: add support to create hash filters") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-By: Kumar Sanghvi <kumaras@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03bpf: fix out-of-bounds access warning in bpf_checkArnd Bergmann
The bpf_verifer_ops array is generated dynamically and may be empty depending on configuration, which then causes an out of bounds access: kernel/bpf/verifier.c: In function 'bpf_check': kernel/bpf/verifier.c:4320:29: error: array subscript is above array bounds [-Werror=array-bounds] This adds a check to the start of the function as a workaround. I would assume that the function is never called in that configuration, so the warning is probably harmless. Fixes: 00176a34d9e2 ("bpf: remove the verifier ops from program structure") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03bpf: fix link error without CONFIG_NETArnd Bergmann
I ran into this link error with the latest net-next plus linux-next trees when networking is disabled: kernel/bpf/verifier.o:(.rodata+0x2958): undefined reference to `tc_cls_act_analyzer_ops' kernel/bpf/verifier.o:(.rodata+0x2970): undefined reference to `xdp_analyzer_ops' It seems that the code was written to deal with varying contents of the arrray, but the actual #ifdef was missing. Both tc_cls_act_analyzer_ops and xdp_analyzer_ops are defined in the core networking code, so adding a check for CONFIG_NET seems appropriate here, and I've verified this with many randconfig builds Fixes: 4f9218aaf8a4 ("bpf: move knowledge about post-translation offsets out of verifier") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03net: Define eth_stp_addr in linux/etherdevice.hEgil Hjelmeland
The lan9303 driver defines eth_stp_addr as a synonym to eth_reserved_addr_base to get the STP ethernet address 01:80:c2:00:00:00. eth_reserved_addr_base is also used to define the start of Bridge Reserved ethernet address range, which happen to be the STP address. br_dev_setup refer to eth_reserved_addr_base as a definition of STP address. Clean up by: - Move the eth_stp_addr definition to linux/etherdevice.h - Use eth_stp_addr instead of eth_reserved_addr_base in br_dev_setup. Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03stmmac: use of_property_read_u32 instead of read_u8Bhadram Varka
Numbers in DT are stored in “cells” which are 32-bits in size. of_property_read_u8 does not work properly because of endianness problem. This causes it to always return 0 with little-endian architectures. Fix it by using of_property_read_u32() OF API. Signed-off-by: Bhadram Varka <vbhadram@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03liquidio: bump up driver version to 1.7.0 to match newer NIC firmwareFelix Manlunas
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Acked-by: Derek Chickles <derek.chickles@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03Merge branch 'net-sched-use-after-free'David S. Miller
Cong Wang says: ==================== net_sched: fix a use-after-free for tc actions This patchset fixes a use-after-free reported by Lucas and closes potential races too. Please see each patch for details. ==================== Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03net_sched: hold netns refcnt for each actionCong Wang
TC actions have been destroyed asynchronously for a long time, previously in a RCU callback and now in a workqueue. If we don't hold a refcnt for its netns, we could use the per netns data structure, struct tcf_idrinfo, after it has been freed by netns workqueue. Hold refcnt to ensure netns destroy happens after all actions are gone. Fixes: ddf97ccdd7cb ("net_sched: add network namespace support for tc actions") Reported-by: Lucas Bates <lucasb@mojatatu.com> Tested-by: Lucas Bates <lucasb@mojatatu.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03net_sched: acquire RTNL in tc_action_net_exit()Cong Wang
I forgot to acquire RTNL in tc_action_net_exit() which leads that action ops->cleanup() is not always called with RTNL. This usually is not a big deal because this function is called after all netns refcnt are gone, but given RTNL protects more than just actions, add it for safety and consistency. Also add an assertion to catch other potential bugs. Fixes: ddf97ccdd7cb ("net_sched: add network namespace support for tc actions") Reported-by: Lucas Bates <lucasb@mojatatu.com> Tested-by: Lucas Bates <lucasb@mojatatu.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03tcp: add tracepoint trace_tcp_retransmit_synack()Song Liu
This tracepoint can be used to trace synack retransmits. It maintains pointer to struct request_sock. We cannot simply reuse trace_tcp_retransmit_skb() here, because the sk here is the LISTEN socket. The IP addresses and ports should be extracted from struct request_sock. Note that, like many other tracepoints, this patch uses IS_ENABLED in TP_fast_assign macro, which triggers sparse warning like: ./include/trace/events/tcp.h:274:1: error: directive in argument list ./include/trace/events/tcp.h:281:1: error: directive in argument list However, there is no good solution to avoid these warnings. To the best of our knowledge, these warnings are harmless. Signed-off-by: Song Liu <songliubraving@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03ipv6: Implement limits on Hop-by-Hop and Destination optionsTom Herbert
RFC 8200 (IPv6) defines Hop-by-Hop options and Destination options extension headers. Both of these carry a list of TLVs which is only limited by the maximum length of the extension header (2048 bytes). By the spec a host must process all the TLVs in these options, however these could be used as a fairly obvious denial of service attack. I think this could in fact be a significant DOS vector on the Internet, one mitigating factor might be that many FWs drop all packets with EH (and obviously this is only IPv6) so an Internet wide attack might not be so effective (yet!). By my calculation, the worse case packet with TLVs in a standard 1500 byte MTU packet that would be processed by the stack contains 1282 invidual TLVs (including pad TLVS) or 724 two byte TLVs. I wrote a quick test program that floods a whole bunch of these packets to a host and sure enough there is substantial time spent in ip6_parse_tlv. These packets contain nothing but unknown TLVS (that are ignored), TLV padding, and bogus UDP header with zero payload length. 25.38% [kernel] [k] __fib6_clean_all 21.63% [kernel] [k] ip6_parse_tlv 4.21% [kernel] [k] __local_bh_enable_ip 2.18% [kernel] [k] ip6_pol_route.isra.39 1.98% [kernel] [k] fib6_walk_continue 1.88% [kernel] [k] _raw_write_lock_bh 1.65% [kernel] [k] dst_release This patch adds configurable limits to Destination and Hop-by-Hop options. There are three limits that may be set: - Limit the number of options in a Hop-by-Hop or Destination options extension header. - Limit the byte length of a Hop-by-Hop or Destination options extension header. - Disallow unrecognized options in a Hop-by-Hop or Destination options extension header. The limits are set in corresponding sysctls: ipv6.sysctl.max_dst_opts_cnt ipv6.sysctl.max_hbh_opts_cnt ipv6.sysctl.max_dst_opts_len ipv6.sysctl.max_hbh_opts_len If a max_*_opts_cnt is less than zero then unknown TLVs are disallowed. The number of known TLVs that are allowed is the absolute value of this number. If a limit is exceeded when processing an extension header the packet is dropped. Default values are set to 8 for options counts, and set to INT_MAX for maximum length. Note the choice to limit options to 8 is an arbitrary guess (roughly based on the fact that the stack supports three HBH options and just one destination option). These limits have being proposed in draft-ietf-6man-rfc6434-bis. Tested (by Martin Lau) I tested out 1 thread (i.e. one raw_udp process). I changed the net.ipv6.max_dst_(opts|hbh)_number between 8 to 2048. With sysctls setting to 2048, the softirq% is packed to 100%. With 8, the softirq% is almost unnoticable from mpstat. v2; - Code and documention cleanup. - Change references of RFC2460 to be RFC8200. - Add reference to RFC6434-bis where the limits will be in standard. Signed-off-by: Tom Herbert <tom@quantonium.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03powerpc/perf: Fix core-imc hotplug callback failure during imc initializationMadhavan Srinivasan
Call trace observed during boot: nest_capp0_imc performance monitor hardware support registered nest_capp1_imc performance monitor hardware support registered core_imc memory allocation for cpu 56 failed Unable to handle kernel paging request for data at address 0xffa400010 Faulting instruction address: 0xc000000000bf3294 0:mon> e cpu 0x0: Vector: 300 (Data Access) at [c000000ff38ff8d0] pc: c000000000bf3294: mutex_lock+0x34/0x90 lr: c000000000bf3288: mutex_lock+0x28/0x90 sp: c000000ff38ffb50 msr: 9000000002009033 dar: ffa400010 dsisr: 80000 current = 0xc000000ff383de00 paca = 0xc000000007ae0000 softe: 0 irq_happened: 0x01 pid = 13, comm = cpuhp/0 Linux version 4.11.0-39.el7a.ppc64le (mockbuild@ppc-058.build.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Tue Oct 3 07:42:44 EDT 2017 0:mon> t [c000000ff38ffb80] c0000000002ddfac perf_pmu_migrate_context+0xac/0x470 [c000000ff38ffc40] c00000000011385c ppc_core_imc_cpu_offline+0x1ac/0x1e0 [c000000ff38ffc90] c000000000125758 cpuhp_invoke_callback+0x198/0x5d0 [c000000ff38ffd00] c00000000012782c cpuhp_thread_fun+0x8c/0x3d0 [c000000ff38ffd60] c0000000001678d0 smpboot_thread_fn+0x290/0x2a0 [c000000ff38ffdc0] c00000000015ee78 kthread+0x168/0x1b0 [c000000ff38ffe30] c00000000000b368 ret_from_kernel_thread+0x5c/0x74 While registering the cpuhoplug callbacks for core-imc, if we fails in the cpuhotplug online path for any random core (either because opal call to initialize the core-imc counters fails or because memory allocation fails for that core), ppc_core_imc_cpu_offline() will get invoked for other cpus who successfully returned from cpuhotplug online path. But in the ppc_core_imc_cpu_offline() path we are trying to migrate the event context, when core-imc counters are not even initialized. Thus creating the above stack dump. Add a check to see if core-imc counters are enabled or not in the cpuhotplug offline path before migrating the context to handle this failing scenario. Fixes: 885dcd709ba9 ("powerpc/perf: Add nest IMC PMU support") Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-02Kbuild: don't pass "-C" to preprocessor when processing linker scriptsLinus Torvalds
For some odd historical reason, we preprocessed the linker scripts with "-C", which keeps comments around. That makes no sense, since the comments are not meaningful for the build anyway. And it actually breaks things, since linker scripts can't have C++ style "//" comments in them, so keeping comments after preprocessing now limits us in odd and surprising ways in our header files for no good reason. The -C option goes back to pre-git and pre-bitkeeper times, but seems to have been historically used (along with "-traditional") for some odd-ball architectures (ia64, MIPS and SH). It probably didn't matter back then either, but might possibly have been used to minimize the difference between the original file and the pre-processed result. The reason for this may be lost in time, but let's not perpetuate it only because we can't remember why we did this crazy thing. This was triggered by the recent addition of SPDX lines to the source tree, where people apparently were confused about why header files couldn't use the C++ comment format. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Greg KH <gregkh@linuxfoundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-02Revert "x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz""Linus Torvalds
This reverts commit 51204e0639c49ada02fd823782ad673b6326d748. There wasn't really any good reason for it, and people are complaining (rightly) that it broke existing practice. Cc: Len Brown <len.brown@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-02Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fix from Catalin Marinas: "Check addr_limit in arm64 __dump_instr()" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: ensure __dump_instr() checks addr_limit
2017-11-02arm64: ensure __dump_instr() checks addr_limitMark Rutland
It's possible for a user to deliberately trigger __dump_instr with a chosen kernel address. Let's avoid problems resulting from this by using get_user() rather than __get_user(), ensuring that we don't erroneously access kernel memory. Where we use __dump_instr() on kernel text, we already switch to KERNEL_DS, so this shouldn't adversely affect those cases. Fixes: 60ffc30d5652810d ("arm64: Exception handling") Cc: stable@vger.kernel.org Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-11-02Merge tag 'spdx_identifiers-4.14-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull initial SPDX identifiers from Greg KH: "License cleanup: add SPDX license identifiers to some files Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>" * tag 'spdx_identifiers-4.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: License cleanup: add SPDX license identifier to uapi header files with a license License cleanup: add SPDX license identifier to uapi header files with no license License cleanup: add SPDX GPL-2.0 license identifier to files with no license
2017-11-02Merge tag 'linux-kselftest-4.14-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest fix from Shuah Khan: "This consists of a single fix to a regression to printing individual test results to the console. An earlier commit changed it to printing just the summary of results, which will negatively impact users that rely on console log to look at the individual test failures. This fix makes it optional to print summary and by default results get printed to the console" * tag 'linux-kselftest-4.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests: lib.mk: print individual test results to console by default
2017-11-02Merge tag 'sound-4.14-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "Unfortunately we still have received a significant amount of changes at the late stage, but at least all are small and clear fixes. There are two fixes for ALSA core stuff, yet another timer race fix and sequencer lockdep annotation fix. Both are spotted by syzkaller, and not too serious but better to paper over quickly. All other commits are about ASoC drivers, most notably, a revert of RT5514 hotword control that was included in 4.14-rc (due to a kind of abuse of kctl TLV ABI), together with topology API fixes and other device-specific small fixes that should go for stable, too" * tag 'sound-4.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: seq: Fix nested rwsem annotation for lockdep splat ALSA: timer: Add missing mutex lock for compat ioctls ASoC: rt5616: fix 0x91 default value ASoC: rt5659: connect LOUT Amp with Charge Pump ASoC: rt5659: register power bit of LOUT Amp ASoC: rt5663: Change the dev getting function in rt5663_irq ASoC: rt5514: Revert Hotword Model control ASoC: topology: Fix a potential memory leak in 'soc_tplg_dapm_widget_denum_create()' ASoC: topology: Fix a potential NULL pointer dereference in 'soc_tplg_dapm_widget_denum_create()' ASoC: rt5514-spi: check irq status to schedule data copy ASoC: adau17x1: Workaround for noise bug in ADC
2017-11-02Merge branch 'fixes-v4.14-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull key handling fixes from James Morris: "Fixes for the Keys subsystem by Eric Biggers" * 'fixes-v4.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: KEYS: fix out-of-bounds read during ASN.1 parsing KEYS: trusted: fix writing past end of buffer in trusted_read() KEYS: return full count in keyring_read() if buffer is too small
2017-11-02futex: futex_wake_op, do not fail on invalid opJiri Slaby
In commit 30d6e0a4190d ("futex: Remove duplicated code and fix undefined behaviour"), I let FUTEX_WAKE_OP to fail on invalid op. Namely when op should be considered as shift and the shift is out of range (< 0 or > 31). But strace's test suite does this madness: futex(0x7fabd78bcffc, 0x5, 0xfacefeed, 0xb, 0x7fabd78bcffc, 0xa0caffee); futex(0x7fabd78bcffc, 0x5, 0xfacefeed, 0xb, 0x7fabd78bcffc, 0xbadfaced); futex(0x7fabd78bcffc, 0x5, 0xfacefeed, 0xb, 0x7fabd78bcffc, 0xffffffff); When I pick the first 0xa0caffee, it decodes as: 0x80000000 & 0xa0caffee: oparg is shift 0x70000000 & 0xa0caffee: op is FUTEX_OP_OR 0x0f000000 & 0xa0caffee: cmp is FUTEX_OP_CMP_EQ 0x00fff000 & 0xa0caffee: oparg is sign-extended 0xcaf = -849 0x00000fff & 0xa0caffee: cmparg is sign-extended 0xfee = -18 That means the op tries to do this: (futex |= (1 << (-849))) == -18 which is completely bogus. The new check of op in the code is: if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28)) { if (oparg < 0 || oparg > 31) return -EINVAL; oparg = 1 << oparg; } which results obviously in the "Invalid argument" errno: FAIL: futex =========== futex(0x7fabd78bcffc, 0x5, 0xfacefeed, 0xb, 0x7fabd78bcffc, 0xa0caffee) = -1: Invalid argument futex.test: failed test: ../futex failed with code 1 So let us soften the failure to print only a (ratelimited) message, crop the value and continue as if it were right. When userspace keeps up, we can switch this to return -EINVAL again. [v2] Do not return 0 immediatelly, proceed with the cropped value. Fixes: 30d6e0a4190d ("futex: Remove duplicated code and fix undefined behaviour") Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Darren Hart <dvhart@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-02Merge branch 'hns3-add-support-for-reset'David S. Miller
Lipeng says: ==================== net: hns3: add support for reset There are 4 reset types for HNS3 PF driver, include global reset, core reset, IMP reset, PF reset.The core reset will reset all datapath of all functions except IMP, MAC and PCI interface. Global reset is equal with the core reset plus all MAC reset. IMP reset is caused by watchdog timer expiration, the same range with core reset. PF reset will reset whole physical function. This patchset adds reset support for hns3 driver and fix some related bugs. --- Change log: V1 -> V2: 1, fix some comments from Yunsheng Lin. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-02net: hns3: hns3:fix a bug about statistic counter in reset processqumingguang
All member of Struct hdev->hw_stats is initialized to 0 as hdev is allocated by devm_kzalloc. But in reset process, hdev will not be allocated again, so need clear hdev->hw_stats in reset process, otherwise the statistic will be wrong after reset. This patch set all of the statistic counters to zero after reset. Signed-off-by: qumingguang <qumingguang@huawei.com> Signed-off-by: Lipeng <lipeng321@huawei.com> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-02net: hns3: Fix a misuse to devm_free_irqqumingguang
we should use free_irq to free the nic irq during the unloading time. because we use request_irq to apply it when nic up. It will crash if up net device after reset the port. This patch fixes the issue. Signed-off-by: qumingguang <qumingguang@huawei.com> Signed-off-by: Lipeng <lipeng321@huawei.com> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-02net: hns3: Add reset interface implementation in clientLipeng
This patch implement the interface of reset notification in hns3_enet, it will do resetting business which include shutdown nic device, free and initialize client side resource. Signed-off-by: qumingguang <qumingguang@huawei.com> Signed-off-by: Lipeng <lipeng321@huawei.com> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-02net: hns3: Add timeout process in hns3_enetLipeng
This patch add timeout handler in hns3_enet.c to handle TX side timeout event, when TX timeout event occur, it will triger NIC driver into reset process. Signed-off-by: qumingguang <qumingguang@huawei.com> Signed-off-by: Lipeng <lipeng321@huawei.com> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>