summaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)Author
2017-03-24ping: implement proper lockingEric Dumazet
We got a report of yet another bug in ping http://www.openwall.com/lists/oss-security/2017/03/24/6 ->disconnect() is not called with socket lock held. Fix this by acquiring ping rwlock earlier. Thanks to Daniel, Alexander and Andrey for letting us know this problem. Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Daniel Jiang <danieljiang0415@gmail.com> Reported-by: Solar Designer <solar@openwall.com> Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24secure_seq: downgrade to per-host timestamp offsetsFlorian Westphal
Unfortunately too many devices (not under our control) use tcp_tw_recycle=1, which depends on timestamps being identical of the same saddr. Although tcp_tw_recycle got removed in net-next we can't make such end hosts disappear so downgrade to per-host timestamp offsets. Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Reported-by: Yvan Vanrossomme <yvan@vanrossomme.net> Fixes: 95a22caee396c ("tcp: randomize tcp timestamp offsets for each connection") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24net: Do not allow negative values for busy_read and busy_poll sysctl interfacesAlexander Duyck
This change basically codifies what I think was already the limitations on the busy_poll and busy_read sysctl interfaces. We weren't checking the lower bounds and as such could input negative values. The behavior when that was used was dependent on the architecture. In order to prevent any issues with that I am just disabling support for values less than 0 since this way we don't have to worry about any odd behaviors. By limiting the sysctl values this way it also makes it consistent with how we handle the SO_BUSY_POLL socket option since the value appears to be reported as a signed integer value and negative values are rejected. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24kcm: return immediately after copy_from_user() failureWANG Cong
There is no reason to continue after a copy_from_user() failure. Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module") Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-23net: neigh: guard against NULL solicit() methodEric Dumazet
Dmitry posted a nice reproducer of a bug triggering in neigh_probe() when dereferencing a NULL neigh->ops->solicit method. This can happen for arp_direct_ops/ndisc_direct_ops and similar, which can be used for NUD_NOARP neighbours (created when dev->header_ops is NULL). Admin can then force changing nud_state to some other state that would fire neigh timer. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Several netfilter fixes from Pablo and the crew: - Handle fragmented packets properly in netfilter conntrack, from Florian Westphal. - Fix SCTP ICMP packet handling, from Ying Xue. - Fix big-endian bug in nftables, from Liping Zhang. - Fix alignment of fake conntrack entry, from Steven Rostedt. 2) Fix feature flags setting in fjes driver, from Taku Izumi. 3) Openvswitch ipv6 tunnel source address not set properly, from Or Gerlitz. 4) Fix jumbo MTU handling in amd-xgbe driver, from Thomas Lendacky. 5) sk->sk_frag.page not released properly in some cases, from Eric Dumazet. 6) Fix RTNL deadlocks in nl80211, from Johannes Berg. 7) Fix erroneous RTNL lockdep splat in crypto, from Herbert Xu. 8) Cure improper inflight handling during AF_UNIX GC, from Andrey Ulanov. 9) sch_dsmark doesn't write to packet headers properly, from Eric Dumazet. 10) Fix SCM_TIMESTAMPING_OPT_STATS handling in TCP, from Soheil Hassas Yeganeh. 11) Add some IDs for Motorola qmi_wwan chips, from Tony Lindgren. 12) Fix nametbl deadlock in tipc, from Ying Xue. 13) GRO and LRO packets not counted correctly in mlx5 driver, from Gal Pressman. 14) Fix reset of internal PHYs in bcmgenet, from Doug Berger. 15) Fix hashmap allocation handling, from Alexei Starovoitov. 16) nl_fib_input() needs stronger netlink message length checking, from Eric Dumazet. 17) Fix double-free of sk->sk_filter during sock clone, from Daniel Borkmann. 18) Fix RX checksum offloading in aquantia driver, from Pavel Belous. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (85 commits) net:ethernet:aquantia: Fix for RX checksum offload. amd-xgbe: Fix the ECC-related bit position definitions sfc: cleanup a condition in efx_udp_tunnel_del() Bluetooth: btqcomsmd: fix compile-test dependency inet: frag: release spinlock before calling icmp_send() tcp: initialize icsk_ack.lrcvtime at session start time genetlink: fix counting regression on ctrl_dumpfamily() socket, bpf: fix sk_filter use after free in sk_clone_lock ipv4: provide stronger user input validation in nl_fib_input() bpf: fix hashmap extra_elems logic enic: update enic maintainers net: bcmgenet: remove bcmgenet_internal_phy_setup() ipv6: make sure to initialize sockc.tsflags before first use fjes: Do not load fjes driver if extended socket device is not power on. fjes: Do not load fjes driver if system does not have extended socket device. net/mlx5e: Count LRO packets correctly net/mlx5e: Count GSO packets correctly net/mlx5: Increase number of max QPs in default profile net/mlx5e: Avoid supporting udp tunnel port ndo for VF reps net/mlx5e: Use the proper UAPI values when offloading TC vlan actions ...
2017-03-22inet: frag: release spinlock before calling icmp_send()Eric Dumazet
Dmitry reported a lockdep splat [1] (false positive) that we can fix by releasing the spinlock before calling icmp_send() from ip_expire() This is a false positive because sending an ICMP message can not possibly re-enter the IP frag engine. [1] [ INFO: possible circular locking dependency detected ] 4.10.0+ #29 Not tainted ------------------------------------------------------- modprobe/12392 is trying to acquire lock: (_xmit_ETHER#2){+.-...}, at: [<ffffffff837a8182>] spin_lock include/linux/spinlock.h:299 [inline] (_xmit_ETHER#2){+.-...}, at: [<ffffffff837a8182>] __netif_tx_lock include/linux/netdevice.h:3486 [inline] (_xmit_ETHER#2){+.-...}, at: [<ffffffff837a8182>] sch_direct_xmit+0x282/0x6d0 net/sched/sch_generic.c:180 but task is already holding lock: (&(&q->lock)->rlock){+.-...}, at: [<ffffffff8389a4d1>] spin_lock include/linux/spinlock.h:299 [inline] (&(&q->lock)->rlock){+.-...}, at: [<ffffffff8389a4d1>] ip_expire+0x51/0x6c0 net/ipv4/ip_fragment.c:201 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&(&q->lock)->rlock){+.-...}: validate_chain kernel/locking/lockdep.c:2267 [inline] __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340 lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline] _raw_spin_lock+0x33/0x50 kernel/locking/spinlock.c:151 spin_lock include/linux/spinlock.h:299 [inline] ip_defrag+0x3a2/0x4130 net/ipv4/ip_fragment.c:669 ip_check_defrag+0x4e3/0x8b0 net/ipv4/ip_fragment.c:713 packet_rcv_fanout+0x282/0x800 net/packet/af_packet.c:1459 deliver_skb net/core/dev.c:1834 [inline] dev_queue_xmit_nit+0x294/0xa90 net/core/dev.c:1890 xmit_one net/core/dev.c:2903 [inline] dev_hard_start_xmit+0x16b/0xab0 net/core/dev.c:2923 sch_direct_xmit+0x31f/0x6d0 net/sched/sch_generic.c:182 __dev_xmit_skb net/core/dev.c:3092 [inline] __dev_queue_xmit+0x13e5/0x1e60 net/core/dev.c:3358 dev_queue_xmit+0x17/0x20 net/core/dev.c:3423 neigh_resolve_output+0x6b9/0xb10 net/core/neighbour.c:1308 neigh_output include/net/neighbour.h:478 [inline] ip_finish_output2+0x8b8/0x15a0 net/ipv4/ip_output.c:228 ip_do_fragment+0x1d93/0x2720 net/ipv4/ip_output.c:672 ip_fragment.constprop.54+0x145/0x200 net/ipv4/ip_output.c:545 ip_finish_output+0x82d/0xe10 net/ipv4/ip_output.c:314 NF_HOOK_COND include/linux/netfilter.h:246 [inline] ip_output+0x1f0/0x7a0 net/ipv4/ip_output.c:404 dst_output include/net/dst.h:486 [inline] ip_local_out+0x95/0x170 net/ipv4/ip_output.c:124 ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1492 ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1512 raw_sendmsg+0x26de/0x3a00 net/ipv4/raw.c:655 inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:761 sock_sendmsg_nosec net/socket.c:633 [inline] sock_sendmsg+0xca/0x110 net/socket.c:643 ___sys_sendmsg+0x4a3/0x9f0 net/socket.c:1985 __sys_sendmmsg+0x25c/0x750 net/socket.c:2075 SYSC_sendmmsg net/socket.c:2106 [inline] SyS_sendmmsg+0x35/0x60 net/socket.c:2101 do_syscall_64+0x2e8/0x930 arch/x86/entry/common.c:281 return_from_SYSCALL_64+0x0/0x7a -> #0 (_xmit_ETHER#2){+.-...}: check_prev_add kernel/locking/lockdep.c:1830 [inline] check_prevs_add+0xa8f/0x19f0 kernel/locking/lockdep.c:1940 validate_chain kernel/locking/lockdep.c:2267 [inline] __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340 lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline] _raw_spin_lock+0x33/0x50 kernel/locking/spinlock.c:151 spin_lock include/linux/spinlock.h:299 [inline] __netif_tx_lock include/linux/netdevice.h:3486 [inline] sch_direct_xmit+0x282/0x6d0 net/sched/sch_generic.c:180 __dev_xmit_skb net/core/dev.c:3092 [inline] __dev_queue_xmit+0x13e5/0x1e60 net/core/dev.c:3358 dev_queue_xmit+0x17/0x20 net/core/dev.c:3423 neigh_hh_output include/net/neighbour.h:468 [inline] neigh_output include/net/neighbour.h:476 [inline] ip_finish_output2+0xf6c/0x15a0 net/ipv4/ip_output.c:228 ip_finish_output+0xa29/0xe10 net/ipv4/ip_output.c:316 NF_HOOK_COND include/linux/netfilter.h:246 [inline] ip_output+0x1f0/0x7a0 net/ipv4/ip_output.c:404 dst_output include/net/dst.h:486 [inline] ip_local_out+0x95/0x170 net/ipv4/ip_output.c:124 ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1492 ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1512 icmp_push_reply+0x372/0x4d0 net/ipv4/icmp.c:394 icmp_send+0x156c/0x1c80 net/ipv4/icmp.c:754 ip_expire+0x40e/0x6c0 net/ipv4/ip_fragment.c:239 call_timer_fn+0x241/0x820 kernel/time/timer.c:1268 expire_timers kernel/time/timer.c:1307 [inline] __run_timers+0x960/0xcf0 kernel/time/timer.c:1601 run_timer_softirq+0x21/0x80 kernel/time/timer.c:1614 __do_softirq+0x31f/0xbe7 kernel/softirq.c:284 invoke_softirq kernel/softirq.c:364 [inline] irq_exit+0x1cc/0x200 kernel/softirq.c:405 exiting_irq arch/x86/include/asm/apic.h:657 [inline] smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:962 apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:707 __read_once_size include/linux/compiler.h:254 [inline] atomic_read arch/x86/include/asm/atomic.h:26 [inline] rcu_dynticks_curr_cpu_in_eqs kernel/rcu/tree.c:350 [inline] __rcu_is_watching kernel/rcu/tree.c:1133 [inline] rcu_is_watching+0x83/0x110 kernel/rcu/tree.c:1147 rcu_read_lock_held+0x87/0xc0 kernel/rcu/update.c:293 radix_tree_deref_slot include/linux/radix-tree.h:238 [inline] filemap_map_pages+0x6d4/0x1570 mm/filemap.c:2335 do_fault_around mm/memory.c:3231 [inline] do_read_fault mm/memory.c:3265 [inline] do_fault+0xbd5/0x2080 mm/memory.c:3370 handle_pte_fault mm/memory.c:3600 [inline] __handle_mm_fault+0x1062/0x2cb0 mm/memory.c:3714 handle_mm_fault+0x1e2/0x480 mm/memory.c:3751 __do_page_fault+0x4f6/0xb60 arch/x86/mm/fault.c:1397 do_page_fault+0x54/0x70 arch/x86/mm/fault.c:1460 page_fault+0x28/0x30 arch/x86/entry/entry_64.S:1011 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&(&q->lock)->rlock); lock(_xmit_ETHER#2); lock(&(&q->lock)->rlock); lock(_xmit_ETHER#2); *** DEADLOCK *** 10 locks held by modprobe/12392: #0: (&mm->mmap_sem){++++++}, at: [<ffffffff81329758>] __do_page_fault+0x2b8/0xb60 arch/x86/mm/fault.c:1336 #1: (rcu_read_lock){......}, at: [<ffffffff8188cab6>] filemap_map_pages+0x1e6/0x1570 mm/filemap.c:2324 #2: (&(ptlock_ptr(page))->rlock#2){+.+...}, at: [<ffffffff81984a78>] spin_lock include/linux/spinlock.h:299 [inline] #2: (&(ptlock_ptr(page))->rlock#2){+.+...}, at: [<ffffffff81984a78>] pte_alloc_one_map mm/memory.c:2944 [inline] #2: (&(ptlock_ptr(page))->rlock#2){+.+...}, at: [<ffffffff81984a78>] alloc_set_pte+0x13b8/0x1b90 mm/memory.c:3072 #3: (((&q->timer))){+.-...}, at: [<ffffffff81627e72>] lockdep_copy_map include/linux/lockdep.h:175 [inline] #3: (((&q->timer))){+.-...}, at: [<ffffffff81627e72>] call_timer_fn+0x1c2/0x820 kernel/time/timer.c:1258 #4: (&(&q->lock)->rlock){+.-...}, at: [<ffffffff8389a4d1>] spin_lock include/linux/spinlock.h:299 [inline] #4: (&(&q->lock)->rlock){+.-...}, at: [<ffffffff8389a4d1>] ip_expire+0x51/0x6c0 net/ipv4/ip_fragment.c:201 #5: (rcu_read_lock){......}, at: [<ffffffff8389a633>] ip_expire+0x1b3/0x6c0 net/ipv4/ip_fragment.c:216 #6: (slock-AF_INET){+.-...}, at: [<ffffffff839b3313>] spin_trylock include/linux/spinlock.h:309 [inline] #6: (slock-AF_INET){+.-...}, at: [<ffffffff839b3313>] icmp_xmit_lock net/ipv4/icmp.c:219 [inline] #6: (slock-AF_INET){+.-...}, at: [<ffffffff839b3313>] icmp_send+0x803/0x1c80 net/ipv4/icmp.c:681 #7: (rcu_read_lock_bh){......}, at: [<ffffffff838ab9a1>] ip_finish_output2+0x2c1/0x15a0 net/ipv4/ip_output.c:198 #8: (rcu_read_lock_bh){......}, at: [<ffffffff836d1dee>] __dev_queue_xmit+0x23e/0x1e60 net/core/dev.c:3324 #9: (dev->qdisc_running_key ?: &qdisc_running_key){+.....}, at: [<ffffffff836d3a27>] dev_queue_xmit+0x17/0x20 net/core/dev.c:3423 stack backtrace: CPU: 0 PID: 12392 Comm: modprobe Not tainted 4.10.0+ #29 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0x2ee/0x3ef lib/dump_stack.c:52 print_circular_bug+0x307/0x3b0 kernel/locking/lockdep.c:1204 check_prev_add kernel/locking/lockdep.c:1830 [inline] check_prevs_add+0xa8f/0x19f0 kernel/locking/lockdep.c:1940 validate_chain kernel/locking/lockdep.c:2267 [inline] __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340 lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline] _raw_spin_lock+0x33/0x50 kernel/locking/spinlock.c:151 spin_lock include/linux/spinlock.h:299 [inline] __netif_tx_lock include/linux/netdevice.h:3486 [inline] sch_direct_xmit+0x282/0x6d0 net/sched/sch_generic.c:180 __dev_xmit_skb net/core/dev.c:3092 [inline] __dev_queue_xmit+0x13e5/0x1e60 net/core/dev.c:3358 dev_queue_xmit+0x17/0x20 net/core/dev.c:3423 neigh_hh_output include/net/neighbour.h:468 [inline] neigh_output include/net/neighbour.h:476 [inline] ip_finish_output2+0xf6c/0x15a0 net/ipv4/ip_output.c:228 ip_finish_output+0xa29/0xe10 net/ipv4/ip_output.c:316 NF_HOOK_COND include/linux/netfilter.h:246 [inline] ip_output+0x1f0/0x7a0 net/ipv4/ip_output.c:404 dst_output include/net/dst.h:486 [inline] ip_local_out+0x95/0x170 net/ipv4/ip_output.c:124 ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1492 ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1512 icmp_push_reply+0x372/0x4d0 net/ipv4/icmp.c:394 icmp_send+0x156c/0x1c80 net/ipv4/icmp.c:754 ip_expire+0x40e/0x6c0 net/ipv4/ip_fragment.c:239 call_timer_fn+0x241/0x820 kernel/time/timer.c:1268 expire_timers kernel/time/timer.c:1307 [inline] __run_timers+0x960/0xcf0 kernel/time/timer.c:1601 run_timer_softirq+0x21/0x80 kernel/time/timer.c:1614 __do_softirq+0x31f/0xbe7 kernel/softirq.c:284 invoke_softirq kernel/softirq.c:364 [inline] irq_exit+0x1cc/0x200 kernel/softirq.c:405 exiting_irq arch/x86/include/asm/apic.h:657 [inline] smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:962 apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:707 RIP: 0010:__read_once_size include/linux/compiler.h:254 [inline] RIP: 0010:atomic_read arch/x86/include/asm/atomic.h:26 [inline] RIP: 0010:rcu_dynticks_curr_cpu_in_eqs kernel/rcu/tree.c:350 [inline] RIP: 0010:__rcu_is_watching kernel/rcu/tree.c:1133 [inline] RIP: 0010:rcu_is_watching+0x83/0x110 kernel/rcu/tree.c:1147 RSP: 0000:ffff8801c391f120 EFLAGS: 00000a03 ORIG_RAX: ffffffffffffff10 RAX: dffffc0000000000 RBX: ffff8801c391f148 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 000055edd4374000 RDI: ffff8801dbe1ae0c RBP: ffff8801c391f1a0 R08: 0000000000000002 R09: 0000000000000000 R10: dffffc0000000000 R11: 0000000000000002 R12: 1ffff10038723e25 R13: ffff8801dbe1ae00 R14: ffff8801c391f680 R15: dffffc0000000000 </IRQ> rcu_read_lock_held+0x87/0xc0 kernel/rcu/update.c:293 radix_tree_deref_slot include/linux/radix-tree.h:238 [inline] filemap_map_pages+0x6d4/0x1570 mm/filemap.c:2335 do_fault_around mm/memory.c:3231 [inline] do_read_fault mm/memory.c:3265 [inline] do_fault+0xbd5/0x2080 mm/memory.c:3370 handle_pte_fault mm/memory.c:3600 [inline] __handle_mm_fault+0x1062/0x2cb0 mm/memory.c:3714 handle_mm_fault+0x1e2/0x480 mm/memory.c:3751 __do_page_fault+0x4f6/0xb60 arch/x86/mm/fault.c:1397 do_page_fault+0x54/0x70 arch/x86/mm/fault.c:1460 page_fault+0x28/0x30 arch/x86/entry/entry_64.S:1011 RIP: 0033:0x7f83172f2786 RSP: 002b:00007fffe859ae80 EFLAGS: 00010293 RAX: 000055edd4373040 RBX: 00007f83175111c8 RCX: 000055edd4373238 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007f8317510970 RBP: 00007fffe859afd0 R08: 0000000000000009 R09: 0000000000000000 R10: 0000000000000064 R11: 0000000000000000 R12: 000055edd4373040 R13: 0000000000000000 R14: 00007fffe859afe8 R15: 0000000000000000 Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22tcp: initialize icsk_ack.lrcvtime at session start timeEric Dumazet
icsk_ack.lrcvtime has a 0 value at socket creation time. tcpi_last_data_recv can have bogus value if no payload is ever received. This patch initializes icsk_ack.lrcvtime for active sessions in tcp_finish_connect(), and for passive sessions in tcp_create_openreq_child() Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22genetlink: fix counting regression on ctrl_dumpfamily()Stanislaw Gruszka
Commit 2ae0f17df1cd ("genetlink: use idr to track families") replaced if (++n < fams_to_skip) continue; into: if (n++ < fams_to_skip) continue; This subtle change cause that on retry ctrl_dumpfamily() call we omit one family that failed to do ctrl_fill_info() on previous call, because cb->args[0] = n number counts also family that failed to do ctrl_fill_info(). Patch fixes the problem and avoid confusion in the future just decrease n counter when ctrl_fill_info() fail. User visible problem caused by this bug is failure to get access to some genetlink family i.e. nl80211. However problem is reproducible only if number of registered genetlink families is big enough to cause second call of ctrl_dumpfamily(). Cc: Xose Vazquez Perez <xose.vazquez@gmail.com> Cc: Larry Finger <Larry.Finger@lwfinger.net> Cc: Johannes Berg <johannes@sipsolutions.net> Fixes: 2ae0f17df1cd ("genetlink: use idr to track families") Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22socket, bpf: fix sk_filter use after free in sk_clone_lockDaniel Borkmann
In sk_clone_lock(), we create a new socket and inherit most of the parent's members via sock_copy() which memcpy()'s various sections. Now, in case the parent socket had a BPF socket filter attached, then newsk->sk_filter points to the same instance as the original sk->sk_filter. sk_filter_charge() is then called on the newsk->sk_filter to take a reference and should that fail due to hitting max optmem, we bail out and release the newsk instance. The issue is that commit 278571baca2a ("net: filter: simplify socket charging") wrongly combined the dismantle path with the failure path of xfrm_sk_clone_policy(). This means, even when charging failed, we call sk_free_unlock_clone() on the newsk, which then still points to the same sk_filter as the original sk. Thus, sk_free_unlock_clone() calls into __sk_destruct() eventually where it tests for present sk_filter and calls sk_filter_uncharge() on it, which potentially lets sk_omem_alloc wrap around and releases the eBPF prog and sk_filter structure from the (still intact) parent. Fix it by making sure that when sk_filter_charge() failed, we reset newsk->sk_filter back to NULL before passing to sk_free_unlock_clone(), so that we don't mess with the parents sk_filter. Only if xfrm_sk_clone_policy() fails, we did reach the point where either the parent's filter was NULL and as a result newsk's as well or where we previously had a successful sk_filter_charge(), thus for that case, we do need sk_filter_uncharge() to release the prior taken reference on sk_filter. Fixes: 278571baca2a ("net: filter: simplify socket charging") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22ipv4: provide stronger user input validation in nl_fib_input()Eric Dumazet
Alexander reported a KMSAN splat caused by reads of uninitialized field (tb_id_in) from user provided struct fib_result_nl It turns out nl_fib_input() sanity tests on user input is a bit wrong : User can pretend nlh->nlmsg_len is big enough, but provide at sendmsg() time a too small buffer. Reported-by: Alexander Potapenko <glider@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22ipv6: make sure to initialize sockc.tsflags before first useAlexander Potapenko
In the case udp_sk(sk)->pending is AF_INET6, udpv6_sendmsg() would jump to do_append_data, skipping the initialization of sockc.tsflags. Fix the problem by moving sockc.tsflags initialization earlier. The bug was detected with KMSAN. Fixes: c14ac9451c34 ("sock: enable timestamping using control messages") Signed-off-by: Alexander Potapenko <glider@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22tipc: fix nametbl deadlock at tipc_nametbl_unsubscribeYing Xue
Until now, tipc_nametbl_unsubscribe() is called at subscriptions reference count cleanup. Usually the subscriptions cleanup is called at subscription timeout or at subscription cancel or at subscriber delete. We have ignored the possibility of this being called from other locations, which causes deadlock as we try to grab the tn->nametbl_lock while holding it already. CPU1: CPU2: ---------- ---------------- tipc_nametbl_publish spin_lock_bh(&tn->nametbl_lock) tipc_nametbl_insert_publ tipc_nameseq_insert_publ tipc_subscrp_report_overlap tipc_subscrp_get tipc_subscrp_send_event tipc_close_conn tipc_subscrb_release_cb tipc_subscrb_delete tipc_subscrp_put tipc_subscrp_put tipc_subscrp_kref_release tipc_nametbl_unsubscribe spin_lock_bh(&tn->nametbl_lock) <<grab nametbl_lock again>> CPU1: CPU2: ---------- ---------------- tipc_nametbl_stop spin_lock_bh(&tn->nametbl_lock) tipc_purge_publications tipc_nameseq_remove_publ tipc_subscrp_report_overlap tipc_subscrp_get tipc_subscrp_send_event tipc_close_conn tipc_subscrb_release_cb tipc_subscrb_delete tipc_subscrp_put tipc_subscrp_put tipc_subscrp_kref_release tipc_nametbl_unsubscribe spin_lock_bh(&tn->nametbl_lock) <<grab nametbl_lock again>> In this commit, we advance the calling of tipc_nametbl_unsubscribe() from the refcount cleanup to the intended callers. Fixes: d094c4d5f5c7 ("tipc: add subscription refcount to avoid invalid delete") Reported-by: John Thompson <thompa.atl@gmail.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22sctp: remove useless err from sctp_association_initXin Long
This patch is to remove the unnecessary temporary variable 'err' from sctp_association_init. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22cgroup, net_cls: iterate the fds of only the tasks which are being migratedTejun Heo
The net_cls controller controls the classid field of each socket which is associated with the cgroup. Because the classid is per-socket attribute, when a task migrates to another cgroup or the configured classid of the cgroup changes, the controller needs to walk all sockets and update the classid value, which was implemented by 3b13758f51de ("cgroups: Allow dynamically changing net_classid"). While the approach is not scalable, migrating tasks which have a lot of fds attached to them is rare and the cost is born by the ones initiating the operations. However, for simplicity, both the migration and classid config change paths call update_classid() which scans all fds of all tasks in the target css. This is an overkill for the migration path which only needs to cover a much smaller subset of tasks which are actually getting migrated in. On cgroup v1, this can lead to unexpected scalability issues when one tries to migrate a task or process into a net_cls cgroup which already contains a lot of fds. Even if the migration traget doesn't have many to get scanned, update_classid() ends up scanning all fds in the target cgroup which can be extremely numerous. Unfortunately, on cgroup v2 which doesn't use net_cls, the problem is even worse. Before bfc2cf6f61fc ("cgroup: call subsys->*attach() only for subsystems which are actually affected by migration"), cgroup core would call the ->css_attach callback even for controllers which don't see actual migration to a different css. As net_cls is always disabled but still mounted on cgroup v2, whenever a process is migrated on the cgroup v2 hierarchy, net_cls sees identity migration from root to root and cgroup core used to call ->css_attach callback for those. The net_cls ->css_attach ends up calling update_classid() on the root net_cls css to which all processes on the system belong to as the controller isn't used. This makes any cgroup v2 migration O(total_number_of_fds_on_the_system) which is horrible and easily leads to noticeable stalls triggering RCU stall warnings and so on. The worst symptom is already fixed in upstream by bfc2cf6f61fc ("cgroup: call subsys->*attach() only for subsystems which are actually affected by migration"); however, backporting that commit is too invasive and we want to avoid other cases too. This patch updates net_cls's cgrp_attach() to iterate fds of only the processes which are actually getting migrated. This removes the surprising migration cost which is dependent on the total number of fds in the target cgroup. As this leaves write_classid() the only user of update_classid(), open-code the helper into write_classid(). Reported-by: David Goode <dgoode@fb.com> Fixes: 3b13758f51de ("cgroups: Allow dynamically changing net_classid") Cc: stable@vger.kernel.org # v4.4+ Cc: Nina Schiff <ninasc@fb.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21tcp: mark skbs with SCM_TIMESTAMPING_OPT_STATSSoheil Hassas Yeganeh
SOF_TIMESTAMPING_OPT_STATS can be enabled and disabled while packets are collected on the error queue. So, checking SOF_TIMESTAMPING_OPT_STATS in sk->sk_tsflags is not enough to safely assume that the skb contains OPT_STATS data. Add a bit in sock_exterr_skb to indicate whether the skb contains opt_stats data. Fixes: 1c885808e456 ("tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING") Reported-by: JongHwan Kim <zzoru007@gmail.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21tcp: fix SCM_TIMESTAMPING_OPT_STATS for normal skbsSoheil Hassas Yeganeh
__sock_recv_timestamp can be called for both normal skbs (for receive timestamps) and for skbs on the error queue (for transmit timestamps). Commit 1c885808e456 (tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING) assumes any skb passed to __sock_recv_timestamp are from the error queue, containing OPT_STATS in the content of the skb. This results in accessing invalid memory or generating junk data. To fix this, set skb->pkt_type to PACKET_OUTGOING for packets on the error queue. This is safe because on the receive path on local sockets skb->pkt_type is never set to PACKET_OUTGOING. With that, copy OPT_STATS from a packet, only if its pkt_type is PACKET_OUTGOING. Fixes: 1c885808e456 ("tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING") Reported-by: JongHwan Kim <zzoru007@gmail.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21sctp: out_qlen should be updated when pruning unsent queueXin Long
This patch is to fix the issue that sctp_prsctp_prune_sent forgot to update q->out_qlen when removing a chunk from unsent queue. Fixes: 8dbdf1f5b09c ("sctp: implement prsctp PRIO policy") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21sctp: remove temporary variable confirm from sctp_packet_transmitXin Long
Commit c86a773c7802 ("sctp: add dst_pending_confirm flag") introduced a temporary variable "confirm" in sctp_packet_transmit. But it broke the rule that longer lines should be above shorter ones. Besides, this variable is not necessary, so this patch is to just remove it and use tp->dst_pending_confirm directly. Fixes: c86a773c7802 ("sctp: add dst_pending_confirm flag") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21sch_dsmark: fix invalid skb_cow() usageEric Dumazet
skb_cow(skb, sizeof(ip header)) is not very helpful in this context. First we need to use pskb_may_pull() to make sure the ip header is in skb linear part, then use skb_try_make_writable() to address clones issues. Fixes: 4c30719f4f55 ("[PKT_SCHED] dsmark: handle cloned and non-linear skb's") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21net: unix: properly re-increment inflight counter of GC discarded candidatesAndrey Ulanov
Dmitry has reported that a BUG_ON() condition in unix_notinflight() may be triggered by a simple code that forwards unix socket in an SCM_RIGHTS message. That is caused by incorrect unix socket GC implementation in unix_gc(). The GC first collects list of candidates, then (a) decrements their "children's" inflight counter, (b) checks which inflight counters are now 0, and then (c) increments all inflight counters back. (a) and (c) are done by calling scan_children() with inc_inflight or dec_inflight as the second argument. Commit 6209344f5a37 ("net: unix: fix inflight counting bug in garbage collector") changed scan_children() such that it no longer considers sockets that do not have UNIX_GC_CANDIDATE flag. It also added a block of code that that unsets this flag _before_ invoking scan_children(, dec_iflight, ). This may lead to incorrect inflight counters for some sockets. This change fixes this bug by changing order of operations: UNIX_GC_CANDIDATE is now unset only after all inflight counters are restored to the original state. kernel BUG at net/unix/garbage.c:149! RIP: 0010:[<ffffffff8717ebf4>] [<ffffffff8717ebf4>] unix_notinflight+0x3b4/0x490 net/unix/garbage.c:149 Call Trace: [<ffffffff8716cfbf>] unix_detach_fds.isra.19+0xff/0x170 net/unix/af_unix.c:1487 [<ffffffff8716f6a9>] unix_destruct_scm+0xf9/0x210 net/unix/af_unix.c:1496 [<ffffffff86a90a01>] skb_release_head_state+0x101/0x200 net/core/skbuff.c:655 [<ffffffff86a9808a>] skb_release_all+0x1a/0x60 net/core/skbuff.c:668 [<ffffffff86a980ea>] __kfree_skb+0x1a/0x30 net/core/skbuff.c:684 [<ffffffff86a98284>] kfree_skb+0x184/0x570 net/core/skbuff.c:705 [<ffffffff871789d5>] unix_release_sock+0x5b5/0xbd0 net/unix/af_unix.c:559 [<ffffffff87179039>] unix_release+0x49/0x90 net/unix/af_unix.c:836 [<ffffffff86a694b2>] sock_release+0x92/0x1f0 net/socket.c:570 [<ffffffff86a6962b>] sock_close+0x1b/0x20 net/socket.c:1017 [<ffffffff81a76b8e>] __fput+0x34e/0x910 fs/file_table.c:208 [<ffffffff81a771da>] ____fput+0x1a/0x20 fs/file_table.c:244 [<ffffffff81483ab0>] task_work_run+0x1a0/0x280 kernel/task_work.c:116 [< inline >] exit_task_work include/linux/task_work.h:21 [<ffffffff8141287a>] do_exit+0x183a/0x2640 kernel/exit.c:828 [<ffffffff8141383e>] do_group_exit+0x14e/0x420 kernel/exit.c:931 [<ffffffff814429d3>] get_signal+0x663/0x1880 kernel/signal.c:2307 [<ffffffff81239b45>] do_signal+0xc5/0x2190 arch/x86/kernel/signal.c:807 [<ffffffff8100666a>] exit_to_usermode_loop+0x1ea/0x2d0 arch/x86/entry/common.c:156 [< inline >] prepare_exit_to_usermode arch/x86/entry/common.c:190 [<ffffffff81009693>] syscall_return_slowpath+0x4d3/0x570 arch/x86/entry/common.c:259 [<ffffffff881478e6>] entry_SYSCALL_64_fastpath+0xc4/0xc6 Link: https://lkml.org/lkml/2017/3/6/252 Signed-off-by: Andrey Ulanov <andreyu@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Fixes: 6209344 ("net: unix: fix inflight counting bug in garbage collector") Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21vsock: cancel packets when failing to connectPeng Tao
Otherwise we'll leave the packets queued until releasing vsock device. E.g., if guest is slow to start up, resulting ETIMEDOUT on connect, guest will get the connect requests from failed host sockets. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: Peng Tao <bergwolf@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21vsock: add pkt cancel capabilityPeng Tao
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Peng Tao <bergwolf@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21vsock: track pkt owner vsockPeng Tao
So that we can cancel a queued pkt later if necessary. Signed-off-by: Peng Tao <bergwolf@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21crypto: deadlock between crypto_alg_sem/rtnl_mutex/genl_mutexHerbert Xu
On Tue, Mar 14, 2017 at 10:44:10AM +0100, Dmitry Vyukov wrote: > > Yes, please. > Disregarding some reports is not a good way long term. Please try this patch. ---8<--- Subject: netlink: Annotate nlk cb_mutex by protocol Currently all occurences of nlk->cb_mutex are annotated by lockdep as a single class. This causes a false lcokdep cycle involving genl and crypto_user. This patch fixes it by dividing cb_mutex into individual classes based on the netlink protocol. As genl and crypto_user do not use the same netlink protocol this breaks the false dependency loop. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-17Merge tag 'nfs-for-4.11-2' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client fixes from Anna Schumaker: "We have a handful of stable fixes to fix kernel warnings and other bugs that have been around for a while. We've also found a few other reference counting bugs and memory leaks since the initial 4.11 pull. Stable Bugfixes: - Fix decrementing nrequests in NFS v4.2 COPY to fix kernel warnings - Prevent a double free in async nfs4_exchange_id() - Squelch a kbuild sparse complaint for xprtrdma Other Bugfixes: - Fix a typo (NFS_ATTR_FATTR_GROUP_NAME) that causes a memory leak - Fix a reference leak that causes kernel warnings - Make nfs4_cb_sv_ops static to fix a sparse warning - Respect a server's max size in CREATE_SESSION - Handle errors from nfs4_pnfs_ds_connect - Flexfiles layout shouldn't mark devices as unavailable" * tag 'nfs-for-4.11-2' of git://git.linux-nfs.org/projects/anna/linux-nfs: pNFS/flexfiles: never nfs4_mark_deviceid_unavailable pNFS: return status from nfs4_pnfs_ds_connect NFSv4.1 respect server's max size in CREATE_SESSION NFS prevent double free in async nfs4_exchange_id nfs: make nfs4_cb_sv_ops static xprtrdma: Squelch kbuild sparse complaint NFS: fix the fault nrequests decreasing for nfs_inode COPY NFSv4: fix a reference leak caused WARNING messages nfs4: fix a typo of NFS_ATTR_FATTR_GROUP_NAME
2017-03-17xprtrdma: Squelch kbuild sparse complaintChuck Lever
New complaint from kbuild for 4.9.y: net/sunrpc/xprtrdma/verbs.c:489:19: sparse: incompatible types in comparison expression (different type sizes) verbs.c: 489 max_sge = min(ia->ri_device->attrs.max_sge, RPCRDMA_MAX_SEND_SGES); I can't reproduce this running sparse here. Likewise, "make W=1 net/sunrpc/xprtrdma/verbs.o" never indicated any issue. A little poking suggests that because the range of its values is small, gcc can make the actual width of RPCRDMA_MAX_SEND_SGES smaller than the width of an unsigned integer. Fixes: 16f906d66cd7 ("xprtrdma: Reduce required number of send SGEs") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: stable@kernel.org Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-03-16tcp: tcp_get_info() should read tcp_time_stamp laterEric Dumazet
Commit b369e7fd41f7 ("tcp: make TCP_INFO more consistent") moved lock_sock_fast() earlier in tcp_get_info() This has the minor effect that jiffies value being sampled at the beginning of tcp_get_info() is more likely to be off by one, and we report big tcpi_last_data_sent values (like 0xFFFFFFFF). Since we lock the socket, fetching tcp_time_stamp right before doing the jiffies_to_msecs() calls is enough to remove these wrong values. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-16bridge: resolve a false alarm of lockdepWANG Cong
Andrei reported a false alarm of lockdep at net/bridge/br_fdb.c:109, this is because in Andrei's case, a spin_bug() was already triggered before this, therefore the debug_locks is turned off, lockdep_is_held() is no longer accurate after that. We should use lockdep_assert_held_once() instead of lockdep_is_held() to respect debug_locks. Fixes: 410b3d48f5111 ("bridge: fdb: add proper lock checks in searching functions") Reported-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-16rxrpc: Ignore BUSY packets on old callsDavid Howells
If we receive a BUSY packet for a call we think we've just completed, the packet is handed off to the connection processor to deal with - but the connection processor doesn't expect a BUSY packet and so flags a protocol error. Fix this by simply ignoring the BUSY packet for the moment. The symptom of this may appear as a system call failing with EPROTO. This may be triggered by pressing ctrl-C under some circumstances. This comes about we abort calls due to interruption by a signal (which we shouldn't do, but that's going to be a large fix and mostly in fs/afs/). What happens is that we abort the call and may also abort follow up calls too (this needs offloading somehoe). So we see a transmission of something like the following sequence of packets: DATA for call N ABORT call N DATA for call N+1 ABORT call N+1 in very quick succession on the same channel. However, the peer may have deferred the processing of the ABORT from the call N to a background thread and thus sees the DATA message from the call N+1 coming in before it has cleared the channel. Thus it sends a BUSY packet[*]. [*] Note that some implementations (OpenAFS, for example) mark the BUSY packet with one plus the callNumber of the call prior to call N. Ordinarily, this would be call N, but there's no requirement for the calls on a channel to be numbered strictly sequentially (the number is required to increase). This is wrong and means that the callNumber in the BUSY packet should be ignored (it really ought to be N+1 since that's what it's in response to). Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-16net: ipv6: set route type for anycast routesDavid Ahern
Anycast routes have the RTF_ANYCAST flag set, but when dumping routes for userspace the route type is not set to RTN_ANYCAST. Make it so. Fixes: 58c4fb86eabcb ("[IPV6]: Flag RTF_ANYCAST for anycast routes") CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-16net: mpls: Fix nexthop alive tracking on down eventsDavid Ahern
Alive tracking of nexthops can account for a link twice if the carrier goes down followed by an admin down of the same link rendering multipath routes useless. This is similar to 79099aab38c8 for UNREGISTER events and DOWN events. Fix by tracking number of alive nexthops in mpls_ifdown similar to the logic in mpls_ifup. Checking the flags per nexthop once after all events have been processed is simpler than trying to maintian a running count through all event combinations. Also, WRITE_ONCE is used instead of ACCESS_ONCE to set rt_nhn_alive per a comment from checkpatch: WARNING: Prefer WRITE_ONCE(<FOO>, <BAR>) over ACCESS_ONCE(<FOO>) = <BAR> Fixes: c89359a42e2a4 ("mpls: support for dead routes") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Robert Shearman <rshearma@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-16Merge tag 'batadv-net-for-davem-20170316' of git://git.open-mesh.org/linux-mergeDavid S. Miller
Simon Wunderlich says: ==================== Here are two batman-adv bugfixes: - Keep fragments equally sized, avoids some problems with too small fragments, by Sven Eckelmann - Initialize gateway class correctly when BATMAN V is compiled in, by Sven Eckelmann ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-16openvswitch: Add missing case OVS_TUNNEL_KEY_ATTR_PADKris Murphy
Added a case for OVS_TUNNEL_KEY_ATTR_PAD to the switch statement in ip_tun_from_nlattr in order to prevent the default case returning an error. Fixes: b46f6ded906e ("libnl: nla_put_be64(): align on a 64-bit area") Signed-off-by: Kris Murphy <kriskend@linux.vnet.ibm.com> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-16nl80211: fix dumpit error path RTNL deadlocksJohannes Berg
Sowmini pointed out Dmitry's RTNL deadlock report to me, and it turns out to be perfectly accurate - there are various error paths that miss unlock of the RTNL. To fix those, change the locking a bit to not be conditional in all those nl80211_prepare_*_dump() functions, but make those require the RTNL to start with, and fix the buggy error paths. This also let me use sparse (by appropriately overriding the rtnl_lock/rtnl_unlock functions) to validate the changes. Cc: stable@vger.kernel.org Reported-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-03-15net: properly release sk_frag.pageEric Dumazet
I mistakenly added the code to release sk->sk_frag in sk_common_release() instead of sk_destruct() TCP sockets using sk->sk_allocation == GFP_ATOMIC do no call sk_common_release() at close time, thus leaking one (order-3) page. iSCSI is using such sockets. Fixes: 5640f7685831 ("net: use a per task frag allocator") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for your net tree, a rather large batch of fixes targeted to nf_tables, conntrack and bridge netfilter. More specifically, they are: 1) Don't track fragmented packets if the socket option IP_NODEFRAG is set. From Florian Westphal. 2) SCTP protocol tracker assumes that ICMP error messages contain the checksum field, what results in packet drops. From Ying Xue. 3) Fix inconsistent handling of AH traffic from nf_tables. 4) Fix new bitmap set representation with big endian. Fix mismatches in nf_tables due to incorrect big endian handling too. Both patches from Liping Zhang. 5) Bridge netfilter doesn't honor maximum fragment size field, cap to largest fragment seen. From Florian Westphal. 6) Fake conntrack entry needs to be aligned to 8 bytes since the 3 LSB bits are now used to store the ctinfo. From Steven Rostedt. 7) Fix element comments with the bitmap set type. Revert the flush field in the nft_set_iter structure, not required anymore after fixing up element comments. 8) Missing error on invalid conntrack direction from nft_ct, also from Liping Zhang. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-15net/openvswitch: Set the ipv6 source tunnel key address attribute correctlyOr Gerlitz
When dealing with ipv6 source tunnel key address attribute (OVS_TUNNEL_KEY_ATTR_IPV6_SRC) we are wrongly setting the tunnel dst ip, fix that. Fixes: 6b26ba3a7d95 ('openvswitch: netlink attributes for IPv6 tunneling') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reported-by: Paul Blakey <paulb@mellanox.com> Acked-by: Jiri Benc <jbenc@redhat.com> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-15netfilter: nft_ct: do cleanup work when NFTA_CT_DIRECTION is invalidLiping Zhang
We should jump to invoke __nft_ct_set_destroy() instead of just return error. Fixes: edee4f1e9245 ("netfilter: nft_ct: add zone id set support") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-03-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Ensure that mtu is at least IPV6_MIN_MTU in ipv6 VTI tunnel driver, from Steffen Klassert. 2) Fix crashes when user tries to get_next_key on an LPM bpf map, from Alexei Starovoitov. 3) Fix detection of VLAN fitlering feature for bnx2x VF devices, from Michal Schmidt. 4) We can get a divide by zero when TCP socket are morphed into listening state, fix from Eric Dumazet. 5) Fix socket refcounting bugs in skb_complete_wifi_ack() and skb_complete_tx_timestamp(). From Eric Dumazet. 6) Use after free in dccp_feat_activate_values(), also from Eric Dumazet. 7) Like bonding team needs to use ETH_MAX_MTU as netdev->max_mtu, from Jarod Wilson. 8) Fix use after free in vrf_xmit(), from David Ahern. 9) Don't do UDP Fragmentation Offload on IPComp ipsec packets, from Alexey Kodanev. 10) Properly check napi_complete_done() return value in order to decide whether to re-enable IRQs or not in amd-xgbe driver, from Thomas Lendacky. 11) Fix double free of hwmon device in marvell phy driver, from Andrew Lunn. 12) Don't crash on malformed netlink attributes in act_connmark, from Etienne Noss. 13) Don't remove routes with a higher metric in ipv6 ECMP route replace, from Sabrina Dubroca. 14) Don't write into a cloned SKB in ipv6 fragmentation handling, from Florian Westphal. 15) Fix routing redirect races in dccp and tcp, basically the ICMP handler can't modify the socket's cached route in it's locked by the user at this moment. From Jon Maxwell. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (108 commits) qed: Enable iSCSI Out-of-Order qed: Correct out-of-bound access i