summaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)Author
2019-03-12Merge tag 'nfsd-5.1' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull NFS server updates from Bruce Fields: "Miscellaneous NFS server fixes. Probably the most visible bug is one that could artificially limit NFSv4.1 performance by limiting the number of oustanding rpcs from a single client. Neil Brown also gets a special mention for fixing a 14.5-year-old memory-corruption bug in the encoding of NFSv3 readdir responses" * tag 'nfsd-5.1' of git://linux-nfs.org/~bfields/linux: nfsd: allow nfsv3 readdir request to be larger. nfsd: fix wrong check in write_v4_end_grace() nfsd: fix memory corruption caused by readdir nfsd: fix performance-limiting session calculation svcrpc: fix UDP on servers with lots of threads svcrdma: Remove syslog warnings in work completion handlers svcrdma: Squelch compiler warning when SUNRPC_DEBUG is disabled svcrdma: Use struct_size() in kmalloc() svcrpc: fix unlikely races preventing queueing of sockets svcrpc: svc_xprt_has_something_to_do seems a little long SUNRPC: Don't allow compiler optimisation of svc_xprt_release_slot() nfsd: fix an IS_ERR() vs NULL check
2019-03-12Merge tag 'ceph-for-5.1-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph updates from Ilya Dryomov: "The highlights are: - rbd will now ignore discards that aren't aligned and big enough to actually free up some space (myself). This is controlled by the new alloc_size map option and can be disabled if needed. - support for rbd deep-flatten feature (myself). Deep-flatten allows "rbd flatten" to fully disconnect the clone image and its snapshots from the parent and make the parent snapshot removable. - a new round of cap handling improvements (Zheng Yan). The kernel client should now be much more prompt about releasing its caps and it is possible to put a limit on the number of caps held. - support for getting ceph.dir.pin extended attribute (Zheng Yan)" * tag 'ceph-for-5.1-rc1' of git://github.com/ceph/ceph-client: (26 commits) Documentation: modern versions of ceph are not backed by btrfs rbd: advertise support for RBD_FEATURE_DEEP_FLATTEN rbd: whole-object write and zeroout should copyup when snapshots exist rbd: copyup with an empty snapshot context (aka deep-copyup) rbd: introduce rbd_obj_issue_copyup_ops() rbd: stop copying num_osd_ops in rbd_obj_issue_copyup() rbd: factor out __rbd_osd_req_create() rbd: clear ->xferred on error from rbd_obj_issue_copyup() rbd: remove experimental designation from kernel layering ceph: add mount option to limit caps count ceph: periodically trim stale dentries ceph: delete stale dentry when last reference is dropped ceph: remove dentry_lru file from debugfs ceph: touch existing cap when handling reply ceph: pass inclusive lend parameter to filemap_write_and_wait_range() rbd: round off and ignore discards that are too small rbd: handle DISCARD and WRITE_ZEROES separately rbd: get rid of obj_req->obj_request_count libceph: use struct_size() for kmalloc() in crush_decode() ceph: send cap releases more aggressively ...
2019-03-12Merge tag 'nfs-for-5.1-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client updates from Trond Myklebust: "Highlights include: Stable fixes: - Fixes for NFS I/O request leakages - Fix error handling paths in the NFS I/O recoalescing code - Reinitialise NFSv4.1 sequence results before retransmitting a request - Fix a soft lockup in the delegation recovery code - Bulk destroy of layouts needs to be safe w.r.t. umount - Prevent thundering herd issues when the SUNRPC socket is not connected - Respect RPC call timeouts when retrying transmission Features: - Convert rpc auth layer to use xdr_streams - Config option to disable insecure RPCSEC_GSS crypto types - Reduce size of RPC receive buffers - Readdirplus optimization by cache mechanism - Convert SUNRPC socket send code to use iov_iter() - SUNRPC micro-optimisations to avoid indirect calls - Add support for the pNFS LAYOUTERROR operation and use it with the pNFS/flexfiles driver - Add trace events to report non-zero NFS status codes - Various removals of unnecessary dprintks Bugfixes and cleanups: - Fix a number of sparse warnings and documentation format warnings - Fix nfs_parse_devname to not modify it's argument - Fix potential corruption of page being written through pNFS/blocks - fix xfstest generic/099 failures on nfsv3 - Avoid NFSv4.1 "false retries" when RPC calls are interrupted - Abort I/O early if the pNFS/flexfiles layout segment was invalidated - Avoid unnecessary pNFS/flexfiles layout invalidations" * tag 'nfs-for-5.1-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (90 commits) SUNRPC: Take the transport send lock before binding+connecting SUNRPC: Micro-optimise when the task is known not to be sleeping SUNRPC: Check whether the task was transmitted before rebind/reconnect SUNRPC: Remove redundant calls to RPC_IS_QUEUED() SUNRPC: Clean up SUNRPC: Respect RPC call timeouts when retrying transmission SUNRPC: Fix up RPC back channel transmission SUNRPC: Prevent thundering herd when the socket is not connected SUNRPC: Allow dynamic allocation of back channel slots NFSv4.1: Bump the default callback session slot count to 16 SUNRPC: Convert remaining GFP_NOIO, and GFP_NOWAIT sites in sunrpc NFS/flexfiles: Clean up mirror DS initialisation NFS/flexfiles: Remove dead code in ff_layout_mirror_valid() NFS/flexfile: Simplify nfs4_ff_layout_select_ds_stateid() NFS/flexfile: Simplify nfs4_ff_layout_ds_version() NFS/flexfiles: Simplify ff_layout_get_ds_cred() NFS/flexfiles: Simplify nfs4_ff_find_or_create_ds_client() NFS/flexfiles: Simplify nfs4_ff_layout_select_ds_fh() NFS/flexfiles: Speed up read failover when DSes are down NFS/flexfiles: Don't invalidate DS deviceids for being unresponsive ...
2019-03-12Merge branch 'work.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "Assorted fixes (really no common topic here)" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: vfs: Make __vfs_write() static vfs: fix preadv64v2 and pwritev64v2 compat syscalls with offset == -1 pipe: stop using ->can_merge splice: don't merge into linked buffers fs: move generic stat response attr handling to vfs_getattr_nosec orangefs: don't reinitialize result_mask in ->getattr fs/devpts: always delete dcache dentry-s in dput()
2019-03-12sctp: convert to genradixKent Overstreet
This also makes sctp_stream_alloc_(out|in) saner, in that they no longer allocate new flex_arrays/genradixes, they just preallocate more elements. This code does however have a suspicious lack of locking. Link: http://lkml.kernel.org/r/20181217131929.11727-7-kent.overstreet@gmail.com Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Eric Paris <eparis@parisplace.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Pravin B Shelar <pshelar@ovn.org> Cc: Shaohua Li <shli@kernel.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-12openvswitch: convert to kvmallocKent Overstreet
Patch series "generic radix trees; drop flex arrays". This patch (of 7): There was no real need for this code to be using flexarrays, it's just implementing a hash table - ideally it would be using rhashtables, but that conversion would be significantly more complicated. Link: http://lkml.kernel.org/r/20181217131929.11727-2-kent.overstreet@gmail.com Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Reviewed-by: Matthew Wilcox <willy@infradead.org> Cc: Pravin B Shelar <pshelar@ovn.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Eric Paris <eparis@parisplace.org> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Paul Moore <paul@paul-moore.com> Cc: Shaohua Li <shli@kernel.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "First batch of fixes in the new merge window: 1) Double dst_cache free in act_tunnel_key, from Wenxu. 2) Avoid NULL deref in IN_DEV_MFORWARD() by failing early in the ip_route_input_rcu() path, from Paolo Abeni. 3) Fix appletalk compile regression, from Arnd Bergmann. 4) If SLAB objects reach the TCP sendpage method we are in serious trouble, so put a debugging check there. From Vasily Averin. 5) Memory leak in hsr layer, from Mao Wenan. 6) Only test GSO type on GSO packets, from Willem de Bruijn. 7) Fix crash in xsk_diag_put_umem(), from Eric Dumazet. 8) Fix VNIC mailbox length in nfp, from Dirk van der Merwe. 9) Fix race in ipv4 route exception handling, from Xin Long. 10) Missing DMA memory barrier in hns3 driver, from Jian Shen. 11) Use after free in __tcf_chain_put(), from Vlad Buslov. 12) Handle inet_csk_reqsk_queue_add() failures, from Guillaume Nault. 13) Return value correction when ip_mc_may_pull() fails, from Eric Dumazet. 14) Use after free in x25_device_event(), also from Eric" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (72 commits) gro_cells: make sure device is up in gro_cells_receive() vxlan: test dev->flags & IFF_UP before calling gro_cells_receive() net/x25: fix use-after-free in x25_device_event() isdn: mISDNinfineon: fix potential NULL pointer dereference net: hns3: fix to stop multiple HNS reset due to the AER changes ip: fix ip_mc_may_pull() return value net: keep refcount warning in reqsk_free() net: stmmac: Avoid one more sometimes uninitialized Clang warning net: dsa: mv88e6xxx: Set correct interface mode for CPU/DSA ports rxrpc: Fix client call queueing, waiting for channel tcp: handle inet_csk_reqsk_queue_add() failures net: ethernet: sun: Zero initialize class in default case in niu_add_ethtool_tcam_entry 8139too : Add support for U.S. Robotics USR997901A 10/100 Cardbus NIC fou, fou6: avoid uninit-value in gue_err() and gue6_err() net: sched: fix potential use-after-free in __tcf_chain_put() vhost: silence an unused-variable warning vsock/virtio: fix kernel panic from virtio_transport_reset_no_sock connector: fix unsafe usage of ->real_parent vxlan: do not need BH again in vxlan_cleanup() net: hns3: add dma_rmb() for rx description ...
2019-03-10SUNRPC: Take the transport send lock before binding+connectingTrond Myklebust
Before trying to bind a port, ensure we grab the send lock to ensure that we don't change the port while another task is busy transmitting requests. The connect code already takes the send lock in xprt_connect(), but it is harmless to take it before that. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-10SUNRPC: Micro-optimise when the task is known not to be sleepingTrond Myklebust
In cases where we know the task is not sleeping, try to optimise away the indirect call to task->tk_action() by replacing it with a direct call. Only change tail calls, to allow gcc to perform tail call elimination. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-10gro_cells: make sure device is up in gro_cells_receive()Eric Dumazet
We keep receiving syzbot reports [1] that show that tunnels do not play the rcu/IFF_UP rules properly. At device dismantle phase, gro_cells_destroy() will be called only after a full rcu grace period is observed after IFF_UP has been cleared. This means that IFF_UP needs to be tested before queueing packets into netif_rx() or gro_cells. This patch implements the test in gro_cells_receive() because too many callers do not seem to bother enough. [1] BUG: unable to handle kernel paging request at fffff4ca0b9ffffe PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 21 Comm: kworker/u4:1 Not tainted 5.0.0+ #97 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: netns cleanup_net RIP: 0010:__skb_unlink include/linux/skbuff.h:1929 [inline] RIP: 0010:__skb_dequeue include/linux/skbuff.h:1945 [inline] RIP: 0010:__skb_queue_purge include/linux/skbuff.h:2656 [inline] RIP: 0010:gro_cells_destroy net/core/gro_cells.c:89 [inline] RIP: 0010:gro_cells_destroy+0x19d/0x360 net/core/gro_cells.c:78 Code: 03 42 80 3c 20 00 0f 85 53 01 00 00 48 8d 7a 08 49 8b 47 08 49 c7 07 00 00 00 00 48 89 f9 49 c7 47 08 00 00 00 00 48 c1 e9 03 <42> 80 3c 21 00 0f 85 10 01 00 00 48 89 c1 48 89 42 08 48 c1 e9 03 RSP: 0018:ffff8880aa3f79a8 EFLAGS: 00010a02 RAX: 00ffffffffffffe8 RBX: ffffe8ffffc64b70 RCX: 1ffff8ca0b9ffffe RDX: ffffc6505cffffe8 RSI: ffffffff858410ca RDI: ffffc6505cfffff0 RBP: ffff8880aa3f7a08 R08: ffff8880aa3e8580 R09: fffffbfff1263645 R10: fffffbfff1263644 R11: ffffffff8931b223 R12: dffffc0000000000 R13: 0000000000000000 R14: ffffe8ffffc64b80 R15: ffffe8ffffc64b75 kobject: 'loop2' (000000004bd7d84a): kobject_uevent_env FS: 0000000000000000(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: fffff4ca0b9ffffe CR3: 0000000094941000 CR4: 00000000001406f0 Call Trace: kobject: 'loop2' (000000004bd7d84a): fill_kobj_path: path = '/devices/virtual/block/loop2' ip_tunnel_dev_free+0x19/0x60 net/ipv4/ip_tunnel.c:1010 netdev_run_todo+0x51c/0x7d0 net/core/dev.c:8970 rtnl_unlock+0xe/0x10 net/core/rtnetlink.c:116 ip_tunnel_delete_nets+0x423/0x5f0 net/ipv4/ip_tunnel.c:1124 vti_exit_batch_net+0x23/0x30 net/ipv4/ip_vti.c:495 ops_exit_list.isra.0+0x105/0x160 net/core/net_namespace.c:156 cleanup_net+0x3fb/0x960 net/core/net_namespace.c:551 process_one_work+0x98e/0x1790 kernel/workqueue.c:2173 worker_thread+0x98/0xe40 kernel/workqueue.c:2319 kthread+0x357/0x430 kernel/kthread.c:246 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352 Modules linked in: CR2: fffff4ca0b9ffffe [ end trace 513fc9c1338d1cb3 ] RIP: 0010:__skb_unlink include/linux/skbuff.h:1929 [inline] RIP: 0010:__skb_dequeue include/linux/skbuff.h:1945 [inline] RIP: 0010:__skb_queue_purge include/linux/skbuff.h:2656 [inline] RIP: 0010:gro_cells_destroy net/core/gro_cells.c:89 [inline] RIP: 0010:gro_cells_destroy+0x19d/0x360 net/core/gro_cells.c:78 Code: 03 42 80 3c 20 00 0f 85 53 01 00 00 48 8d 7a 08 49 8b 47 08 49 c7 07 00 00 00 00 48 89 f9 49 c7 47 08 00 00 00 00 48 c1 e9 03 <42> 80 3c 21 00 0f 85 10 01 00 00 48 89 c1 48 89 42 08 48 c1 e9 03 RSP: 0018:ffff8880aa3f79a8 EFLAGS: 00010a02 RAX: 00ffffffffffffe8 RBX: ffffe8ffffc64b70 RCX: 1ffff8ca0b9ffffe RDX: ffffc6505cffffe8 RSI: ffffffff858410ca RDI: ffffc6505cfffff0 RBP: ffff8880aa3f7a08 R08: ffff8880aa3e8580 R09: fffffbfff1263645 R10: fffffbfff1263644 R11: ffffffff8931b223 R12: dffffc0000000000 kobject: 'loop3' (00000000e4ee57a6): kobject_uevent_env R13: 0000000000000000 R14: ffffe8ffffc64b80 R15: ffffe8ffffc64b75 FS: 0000000000000000(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: fffff4ca0b9ffffe CR3: 0000000094941000 CR4: 00000000001406f0 Fixes: c9e6bc644e55 ("net: add gro_cells infrastructure") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-10net/x25: fix use-after-free in x25_device_event()Eric Dumazet
In case of failure x25_connect() does a x25_neigh_put(x25->neighbour) but forgets to clear x25->neighbour pointer, thus triggering use-after-free. Since the socket is visible in x25_list, we need to hold x25_list_lock to protect the operation. syzbot report : BUG: KASAN: use-after-free in x25_kill_by_device net/x25/af_x25.c:217 [inline] BUG: KASAN: use-after-free in x25_device_event+0x296/0x2b0 net/x25/af_x25.c:252 Read of size 8 at addr ffff8880a030edd0 by task syz-executor003/7854 CPU: 0 PID: 7854 Comm: syz-executor003 Not tainted 5.0.0+ #97 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187 kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317 __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:135 x25_kill_by_device net/x25/af_x25.c:217 [inline] x25_device_event+0x296/0x2b0 net/x25/af_x25.c:252 notifier_call_chain+0xc7/0x240 kernel/notifier.c:93 __raw_notifier_call_chain kernel/notifier.c:394 [inline] raw_notifier_call_chain+0x2e/0x40 kernel/notifier.c:401 call_netdevice_notifiers_info+0x3f/0x90 net/core/dev.c:1739 call_netdevice_notifiers_extack net/core/dev.c:1751 [inline] call_netdevice_notifiers net/core/dev.c:1765 [inline] __dev_notify_flags+0x1e9/0x2c0 net/core/dev.c:7607 dev_change_flags+0x10d/0x170 net/core/dev.c:7643 dev_ifsioc+0x2b0/0x940 net/core/dev_ioctl.c:237 dev_ioctl+0x1b8/0xc70 net/core/dev_ioctl.c:488 sock_do_ioctl+0x1bd/0x300 net/socket.c:995 sock_ioctl+0x32b/0x610 net/socket.c:1096 vfs_ioctl fs/ioctl.c:46 [inline] file_ioctl fs/ioctl.c:509 [inline] do_vfs_ioctl+0xd6e/0x1390 fs/ioctl.c:696 ksys_ioctl+0xab/0xd0 fs/ioctl.c:713 __do_sys_ioctl fs/ioctl.c:720 [inline] __se_sys_ioctl fs/ioctl.c:718 [inline] __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x4467c9 Code: e8 0c e8 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 5b 07 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fdbea222d98 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00000000006dbc58 RCX: 00000000004467c9 RDX: 0000000020000340 RSI: 0000000000008914 RDI: 0000000000000003 RBP: 00000000006dbc50 R08: 00007fdbea223700 R09: 0000000000000000 R10: 00007fdbea223700 R11: 0000000000000246 R12: 00000000006dbc5c R13: 6000030030626669 R14: 0000000000000000 R15: 0000000030626669 Allocated by task 7843: save_stack+0x45/0xd0 mm/kasan/common.c:73 set_track mm/kasan/common.c:85 [inline] __kasan_kmalloc mm/kasan/common.c:495 [inline] __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:468 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:509 kmem_cache_alloc_trace+0x151/0x760 mm/slab.c:3615 kmalloc include/linux/slab.h:545 [inline] x25_link_device_up+0x46/0x3f0 net/x25/x25_link.c:249 x25_device_event+0x116/0x2b0 net/x25/af_x25.c:242 notifier_call_chain+0xc7/0x240 kernel/notifier.c:93 __raw_notifier_call_chain kernel/notifier.c:394 [inline] raw_notifier_call_chain+0x2e/0x40 kernel/notifier.c:401 call_netdevice_notifiers_info+0x3f/0x90 net/core/dev.c:1739 call_netdevice_notifiers_extack net/core/dev.c:1751 [inline] call_netdevice_notifiers net/core/dev.c:1765 [inline] __dev_notify_flags+0x121/0x2c0 net/core/dev.c:7605 dev_change_flags+0x10d/0x170 net/core/dev.c:7643 dev_ifsioc+0x2b0/0x940 net/core/dev_ioctl.c:237 dev_ioctl+0x1b8/0xc70 net/core/dev_ioctl.c:488 sock_do_ioctl+0x1bd/0x300 net/socket.c:995 sock_ioctl+0x32b/0x610 net/socket.c:1096 vfs_ioctl fs/ioctl.c:46 [inline] file_ioctl fs/ioctl.c:509 [inline] do_vfs_ioctl+0xd6e/0x1390 fs/ioctl.c:696 ksys_ioctl+0xab/0xd0 fs/ioctl.c:713 __do_sys_ioctl fs/ioctl.c:720 [inline] __se_sys_ioctl fs/ioctl.c:718 [inline] __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 7865: save_stack+0x45/0xd0 mm/kasan/common.c:73 set_track mm/kasan/common.c:85 [inline] __kasan_slab_free+0x102/0x150 mm/kasan/common.c:457 kasan_slab_free+0xe/0x10 mm/kasan/common.c:465 __cache_free mm/slab.c:3494 [inline] kfree+0xcf/0x230 mm/slab.c:3811 x25_neigh_put include/net/x25.h:253 [inline] x25_connect+0x8d8/0xde0 net/x25/af_x25.c:824 __sys_connect+0x266/0x330 net/socket.c:1685 __do_sys_connect net/socket.c:1696 [inline] __se_sys_connect net/socket.c:1693 [inline] __x64_sys_connect+0x73/0xb0 net/socket.c:1693 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff8880a030edc0 which belongs to the cache kmalloc-256 of size 256 The buggy address is located 16 bytes inside of 256-byte region [ffff8880a030edc0, ffff8880a030eec0) The buggy address belongs to the page: page:ffffea000280c380 count:1 mapcount:0 mapping:ffff88812c3f07c0 index:0x0 flags: 0x1fffc0000000200(slab) raw: 01fffc0000000200 ffffea0002806788 ffffea00027f0188 ffff88812c3f07c0 raw: 0000000000000000 ffff8880a030e000 000000010000000c 0000000000000000 page dumped because: kasan: bad access detected Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot+04babcefcd396fabec37@syzkaller.appspotmail.com Cc: andrew hendry <andrew.hendry@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-10SUNRPC: Check whether the task was transmitted before rebind/reconnectTrond Myklebust
Before initiating transport actions that require putting the task to sleep, such as rebinding or reconnecting, we should check whether or not the task was already transmitted. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-09Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma updates from Jason Gunthorpe: "This has been a slightly more active cycle than normal with ongoing core changes and quite a lot of collected driver updates. - Various driver fixes for bnxt_re, cxgb4, hns, mlx5, pvrdma, rxe - A new data transfer mode for HFI1 giving higher performance - Significant functional and bug fix update to the mlx5 On-Demand-Paging MR feature - A chip hang reset recovery system for hns - Change mm->pinned_vm to an atomic64 - Update bnxt_re to support a new 57500 chip - A sane netlink 'rdma link add' method for creating rxe devices and fixing the various unregistration race conditions in rxe's unregister flow - Allow lookup up objects by an ID over netlink - Various reworking of the core to driver interface: - drivers should not assume umem SGLs are in PAGE_SIZE chunks - ucontext is accessed via udata not other means - start to make the core code responsible for object memory allocation - drivers should convert struct device to struct ib_device via a helper - drivers have more tools to avoid use after unregister problems" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (280 commits) net/mlx5: ODP support for XRC transport is not enabled by default in FW IB/hfi1: Close race condition on user context disable and close RDMA/umem: Revert broken 'off by one' fix RDMA/umem: minor bug fix in error handling path RDMA/hns: Use GFP_ATOMIC in hns_roce_v2_modify_qp cxgb4: kfree mhp after the debug print IB/rdmavt: Fix concurrency panics in QP post_send and modify to error IB/rdmavt: Fix loopback send with invalidate ordering IB/iser: Fix dma_nents type definition IB/mlx5: Set correct write permissions for implicit ODP MR bnxt_re: Clean cq for kernel consumers only RDMA/uverbs: Don't do double free of allocated PD RDMA: Handle ucontext allocations by IB/core RDMA/core: Fix a WARN() message bnxt_re: fix the regression due to changes in alloc_pbl IB/mlx4: Increase the timeout for CM cache IB/core: Abort page fault handler silently during owning process exit IB/mlx5: Validate correct PD before prefetch MR IB/mlx5: Protect against prefetch of invalid MR RDMA/uverbs: Store PR pointer before it is overwritten ...
2019-03-09SUNRPC: Remove redundant calls to RPC_IS_QUEUED()Trond Myklebust
The RPC task wakeup calls all check for RPC_IS_QUEUED() before taking any locks. In addition, rpc_exit() already calls rpc_wake_up_queued_task(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-09SUNRPC: Clean upTrond Myklebust
Replace remaining callers of call_timeout() with rpc_check_timeout(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-08rxrpc: Fix client call queueing, waiting for channelDavid Howells
rxrpc_get_client_conn() adds a new call to the front of the waiting_calls queue if the connection it's going to use already exists. This is bad as it allows calls to get starved out. Fix this by adding to the tail instead. Also change the other enqueue point in the same function to put it on the front (ie. when we have a new connection). This makes the point that in the case of a new connection the new call goes at the front (though it doesn't actually matter since the queue should be unoccupied). Fixes: 45025bceef17 ("rxrpc: Improve management and caching of client connection objects") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf 2019-03-09 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix a crash in AF_XDP's xsk_diag_put_ring() which was passing wrong queue argument, from Eric. 2) Fix a regression due to wrong test for TCP GSO packets used in various BPF helpers like NAT64, from Willem. 3) Fix a sk_msg strparser warning which asserts that strparser must be stopped first, from Jakub. 4) Fix rejection of invalid options/bind flags in AF_XDP, from Björn. 5) Fix GSO in bpf_lwt_push_ip_encap() which must properly set inner headers and inner protocol, from Peter. 6) Fix a libbpf leak when kernel does not support BTF, from Nikita. 7) Various BPF selftest and libbpf build fixes to make out-of-tree compilation work and to properly resolve dependencies via fixdep target, from Stanislav. 8) Fix rejection of invalid ldimm64 imm field, from Daniel. 9) Fix bpf stats sysctl compile warning of unused helper function proc_dointvec_minmax_bpf_stats() under some configs, from Arnd. 10) Fix couple of warnings about using plain integer as NULL, from Bo. 11) Fix some BPF sample spelling mistakes, from Colin. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-08tcp: handle inet_csk_reqsk_queue_add() failuresGuillaume Nault
Commit 7716682cc58e ("tcp/dccp: fix another race at listener dismantle") let inet_csk_reqsk_queue_add() fail, and adjusted {tcp,dccp}_check_req() accordingly. However, TFO and syncookies weren't modified, thus leaking allocated resources on error. Contrary to tcp_check_req(), in both syncookies and TFO cases, we need to drop the request socket. Also, since the child socket is created with inet_csk_clone_lock(), we have to unlock it and drop an extra reference (->sk_refcount is initially set to 2 and inet_csk_reqsk_queue_add() drops only one ref). For TFO, we also need to revert the work done by tcp_try_fastopen() (with reqsk_fastopen_remove()). Fixes: 7716682cc58e ("tcp/dccp: fix another race at listener dismantle") Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-08fou, fou6: avoid uninit-value in gue_err() and gue6_err()Eric Dumazet
My prior commit missed the fact that these functions were using udp_hdr() (aka skb_transport_header()) to get access to GUE header. Since pskb_transport_may_pull() does not exist yet, we have to add transport_offset to our pskb_may_pull() calls. BUG: KMSAN: uninit-value in gue_err+0x514/0xfa0 net/ipv4/fou.c:1032 CPU: 1 PID: 10648 Comm: syz-executor.1 Not tainted 5.0.0+ #11 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x173/0x1d0 lib/dump_stack.c:113 kmsan_report+0x12e/0x2a0 mm/kmsan/kmsan.c:600 __msan_warning+0x82/0xf0 mm/kmsan/kmsan_instr.c:313 gue_err+0x514/0xfa0 net/ipv4/fou.c:1032 __udp4_lib_err_encap_no_sk net/ipv4/udp.c:571 [inline] __udp4_lib_err_encap net/ipv4/udp.c:626 [inline] __udp4_lib_err+0x12e6/0x1d40 net/ipv4/udp.c:665 udp_err+0x74/0x90 net/ipv4/udp.c:737 icmp_socket_deliver net/ipv4/icmp.c:767 [inline] icmp_unreach+0xb65/0x1070 net/ipv4/icmp.c:884 icmp_rcv+0x11a1/0x1950 net/ipv4/icmp.c:1066 ip_protocol_deliver_rcu+0x584/0xbb0 net/ipv4/ip_input.c:208 ip_local_deliver_finish net/ipv4/ip_input.c:234 [inline] NF_HOOK include/linux/netfilter.h:289 [inline] ip_local_deliver+0x624/0x7b0 net/ipv4/ip_input.c:255 dst_input include/net/dst.h:450 [inline] ip_rcv_finish net/ipv4/ip_input.c:414 [inline] NF_HOOK include/linux/netfilter.h:289 [inline] ip_rcv+0x6bd/0x740 net/ipv4/ip_input.c:524 __netif_receive_skb_one_core net/core/dev.c:4973 [inline] __netif_receive_skb net/core/dev.c:5083 [inline] process_backlog+0x756/0x10e0 net/core/dev.c:5923 napi_poll net/core/dev.c:6346 [inline] net_rx_action+0x78b/0x1a60 net/core/dev.c:6412 __do_softirq+0x53f/0x93a kernel/softirq.c:293 invoke_softirq kernel/softirq.c:375 [inline] irq_exit+0x214/0x250 kernel/softirq.c:416 exiting_irq+0xe/0x10 arch/x86/include/asm/apic.h:536 smp_apic_timer_interrupt+0x48/0x70 arch/x86/kernel/apic/apic.c:1064 apic_timer_interrupt+0x2e/0x40 arch/x86/entry/entry_64.S:814 </IRQ> RIP: 0010:finish_lock_switch+0x2b/0x40 kernel/sched/core.c:2597 Code: 48 89 e5 53 48 89 fb e8 63 e7 95 00 8b b8 88 0c 00 00 48 8b 00 48 85 c0 75 12 48 89 df e8 dd db 95 00 c6 00 00 c6 03 00 fb 5b <5d> c3 e8 4e e6 95 00 eb e7 66 90 66 2e 0f 1f 84 00 00 00 00 00 55 RSP: 0018:ffff888081a0fc80 EFLAGS: 00000296 ORIG_RAX: ffffffffffffff13 RAX: ffff88821fd6bd80 RBX: ffff888027898000 RCX: ccccccccccccd000 RDX: ffff88821fca8d80 RSI: ffff888000000000 RDI: 00000000000004a0 RBP: ffff888081a0fc80 R08: 0000000000000002 R09: ffff888081a0fb08 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 R13: ffff88811130e388 R14: ffff88811130da00 R15: ffff88812fdb7d80 finish_task_switch+0xfc/0x2d0 kernel/sched/core.c:2698 context_switch kernel/sched/core.c:2851 [inline] __schedule+0x6cc/0x800 kernel/sched/core.c:3491 schedule+0x15b/0x240 kernel/sched/core.c:3535 freezable_schedule include/linux/freezer.h:172 [inline] do_nanosleep+0x2ba/0x980 kernel/time/hrtimer.c:1679 hrtimer_nanosleep kernel/time/hrtimer.c:1733 [inline] __do_sys_nanosleep kernel/time/hrtimer.c:1767 [inline] __se_sys_nanosleep+0x746/0x960 kernel/time/hrtimer.c:1754 __x64_sys_nanosleep+0x3e/0x60 kernel/time/hrtimer.c:1754 do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291 entry_SYSCALL_64_after_hwframe+0x63/0xe7 RIP: 0033:0x4855a0 Code: 00 00 48 c7 c0 d4 ff ff ff 64 c7 00 16 00 00 00 31 c0 eb be 66 0f 1f 44 00 00 83 3d b1 11 5d 00 00 75 14 b8 23 00 00 00 0f 05 <48> 3d 01 f0 ff ff 0f 83 04 e2 f8 ff c3 48 83 ec 08 e8 3a 55 fd ff RSP: 002b:0000000000a4fd58 EFLAGS: 00000246 ORIG_RAX: 0000000000000023 RAX: ffffffffffffffda RBX: 0000000000085780 RCX: 00000000004855a0 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000a4fd60 RBP: 00000000000007ec R08: 0000000000000001 R09: 0000000000ceb940 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008 R13: 0000000000a4fdb0 R14: 0000000000085711 R15: 0000000000a4fdc0 Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:205 [inline] kmsan_internal_poison_shadow+0x92/0x150 mm/kmsan/kmsan.c:159 kmsan_kmalloc+0xa6/0x130 mm/kmsan/kmsan_hooks.c:176 kmsan_slab_alloc+0xe/0x10 mm/kmsan/kmsan_hooks.c:185 slab_post_alloc_hook mm/slab.h:445 [inline] slab_alloc_node mm/slub.c:2773 [inline] __kmalloc_node_track_caller+0xe9e/0xff0 mm/slub.c:4398 __kmalloc_reserve net/core/skbuff.c:140 [inline] __alloc_skb+0x309/0xa20 net/core/skbuff.c:208 alloc_skb include/linux/skbuff.h:1012 [inline] alloc_skb_with_frags+0x186/0xa60 net/core/skbuff.c:5287 sock_alloc_send_pskb+0xafd/0x10a0 net/core/sock.c:2091 sock_alloc_send_skb+0xca/0xe0 net/core/sock.c:2108 __ip_append_data+0x34cd/0x5000 net/ipv4/ip_output.c:998 ip_append_data+0x324/0x480 net/ipv4/ip_output.c:1220 icmp_push_reply+0x23d/0x7e0 net/ipv4/icmp.c:375 __icmp_send+0x2ea3/0x30f0 net/ipv4/icmp.c:737 icmp_send include/net/icmp.h:47 [inline] ipv4_link_failure+0x6d/0x230 net/ipv4/route.c:1190 dst_link_failure include/net/dst.h:427 [inline] arp_error_report+0x106/0x1a0 net/ipv4/arp.c:297 neigh_invalidate+0x359/0x8e0 net/core/neighbour.c:992 neigh_timer_handler+0xdf2/0x1280 net/core/neighbour.c:1078 call_timer_fn+0x285/0x600 kernel/time/timer.c:1325 expire_timers kernel/time/timer.c:1362 [inline] __run_timers+0xdb4/0x11d0 kernel/time/timer.c:1681 run_timer_softirq+0x2e/0x50 kernel/time/timer.c:1694 __do_softirq+0x53f/0x93a kernel/softirq.c:293 Fixes: 26fc181e6cac ("fou, fou6: do not assume linear skbs") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Stefano Brivio <sbrivio@redhat.com> Cc: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-08net: sched: fix potential use-after-free in __tcf_chain_put()Vlad Buslov
When used with unlocked classifier that have filters attached to actions with goto chain, __tcf_chain_put() for last non action reference can race with calls to same function from action cleanup code that releases last action reference. In this case action cleanup handler could free the chain if it executes after all references to chain were released, but before all concurrent users finished using it. Modify __tcf_chain_put() to only access tcf_chain fields when holding block->lock. Remove local variables that were used to cache some tcf_chain fields and are no longer needed because their values can now be obtained directly from chain under block->lock protection. Fixes: 726d061286ce ("net: sched: prevent insertion of new classifiers during chain flush") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-08vsock/virtio: fix kernel panic from virtio_transport_reset_no_sockAdalbert Lazăr
Previous to commit 22b5c0b63f32 ("vsock/virtio: fix kernel panic after device hot-unplug"), vsock_core_init() was called from virtio_vsock_probe(). Now, virtio_transport_reset_no_sock() can be called before vsock_core_init() has the chance to run. [Wed Feb 27 14:17:09 2019] BUG: unable to handle kernel NULL pointer dereference at 0000000000000110 [Wed Feb 27 14:17:09 2019] #PF error: [normal kernel read fault] [Wed Feb 27 14:17:09 2019] PGD 0 P4D 0 [Wed Feb 27 14:17:09 2019] Oops: 0000 [#1] SMP PTI [Wed Feb 27 14:17:09 2019] CPU: 3 PID: 59 Comm: kworker/3:1 Not tainted 5.0.0-rc7-390-generic-hvi #390 [Wed Feb 27 14:17:09 2019] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [Wed Feb 27 14:17:09 2019] Workqueue: virtio_vsock virtio_transport_rx_work [vmw_vsock_virtio_transport] [Wed Feb 27 14:17:09 2019] RIP: 0010:virtio_transport_reset_no_sock+0x8c/0xc0 [vmw_vsock_virtio_transport_common] [Wed Feb 27 14:17:09 2019] Code: 35 8b 4f 14 48 8b 57 08 31 f6 44 8b 4f 10 44 8b 07 48 8d 7d c8 e8 84 f8 ff ff 48 85 c0 48 89 c3 74 2a e8 f7 31 03 00 48 89 df <48> 8b 80 10 01 00 00 e8 68 fb 69 ed 48 8b 75 f0 65 48 33 34 25 28 [Wed Feb 27 14:17:09 2019] RSP: 0018:ffffb42701ab7d40 EFLAGS: 00010282 [Wed Feb 27 14:17:09 2019] RAX: 0000000000000000 RBX: ffff9d79637ee080 RCX: 0000000000000003 [Wed Feb 27 14:17:09 2019] RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff9d79637ee080 [Wed Feb 27 14:17:09 2019] RBP: ffffb42701ab7d78 R08: ffff9d796fae70e0 R09: ffff9d796f403500 [Wed Feb 27 14:17:09 2019] R10: ffffb42701ab7d90 R11: 0000000000000000 R12: ffff9d7969d09240 [Wed Feb 27 14:17:09 2019] R13: ffff9d79624e6840 R14: ffff9d7969d09318 R15: ffff9d796d48ff80 [Wed Feb 27 14:17:09 2019] FS: 0000000000000000(0000) GS:ffff9d796fac0000(0000) knlGS:0000000000000000 [Wed Feb 27 14:17:09 2019] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [Wed Feb 27 14:17:09 2019] CR2: 0000000000000110 CR3: 0000000427f22000 CR4: 00000000000006e0 [Wed Feb 27 14:17:09 2019] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [Wed Feb 27 14:17:09 2019] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [Wed Feb 27 14:17:09 2019] Call Trace: [Wed Feb 27 14:17:09 2019] virtio_transport_recv_pkt+0x63/0x820 [vmw_vsock_virtio_transport_common] [Wed Feb 27 14:17:09 2019] ? kfree+0x17e/0x190 [Wed Feb 27 14:17:09 2019] ? detach_buf_split+0x145/0x160 [Wed Feb 27 14:17:09 2019] ? __switch_to_asm+0x40/0x70 [Wed Feb 27 14:17:09 2019] virtio_transport_rx_work+0xa0/0x106 [vmw_vsock_virtio_transport] [Wed Feb 27 14:17:09 2019] NET: Registered protocol family 40 [Wed Feb 27 14:17:09 2019] process_one_work+0x167/0x410 [Wed Feb 27 14:17:09 2019] worker_thread+0x4d/0x460 [Wed Feb 27 14:17:09 2019] kthread+0x105/0x140 [Wed Feb 27 14:17:09 2019] ? rescuer_thread+0x360/0x360 [Wed Feb 27 14:17:09 2019] ? kthread_destroy_worker+0x50/0x50 [Wed Feb 27 14:17:09 2019] ret_from_fork+0x35/0x40 [Wed Feb 27 14:17:09 2019] Modules linked in: vmw_vsock_virtio_transport vmw_vsock_virtio_transport_common input_leds vsock serio_raw i2c_piix4 mac_hid qemu_fw_cfg autofs4 cirrus ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops virtio_net psmouse drm net_failover pata_acpi virtio_blk failover floppy Fixes: 22b5c0b63f32 ("vsock/virtio: fix kernel panic after device hot-unplug") Reported-by: Alexandru Herghelegiu <aherghelegiu@bitdefender.com> Signed-off-by: Adalbert Lazăr <alazar@bitdefender.com> Co-developed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-08Merge tag 'io_uring-2019-03-06' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring IO interface from Jens Axboe: "Second attempt at adding the io_uring interface. Since the first one, we've added basic unit testing of the three system calls, that resides in liburing like the other unit tests that we have so far. It'll take a while to get full coverage of it, but we're working towards it. I've also added two basic test programs to tools/io_uring. One uses the raw interface and has support for all the various features that io_uring supports outside of standard IO, like fixed files, fixed IO buffers, and polled IO. The other uses the liburing API, and is a simplified version of cp(1). This adds support for a new IO interface, io_uring. io_uring allows an application to communicate with the kernel through two rings, the submission queue (SQ) and completion queue (CQ) ring. This allows for very efficient handling of IOs, see the v5 posting for some basic numbers: https://lore.kernel.org/linux-block/20190116175003.17880-1-axboe@kernel.dk/ Outside of just efficiency, the interface is also flexible and extendable, and allows for future use cases like the upcoming NVMe key-value store API, networked IO, and so on. It also supports async buffered IO, something that we've always failed to support in the kernel. Outside of basic IO features, it supports async polled IO as well. This particular feature has already been tested at Facebook months ago for flash storage boxes, with 25-33% improvements. It makes polled IO actually useful for real world use cases, where even basic flash sees a nice win in terms of efficiency, latency, and performance. These boxes were IOPS bound before, now they are not. This series adds three new system calls. One for setting up an io_uring instance (io_uring_setup(2)), one for submitting/completing IO (io_uring_enter(2)), and one for aux functions like registrating file sets, buffers, etc (io_uring_register(2)). Through the help of Arnd, I've coordinated the syscall numbers so merge on that front should be painless. Jon did a writeup of the interface a while back, which (except for minor details that have been tweaked) is still accurate. Find that here: https://lwn.net/Articles/776703/ Huge thanks to Al Viro for helping getting the reference cycle code correct, and to Jann Horn for his extensive reviews focused on both security and bugs in general. There's a userspace library that provides basic functionality for applications that don't need or want to care about how to fiddle with the rings directly. It has helpers to allow applications to easily set up an io_uring instance, and submit/complete IO through it without knowing about the intricacies of the rings. It also includes man pages (thanks to Jeff Moyer), and will continue to grow support helper functions and features as time progresses. Find it here: git://git.kernel.dk/liburing Fio has full support for the raw interface, both in the form of an IO engine (io_uring), but also with a small test application (t/io_uring) that can exercise and benchmark the interface" * tag 'io_uring-2019-03-06' of git://git.kernel.dk/linux-block: io_uring: add a few test tools io_uring: allow workqueue item to handle multiple buffered requests io_uring: add support for IORING_OP_POLL io_uring: add io_kiocb ref count io_uring: add submission polling io_uring: add file set registration net: split out functions related to registering inflight socket files io_uring: add support for pre-mapped user IO buffers block: implement bio helper to add iter bvec pages to bio io_uring: batch io_kiocb allocation io_uring: use fget/fput_many() for file references fs: add fget_many() and fput_many() io_uring: support for IO polling io_uring: add fsync support Add io_uring IO interface
2019-03-08bpf: fix warning about using plain integer as NULLBo YU
Sparse warning below: sudo make C=2 CF=-D__CHECK_ENDIAN__ M=net/bpf/ CHECK net/bpf//test_run.c net/bpf//test_run.c:19:77: warning: Using plain integer as NULL pointer ./include/linux/bpf-cgroup.h:295:77: warning: Using plain integer as NULL pointer Fixes: 8bad74f9840f ("bpf: extend cgroup bpf core to allow multiple cgroup storage types") Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Bo YU <tsu.yubo@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-08xsk: fix to reject invalid options in Tx descriptorBjörn Töpel
Passing a non-existing option in the options member of struct xdp_desc was, incorrectly, silently ignored. This patch addresses that behavior, and drops any Tx descriptor with non-existing options. We have examined existing user space code, and to our best knowledge, no one is relying on the current incorrect behavior. AF_XDP is still in its infancy, so from our perspective, the risk of breakage is very low, and addressing this problem now is important. Fixes: 35fcde7f8deb ("xsk: support for Tx") Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-08xsk: fix to reject invalid flags in xsk_bindBjörn Töpel
Passing a non-existing flag in the sxdp_flags member of struct sockaddr_xdp was, incorrectly, silently ignored. This patch addresses that behavior, and rejects any non-existing flags. We have examined existing user space code, and to our best knowledge, no one is relying on the current incorrect behavior. AF_XDP is still in its infancy, so from our perspective, the risk of breakage is very low, and addressing this problem now is important. Fixes: 965a99098443 ("xsk: add support for bind for Rx") Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-08sctp: call sctp_auth_init_hmacs() in sctp_sock_migrate()Xin Long
New ep's auth_hmacs should be set if old ep's is set, in case that net->sctp.auth_enable has been changed to 0 by users and new ep's auth_hmacs couldn't be set in sctp_endpoint_init(). It can even crash kernel by doing: 1. on server: sysctl -w net.sctp.auth_enable=1, sysctl -w net.sctp.addip_enable=1, sysctl -w net.sctp.addip_noauth_enable=0, listen() on server, sysctl -w net.sctp.auth_enable=0. 2. on client: connect() to server. 3. on server: accept() the asoc, sysctl -w net.sctp.auth_enable=1. 4. on client: send() asconf packet to server. The call trace: [ 245.280251] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [ 245.286872] RIP: 0010:sctp_auth_calculate_hmac+0xa3/0x140 [sctp] [ 245.304572] Call Trace: [ 245.305091] <IRQ> [ 245.311287] sctp_sf_authenticate+0x110/0x160 [sctp] [ 245.312311] sctp_sf_eat_auth+0xf2/0x230 [sctp] [ 245.313249] sctp_do_sm+0x9a/0x2d0 [sctp] [ 245.321483] sctp_assoc_bh_rcv+0xed/0x1a0 [sctp] [ 245.322495] sctp_rcv+0xa66/0xc70 [sctp] It's because the old ep->auth_hmacs wasn't copied to the new ep while ep->auth_hmacs is used in sctp_auth_calculate_hmac() when processing the incoming auth chunks, and it should have been done when migrating sock. Reported-by: Ying Xu <yinxu@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-08sctp: move up sctp_auth_init_hmacs() in sctp_endpoint_init()Xin Long
sctp_auth_init_hmacs() is called only when ep->auth_enable is set. It better to move up sctp_auth_init_hmacs() and remove auth_enable check in it and check auth_enable only once in sctp_endpoint_init(). Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-08sctp: sctp_sock_migrate() returns error if sctp_bind_addr_dup() failsXin Long
It should fail to create the new sk if sctp_bind_addr_dup() fails when accepting or peeloff an association. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-08rxrpc: Fix client call connect/disconnect raceDavid Howells
rxrpc_disconnect_client_call() reads the call's connection ID protocol value (call->cid) as part of that function's variable declarations. This is bad because it's not inside the locked section and so may race with someone granting use of the channel to the call. This manifests as an assertion failure (see below) where the call in the presumed channel (0 because call->cid wasn't set when we read it) doesn't match the call attached to the channel we were actually granted (if 1, 2 or 3). Fix this by moving the read and dependent calculations inside of the channel_lock section. Also, only set the channel number and pointer variables if cid is not zero (ie. unset). This problem can be induced by injecting an occasional error in rxrpc_wait_for_channel() before the call to schedule(). Make two further changes also: (1) Add a trace for wait failure in rxrpc_connect_call(). (2) Drop channel_lock before BUG'ing in the case of the assertion failure. The failure causes a trace akin to the following: rxrpc: Assertion failed - 18446612685268945920(0xffff8880beab8c00) == 18446612685268621312(0xffff8880bea69800) is false ------------[ cut here ]------------ kernel BUG at net/rxrpc/conn_client.c:824! ... RIP: 0010:rxrpc_disconnect_client_call+0x2bf/0x99d ... Call Trace: rxrpc_connect_call+0x902/0x9b3 ? wake_up_q+0x54/0x54 rxrpc_new_client_call+0x3a0/0x751 ? rxrpc_kernel_begin_call+0x141/0x1bc ? afs_alloc_call+0x1b5/0x1b5 rxrpc_kernel_begin_call+0x141/0x1bc afs_make_call+0x20c/0x525 ? afs_alloc_call+0x1b5/0x1b5 ? __lock_is_held+0x40/0x71 ? lockdep_init_map+0xaf/0x193 ? lockdep_init_map+0xaf/0x193 ? __lock_is_held+0x40/0x71 ? yfs_fs_fetch_data+0x33b/0x34a yfs_fs_fetch_data+0x33b/0x34a afs_fetch_data+0xdc/0x3b7 afs_read_dir+0x52d/0x97f afs_dir_iterate+0xa0/0x661 ? iterate_dir+0x63/0x141 iterate_dir+0xa2/0x141 ksys_getdents64+0x9f/0x11b ? filldir+0x111/0x111 ? do_syscall_64+0x3e/0x1a0 __x64_sys_getdents64+0x16/0x19 do_syscall_64+0x7d/0x1a0 entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: 45025bceef17 ("rxrpc: Improve management and caching of client connection objects") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-08sctp: remove sched init from sctp_stream_initXin Long
syzbot reported a NULL-ptr deref caused by that sched->init() in sctp_stream_init() set stream->rr_next = NULL. kasan: GPF could be caused by NULL-ptr deref or user memory access RIP: 0010:sctp_sched_rr_dequeue+0xd3/0x170 net/sctp/stream_sched_rr.c:141 Call Trace: sctp_outq_dequeue_data net/sctp/outqueue.c:90 [inline] sctp_outq_flush_data net/sctp/outqueue.c:1079 [inline] sctp_outq_flush+0xba2/0x2790 net/sctp/outqueue.c:1205 All sched info is saved in sout->ext now, in sctp_stream_init() sctp_stream_alloc_out() will not change it, there's no need to call sched->init() again, since sctp_outq_init() has already done it. Fixes: 5bbbbe32a431 ("sctp: introduce stream scheduler foundations") Reported-by: syzbot+4c9934f20522c0efd657@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-08route: set the deleted fnhe fnhe_daddr to 0 in ip_del_fnhe to fix a raceXin Long
The race occurs in __mkroute_output() when 2 threads lookup a dst: CPU A CPU B find_exception() find_exception() [fnhe expires] ip_del_fnhe() [fnhe is deleted] rt_bind_exception() In rt_bind_exception() it will bind a deleted fnhe with the new dst, and this dst will get no chance to be freed. It causes a dev defcnt leak and consecutive dmesg warnings: unregister_netdevice: waiting for ethX to become free. Usage count = 1 Especially thanks Jon to identify the issue. This patch fixes it by setting fnhe_daddr to 0 in ip_del_fnhe() to stop binding the deleted fnhe with a new dst when checking fnhe's fnhe_daddr and daddr in rt_bind_exception(). It works as both ip_del_fnhe() and rt_bind_exception() are protected by fnhe_lock and the fhne is freed by kfree_rcu(). Fixes: deed49df7390 ("route: check and remove route cache when we get route") Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-07SUNRPC: Respect RPC call timeouts when retrying transmissionTrond Myklebust
Fix a regression where soft and softconn requests are not timing out as expected. Fixes: 89f90fe1ad8b ("SUNRPC: Allow calls to xprt_transmit() to drain...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-07SUNRPC: Fix up RPC back channel transmissionTrond Myklebust
Now that transmissions happen through a queue, we require the RPC tasks to handle error conditions that may have been set while they were sleeping. The back channel does not currently do this, but assumes that any error condition happens during its own call to xprt_transmit(). The solution is to ensure that the back channel splits out the error handling just like the forward channel does. Fixes: 89f90fe1ad8b ("SUNRPC: Allow calls to xprt_transmit() to drain...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-07SUNRPC: Prevent thundering herd when the socket is not conn