summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)Author
2017-06-30sched/cputime: Refactor the cputime_adjust() codeGustavo A. R. Silva
Address a Coverity false positive, which is caused by overly convoluted code: Value assigned to variable 'utime' at line 619:utime = rtime; is overwritten at line 642:utime = rtime - stime; before it can be used. This makes such variable assignment useless. Remove this variable assignment and refactor the code related. Addresses-Coverity-ID: 1371643 Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Cc: Frans Klaver <fransklaver@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/20170629184128.GA5271@embeddedgus Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-30sched/debug: Expose the number of RT/DL tasks that can migrateDaniel Bristot de Oliveira
Add the value of the rt_rq.rt_nr_migratory and dl_rq.dl_nr_migratory to the sched_debug output, for instance: rt_rq[0]: .rt_nr_running : 2 .rt_nr_migratory : 1 <--- Like this .rt_throttled : 0 .rt_time : 828.645877 .rt_runtime : 1000.000000 This is useful to debug problems related to the RT/DL schedulers. This also fixes the format of some variables, that were unsigned, rather than signed. Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-rt-users <linux-rt-users@vger.kernel.org> Link: http://lkml.kernel.org/r/7896f71cada54ee7dd8507bb666063a2e051c3d4.1498482127.git.bristot@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-29sched/numa: Hide numa_wake_affine() from UP buildThomas Gleixner
Stephen reported the following build warning in UP: kernel/sched/fair.c:2657:9: warning: 'struct sched_domain' declared inside parameter list ^ /home/sfr/next/next/kernel/sched/fair.c:2657:9: warning: its scope is only this definition or declaration, which is probably not what you want Hide the numa_wake_affine() inline stub on UP builds to get rid of it. Fixes: 3fed382b46ba ("sched/numa: Implement NUMA node level wake_affine()") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org>
2017-06-24sched/fair: Remove effective_load()Rik van Riel
The effective_load() function was only used by the NUMA balancing code, and not by the regular load balancing code. Now that the NUMA balancing code no longer uses it either, get rid of it. Signed-off-by: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: jhladky@redhat.com Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20170623165530.22514-5-riel@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-24sched/numa: Implement NUMA node level wake_affine()Rik van Riel
Since select_idle_sibling() can place a task anywhere on a socket, comparing loads between individual CPU cores makes no real sense for deciding whether to do an affine wakeup across sockets, either. Instead, compare the load between the sockets in a similar way the load balancer and the numa balancing code do. Signed-off-by: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: jhladky@redhat.com Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20170623165530.22514-4-riel@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-24sched/fair: Simplify wake_affine() for the single socket caseRik van Riel
Then 'this_cpu' and 'prev_cpu' are in the same socket, select_idle_sibling() will do its thing regardless of the return value of wake_affine(). Just return true and don't look at all the other things. Signed-off-by: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: jhladky@redhat.com Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20170623165530.22514-3-riel@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-24sched/numa: Override part of migrate_degrades_locality() when idle balancingRik van Riel
Several tests in the NAS benchmark seem to run a lot slower with NUMA balancing enabled, than with NUMA balancing disabled. The slower run time corresponds with increased idle time. Overriding the final test of migrate_degrades_locality (but still doing the other NUMA tests first) seems to improve performance of those benchmarks. Reported-by: Jirka Hladky <jhladky@redhat.com> Signed-off-by: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20170623165530.22514-2-riel@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-24Merge branch 'linus' into sched/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-23Merge tag 'powerpc-4.12-7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Some more powerpc fixes for 4.12. Most of these actually came in last week but got held up for some more testing. - three fixes for kprobes/ftrace/livepatch interactions. - properly handle data breakpoints when using the Radix MMU. - fix for perf sampling of registers during call_usermodehelper(). - properly initialise the thread_info on our emergency stacks - add an explicit flush when doing TLB invalidations for a process using NPU2. Thanks to: Alistair Popple, Naveen N. Rao, Nicholas Piggin, Ravi Bangoria, Masami Hiramatsu" * tag 'powerpc-4.12-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64: Initialise thread_info for emergency stacks powerpc/powernv/npu-dma: Add explicit flush when sending an ATSD powerpc/perf: Fix oops when kthread execs user process powerpc/64s: Handle data breakpoints in Radix mode powerpc/kprobes: Skip livepatch_handler() for jprobes powerpc/ftrace: Pass the correct stack pointer for DYNAMIC_FTRACE_WITH_REGS powerpc/kprobes: Pause function_graph tracing during jprobes handling
2017-06-23Merge tag 'acpi-4.12-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fix from Rafael Wysocki: "This fixes the ACPI-based enumeration of some I2C and SPI devices broken in 4.11. Specifics: - I2C and SPI devices are expected to be enumerated by the I2C and SPI subsystems, respectively, but due to a change made during the 4.11 cycle, in some cases the ACPI core marks them as already enumerated which causes the I2C and SPI subsystems to overlook them, so fix that (Jarkko Nikula)" * tag 'acpi-4.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI / scan: Fix enumeration for special SPI and I2C devices
2017-06-23Merge branch 'i2c/for-current-fixed' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fix from Wolfram Sang. * 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: imx: Use correct function to write to register
2017-06-23Merge tag 'gpio-v4.12-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio Pull GPIO fix from Linus Walleij: "A single GPIO patch fixing the compatible string for the MVEBU PWM controller embedded in the GPIO controller before we release v4.12. Hopefully" * tag 'gpio-v4.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: gpio: mvebu: change compatible string for PWM support
2017-06-23Merge tag 'sound-4.12-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "Nothing exciting here, just a few stable fixes: - suppress spurious kernel WARNING in PCM core - fix potential spin deadlock at error handling in firewire - HD-audio PCI ID addition / fixup" * tag 'sound-4.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda - Apply quirks to Broxton-T, too ALSA: firewire-lib: Fix stall of process context at packet error ALSA: pcm: Don't treat NULL chmap as a fatal error ALSA: hda - Add Coffelake PCI ID
2017-06-23Merge tag 'drm-fixes-for-v4.12-rc7' of ↵Linus Torvalds
git://people.freedesktop.org/~airlied/linux Pull drm fixes from Dave Airlie: "A varied bunch of fixes, one for an API regression with connectors. Otherwise amdgpu and i915 have a bunch of varied fixes, the shrinker ones being the most important" * tag 'drm-fixes-for-v4.12-rc7' of git://people.freedesktop.org/~airlied/linux: drm: Fix GETCONNECTOR regression drm/radeon: add a quirk for Toshiba Satellite L20-183 drm/radeon: add a PX quirk for another K53TK variant drm/amdgpu: adjust default display clock drm/amdgpu/atom: fix ps allocation size for EnableDispPowerGating drm/amdgpu: add Polaris12 DID drm/i915: Don't enable backlight at setup time. drm/i915: Plumb the correct acquire ctx into intel_crtc_disable_noatomic() drm/i915: Fix deadlock witha the pipe A quirk during resume drm/i915: Remove __GFP_NORETRY from our buffer allocator drm/i915: Encourage our shrinker more when our shmemfs allocations fails drm/i915: Differentiate between sw write location into ring and last hw read
2017-06-23Merge tag 'random_for_linus_stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random Pull random fixes from Ted Ts'o: "Fix some locking and gcc optimization issues from the most recent random_for_linus_stable pull request" * tag 'random_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random: random: silence compiler warnings and fix race
2017-06-23Merge tag 'for-4.12/dm-fixes-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: - a revert of a DM mirror commit that has proven to make the code prone to crash - a DM io reference count fix that resolves a NULL pointer seen when issuing discards to a DM mirror target's device whose mirror legs do not all support discards - a couple DM integrity fixes * tag 'for-4.12/dm-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm io: fix duplicate bio completion due to missing ref count dm integrity: fix to not disable/enable interrupts from interrupt context Revert "dm mirror: use all available legs on multiple failures" dm integrity: reject mappings too large for device
2017-06-23Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "8 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: fs/exec.c: account for argv/envp pointers ocfs2: fix deadlock caused by recursive locking in xattr slub: make sysfs file removal asynchronous lib/cmdline.c: fix get_options() overflow while parsing ranges fs/dax.c: fix inefficiency in dax_writeback_mapping_range() autofs: sanity check status reported with AUTOFS_DEV_IOCTL_FAIL mm/vmalloc.c: huge-vmap: fail gracefully on unexpected huge vmap mappings mm, thp: remove cond_resched from __collapse_huge_page_copy
2017-06-23fs/exec.c: account for argv/envp pointersKees Cook
When limiting the argv/envp strings during exec to 1/4 of the stack limit, the storage of the pointers to the strings was not included. This means that an exec with huge numbers of tiny strings could eat 1/4 of the stack limit in strings and then additional space would be later used by the pointers to the strings. For example, on 32-bit with a 8MB stack rlimit, an exec with 1677721 single-byte strings would consume less than 2MB of stack, the max (8MB / 4) amount allowed, but the pointers to the strings would consume the remaining additional stack space (1677721 * 4 == 6710884). The result (1677721 + 6710884 == 8388605) would exhaust stack space entirely. Controlling this stack exhaustion could result in pathological behavior in setuid binaries (CVE-2017-1000365). [akpm@linux-foundation.org: additional commenting from Kees] Fixes: b6a2fea39318 ("mm: variable length argument support") Link: http://lkml.kernel.org/r/20170622001720.GA32173@beast Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Qualys Security Advisory <qsa@qualys.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-23ocfs2: fix deadlock caused by recursive locking in xattrEric Ren
Another deadlock path caused by recursive locking is reported. This kind of issue was introduced since commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()"). Two deadlock paths have been fixed by commit b891fa5024a9 ("ocfs2: fix deadlock issue when taking inode lock at vfs entry points"). Yes, we intend to fix this kind of case in incremental way, because it's hard to find out all possible paths at once. This one can be reproduced like this. On node1, cp a large file from home directory to ocfs2 mountpoint. While on node2, run setfacl/getfacl. Both nodes will hang up there. The backtraces: On node1: __ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2] ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2] ocfs2_write_begin+0x43/0x1a0 [ocfs2] generic_perform_write+0xa9/0x180 __generic_file_write_iter+0x1aa/0x1d0 ocfs2_file_write_iter+0x4f4/0xb40 [ocfs2] __vfs_write+0xc3/0x130 vfs_write+0xb1/0x1a0 SyS_write+0x46/0xa0 On node2: __ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2] ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2] ocfs2_xattr_set+0x12e/0xe80 [ocfs2] ocfs2_set_acl+0x22d/0x260 [ocfs2] ocfs2_iop_set_acl+0x65/0xb0 [ocfs2] set_posix_acl+0x75/0xb0 posix_acl_xattr_set+0x49/0xa0 __vfs_setxattr+0x69/0x80 __vfs_setxattr_noperm+0x72/0x1a0 vfs_setxattr+0xa7/0xb0 setxattr+0x12d/0x190 path_setxattr+0x9f/0xb0 SyS_setxattr+0x14/0x20 Fix this one by using ocfs2_inode_{lock|unlock}_tracker, which is exported by commit 439a36b8ef38 ("ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock"). Link: http://lkml.kernel.org/r/20170622014746.5815-1-zren@suse.com Fixes: 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()") Signed-off-by: Eric Ren <zren@suse.com> Reported-by: Thomas Voegtle <tv@lio96.de> Tested-by: Thomas Voegtle <tv@lio96.de> Reviewed-by: Joseph Qi <jiangqi903@gmail.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-23slub: make sysfs file removal asynchronousTejun Heo
Commit bf5eb3de3847 ("slub: separate out sysfs_slab_release() from sysfs_slab_remove()") made slub sysfs file removals synchronous to kmem_cache shutdown. Unfortunately, this created a possible ABBA deadlock between slab_mutex and sysfs draining mechanism triggering the following lockdep warning. ====================================================== [ INFO: possible circular locking dependency detected ] 4.10.0-test+ #48 Not tainted ------------------------------------------------------- rmmod/1211 is trying to acquire lock: (s_active#120){++++.+}, at: [<ffffffff81308073>] kernfs_remove+0x23/0x40 but task is already holding lock: (slab_mutex){+.+.+.}, at: [<ffffffff8120f691>] kmem_cache_destroy+0x41/0x2d0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (slab_mutex){+.+.+.}: lock_acquire+0xf6/0x1f0 __mutex_lock+0x75/0x950 mutex_lock_nested+0x1b/0x20 slab_attr_store+0x75/0xd0 sysfs_kf_write+0x45/0x60 kernfs_fop_write+0x13c/0x1c0 __vfs_write+0x28/0x120 vfs_write+0xc8/0x1e0 SyS_write+0x49/0xa0 entry_SYSCALL_64_fastpath+0x1f/0xc2 -> #0 (s_active#120){++++.+}: __lock_acquire+0x10ed/0x1260 lock_acquire+0xf6/0x1f0 __kernfs_remove+0x254/0x320 kernfs_remove+0x23/0x40 sysfs_remove_dir+0x51/0x80 kobject_del+0x18/0x50 __kmem_cache_shutdown+0x3e6/0x460 kmem_cache_destroy+0x1fb/0x2d0 kvm_exit+0x2d/0x80 [kvm] vmx_exit+0x19/0xa1b [kvm_intel] SyS_delete_module+0x198/0x1f0 entry_SYSCALL_64_fastpath+0x1f/0xc2 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(slab_mutex); lock(s_active#120); lock(slab_mutex); lock(s_active#120); *** DEADLOCK *** 2 locks held by rmmod/1211: #0: (cpu_hotplug.dep_map){++++++}, at: [<ffffffff810a7877>] get_online_cpus+0x37/0x80 #1: (slab_mutex){+.+.+.}, at: [<ffffffff8120f691>] kmem_cache_destroy+0x41/0x2d0 stack backtrace: CPU: 3 PID: 1211 Comm: rmmod Not tainted 4.10.0-test+ #48 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012 Call Trace: print_circular_bug+0x1be/0x210 __lock_acquire+0x10ed/0x1260 lock_acquire+0xf6/0x1f0 __kernfs_remove+0x254/0x320 kernfs_remove+0x23/0x40 sysfs_remove_dir+0x51/0x80 kobject_del+0x18/0x50 __kmem_cache_shutdown+0x3e6/0x460 kmem_cache_destroy+0x1fb/0x2d0 kvm_exit+0x2d/0x80 [kvm] vmx_exit+0x19/0xa1b [kvm_intel] SyS_delete_module+0x198/0x1f0 ? SyS_delete_module+0x5/0x1f0 entry_SYSCALL_64_fastpath+0x1f/0xc2 It'd be the cleanest to deal with the issue by removing sysfs files without holding slab_mutex before the rest of shutdown; however, given the current code structure, it is pretty difficult to do so. This patch punts sysfs file removal to a work item. Before commit bf5eb3de3847, the removal was punted to a RCU delayed work item which is executed after release. Now, we're punting to a different work item on shutdown which still maintains the goal removing the sysfs files earlier when destroying kmem_caches. Link: http://lkml.kernel.org/r/20170620204512.GI21326@htj.duckdns.org Fixes: bf5eb3de3847 ("slub: separate out sysfs_slab_release() from sysfs_slab_remove()") Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-23lib/cmdline.c: fix get_options() overflow while parsing rangesIlya Matveychikov
When using get_options() it's possible to specify a range of numbers, like 1-100500. The problem is that it doesn't track array size while calling internally to get_range() which iterates over the range and fills the memory with numbers. Link: http://lkml.kernel.org/r/2613C75C-B04D-4BFF-82A6-12F97BA0F620@gmail.com Signed-off-by: Ilya V. Matveychikov <matvejchikov@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-23fs/dax.c: fix inefficiency in dax_writeback_mapping_range()Jan Kara
dax_writeback_mapping_range() fails to update iteration index when searching radix tree for entries needing cache flushing. Thus each pagevec worth of entries is searched starting from the start which is inefficient and prone to livelocks. Update index properly. Link: http://lkml.kernel.org/r/20170619124531.21491-1-jack@suse.cz Fixes: 9973c98ecfda3 ("dax: add support for fsync/sync") Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-23autofs: sanity check status reported with AUTOFS_DEV_IOCTL_FAILNeilBrown
If a positive status is passed with the AUTOFS_DEV_IOCTL_FAIL ioctl, autofs4_d_automount() will return ERR_PTR(status) with that status to follow_automount(), which will then dereference an invalid pointer. So treat a positive status the same as zero, and map to ENOENT. See comment in systemd src/core/automount.c::automount_send_ready(). Link: http://lkml.kernel.org/r/871sqwczx5.fsf@notabene.neil.brown.name Signed-off-by: NeilBrown <neilb@suse.com> Cc: Ian Kent <raven@themaw.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-23mm/vmalloc.c: huge-vmap: fail gracefully on unexpected huge vmap mappingsArd Biesheuvel
Existing code that uses vmalloc_to_page() may assume that any address for which is_vmalloc_addr() returns true may be passed into vmalloc_to_page() to retrieve the associated struct page. This is not un unreasonable assumption to make, but on architectures that have CONFIG_HAVE_ARCH_HUGE_VMAP=y, it no longer holds, and we need to ensure that vmalloc_to_page() does not go off into the weeds trying to dereference huge PUDs or PMDs as table entries. Given that vmalloc() and vmap() themselves never create huge mappings or deal with compound pages at all, there is no correct answer in this case, so return NULL instead, and issue a warning. When reading /proc/kcore on arm64, you will hit an oops as soon as you hit the huge mappings used for the various segments that make up the mapping of vmlinux. With this patch applied, you will no longer hit the oops, but the kcore contents willl be incorrect (these regions will be zeroed out) We are fixing this for kcore specifically, so it avoids vread() for those regions. At least one other problematic user exists, i.e., /dev/kmem, but that is currently broken on arm64 for other reasons. Link: http://lkml.kernel.org/r/20170609082226.26152-1-ard.biesheuvel@linaro.org Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Laura Abbott <labbott@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: zhong jiang <zhongjiang@huawei.com> Cc: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-23mm, thp: remove cond_resched from __collapse_huge_page_copyDavid Rientjes
This is a partial revert of commit 338a16ba1549 ("mm, thp: copying user pages must schedule on collapse") which added a cond_resched() to __collapse_huge_page_copy(). On x86 with CONFIG_HIGHPTE, __collapse_huge_page_copy is called in atomic context and thus scheduling is not possible. This is only a possible config on arm and i386. Although need_resched has been shown to be set for over 100 jiffies while doing the iteration in __collapse_huge_page_copy, this is better than doing if (in_atomic()) cond_resched() to cover only non-CONFIG_HIGHPTE configs. Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1706191341550.97821@chino.kir.corp.google.com Signed-off-by: David Rientjes <rientjes@google.com> Reported-by: Larry Finger <Larry.Finger@lwfinger.net> Tested-by: Larry Finger <Larry.Finger@lwfinger.net> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-23Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Two fixes to remove spurious WARN_ONs from the new(ish) qedi driver. The driver already prints a warning message, there's no need to panic users by printing something that looks like an oops as well" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: qedi: Remove WARN_ON from clear task context. scsi: qedi: Remove WARN_ON for untracked cleanup.
2017-06-23Merge tag 'xfs-4.12-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fixes from Darrick Wong: "I have one more bugfix for you for 4.12-rc7 to fix a disk corruption problem: - don't allow swapon on files on the realtime device, because the swap code will swap pages out to blocks on the data device, thereby corrupting the filesystem" * tag 'xfs-4.12-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: don't allow bmap on rt files
2017-06-23sched/rt: Move RT related code from sched/core.c to sched/rt.cNicolas Pitre
This helps making sched/core.c smaller and hopefully easier to understand and maintain. Signed-off-by: Nicolas Pitre <nico@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170621182203.30626-3-nicolas.pitre@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-23sched/deadline: Move DL related code from sched/core.c to sched/deadline.cNicolas Pitre
This helps making sched/core.c smaller and hopefully easier to understand and maintain. Signed-off-by: Nicolas Pitre <nico@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170621182203.30626-2-nicolas.pitre@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-23sched/cpuset: Only offer CONFIG_CPUSETS if SMP is enabledNicolas Pitre
Make CONFIG_CPUSETS=y depend on SMP as this feature makes no sense on UP. This allows for configuring out cpuset_cpumask_can_shrink() and task_can_attach() entirely, which shrinks the kernel a bit. Signed-off-by: Nicolas Pitre <nico@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170614171926.8345-2-nicolas.pitre@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-23powerpc/64: Initialise thread_info for emergency stacksNicholas Piggin
Emergency stacks have their thread_info mostly uninitialised, which in particular means garbage preempt_count values. Emergency stack code runs with interrupts disabled entirely, and is used very rarely, so this has been unnoticed so far. It was found by a proposed new powerpc watchdog that takes a soft-NMI directly from the masked_interrupt handler and using the emergency stack. That crashed at BUG_ON(in_nmi()) in nmi_enter(). preempt_count()s were found to be garbage. To fix this, zero the entire THREAD_SIZE allocation, and initialize the thread_info. Cc: stable@vger.kernel.org Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Move it all into setup_64.c, use a function not a macro. Fix crashes on Cell by setting preempt_count to 0 not HARDIRQ_OFFSET] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-23Merge tag 'drm-misc-fixes-2017-06-22' of ↵Dave Airlie
git://anongit.freedesktop.org/git/drm-misc into drm-fixes UAPI Changes: - drm: Fix regression in GETCONNECTOR ioctl returning stale properties (Daniel) Cc: Daniel Vetter <daniel.vetter@ffwll.ch> * tag 'drm-misc-fixes-2017-06-22' of git://anongit.freedesktop.org/git/drm-misc: drm: Fix GETCONNECTOR regression
2017-06-22Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: "Various small fixes for stable" * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: CIFS: Fix some return values in case of error in 'crypt_message' cifs: remove redundant return in cifs_creation_time_get CIFS: Improve readdir verbosity CIFS: check if pages is null rather than bv for a failed allocation CIFS: Set ->should_dirty in cifs_user_readv()
2017-06-22Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Radim Krčmář: "MIPS: - Fix build with KVM, DYNAMIC_DEBUG and JUMP_LABEL. PPC: - Fix host crashes/hangs on POWER9. - Properly restore userspace state after KVM_RUN ioctl. s390: - Fix address translation in odd-ball cases (real-space designation ASCEs). x86: - Fix privilege escalation in 64-bit Windows guests All patches are for stable and the x86 also has a CVE" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: fix singlestepping over syscall KVM: s390: gaccess: fix real-space designation asce handling for gmap shadows KVM: MIPS: Fix maybe-uninitialized build failure KVM: PPC: Book3S HV: Ignore timebase offset on POWER9 DD1 KVM: PPC: Book3S HV: Save/restore host values of debug registers KVM: PPC: Book3S HV: Preserve userspace HTM state properly KVM: PPC: Book3S HV: Restore critical SPRs to host values on guest exit KVM: PPC: Book3S HV: Context-switch EBB registers properly KVM: PPC: Book3S HV: Cope with host using large decrementer mode
2017-06-22Merge tag 'mfd-fixes-4.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd Pull MFD fixes from Lee Jones: - arizona: use address passed in, rather than hard coded value - correct STM32 clock-names value in DT binding documentation * tag 'mfd-fixes-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: dt-bindings: mfd: Update STM32 timers clock names mfd: arizona: Fix typo using hard-coded register
2017-06-22KVM: x86: fix singlestepping over syscallPaolo Bonzini
TF is handled a bit differently for syscall and sysret, compared to the other instructions: TF is checked after the instruction completes, so that the OS can disable #DB at a syscall by adding TF to FMASK. When the sysret is executed the #DB is taken "as if" the syscall insn just completed. KVM emulates syscall so that it can trap 32-bit syscall on Intel processors. Fix the behavior, otherwise you could get #DB on a user stack which is not nice. This does not affect Linux guests, as they use an IST or task gate for #DB. This fixes CVE-2017-7518. Cc: stable@vger.kernel.org Reported-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-06-22Merge tag 'kvm-s390-master-4.12-2' of ↵Radim Krčmář
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux KVM: s390: fix shadow table handling for nested guests Some odd-ball cases (real-space designation ASCEs) are handled wrong for the shadow page tables. Fix it.
2017-06-22powerpc/powernv/npu-dma: Add explicit flush when sending an ATSDAlistair Popple
NPU2 requires an extra explicit flush to an active GPU PID when sending address translation shoot downs (ATSDs) to reliably flush the GPU TLB. This patch adds just such a flush at the end of each sequence of ATSDs. We can safely use PID 0 which is always reserved and active on the GPU. PID 0 is only used for init_mm which will never be a user mm on the GPU. To enforce this we add a check in pnv_npu2_init_context() just in case someone tries to use PID 0 on the GPU. Signed-off-by: Alistair Popple <alistair@popple.id.au> [mpe: Use true/false for bool literals] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-22KVM: s390: gaccess: fix real-space designation asce handling for gmap shadowsHeiko Carstens
For real-space designation asces the asce origin part is only a token. The asce token origin must not be used to generate an effective address for storage references. This however is erroneously done within kvm_s390_shadow_tables(). Furthermore within the same function the wrong parts of virtual addresses are used to generate a corresponding real address (e.g. the region second index is used as region first index). Both of the above can result in incorrect address translations. Only for real space designations with a token origin of zero and addresses below one megabyte the translation was correct. Furthermore replace a "!asce.r" statement with a "!*fake" statement to make it more obvious that a specific condition has nothing to do with the architecture, but with the fake handling of real space designations. Fixes: 3218f7094b6b ("s390/mm: support real-space for gmap shadows") Cc: David Hildenbrand <david@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2017-06-22sched/fair: Spare idle load balancing on nohz_full CPUsFrederic Weisbecker
Although idle load balancing obviously only concerns idle CPUs, it can be a disturbance on a busy nohz_full CPU. Indeed a CPU can only get rid of an idle load balancing duty once a tick fires while it runs a task and this can take a while on a nohz_full CPU. We could fix that and escape the idle load balancing duty from the very idle exit path but that would bring unecessary overhead. Lets just not bother and leave that job to housekeeping CPUs (those outside nohz_full range). The nohz_full CPUs simply don't want any disturbance. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1497838322-10913-4-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-22nohz: Move idle balancer registration to the idle pathFrederic Weisbecker
The idle load balancing registration path assumes that we only stop the tick when the CPU is idle, ignoring the nohz full case. As a result, a nohz full CPU that is running a task may be chosen to perform idle load balancing. Lets make sure that only CPUs in dynticks idle mode can be picked as idle load balancers. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1497838322-10913-3-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-22sched/loadavg: Generalize "_idle" naming to "_nohz"Frederic Weisbecker
The loadavg naming code still assumes that nohz == idle whereas its code is actually handling well both nohz idle and nohz full. So lets fix the naming according to what the code actually does, to unconfuse the reader. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1497838322-10913-2-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-22i2c: imx: Use correct function to write to registerMichail Georgios Etairidis
The i2c-imx driver incorrectly uses readb()/writeb() to read and write to the appropriate registers when performing a repeated start. The appropriate imx_i2c_read_reg()/imx_i2c_write_reg() functions should be used instead. Performing a repeated start results in a kernel panic. The platform is imx. Signed-off-by: Michail G Etairidis <m.etairidis@beck-ipc.com> Fixes: ce1a78840ff7 ("i2c: imx: add DMA support for freescale i2c driver") Fixes: 054b62d9f25c ("i2c: imx: fix the i2c bus hang issue when do repeat restart") Acked-by: Fugang Duan <fugang.duan@nxp.com> Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2017-06-21Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "This contains a set of fixes for xen-blkback by way of Konrad, and a performance regression fix for blk-mq for shared tags. The latter could account for as much as a 50x reduction in performance, with the test case from the user with 500 name spaces. A more realistic setup on my end with 32 drives showed a 3.5x drop. The fix has been thoroughly tested before being committed" * 'for-linus' of git://git.kernel.dk/linux-block: blk-mq: fix performance regression with shared tags xen-blkback: don't leak stack data via response ring xen/blkback: don't use xen_blkif_get() in xen-blkback kthread xen/blkback: don't free be structure too early xen/blkback: fix disconnect while I/Os in flight
2017-06-21xfs: don't allow bmap on rt filesDarrick J. Wong
bmap returns a dumb LBA address but not the block device that goes with that LBA. Swapfiles don't care about this and will blindly assume that the data volume is the correct blockdev, which is totally bogus for files on the rt subvolume. This results in the swap code doing IOs to arbitrary locations on the data device(!) if the passed in mapping is a realtime file, so just turn off bmap for rt files. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-06-21ACPI / scan: Fix enumeration for special SPI and I2C devicesJarkko Nikula
Commit f406270bf73d ("ACPI / scan: Set the visited flag for all enumerated devices") caused that two group of special SPI or I2C devices do not enumerate. SPI and I2C devices are expected to be enumerated by the SPI and I2C subsystems but change caused that acpi_bus_attach() marks those devices with acpi_device_set_enumerated(). First group of devices are matched using Device Tree compatible property with special _HID "PRP0001". Those devices have matched scan handler, acpi_scan_attach_handler() retuns 1 and acpi_bus_attach() marks them with acpi_device_set_enumerated(). Second group of devices without valid _HID such as "LNXVIDEO" have device->pnp.type.platform_id set to zero and change again marks them with acpi_device_set_enumerated(). Fix this by flagging the SPI and I2C devices during struct acpi_device object initialization time and let the code in acpi_bus_attach() to go through the device_attach() and acpi_default_enumeration() path for all SPI and I2C devices. Fixes: f406270bf73d (ACPI / scan: Set the visited flag for all enumerated devices) Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: 4.11+ <stable@vger.kernel.org> # 4.11+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-06-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix refcounting wrt timers which hold onto inet6 address objects, from Xin Long. 2) Fix an ancient bug in wireless wext ioctls, from Johannes Berg. 3) Firmware handling fixes in brcm80211 driver, from Arend Van Spriel. 4) Several mlx5 driver fixes (firmware readiness, timestamp cap reporting, devlink command validity checking, tc offloading, etc.) From Eli Cohen, Maor Dickman, Chris Mi, and Or Gerlitz. 5) Fix dst leak in IP/IP6 tunnels, from Haishuang Yan. 6) Fix dst refcount bug in decnet, from Wei Wang. 7) Netdev can be double freed in register_vlan_device(). Fix from Gao Feng. 8) Don't allow object to be destroyed while it is being dumped in SCTP, from Xin Long. 9) Fix dpaa_eth build when modular, from Madalin Bucur. 10) Fix throw route leaks, from Serhey Popovych. 11) IFLA_GROUP missing from if_nlmsg_size() and ifla_policy[] table, also from Serhey Popovych. 12) Fix premature TX SKB free in stmmac, from Niklas Cassel. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (36 commits) igmp: add a missing spin_lock_init() net: stmmac: free an skb first when there are no longer any descriptors using it sfc: remove duplicate up_write on VF filter_sem rtnetlink: add IFLA_GROUP to ifla_policy ipv6: Do not leak throw route references dt-bindings: net: sms911x: Add missing optional VDD regulators dpaa_eth: reuse the dma_ops provided by the FMan MAC device fsl/fman: propagate dma_ops net/core: remove explicit do_softirq() from busy_poll_stop() fib_rules: Resolve goto rules target on delete sctp: ensure ep is not destroyed before doing the dump net/hns:bugfix of ethtool -t phy self_test net: 8021q: Fix one possible panic caused by BUG_ON in free_netdev cxgb4: notify uP to route ctrlq compl to rdma rspq ip6_tunnel: Correct tos value in collect_md mode decnet: always not take dst->__refcnt when inserting dst into hash table ip6_tunnel: fix potential issue in __ip6_tnl_rcv ip_tunnel: fix potential issue in ip_tunnel_rcv brcmfmac: fix uninitialized warning in brcmf_usb_probe_phase2() net/mlx5e: Avoid doing a cleanup call if the profile doesn't have it ...
2017-06-21Merge tag 'pinctrl-v4.12-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull more pin control fixes from Linus Walleij: "Some late arriving fixes. I should have sent earlier, just swamped with work as usual. Thomas patch makes AMD systems usable despite firmware bugs so it is fairly important. - Make the AMD driver use a regular interrupt rather than a chained one, so the system does not lock up. - Fix a function call error deep inside the STM32 driver" * tag 'pinctrl-v4.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: stm32: Fix bad function call pinctrl/amd: Use regular interrupt instead of chained
2017-06-21Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid Pull HID fixes from Jiri Kosina: - revert of a commit to magicmouse driver that regressess certain devices, from Daniel Stone - quirk for a specific Dell mouse, from Sebastian Parschauer * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: Revert "HID: magicmouse: Set multi-touch keybits for Magic Mouse" HID: Add quirk for Dell PIXART OEM mouse
2017-06-21Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching Pull livepatching fix from Jiri Kosina: "Fix the way how livepatches are being stacked with respect to RCU, from Petr Mladek" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching: livepatch: Fix stacking of patches with respect to RCU