summaryrefslogtreecommitdiffstats
path: root/arch
AgeCommit message (Collapse)Author
2015-02-13x86: use %*pb[l] to print bitmaps including cpumasks and nodemasksTejun Heo
printk and friends can now format bitmaps using '%*pb[l]'. cpumask and nodemask also provide cpumask_pr_args() and nodemask_pr_args() respectively which can be used to generate the two printf arguments necessary to format the specified cpu/nodemask. * Unnecessary buffer size calculation and condition on the lenght removed from intel_cacheinfo.c::show_shared_cpu_map_func(). * uv_nmi_nr_cpus_pr() got overly smart and implemented "..." abbreviation if the output stretched over the predefined 1024 byte buffer. Replaced with plain printk. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Mike Travis <travis@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13tile: use %*pb[l] to print bitmaps including cpumasks and nodemasksTejun Heo
printk and friends can now format bitmaps using '%*pb[l]'. cpumask and nodemask also provide cpumask_pr_args() and nodemask_pr_args() respectively which can be used to generate the two printf arguments necessary to format the specified cpu/nodemask. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13powerpc: use %*pb[l] to print bitmaps including cpumasks and nodemasksTejun Heo
printk and friends can now format bitmaps using '%*pb[l]'. cpumask and nodemask also provide cpumask_pr_args() and nodemask_pr_args() respectively which can be used to generate the two printf arguments necessary to format the specified cpu/nodemask. * Spurious if (len > 1) test dropped from shared_cpu_map_show(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13mips: use %*pb[l] to print bitmaps including cpumasks and nodemasksTejun Heo
printk and friends can now format bitmaps using '%*pb[l]'. cpumask and nodemask also provide cpumask_pr_args() and nodemask_pr_args() respectively which can be used to generate the two printf arguments necessary to format the specified cpu/nodemask. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tileLinus Torvalds
Pull arch/tile changes from Chris Metcalf: "Not much in this batch, just some minor cleanups" * git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile: tile: change MAINTAINERS website from tilera.com to ezchip.com tile: enable sparse checks for get/put_user tile: fix put_user sparse errors tile: default to little endian on older toolchains
2015-02-13Revert "x86/apic: Only disable CPU x2apic mode when necessary"Linus Torvalds
This reverts commit 5fcee53ce705d49c766f8a302c7e93bdfc33c124. It causes the suspend to fail on at least the Chromebook Pixel, possibly other platforms too. Joerg Roedel points out that the logic should probably have been if (max_physical_apicid > 255 || !(IS_ENABLED(CONFIG_HYPERVISOR_GUEST) && hypervisor_x2apic_available())) { instead, but since the code is not in any fast-path, so we can just live without that optimization and just revert to the original code. Acked-by: Joerg Roedel <joro@8bytes.org> Acked-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM update from Paolo Bonzini: "Fairly small update, but there are some interesting new features. Common: Optional support for adding a small amount of polling on each HLT instruction executed in the guest (or equivalent for other architectures). This can improve latency up to 50% on some scenarios (e.g. O_DSYNC writes or TCP_RR netperf tests). This also has to be enabled manually for now, but the plan is to auto-tune this in the future. ARM/ARM64: The highlights are support for GICv3 emulation and dirty page tracking s390: Several optimizations and bugfixes. Also a first: a feature exposed by KVM (UUID and long guest name in /proc/sysinfo) before it is available in IBM's hypervisor! :) MIPS: Bugfixes. x86: Support for PML (page modification logging, a new feature in Broadwell Xeons that speeds up dirty page tracking), nested virtualization improvements (nested APICv---a nice optimization), usual round of emulation fixes. There is also a new option to reduce latency of the TSC deadline timer in the guest; this needs to be tuned manually. Some commits are common between this pull and Catalin's; I see you have already included his tree. Powerpc: Nothing yet. The KVM/PPC changes will come in through the PPC maintainers, because I haven't received them yet and I might end up being offline for some part of next week" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (130 commits) KVM: ia64: drop kvm.h from installed user headers KVM: x86: fix build with !CONFIG_SMP KVM: x86: emulate: correct page fault error code for NoWrite instructions KVM: Disable compat ioctl for s390 KVM: s390: add cpu model support KVM: s390: use facilities and cpu_id per KVM KVM: s390/CPACF: Choose crypto control block format s390/kernel: Update /proc/sysinfo file with Extended Name and UUID KVM: s390: reenable LPP facility KVM: s390: floating irqs: fix user triggerable endless loop kvm: add halt_poll_ns module parameter kvm: remove KVM_MMIO_SIZE KVM: MIPS: Don't leak FPU/DSP to guest KVM: MIPS: Disable HTW while in guest KVM: nVMX: Enable nested posted interrupt processing KVM: nVMX: Enable nested virtual interrupt delivery KVM: nVMX: Enable nested apic register virtualization KVM: nVMX: Make nested control MSRs per-cpu KVM: nVMX: Enable nested virtualize x2apic mode KVM: nVMX: Prepare for using hardware MSR bitmap ...
2015-02-12Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge third set of updates from Andrew Morton: - the rest of MM [ This includes getting rid of the numa hinting bits, in favor of just generic protnone logic. Yay. - Linus ] - core kernel - procfs - some of lib/ (lots of lib/ material this time) * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (104 commits) lib/lcm.c: replace include lib/percpu_ida.c: remove redundant includes lib/strncpy_from_user.c: replace module.h include lib/stmp_device.c: replace module.h include lib/sort.c: move include inside #if 0 lib/show_mem.c: remove redundant include lib/radix-tree.c: change to simpler include lib/plist.c: remove redundant include lib/nlattr.c: remove redundant include lib/kobject_uevent.c: remove redundant include lib/llist.c: remove redundant include lib/md5.c: simplify include lib/list_sort.c: rearrange includes lib/genalloc.c: remove redundant include lib/idr.c: remove redundant include lib/halfmd4.c: simplify includes lib/dynamic_queue_limits.c: simplify includes lib/sort.c: use simpler includes lib/interval_tree.c: simplify includes hexdump: make it return number of bytes placed in buffer ...
2015-02-12lib/string.c: remove strnicmp()Rasmus Villemoes
Now that all in-tree users of strnicmp have been converted to strncasecmp, the wrapper can be removed. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: David Howells <dhowells@redhat.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12kernel.h: remove ancient __FUNCTION__ hackRasmus Villemoes
__FUNCTION__ hasn't been treated as a string literal since gcc 3.4, so this only helps people who only test-compile using 3.3 (compiler-gcc3.h barks at anything older than that). Besides, there are almost no occurrences of __FUNCTION__ left in the tree. [akpm@linux-foundation.org: convert remaining __FUNCTION__ references] Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12powerpc: add running_clock for powerpc to prevent spurious softlockup warningsCyril Bur
On POWER8 virtualised kernels the VTB register can be read to have a view of time that only increases while the guest is running. This will prevent guests from seeing time jump if a guest is paused for significant amounts of time. On POWER7 and below virtualised kernels stolen time is subtracted from local_clock as a best effort approximation. This will not eliminate spurious warnings in the case of a suspended guest but may reduce the occurance in the case of softlockups due to host over commit. Bare metal kernels should avoid reading the VTB as KVM does not restore sane values when not executing, the approxmation is fine as host kernels won't observe any stolen time. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Andrew Jones <drjones@redhat.com> Acked-by: Don Zickus <dzickus@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Ulrich Obergfell <uobergfe@redhat.com> Cc: chai wen <chaiw.fnst@cn.fujitsu.com> Cc: Fabian Frederick <fabf@skynet.be> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: Ben Zhang <benzh@chromium.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12all arches, signal: move restart_block to struct task_structAndy Lutomirski
If an attacker can cause a controlled kernel stack overflow, overwriting the restart block is a very juicy exploit target. This is because the restart_block is held in the same memory allocation as the kernel stack. Moving the restart block to struct task_struct prevents this exploit by making the restart_block harder to locate. Note that there are other fields in thread_info that are also easy targets, at least on some architectures. It's also a decent simplification, since the restart code is more or less identical on all architectures. [james.hogan@imgtec.com: metag: align thread_info::supervisor_stack] Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: David Miller <davem@davemloft.net> Acked-by: Richard Weinberger <richard@nod.at> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Steven Miao <realmz6@gmail.com> Cc: Mark Salter <msalter@redhat.com> Cc: Aurelien Jacquiot <a-jacquiot@ti.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: David Howells <dhowells@redhat.com> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Michal Simek <monstr@monstr.eu> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Jonas Bonn <jonas@southpole.se> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Helge Deller <deller@gmx.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Tested-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Chen Liqin <liqin.linux@gmail.com> Cc: Lennox Wu <lennox.wu@gmail.com> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Chris Zankel <chris@zankel.net> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: James Hogan <james.hogan@imgtec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12arch/frv/mm/extable.c: remove unused functionRickard Strandqvist
Remove the function search_one_table() that is not used anywhere. This was partially found by using a static code analysis program called cppcheck. Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12x86: mm: restore original pte_special checkMel Gorman
Commit b38af4721f59 ("x86,mm: fix pte_special versus pte_numa") adjusted the pte_special check to take into account that a special pte had SPECIAL and neither PRESENT nor PROTNONE. Now that NUMA hinting PTEs are no longer modifying _PAGE_PRESENT it should be safe to restore the original pte_special behaviour. Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Dave Jones <davej@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Rik van Riel <riel@redhat.com> Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12mm: remove remaining references to NUMA hinting bits and helpersMel Gorman
This patch removes the NUMA PTE bits and associated helpers. As a side-effect it increases the maximum possible swap space on x86-64. One potential source of problems is races between the marking of PTEs PROT_NONE, NUMA hinting faults and migration. It must be guaranteed that a PTE being protected is not faulted in parallel, seen as a pte_none and corrupting memory. The base case is safe but transhuge has problems in the past due to an different migration mechanism and a dependance on page lock to serialise migrations and warrants a closer look. task_work hinting update parallel fault ------------------------ -------------- change_pmd_range change_huge_pmd __pmd_trans_huge_lock pmdp_get_and_clear __handle_mm_fault pmd_none do_huge_pmd_anonymous_page read? pmd_lock blocks until hinting complete, fail !pmd_none test write? __do_huge_pmd_anonymous_page acquires pmd_lock, checks pmd_none pmd_modify set_pmd_at task_work hinting update parallel migration ------------------------ ------------------ change_pmd_range change_huge_pmd __pmd_trans_huge_lock pmdp_get_and_clear __handle_mm_fault do_huge_pmd_numa_page migrate_misplaced_transhuge_page pmd_lock waits for updates to complete, recheck pmd_same pmd_modify set_pmd_at Both of those are safe and the case where a transhuge page is inserted during a protection update is unchanged. The case where two processes try migrating at the same time is unchanged by this series so should still be ok. I could not find a case where we are accidentally depending on the PTE not being cleared and flushed. If one is missed, it'll manifest as corruption problems that start triggering shortly after this series is merged and only happen when NUMA balancing is enabled. Signed-off-by: Mel Gorman <mgorman@suse.de> Tested-by: Sasha Levin <sasha.levin@oracle.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Dave Jones <davej@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Rik van Riel <riel@redhat.com> Cc: Mark Brown <broonie@kernel.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12mm: convert p[te|md]_mknonnuma and remaining page table manipulationsMel Gorman
With PROT_NONE, the traditional page table manipulation functions are sufficient. [andre.przywara@arm.com: fix compiler warning in pmdp_invalidate()] [akpm@linux-foundation.org: fix build with STRICT_MM_TYPECHECKS] Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Tested-by: Sasha Levin <sasha.levin@oracle.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Dave Jones <davej@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12ppc64: add paranoid warnings for unexpected DSISR_PROTFAULTMel Gorman
ppc64 should not be depending on DSISR_PROTFAULT and it's unexpected if they are triggered. This patch adds warnings just in case they are being accidentally depended upon. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Tested-by: Sasha Levin <sasha.levin@oracle.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Dave Jones <davej@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12mm: convert p[te|md]_numa users to p[te|md]_protnone_numaMel Gorman
Convert existing users of pte_numa and friends to the new helper. Note that the kernel is broken after this patch is applied until the other page table modifiers are also altered. This patch layout is to make review easier. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Tested-by: Sasha Levin <sasha.levin@oracle.com> Cc: Dave Jones <davej@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Rik van Riel <riel@redhat.com> Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12mm: add p[te|md] protnone helpers for use by NUMA balancingMel Gorman
This is a preparatory patch that introduces protnone helpers for automatic NUMA balancing. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Tested-by: Sasha Levin <sasha.levin@oracle.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Dave Jones <davej@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12Merge branch 'for-3.20/drivers' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block driver changes from Jens Axboe: "This contains: - The 4k/partition fixes for brd from Boaz/Matthew. - A few xen front/back block fixes from David Vrabel and Roger Pau Monne. - Floppy changes from Takashi, cleaning the device file creation. - Switching libata to use the new blk-mq tagging policy, removing code (and a suboptimal implementation) from libata. This will throw you a merge conflict, since a bug in the original libata tagging code was fixed since this code was branched. Trivial. From Shaohua. - Conversion of loop to blk-mq, from Ming Lei. - Cleanup of the io_schedule() handling in bsg from Peter Zijlstra. He claims it improves on unreadable code, which will cost him a beer. - Maintainer update or NDB, now handled by Markus Pargmann. - NVMe: - Optimization from me that avoids a kmalloc/kfree per IO for smaller (<= 8KB) IO. This cuts about 1% of high IOPS CPU overhead. - Removal of (now) dead RCU code, a relic from before NVMe was converted to blk-mq" * 'for-3.20/drivers' of git://git.kernel.dk/linux-block: xen-blkback: default to X86_32 ABI on x86 xen-blkfront: fix accounting of reqs when migrating xen-blkback,xen-blkfront: add myself as maintainer block: Simplify bsg complete all floppy: Avoid manual call of device_create_file() NVMe: avoid kmalloc/kfree for smaller IO MAINTAINERS: Update NBD maintainer libata: make sata_sil24 use fifo tag allocator libata: move sas ata tag allocation to libata-scsi.c libata: use blk taging NVMe: within nvme_free_queues(), delete RCU sychro/deferred free null_blk: suppress invalid partition info brd: Request from fdisk 4k alignment brd: Fix all partitions BUGs axonram: Fix bug in direct_access loop: add blk-mq.h include block: loop: don't handle REQ_FUA explicitly block: loop: introduce lo_discard() and lo_req_flush() block: loop: say goodby to bio block: loop: improve performance via blk-mq
2015-02-12Merge branch 'for-3.20/core' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull core block IO changes from Jens Axboe: "This contains: - A series from Christoph that cleans up and refactors various parts of the REQ_BLOCK_PC handling. Contributions in that series from Dongsu Park and Kent Overstreet as well. - CFQ: - A bug fix for cfq for realtime IO scheduling from Jeff Moyer. - A stable patch fixing a potential crash in CFQ in OOM situations. From Konstantin Khlebnikov. - blk-mq: - Add support for tag allocation policies, from Shaohua. This is a prep patch enabling libata (and other SCSI parts) to use the blk-mq tagging, instead of rolling their own. - Various little tweaks from Keith and Mike, in preparation for DM blk-mq support. - Minor little fixes or tweaks from me. - A double free error fix from Tony Battersby. - The partition 4k issue fixes from Matthew and Boaz. - Add support for zero+unprovision for blkdev_issue_zeroout() from Martin" * 'for-3.20/core' of git://git.kernel.dk/linux-block: (27 commits) block: remove unused function blk_bio_map_sg block: handle the null_mapped flag correctly in blk_rq_map_user_iov blk-mq: fix double-free in error path block: prevent request-to-request merging with gaps if not allowed blk-mq: make blk_mq_run_queues() static dm: fix multipath regression due to initializing wrong request cfq-iosched: handle failure of cfq group allocation block: Quiesce zeroout wrapper block: rewrite and split __bio_copy_iov() block: merge __bio_map_user_iov into bio_map_user_iov block: merge __bio_map_kern into bio_map_kern block: pass iov_iter to the BLOCK_PC mapping functions block: add a helper to free bio bounce buffer pages block: use blk_rq_map_user_iov to implement blk_rq_map_user block: simplify bio_map_kern block: mark blk-mq devices as stackable block: keep established cmd_flags when cloning into a blk-mq request block: add blk-mq support to blk_insert_cloned_request() block: require blk_rq_prep_clone() be given an initialized clone request blk-mq: add tag allocation policy ...
2015-02-12Merge tag 'md/3.20' of git://neil.brown.name/mdLinus Torvalds
Pull md updates from Neil Brown: - assorted locking changes so that access to /proc/mdstat and much of /sys/block/mdXX/md/* is protected by a spinlock rather than a mutex and will never block indefinitely. - Make an 'if' condition in RAID5 - which has been implicated in recent bugs - more readable. - misc minor fixes * tag 'md/3.20' of git://neil.brown.name/md: (28 commits) md/raid10: fix conversion from RAID0 to RAID10 md: wakeup thread upon rdev_dec_pending() md: make reconfig_mutex optional for writes to md sysfs files. md: move mddev_lock and related to md.h md: use mddev->lock to protect updates to resync_{min,max}. md: minor cleanup in safe_delay_store. md: move GET_BITMAP_FILE ioctl out from mddev_lock. md: tidy up set_bitmap_file md: remove unnecessary 'buf' from get_bitmap_file. md: remove mddev_lock from rdev_attr_show() md: remove mddev_lock() from md_attr_show() md/raid5: use ->lock to protect accessing raid5 sysfs attributes. md: remove need for mddev_lock() in md_seq_show() md/bitmap: protect clearing of ->bitmap by mddev->lock md: protect ->pers changes with mddev->lock md: level_store: group all important changes into one place. md: rename ->stop to ->free md: split detach operation out from ->stop. md/linear: remove rcu protections in favour of suspend/resume md: make merge_bvec_fn more robust in face of personality changes. ...
2015-02-12Merge tag 'iommu-updates-v3.20' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU updates from Joerg Roedel: "This time with: - Generic page-table framework for ARM IOMMUs using the LPAE page-table format, ARM-SMMU and Renesas IPMMU make use of it already. - Break out the IO virtual address allocator from the Intel IOMMU so that it can be used by other DMA-API implementations too. The first user will be the ARM64 common DMA-API implementation for IOMMUs - Device tree support for Renesas IPMMU - Various fixes and cleanups all over the place" * tag 'iommu-updates-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (36 commits) iommu/amd: Convert non-returned local variable to boolean when relevant iommu: Update my email address iommu/amd: Use wait_event in put_pasid_state_wait iommu/amd: Fix amd_iommu_free_device() iommu/arm-smmu: Avoid build warning iommu/fsl: Various cleanups iommu/fsl: Use %pa to print phys_addr_t iommu/omap: Print phys_addr_t using %pa iommu: Make more drivers depend on COMPILE_TEST iommu/ipmmu-vmsa: Fix IOMMU lookup when multiple IOMMUs are registered iommu: Disable on !MMU builds iommu/fsl: Remove unused fsl_of_pamu_ids[] iommu/fsl: Fix section mismatch iommu/ipmmu-vmsa: Use the ARM LPAE page table allocator iommu: Fix trace_map() to report original iova and original size iommu/arm-smmu: add support for iova_to_phys through ATS1PR iopoll: Introduce memory-mapped IO polling macros iommu/arm-smmu: don't touch the secure STLBIALL register iommu/arm-smmu: make use of generic LPAE allocator iommu: io-pgtable-arm: add non-secure quirk ...
2015-02-12Merge branch 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-armLinus Torvalds
Pull ARM updates from Russell King: - clang assembly fixes from Ard - optimisations and cleanups for Aurora L2 cache support - efficient L2 cache support for secure monitor API on Exynos SoCs - debug menu cleanup from Daniel Thompson to allow better behaviour for multiplatform kernels - StrongARM SA11x0 conversion to irq domains, and pxa_timer - kprobes updates for older ARM CPUs - move probes support out of arch/arm/kernel to arch/arm/probes - add inline asm support for the rbit (reverse bits) instruction - provide an ARM mode secondary CPU entry point (for Qualcomm CPUs) - remove the unused ARMv3 user access code - add driver_override support to AMBA Primecell bus * 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: (55 commits) ARM: 8256/1: driver coamba: add device binding path 'driver_override' ARM: 8301/1: qcom: Use secondary_startup_arm() ARM: 8302/1: Add a secondary_startup that assumes ARM mode ARM: 8300/1: teach __asmeq that r11 == fp and r12 == ip ARM: kprobes: Fix compilation error caused by superfluous '*' ARM: 8297/1: cache-l2x0: optimize aurora range operations ARM: 8296/1: cache-l2x0: clean up aurora cache handling ARM: 8284/1: sa1100: clear RCSR_SMR on resume ARM: 8283/1: sa1100: collie: clear PWER register on machine init ARM: 8282/1: sa1100: use handle_domain_irq ARM: 8281/1: sa1100: move GPIO-related IRQ code to gpio driver ARM: 8280/1: sa1100: switch to irq_domain_add_simple() ARM: 8279/1: sa1100: merge both GPIO irqdomains ARM: 8278/1: sa1100: split irq handling for low GPIOs ARM: 8291/1: replace magic number with PAGE_SHIFT macro in fixup_pv code ARM: 8290/1: decompressor: fix a wrong comment ARM: 8286/1: mm: Fix dma_contiguous_reserve comment ARM: 8248/1: pm: remove outdated comment ARM: 8274/1: Fix DEBUG_LL for multi-platform kernels (without PL01X) ARM: 8273/1: Seperate DEBUG_UART_PHYS from DEBUG_LL on EP93XX ...
2015-02-12Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/egtvedt/linux-avr32 Pull AVR32 update from Hans-Christian Egtvedt. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/egtvedt/linux-avr32: avr32: update all default configurations avr32: remove fake at91 cpu identification avr32: wire up missing syscalls
2015-02-11Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge second set of updates from Andrew Morton: "More of MM" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (83 commits) mm/nommu.c: fix arithmetic overflow in __vm_enough_memory() mm/mmap.c: fix arithmetic overflow in __vm_enough_memory() vmstat: Reduce time interval to stat update on idle cpu mm/page_owner.c: remove unnecessary stack_trace field Documentation/filesystems/proc.txt: describe /proc/<pid>/map_files mm: incorporate read-only pages into transparent huge pages vmstat: do not use deferrable delayed work for vmstat_update mm: more aggressive page stealing for UNMOVABLE allocations mm: always steal split buddies in fallback allocations mm: when stealing freepages, also take pages created by splitting buddy page mincore: apply page table walker on do_mincore() mm: /proc/pid/clear_refs: avoid split_huge_page() mm: pagewalk: fix misbehavior of walk_page_range for vma(VM_PFNMAP) mempolicy: apply page table walker on queue_pages_range() arch/powerpc/mm/subpage-prot.c: use walk->vma and walk_page_vma() memcg: cleanup preparation for page table walk numa_maps: remove numa_maps->vma numa_maps: fix typo in gather_hugetbl_stats pagemap: use walk->vma instead of calling find_vma() clear_refs: remove clear_refs_private->vma and introduce clear_refs_test_walk() ...
2015-02-11Merge tag 'powerpc-3.20-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux Pull powerpc updates from Michael Ellerman: - Update of all defconfigs - Addition of a bunch of config options to modernise our defconfigs - Some PS3 updates from Geoff - Optimised memcmp for 64 bit from Anton - Fix for kprobes that allows 'perf probe' to work from Naveen - Several cxl updates from Ian & Ryan - Expanded support for the '24x7' PMU from Cody & Sukadev - Freescale updates from Scott: "Highlights include 8xx optimizations, some more work on datapath device tree content, e300 machine check support, t1040 corenet error reporting, and various cleanups and fixes" * tag 'powerpc-3.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (102 commits) cxl: Add missing return statement after handling AFU errror cxl: Fail AFU initialisation if an invalid configuration record is found cxl: Export optional AFU configuration record in sysfs powerpc/mm: Warn on flushing tlb page in kernel context powerpc/powernv: Add OPAL soft-poweroff routine powerpc/perf/hv-24x7: Document sysfs event description entries powerpc/perf/hv-gpci: add the remaining gpci requests powerpc/perf/{hv-gpci, hv-common}: generate requests with counters annotated powerpc/perf/hv-24x7: parse catalog and populate sysfs with events perf: define EVENT_DEFINE_RANGE_FORMAT_LITE helper perf: add PMU_EVENT_ATTR_STRING() helper perf: provide sysfs_show for struct perf_pmu_events_attr powerpc/kernel: Avoid initializing device-tree pointer twice powerpc: Remove old compile time disabled syscall tracing code powerpc/kernel: Make syscall_exit a local label cxl: Fix device_node reference counting powerpc/mm: bail out early when flushing TLB page powerpc: defconfigs: add MTD_SPI_NOR (new dependency for M25P80) perf/powerpc: reset event hw state when adding it to the PMU powerpc/qe: Use strlcpy() ...
2015-02-11Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: "arm64 updates for 3.20: - reimplementation of the virtual remapping of UEFI Runtime Services in a way that is stable across kexec - emulation of the "setend" instruction for 32-bit tasks (user endianness switching trapped in the kernel, SCTLR_EL1.E0E bit set accordingly) - compat_sys_call_table implemented in C (from asm) and made it a constant array together with sys_call_table - export CPU cache information via /sys (like other architectures) - DMA API implementation clean-up in preparation for IOMMU support - macros clean-up for KVM - dropped some unnecessary cache+tlb maintenance - CONFIG_ARM64_CPU_SUSPEND clean-up - defconfig update (CPU_IDLE) The EFI changes going via the arm64 tree have been acked by Matt Fleming. There is also a patch adding sys_*stat64 prototypes to include/linux/syscalls.h, acked by Andrew Morton" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (47 commits) arm64: compat: Remove incorrect comment in compat_siginfo arm64: Fix section mismatch on alloc_init_p[mu]d() arm64: Avoid breakage caused by .altmacro in fpsimd save/restore macros arm64: mm: use *_sect to check for section maps arm64: drop unnecessary cache+tlb maintenance arm64:mm: free the useless initial page table arm64: Enable CPU_IDLE in defconfig arm64: kernel: remove ARM64_CPU_SUSPEND config option arm64: make sys_call_table const arm64: Remove asm/syscalls.h arm64: Implement the compat_sys_call_table in C syscalls: Declare sys_*stat64 prototypes if __ARCH_WANT_(COMPAT_)STAT64 compat: Declare compat_sys_sigpending and compat_sys_sigprocmask prototypes arm64: uapi: expose our struct ucontext to the uapi headers smp, ARM64: Kill SMP single function call interrupt arm64: Emulate SETEND for AArch32 tasks arm64: Consolidate hotplug notifier for instruction emulation arm64: Track system support for mixed endian EL0 arm64: implement generic IOMMU configuration arm64: Combine coherent and non-coherent swiotlb dma_ops ...
2015-02-11Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Martin Schwidefsky: - The remaining patches for the z13 machine support: kernel build option for z13, the cache synonym avoidance, SMT support, compare-and-delay for spinloops and the CES5S crypto adapater. - The ftrace support for function tracing with the gcc hotpatch option. This touches common code Makefiles, Steven is ok with the changes. - The hypfs file system gets an extension to access diagnose 0x0c data in user space for performance analysis for Linux running under z/VM. - The iucv hvc console gets wildcard spport for the user id filtering. - The cacheinfo code is converted to use the generic infrastructure. - Cleanup and bug fixes. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (42 commits) s390/process: free vx save area when releasing tasks s390/hypfs: Eliminate hypfs interval s390/hypfs: Add diagnose 0c support s390/cacheinfo: don't use smp_processor_id() in preemptible context s390/zcrypt: fixed domain scanning problem (again) s390/smp: increase maximum value of NR_CPUS to 512 s390/jump label: use different nop instruction s390/jump label: add sanity checks s390/mm: correct missing space when reporting user process faults s390/dasd: cleanup profiling s390/dasd: add locking for global_profile access s390/ftrace: hotpatch support for function tracing ftrace: let notrace function attribute disable hotpatching if necessary ftrace: allow architectures to specify ftrace compile options s390: reintroduce diag 44 calls for cpu_relax() s390/zcrypt: Add support for new crypto express (CEX5S) adapter. s390/zcrypt: Number of supported ap domains is not retrievable. s390/spinlock: add compare-and-delay to lock wait loops s390/tape: remove redundant if statement s390/hvc_iucv: add simple wildcard matches to the iucv allow filter ...
2015-02-11arch/powerpc/mm/subpage-prot.c: use walk->vma and walk_page_vma()Naoya Horiguchi
We don't have to use mm_walk->private to pass vma to the callback function because of mm_walk->vma. And walk_page_vma() is useful if we walk over a single vma. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11mm: gup: use get_user_pages_unlocked within get_user_pages_fastAndrea Arcangeli
This allows the get_user_pages_fast slow path to release the mmap_sem before blocking. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andres Lagar-Cavilla <andreslc@google.com> Cc: Peter Feiner <pfeiner@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11mm: fix false-positive warning on exit due mm_nr_pmds(mm)Kirill A. Shutemov
The problem is that we check nr_ptes/nr_pmds in exit_mmap() which happens *before* pgd_free(). And if an arch does pte/pmd allocation in pgd_alloc() and frees them in pgd_free() we see offset in counters by the time of the checks. We tried to workaround this by offsetting expected counter value according to FIRST_USER_ADDRESS for both nr_pte and nr_pmd in exit_mmap(). But it doesn't work in some cases: 1. ARM with LPAE enabled also has non-zero USER_PGTABLES_CEILING, but upper addresses occupied with huge pmd entries, so the trick with offsetting expected counter value will get really ugly: we will have to apply it nr_pmds, but not nr_ptes. 2. Metag has non-zero FIRST_USER_ADDRESS, but doesn't do allocation pte/pmd page tables allocation in pgd_alloc(), just setup a pgd entry which is allocated at boot and shared accross all processes. The proposal is to move the check to check_mm() which happens *after* pgd_free() and do proper accounting during pgd_alloc() and pgd_free() which would bring counters to zero if nothing leaked. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Tyler Baker <tyler.baker@linaro.org> Tested-by: Tyler Baker <tyler.baker@linaro.org> Tested-by: Nishanth Menon <nm@ti.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: James Hogan <james.hogan@imgtec.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11mm: account pmd page tables to the processKirill A. Shutemov
Dave noticed that unprivileged process can allocate significant amount of memory -- >500 MiB on x86_64 -- and stay unnoticed by oom-killer and memory cgroup. The trick is to allocate a lot of PMD page tables. Linux kernel doesn't account PMD tables to the process, only PTE. The use-cases below use few tricks to allocate a lot of PMD page tables while keeping VmRSS and VmPTE low. oom_score for the process will be 0. #include <errno.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/mman.h> #include <sys/prctl.h> #define PUD_SIZE (1UL << 30) #define PMD_SIZE (1UL << 21) #define NR_PUD 130000 int main(void) { char *addr = NULL; unsigned long i; prctl(PR_SET_THP_DISABLE); for (i = 0; i < NR_PUD ; i++) { addr = mmap(addr + PUD_SIZE, PUD_SIZE, PROT_WRITE|PROT_READ, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0); if (addr == MAP_FAILED) { perror("mmap"); break; } *addr = 'x'; munmap(addr, PMD_SIZE); mmap(addr, PMD_SIZE, PROT_WRITE|PROT_READ, MAP_ANONYMOUS|MAP_PRIVATE|MAP_FIXED, -1, 0); if (addr == MAP_FAILED) perror("re-mmap"), exit(1); } printf("PID %d consumed %lu KiB in PMD page tables\n", getpid(), i * 4096 >> 10); return pause(); } The patch addresses the issue by account PMD tables to the process the same way we account PTE. The main place where PMD tables is accounted is __pmd_alloc() and free_pmd_range(). But there're few corner cases: - HugeTLB can share PMD page tables. The patch handles by accounting the table to all processes who share it. - x86 PAE pre-allocates few PMD tables on fork. - Architectures with FIRST_USER_ADDRESS > 0. We need to adjust sanity check on exit(2). Accounting only happens on configuration where PMD page table's level is present (PMD is not folded). As with nr_ptes we use per-mm counter. The counter value is used to calculate baseline for badness score by oom-killer. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: David Rientjes <rientjes@google.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11arm: define __PAGETABLE_PMD_FOLDED for !LPAEKirill A. Shutemov
ARM uses custom implementation of PMD folding in 2-level page table case. Generic code expects to see __PAGETABLE_PMD_FOLDED to be defined if PMD is folded, but ARM doesn't do this. Let's fix it. Defining __PAGETABLE_PMD_FOLDED will drop out unused __pmd_alloc(). It also fixes problems with recently-introduced pmd accounting on ARM without LPAE. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Nishanth Menon <nm@ti.com> Reported-by: Simon Horman <horms@verge.net.au> Tested-by: Simon Horman <horms+renesas@verge.net.au> Tested-by: Fabio Estevam <festevam@gmail.com> Tested-by: Felipe Balbi <balbi@ti.com> Tested-by: Nishanth Menon <nm@ti.com> Tested-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Tested-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: David Rientjes <rientjes@google.com> Cc: Russell King <linux@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11mm: make FIRST_USER_ADDRESS unsigned long on all archsKirill A. Shutemov
LKP has triggered a compiler warning after my recent patch "mm: account pmd page tables to the process": mm/mmap.c: In function 'exit_mmap': >> mm/mmap.c:2857:2: warning: right shift count >= width of type [enabled by default] The code: > 2857 WARN_ON(mm_nr_pmds(mm) > 2858 round_up(FIRST_USER_ADDRESS, PUD_SIZE) >> PUD_SHIFT); In this, on tile, we have FIRST_USER_ADDRESS defined as 0. round_up() has the same type -- int. PUD_SHIFT. I think the best way to fix it is to define FIRST_USER_ADDRESS as unsigned long. On every arch for consistency. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11microblaze: define __PAGETABLE_PMD_FOLDEDKirill A. Shutemov
Microblaze uses custom implementation of PMD folding, but doesn't define __PAGETABLE_PMD_FOLDED, which generic code expects to see. Let's fix it. Defining __PAGETABLE_PMD_FOLDED will drop out unused __pmd_alloc(). It also fixes problems with recently-introduced pmd accounting. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11sparc32: fix broken set_pte()Kirill A. Shutemov
32-bit sparc uses swap instruction to implement set_pte(). It called using GCC inline assembler. But it misses the "memory" clobber to indicate that pte value will be updated in memory. As result GCC doesn't know that it cannot postpone pte pointer dereference which occurs before set_pte() to post-set_pte() time. It leads to real-world bugs -- [1]. In this situation we have code: ptent = ptep_modify_prot_start(mm, addr, pte); ptent = pte_modify(ptent, newprot); ... ptep_modify_prot_commit(mm, addr, pte, ptent); ptep_modify_prot_start() in sparc case is just 'pte' dereference plus pte_clear(). pte_clear() calls broken set_pte(). GCC thinks it's valid to dereference 'pte' again on pte_modify() and gets cleared pte. ptep_modify_prot_commit() puts 'pteent' with pfn==0 back to page table, which eventually leads to the crash. [1] http://lkml.kernel.org/r/54C06B19.8060305@roeck-us.net Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Cc: Paul Moore <pmoore@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11mm/hugetlb: pmd_huge() returns true for non-present hugepageNaoya Horiguchi
Migrating hugepages and hwpoisoned hugepages are considered as non-present hugepages, and they are referenced via migration entries and hwpoison entries in their page table slots. This behavior causes race condition because pmd_huge() doesn't tell non-huge pages from migrating/hwpoisoned hugepages. follow_page_mask() is one example where the kernel would call follow_page_pte() for such hugepage while this function is supposed to handle only normal pages. To avoid this, this patch makes pmd_huge() return true when pmd_none() is true *and* pmd_present() is false. We don't have to worry about mixing up non-present pmd entry with normal pmd (pointing to leaf level pte entry) because pmd_present() is true in normal pmd. The same race condition could happen in (x86-specific) gup_pmd_range(), where this patch simply adds pmd_present() check instead of pmd_huge(). This is because gup_pmd_range() is fast path. If we have non-present hugepage in this function, we will go into gup_huge_pmd(), then return 0 at flag mask check, and finally fall back to the slow path. Fixes: 290408d4a2 ("hugetlb: hugepage migration core") Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Hugh Dickins <hughd@google.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: <stable@vger.kernel.org> [2.6.36+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11mm/hugetlb: reduce arch dependent code around follow_huge_*Naoya Horiguchi
Currently we have many duplicates in definitions around follow_huge_addr(), follow_huge_pmd(), and follow_huge_pud(), so this patch tries to remove the m. The basic idea is to put the default imple