summaryrefslogtreecommitdiffstats
path: root/arch/s390/mm
AgeCommit message (Collapse)Author
2016-07-26Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge updates from Andrew Morton: - a few misc bits - ocfs2 - most(?) of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (125 commits) thp: fix comments of __pmd_trans_huge_lock() cgroup: remove unnecessary 0 check from css_from_id() cgroup: fix idr leak for the first cgroup root mm: memcontrol: fix documentation for compound parameter mm: memcontrol: remove BUG_ON in uncharge_list mm: fix build warnings in <linux/compaction.h> mm, thp: convert from optimistic swapin collapsing to conservative mm, thp: fix comment inconsistency for swapin readahead functions thp: update Documentation/{vm/transhuge,filesystems/proc}.txt shmem: split huge pages beyond i_size under memory pressure thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE khugepaged: add support of collapse for tmpfs/shmem pages shmem: make shmem_inode_info::lock irq-safe khugepaged: move up_read(mmap_sem) out of khugepaged_alloc_page() thp: extract khugepaged from mm/huge_memory.c shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings shmem: add huge pages support shmem: get_unmapped_area align huge page shmem: prepare huge= mount option and sysfs knob mm, rmap: account shmem thp pages ...
2016-07-26mm: do not pass mm_struct into handle_mm_faultKirill A. Shutemov
We always have vma->vm_mm around. Link: http://lkml.kernel.org/r/1466021202-61880-8-git-send-email-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-26Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Martin Schwidefsky: "There are a couple of new things for s390 with this merge request: - a new scheduling domain "drawer" is added to reflect the unusual topology found on z13 machines. Performance tests showed up to 8 percent gain with the additional domain. - the new crc-32 checksum crypto module uses the vector-galois-field multiply and sum SIMD instruction to speed up crc-32 and crc-32c. - proper __ro_after_init support, this requires RO_AFTER_INIT_DATA in the generic vmlinux.lds linker script definitions. - kcov instrumentation support. A prerequisite for that is the inline assembly basic block cleanup, which is the reason for the net/iucv/iucv.c change. - support for 2GB pages is added to the hugetlbfs backend. Then there are two removals: - the oprofile hardware sampling support is dead code and is removed. The oprofile user space uses the perf interface nowadays. - the ETR clock synchronization is removed, this has been superseeded be the STP clock synchronization. And it always has been "interesting" code.. And the usual bug fixes and cleanups" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (82 commits) s390/pci: Delete an unnecessary check before the function call "pci_dev_put" s390/smp: clean up a condition s390/cio/chp : Remove deprecated create_singlethread_workqueue s390/chsc: improve channel path descriptor determination s390/chsc: sanitize fmt check for chp_desc determination s390/cio: make fmt1 channel path descriptor optional s390/chsc: fix ioctl CHSC_INFO_CU command s390/cio/device_ops: fix kernel doc s390/cio: allow to reset channel measurement block s390/console: Make preferred console handling more consistent s390/mm: fix gmap tlb flush issues s390/mm: add support for 2GB hugepages s390: have unique symbol for __switch_to address s390/cpuinfo: show maximum thread id s390/ptrace: clarify bits in the per_struct s390: stack address vs thread_info s390: remove pointless load within __switch_to s390: enable kcov support s390/cpumf: use basic block for ecctr inline assembly s390/hypfs: use basic block for diag inline assembly ...
2016-07-13s390/mm: fix gmap tlb flush issuesDavid Hildenbrand
__tlb_flush_asce() should never be used if multiple asce belong to a mm. As this function changes mm logic determining if local or global tlb flushes will be neded, we might end up flushing only the gmap asce on all CPUs and a follow up mm asce flushes will only flush on the local CPU, although that asce ran on multiple CPUs. The missing tlb flushes will provoke strange faults in user space and even low address protections in user space, crashing the kernel. Fixes: 1b948d6caec4 ("s390/mm,tlb: optimize TLB flushing for zEC12") Cc: stable@vger.kernel.org # 3.15+ Reported-by: Sascha Silbe <silbe@linux.vnet.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-07-06s390/mm: add support for 2GB hugepagesGerald Schaefer
This adds support for 2GB hugetlbfs pages on s390. Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-28s390/mm: use basic block for essa inline assemblyHeiko Carstens
Use only simple inline assemblies which consist of a single basic block if the register asm construct is being used. Otherwise gcc would generate broken code if the compiler option --sanitize-coverage=trace-pc would be used. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-24s390: get rid of superfluous __GFP_REPEATMichal Hocko
__GFP_REPEAT has a rather weak semantic but since it has been introduced around 2.6.12 it has been ignored for low order allocations. page_table_alloc then uses the flag for a single page allocation. This means that this flag has never been actually useful here because it has always been used only for PAGE_ALLOC_COSTLY requests. Link: http://lkml.kernel.org/r/1464599699-30131-14-git-send-email-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-14s390/mm: fix compile for PAGE_DEFAULT_KEY != 0Heiko Carstens
The usual problem for code that is ifdef'ed out is that it doesn't compile after a while. That's also the case for the storage key initialisation code, if it would be used (set PAGE_DEFAULT_KEY to something not zero): ./arch/s390/include/asm/page.h: In function 'storage_key_init_range': ./arch/s390/include/asm/page.h:36:2: error: implicit declaration of function '__storage_key_init_range' Since the code itself has been useful for debugging purposes several times, remove the ifdefs and make sure the code gets compiler coverage. The cost for this is eight bytes. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390: avoid extable collisionsHeiko Carstens
We have some inline assemblies where the extable entry points to a label at the end of an inline assembly which is not followed by an instruction. On the other hand we have also inline assemblies where the extable entry points to the first instruction of an inline assembly. If a first type inline asm (extable point to empty label at the end) would be directly followed by a second type inline asm (extable points to first instruction) then we would have two different extable entries that point to the same instruction but would have a different target address. This can lead to quite random behaviour, depending on sorting order. I verified that we currently do not have such collisions within the kernel. However to avoid such subtle bugs add a couple of nop instructions to those inline assemblies which contain an extable that points to an empty label. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390: add proper __ro_after_init supportHeiko Carstens
On s390 __ro_after_init is currently mapped to __read_mostly which means that data marked as __ro_after_init will not be protected. Reason for this is that the common code __ro_after_init implementation is x86 centric: the ro_after_init data section was added to rodata, since x86 enables write protection to kernel text and rodata very late. On s390 we have write protection for these sections enabled with the initial page tables. So adding the ro_after_init data section to rodata does not work on s390. In order to make __ro_after_init work properly on s390 move the ro_after_init data, right behind rodata. Unlike the rodata section it will be marked read-only later after all init calls happened. This s390 specific implementation adds new __start_ro_after_init and __end_ro_after_init labels. Everything in between will be marked read-only after the init calls happened. In addition to the __ro_after_init data move also the exception table there, since from a practical point of view it fits the __ro_after_init requirements. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/mm: simplify the TLB flushing codeMartin Schwidefsky
ptep_flush_lazy and pmdp_flush_lazy use mm->context.attach_count to decide between a lazy TLB flush vs an immediate TLB flush. The field contains two 16-bit counters, the number of CPUs that have the mm attached and can create TLB entries for it and the number of CPUs in the middle of a page table update. The __tlb_flush_asce, ptep_flush_direct and pmdp_flush_direct functions use the attach counter and a mask check with mm_cpumask(mm) to decide between a local flush local of the current CPU and a global flush. For all these functions the decision between lazy vs immediate and local vs global TLB flush can be based on CPU masks. There are two masks: the mm->context.cpu_attach_mask with the CPUs that are actively using the mm, and the mm_cpumask(mm) with the CPUs that have used the mm since the last full flush. The decision between lazy vs immediate flush is based on the mm->context.cpu_attach_mask, to decide between local vs global flush the mm_cpumask(mm) is used. With this patch all checks will use the CPU masks, the old counter mm->context.attach_count with its two 16-bit values is turned into a single counter mm->context.flush_count that keeps track of the number of CPUs with incomplete page table updates. The sole user of this counter is finish_arch_post_lock_switch() which waits for the end of all page table updates. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/mm: fix vunmap vs finish_arch_post_lock_switchMartin Schwidefsky
The vunmap_pte_range() function calls ptep_get_and_clear() without any locking. ptep_get_and_clear() uses ptep_xchg_lazy()/ptep_flush_direct() for the page table update. ptep_flush_direct requires that preemption is disabled, but without any locking this is not the case. If the kernel preempts the task while the attach_counter is increased an endless loop in finish_arch_post_lock_switch() will occur the next time the task is scheduled. Add explicit preempt_disable()/preempt_enable() calls to the relevant functions in arch/s390/mm/pgtable.c. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/mm: align swapper_pg_dir to 16kHeiko Carstens
The segment/region table that is part of the kernel image must be properly aligned to 16k in order to make the crdte inline assembly work. Otherwise it will calculate a wrong segment/region table start address and access incorrect memory locations if the swapper_pg_dir is not aligned to 16k. Therefore define BSS_FIRST_SECTIONS in order to put the swapper_pg_dir at the beginning of the bss section and also align the bss section to 16k just like other architectures did. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/pgtable: add mapping statisticsHeiko Carstens
Add statistics that show how memory is mapped within the kernel identity mapping. This is more or less the same like git commit ce0c0e50f94e ("x86, generic: CPA add statistics about state of direct mapping v4") for x86. I also intentionally copied the lower case "k" within DirectMap4k vs the upper case "M" and "G" within the two other lines. Let's have consistent inconsistencies across architectures. The output of /proc/meminfo now contains these additional lines: DirectMap4k: 2048 kB DirectMap1M: 3991552 kB DirectMap2G: 4194304 kB The implementation on s390 is lockless unlike the x86 version, since I assume changes to the kernel mapping are a very rare event. Therefore it really doesn't matter if these statistics could potentially be inconsistent if read while kernel pages tables are being changed. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/vmem: simplify vmem code for read-only mappingsHeiko Carstens
For the kernel identity mapping map everything read-writeable and subsequently call set_memory_ro() to make the ro section read-only. This simplifies the code a lot. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/pageattr: allow kernel page table splittingHeiko Carstens
set_memory_ro() and set_memory_rw() currently only work on 4k mappings, which is good enough for module code aka the vmalloc area. However we stumbled already twice into the need to make this also work on larger mappings: - the ro after init patch set - the crash kernel resize code Therefore this patch implements automatic kernel page table splitting if e.g. set_memory_ro() would be called on parts of a 2G mapping. This works quite the same as the x86 code, but is much simpler. In order to make this work and to be architecturally compliant we now always use the csp, cspg or crdte instructions to replace valid page table entries. This means that set_memory_ro() and set_memory_rw() will be much more expensive than before. In order to avoid huge latencies the code contains a couple of cond_resched() calls. The current code only splits page tables, but does not merge them if it would be possible. The reason for this is that currently there is no real life scenarion where this would really happen. All current use cases that I know of only change access rights once during the life time. If that should change we can still implement kernel page table merging at a later time. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/mm: always use PAGE_KERNEL when mapping pagesHeiko Carstens
Always use PAGE_KERNEL when re-enabling pages within the kernel mapping due to debug pagealloc. Without using this pgprot value pte_mkwrite() and pte_wrprotect() won't work on such mappings after an unmap -> map cycle anymore. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/vmem: make use of pte_clear()Heiko Carstens
Use pte_clear() instead of open-coding it. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/pgtable: get rid of _REGION3_ENTRY_ROHeiko Carstens
_REGION3_ENTRY_RO is a duplicate of _REGION_ENTRY_PROTECT. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/vmem: introduce and use SEGMENT_KERNEL and REGION3_KERNELHeiko Carstens
Instead of open-coded SEGMENT_KERNEL and REGION3_KERNEL assignments use defines. Also to make e.g. pmd_wrprotect() work on the kernel mapping a couple more flags must be set. Therefore add the missing flags also. In order to make everything symmetrical this patch also adds software dirty, young, read and write bits for region 3 table entries. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/vmem: align segment and region tables to 16kHeiko Carstens
Usually segment and region tables are 16k aligned due to the way the buddy allocator works. This is not true for the vmem code which only asks for a 4k alignment. In order to be consistent enforce a 16k alignment here as well. This alignment will be assumed and therefore is required by the pageattr code. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13KVM: s390/mm: Fix CMMA reset during rebootChristian Borntraeger
commit 1e133ab296f ("s390/mm: split arch/s390/mm/pgtable.c") factored out the page table handling code from __gmap_zap and __s390_reset_cmma into ptep_zap_unused and added a simple flag that tells which one of the function (reset or not) is to be made. This also changed the behaviour, as it also zaps unused page table entries on reset. Turns out that this is wrong as s390_reset_cmma uses the page walker, which DOES NOT take the ptl lock. The most simple fix is to not do the zapping part on reset (which uses the walker) Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Fixes: 1e133ab296f ("s390/mm: split arch/s390/mm/pgtable.c") Cc: stable@vger.kernel.org # 4.6+ Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-05-23s390: fix info leak in do_sigsegvMichal Hocko
Aleksa has reported incorrect si_errno value when stracing task which received SIGSEGV: [pid 20799] --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_errno=2510266, si_addr=0x100000000000000} The reason seems to be that do_sigsegv is not initializing siginfo structure defined on the stack completely so it will leak 4B of the previous stack content. Fix it simply by initializing si_errno to 0 (same as do_sigbus does already). Cc: stable # introduced pre-git times Reported-by: Aleksa Sarai <asarai@suse.de> Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-05-18Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Martin Schwidefsky: "The s390 patches for the 4.7 merge window have the usual bug fixes and cleanups, and the following new features: - An interface for dasd driver to query if a volume is online to another operating system - A new ioctl for the dasd driver to verify the format for a range of tracks - Following the example of x86 the struct fpu is now allocated with the task_struct - The 'report_error' interface for the PCI bus to send an adapter-error notification from user space to the service element of the machine" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (29 commits) s390/vmem: remove unused function parameter s390/vmem: fix identity mapping s390: add missing include statements s390: add missing declarations s390: make couple of variables and functions static s390/cache: remove superfluous locking s390/cpuinfo: simplify locking and skip offline cpus early s390/3270: hangup the 3270 tty after a disconnect s390/3270: handle reconnect of a tty with a different size s390/3270: avoid endless I/O loop with disconnected 3270 terminals s390/3270: fix garbled output on 3270 tty view s390/3270: fix view reference counting s390/3270: add missing tty_kref_put s390/dumpstack: implement and use return_address() s390/cpum_sf: Remove superfluous SMP function call s390/cpum_cf: Remove superfluous SMP function call s390/Kconfig: make z196 the default processor type s390/sclp: avoid compile warning in sclp_pci_report s390/fpu: allocate 'struct fpu' with the task_struct s390/crypto: cleanup and move the header with the cpacf definitions ...
2016-05-11s390/vmem: remove unused function parameterHeiko Carstens
vmem_pte_alloc() has an unused function parameter. Let's remove it. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-05-11s390/vmem: fix identity mappingHeiko Carstens
The identity mapping is suboptimal for the last 2GB frame. The mapping will be established with a mix of 4KB and 1MB mappings instead of a single 2GB mapping. This happens because of a off-by-one bug introduced with commit 50be63450728 ("s390/mm: Convert bootmem to memblock"). Currently the identity mapping looks like this: 0x0000000080000000-0x0000000180000000 4G PUD RW 0x0000000180000000-0x00000001fff00000 2047M PMD RW 0x00000001fff00000-0x0000000200000000 1M PTE RW With the bug fixed it looks like this: 0x0000000080000000-0x0000000200000000 6G PUD RW Fixes: 50be63450728 ("s390/mm: Convert bootmem to memblock") Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-05-10s390: add missing include statementsHeiko Carstens
arch_mmap_rnd, cpu_have_feature, and arch_randomize_brk are all defined as globally visible variables. However the files they are defined in do not include the header files with the declaration. To avoid a possible mismatch add the missing include statements so we have proper type checking in place. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-04-21s390/mm: fix asce_bits handling with dynamic pagetable levelsGerald Schaefer
There is a race with multi-threaded applications between context switch and pagetable upgrade. In switch_mm() a new user_asce is built from mm->pgd and mm->context.asce_bits, w/o holding any locks. A concurrent mmap with a pagetable upgrade on another thread in crst_table_upgrade() could already have set new asce_bits, but not yet the new mm->pgd. This would result in a corrupt user_asce in switch_mm(), and eventually in a kernel panic from a translation exception. Fix this by storing the complete asce instead of just the asce_bits, which can then be read atomically from switch_mm(), so that it either sees the old value or the new value, but no mixture. Both cases are OK. Having the old value would result in a page fault on access to the higher level memory, but the fault handler would see the new mm->pgd, if it was a valid access after the mmap on the other thread has completed. So as worst-case scenario we would have a page fault loop for the racing thread until the next time slice. Also remove dead code and simplify the upgrade/downgrade path, there are no upgrades from 2 levels, and only downgrades from 3 levels for compat tasks. There are also no concurrent upgrades, because the mmap_sem is held with down_write() in do_mmap, so the flush and table checks during upgrade can be removed. Reported-by: Michael Munday <munday@ca.ibm.com> Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-04-15s390: Clarify pagefault interruptPeter Zijlstra
While looking at set_task_state() users I stumbled over the s390 pfault interrupt code. Since Heiko provided a great explanation on how it worked, I figured we ought to preserve this. Also make a few little tweaks to the code to aid in readability and explicitly comment the unusual blocking scheme. Based-on-text-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-04-05Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Paolo Bonzini: "Miscellaneous bugfixes. The ARM and s390 fixes are for new regressions from the merge window, others are usual stable material" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: compiler-gcc: disable -ftracer for __noclone functions kvm: x86: make lapic hrtimer pinned s390/mm/kvm: fix mis-merge in gmap handling kvm: set page dirty only if page has been writable KVM: x86: reduce default value of halt_poll_ns parameter KVM: Hyper-V: do not do hypercall userspace exits if SynIC is disabled KVM: x86: Inject pending interrupt even if pending nmi exist arm64: KVM: Register CPU notifiers when the kernel runs at HYP arm64: kvm: 4.6-rc1: Fix VTCR_EL2 VS setting
2016-04-05s390/mm/kvm: fix mis-merge in gmap handlingChristian Borntraeger
commit 1e133ab296f3 ("s390/mm: split arch/s390/mm/pgtable.c") dropped some changes from commit a3a92c31bf0b ("KVM: s390: fix mismatch between user and in-kernel guest limit") - this breaks KVM for some memory sizes (kvm-s390: failed to commit memory region) like exactly 2GB. Cc: Dominik Dingel <dingel@linux.vnet.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-04-01Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Martin Schwidefsky: - A proper fix for the locking issue in the dasd driver - Wire up the new preadv2 nad pwritev2 system calls - Add the mark_rodata_ro function and set DEBUG_RODATA=y - A few more bug fixes. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390: wire up preadv2/pwritev2 syscalls s390/pci: PCI function group 0 is valid for clp_query_pci_fn s390/crypto: provide correct file mode at device register. s390/mm: handle PTE-mapped tail pages in fast gup s390: add DEBUG_RODATA support s390: disable postinit-readonly for now s390/dasd: reorder lcu and device lock s390/cpum_sf: Fix cpu hotplug notifier transitions s390/cpum_cf: Fix missing cpu hotplug notifier transition
2016-03-22s390/extable: use generic search and sort routinesArd Biesheuvel
Replace the arch specific versions of search_extable() and sort_extable() with calls to the generic ones, which now support relative exception tables as well. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-20Merge branch 'mm-pkeys-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 protection key support from Ingo Molnar: "This tree adds support for a new memory protection hardware feature that is available in upcoming Intel CPUs: 'protection keys' (pkeys). There's a background article at LWN.net: https://lwn.net/Articles/643797/ The gist is that protection keys allow the encoding of user-controllable permission masks in the pte. So instead of having a fixed protection mask in the pte (which needs a system call to change and works on a per page basis), the user can map a (handful of) protection mask variants and can change the masks runtime relatively cheaply, without having to change every single page in the affected virtual memory range. This allows the dynamic switching of the protection bits of large amounts of virtual memory, via user-space instructions. It also allows more precise control of MMU permission bits: for example the executable bit is separate from the read bit (see more about that below). This tree adds the MM infrastructure and low level x86 glue needed for that, plus it adds a high level API to make use of protection keys - if a user-space application calls: mmap(..., PROT_EXEC); or mprotect(ptr, sz, PROT_EXEC); (note PROT_EXEC-only, without PROT_READ/WRITE), the kernel will notice this special case, and will set a special protection key on this memory range. It also sets the appropriate bits in the Protection Keys User Rights (PKRU) register so that the memory becomes unreadable and unwritable. So using protection keys the kernel is able to implement 'true' PROT_EXEC on x86 CPUs: without protection keys PROT_EXEC implies PROT_READ as well. Unreadable executable mappings have security advantages: they cannot be read via information leaks to figure out ASLR details, nor can they be scanned for ROP gadgets - and they cannot be used by exploits for data purposes either. We know about no user-space code that relies on pure PROT_EXEC mappings today, but binary loaders could start making use of this new feature to map binaries and libraries in a more secure fashion. There is other pending pkeys work that offers more high level system call APIs to manage protection keys - but those are not part of this pull request. Right now there's a Kconfig that controls this feature (CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) that is default enabled (like most x86 CPU feature enablement code that has no runtime overhead), but it's not user-configurable at the moment. If there's any serious problem with this then we can make it configurable and/or flip the default" * 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits) x86/mm/pkeys: Fix mismerge of protection keys CPUID bits mm/pkeys: Fix siginfo ABI breakage caused by new u64 field x86/mm/pkeys: Fix access_error() denial of writes to write-only VMA mm/core, x86/mm/pkeys: Add execute-only protection keys support x86/mm/pkeys: Create an x86 arch_calc_vm_prot_bits() for VMA flags x86/mm/pkeys: Allow kernel to modify user pkey rights register x86/fpu: Allow setting of XSAVE state x86/mm: Factor out LDT init from context init mm/core, x86/mm/pkeys: Add arch_validate_pkey() mm/core, arch, powerpc: Pass a protection key in to calc_vm_flag_bits() x86/mm/pkeys: Actually enable Memory Protection Keys in the CPU x86/mm/pkeys: Add Kconfig prompt to existing config option x86/mm/pkeys: Dump pkey from VMA in /proc/pid/smaps x86/mm/pkeys: Dump PKRU with other kernel registers mm/core, x86/mm/pkeys: Differentiate instruction fetches x86/mm/pkeys: Optimize fault handling in access_error() mm/core: Do not enforce PKEY permissions on remote mm access um, pkeys: Add UML arch_*_access_permitted() methods mm/gup, x86/mm/pkeys: Check VMAs and PTEs for protection keys x86/mm/gup: Simplify get_user_pages() PTE bit handling ...
2016-03-17s390/mm: handle PTE-mapped tail pages in fast gupGerald Schaefer
With the THP refcounting rework it is possible to see THP compound tail pages mapped with PTEs during a THP split. This needs to be considered when using page_cache_get_speculative(), which will always fail on tail pages because ->_count is always zero. commit 7aef4172 "mm: handle PTE-mapped tail pages in gerneric fast gup implementaiton" fixed it for the generic fast gup code by using compound_head(page) instead of page, but not for s390. This patch is a 1:1 adaption of commit 7aef4172 for the s390 fast gup code. Without this fix, gup will fall back to the slow path or fail in the unlikely scenario that we hit a THP under splitting in-between the page table split and the compound page split. Cc: stable@vger.kernel.org # v4.5 Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-17s390: add DEBUG_RODATA supportHeiko Carstens
git commit d2aa1acad22f ("mm/init: Add 'rodata=off' boot cmdline parameter to disable read-only kernel mappings") adds a bogus warning to the console which states that s390 does not support kernel memory protection. This however is not true. We do support that since a couple of years however in a different way than the author of the above named patch expected. To get rid of the misleading message implement the mark_rodata_ro function and emit a message which states the amount of memory which was write protected already earlier. This is the same what parisc currently does. We currently do not support the kernel parameter "rodata=off" which would allow to write to the rodata section again. However since we have this feature since years without any problems there is no reason to add support for this. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-16Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge first patch-bomb from Andrew Morton: - some misc things - ofs2 updates - about half of MM - checkpatch updates - autofs4 update * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (120 commits) autofs4: fix string.h include in auto_dev-ioctl.h autofs4: use pr_xxx() macros directly for logging autofs4: change log print macros to not insert newline autofs4: make autofs log prints consistent autofs4: fix some white space errors autofs4: fix invalid ioctl return in autofs4_root_ioctl_unlocked() autofs4: fix coding style line length in autofs4_wait() autofs4: fix coding style problem in autofs4_get_set_timeout() autofs4: coding style fixes autofs: show pipe inode in mount options kallsyms: add support for relative offsets in kallsyms address table kallsyms: don't overload absolute symbol type for percpu symbols x86: kallsyms: disable absolute percpu symbols on !SMP checkpatch: fix another left brace warning checkpatch: improve UNSPECIFIED_INT test for bare signed/unsigned uses checkpatch: warn on bare unsigned or signed declarations without int checkpatch: exclude asm volatile from complex macro check mm: memcontrol: drop unnecessary lru locking from mem_cgroup_migrate() mm: migrate: consolidate mem_cgroup_migrate() calls mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous ...
2016-03-15s390: query dynamic DEBUG_PAGEALLOC settingChristian Borntraeger
We can use debug_pagealloc_enabled() to check if we can map the identity mapping with 1MB/2GB pages as well as to print the current setting in dump_stack. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: David Rientjes <rientjes@google.com> Cc: Laura Abbott <labbott@fedoraproject.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-08s390/mm: split arch/s390/mm/pgtable.cMartin Schwidefsky
The pgtable.c file is quite big, before it grows any larger split it into pgtable.c, pgalloc.c and gmap.c. In addition move the gmap related header definitions into the new gmap.h header and all of the pgste helpers from pgtable.h to pgtable.c. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-08s390/mm: uninline pmdp_xxx functions from pgtable.hMartin Schwidefsky
The pmdp_xxx function are smaller than their ptep_xxx counterparts but to keep things symmetrical unline them as well. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-08s390/mm: uninline ptep_xxx functions from pgtable.hMartin Schwidefsky
The code in the various ptep_xxx functions has grown quite large, consolidate them to four out-of-line functions: ptep_xchg_direct to exchange a pte with another with immediate flushing ptep_xchg_lazy to exchange a pte with another in a batched update ptep_modify_prot_start to begin a protection flags update ptep_modify_prot_commit to commit a protection flags update Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-07s390: Use pr_warn instead of pr_warningJoe Perches
Convert the uses of pr_warning to pr_warn so there are fewer uses of the old pr_warning. Miscellanea: o Align arguments o Coalesce formats Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-02s390/fault: merge report_user_fault implementationsHeiko Carstens
We have two close to identical report_user_fault functions. Add a parameter to one and get rid of the other one in order to reduce code duplication. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-02s390/kvm: simplify set_guest_storage_keyMartin Schwidefsky
Git commit ab3f285f227fec62868037e9b1b1fd18294a83b8 "KVM: s390/mm: try a cow on read only pages for key ops" added a fixup_user_fault to set_guest_storage_key force a copy on write if the page is mapped read-only. This is supposed to fix the problem of differing storage keys for shared mappings, e.g. the empty_zero_page. But if the storage key is set before the pte is mapped the storage key update is done on the pgste. A later fault will happily map the shared page with the key from the pgste. Eventually git commit 2faee8ff9dc6f4bfe46f6d2d110add858140fb20 "s390/mm: prevent and break zero page mappings in case of storage keys" fixed this problem for the empty_zero_page. The commit makes sure that guests enabled for storage keys will not use the empty_zero_page at all. As the call to fixup_user_fault in set_guest_storage_key depends on the order of the storage key operation vs. the fault that maps the pte it does not really fix anything. Just remove it. Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-02-23s390/pageattr: do a single TLB flush for change_page_attrMartin Schwidefsky
The change of the access rights for an address range in the kernel address space is currently done with a loop of IPTE + a store of the modified PTE. Between the IPTE and the store the PTE will be invalid, this intermediate state can cause problems with concurrent accesses. Consider a change of a kernel area from read-write to read-only, a concurrent reader of that area should be fine but with the invalid PTE it might get an unexpected exception. Remove the IPTEs for each PTE and do a global flush after all PTEs have been modified. Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-02-17s390/maccess: reduce stnsm instructionsHeiko Carstens
When fixing the DAT off bug ("s390: fix DAT off memory access, e.g. on kdump") both Christian and I missed that we can save an additional stnsm instruction. This saves us a couple of cycles which could improve the speed of memcpy_real. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-02-16mm/gup: Switch all callers of get_user_pages() to not pass tsk/mmDave Hansen
We will soon modify the vanilla get_user_pages() so it can no longer be used on mm/tasks other than 'current/current->mm', which is by far the most common way it is called. For now, we allow the old-style calls, but warn when they are used. (implemented in previous patch) This patch switches all callers of: get_user_pages() get_user_pages_unlocked() get_user_pages_locked() to stop passing tsk/mm so they will no longer see the warnings. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: jack@suse.cz Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20160212210156.113E9407@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-11s390: fix DAT off memory access, e.g. on kdumpChristian Borntraeger
commit 204ee2c56431 ("s390/irqflags: optimize irq restore") optimized irqrestore to really only care about interrupts and adapted the remaining low level users. One spot (memcpy_real) was not touched, though - fix it. Otherwise a kdump kernel will fail while reading the old kernel. As we re-enable irqs with a non-standard function we have to tell lockdep about that. Fixes: 204ee2c56431 ("s390/irqflags: optimize irq restore") Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-01-19s390: remove all usages of PSW_ADDR_INSNHeiko Carstens
Yet another leftover from the 31 bit era. The usual operation "y = x & PSW_ADDR_INSN" with the PSW_ADDR_INSN mask is a nop for CONFIG_64BIT. Therefore remove all usages and hope the code is a bit less confusing. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
2016-01-19s390: remove all usages of PSW_ADDR_AMODEHeiko Carstens
This is a leftover from the 31 bit area. For CONFIG_64BIT the usual operation "y = x | PSW_ADDR_AMODE" is a nop. Therefore remove all usages of PSW_ADDR_AMODE and make the code a bit less confusing. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>