summaryrefslogtreecommitdiffstats
path: root/arch/powerpc/kernel/exceptions-64s.S
AgeCommit message (Collapse)Author
2018-01-16powerpc: Remove useless EXC_COMMON_HVBenjamin Herrenschmidt
The only difference between EXC_COMMON_HV and EXC_COMMON is that the former adds "2" to the trap number which is supposed to represent the fact that this is an "HV" interrupt which uses HSRR0/1. However KVM is the only one who cares and it has its own separate macros. In fact, we only have one user of EXC_COMMON_HV and it's for an unknown interrupt case. All the other ones already using EXC_COMMON. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-16Merge tag 'powerpc-4.15-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "A bit of a small release, I suspect in part due to me travelling for KS. But my backlog of patches to review is smaller than usual, so I think in part folks just didn't send as much this cycle. Non-highlights: - Five fixes for the >128T address space handling, both to fix bugs in our implementation and to bring the semantics exactly into line with x86. Highlights: - Support for a new OPAL call on bare metal machines which gives us a true NMI (ie. is not masked by MSR[EE]=0) for debugging etc. - Support for Power9 DD2 in the CXL driver. - Improvements to machine check handling so that uncorrectable errors can be reported into the generic memory_failure() machinery. - Some fixes and improvements for VPHN, which is used under PowerVM to notify the Linux partition of topology changes. - Plumbing to enable TM (transactional memory) without suspend on some Power9 processors (PPC_FEATURE2_HTM_NO_SUSPEND). - Support for emulating vector loads form cache-inhibited memory, on some Power9 revisions. - Disable the fast-endian switch "syscall" by default (behind a CONFIG), we believe it has never had any users. - A major rework of the API drivers use when initiating and waiting for long running operations performed by OPAL firmware, and changes to the powernv_flash driver to use the new API. - Several fixes for the handling of FP/VMX/VSX while processes are using transactional memory. - Optimisations of TLB range flushes when using the radix MMU on Power9. - Improvements to the VAS facility used to access coprocessors on Power9, and related improvements to the way the NX crypto driver handles requests. - Implementation of PMEM_API and UACCESS_FLUSHCACHE for 64-bit. Thanks to: Alexey Kardashevskiy, Alistair Popple, Allen Pais, Andrew Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Balbir Singh, Benjamin Herrenschmidt, Breno Leitao, Christophe Leroy, Christophe Lombard, Cyril Bur, Frederic Barrat, Gautham R. Shenoy, Geert Uytterhoeven, Guilherme G. Piccoli, Gustavo Romero, Haren Myneni, Joel Stanley, Kamalesh Babulal, Kautuk Consul, Markus Elfring, Masami Hiramatsu, Michael Bringmann, Michael Neuling, Michal Suchanek, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Paul Mackerras, Pedro Miraglia Franco de Carvalho, Philippe Bergheaud, Sandipan Das, Seth Forshee, Shriya, Stephen Rothwell, Stewart Smith, Sukadev Bhattiprolu, Tyrel Datwyler, Vaibhav Jain, Vaidyanathan Srinivasan, and William A. Kennington III" * tag 'powerpc-4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (151 commits) powerpc/64s: Fix Power9 DD2.0 workarounds by adding DD2.1 feature powerpc/64s: Fix masking of SRR1 bits on instruction fault powerpc/64s: mm_context.addr_limit is only used on hash powerpc/64s/radix: Fix 128TB-512TB virtual address boundary case allocation powerpc/64s/hash: Allow MAP_FIXED allocations to cross 128TB boundary powerpc/64s/hash: Fix fork() with 512TB process address space powerpc/64s/hash: Fix 128TB-512TB virtual address boundary case allocation powerpc/64s/hash: Fix 512T hint detection to use >= 128T powerpc: Fix DABR match on hash based systems powerpc/signal: Properly handle return value from uprobe_deny_signal() powerpc/fadump: use kstrtoint to handle sysfs store powerpc/lib: Implement UACCESS_FLUSHCACHE API powerpc/lib: Implement PMEM API powerpc/powernv/npu: Don't explicitly flush nmmu tlb powerpc/powernv/npu: Use flush_all_mm() instead of flush_tlb_mm() powerpc/powernv/idle: Round up latency and residency values powerpc/kprobes: refactor kprobe_lookup_name for safer string operations powerpc/kprobes: Blacklist emulate_update_regs() from kprobes powerpc/kprobes: Do not disable interrupts for optprobes and kprobes_on_ftrace powerpc/kprobes: Disable preemption before invoking probe handler for optprobes ...
2017-11-14powerpc/64s: Fix masking of SRR1 bits on instruction faultMichael Ellerman
On 64-bit Book3s, when we take an instruction fault the reason for the fault may be reported in SRR1. For data faults the reason is reported in DSISR (Data Storage Instruction Status Register). The reasons reported in each do not necessarily correspond, so we mask the SRR1 bits before copying them to the DSISR, which is then used by the page fault code. Prior to commit b4c001dc44f0 ("powerpc/mm: Use symbolic constants for filtering SRR1 bits on ISIs") we used a hard-coded mask of 0x58200000, which corresponds to: DSISR_NOHPTE 0x40000000 /* no translation found */ DSISR_NOEXEC_OR_G 0x10000000 /* exec of no-exec or guarded */ DSISR_PROTFAULT 0x08000000 /* protection fault */ DSISR_KEYFAULT 0x00200000 /* Storage Key fault */ That commit added a #define for the mask, DSISR_SRR1_MATCH_64S, but incorrectly used a different similarly named DSISR_BAD_FAULT_64S. This had the effect of changing the mask to 0xa43a0000, which omits everything but DSISR_KEYFAULT. Luckily this had no visible effect, because in practice we hardly use the DSISR bits. The lack of DSISR_NOHPTE means a TLB flush optimisation was missed in the native HPTE code, and DSISR_NOEXEC_OR_G and DSISR_PROTFAULT are both only used to trigger rare warnings. So we got lucky, but let's fix it. The new value only has bits between 17 and 30 set, so we can continue to use andis. Fixes: b4c001dc44f0 ("powerpc/mm: Use symbolic constants for filtering SRR1 bits on ISIs") Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-13powerpc: Fix DABR match on hash based systemsBenjamin Herrenschmidt
Commit 398a719d34a1 ("powerpc/mm: Update bits used to skip hash_page") mistakenly dropped the DSISR_DABRMATCH bit from the mask of bit tested to skip trying to hash a page. As a result, the DABR matches would no longer be detected. This adds it back. We open code it in the 2 places where it matters rather than fold it into DSISR_BAD_FAULT_32S/64S because this isn't technically a bad fault and while we would never hit it with the current code, I prefer if page_fault_is_bad() didn't trigger on these. Fixes: 398a719d34a1 ("powerpc/mm: Update bits used to skip hash_page") Cc: stable@vger.kernel.org # v4.14 Tested-by: Pedro Miraglia Franco de Carvalho <pedromfc@br.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2017-11-06powerpc/64s: Replace CONFIG_PPC_STD_MMU_64 with CONFIG_PPC_BOOK3S_64Michael Ellerman
CONFIG_PPC_STD_MMU_64 indicates support for the "standard" powerpc MMU on 64-bit CPUs. The "standard" MMU refers to the hash page table MMU found in "server" processors, from IBM mainly. Currently CONFIG_PPC_STD_MMU_64 is == CONFIG_PPC_BOOK3S_64. While it's annoying to have two symbols that always have the same value, it's not quite annoying enough to bother removing one. However with the arrival of Power9, we now have the situation where CONFIG_PPC_STD_MMU_64 is enabled, but the kernel is running using the Radix MMU - *not* the "standard" MMU. So it is now actively confusing to use it, because it implies that code is disabled or inactive when the Radix MMU is in use, however that is not necessarily true. So s/CONFIG_PPC_STD_MMU_64/CONFIG_PPC_BOOK3S_64/, and do some minor formatting updates of some of the affected lines. This will be a pain for backports, but c'est la vie. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-06KVM: PPC: Book3S HV: Handle host system reset in guest modeNicholas Piggin
If the host takes a system reset interrupt while a guest is running, the CPU must exit the guest before processing the host exception handler. After this patch, taking a sysrq+x with a CPU running in a guest gives a trace like this: cpu 0x27: Vector: 100 (System Reset) at [c000000fdf5776f0] pc: c008000010158b80: kvmppc_run_core+0x16b8/0x1ad0 [kvm_hv] lr: c008000010158b80: kvmppc_run_core+0x16b8/0x1ad0 [kvm_hv] sp: c000000fdf577850 msr: 9000000002803033 current = 0xc000000fdf4b1e00 paca = 0xc00000000fd4d680 softe: 3 irq_happened: 0x01 pid = 6608, comm = qemu-system-ppc Linux version 4.14.0-rc7-01489-g47e1893a404a-dirty #26 SMP [c000000fdf577a00] c008000010159dd4 kvmppc_vcpu_run_hv+0x3dc/0x12d0 [kvm_hv] [c000000fdf577b30] c0080000100a537c kvmppc_vcpu_run+0x44/0x60 [kvm] [c000000fdf577b60] c0080000100a1ae0 kvm_arch_vcpu_ioctl_run+0x118/0x310 [kvm] [c000000fdf577c00] c008000010093e98 kvm_vcpu_ioctl+0x530/0x7c0 [kvm] [c000000fdf577d50] c000000000357bf8 do_vfs_ioctl+0xd8/0x8c0 [c000000fdf577df0] c000000000358448 SyS_ioctl+0x68/0x100 [c000000fdf577e30] c00000000000b220 system_call+0x58/0x6c --- Exception: c01 (System Call) at 00007fff76868df0 SP (7fff7069baf0) is in userspace Fixes: e36d0a2ed5 ("powerpc/powernv: Implement NMI IPI with OPAL_SIGNAL_SYSTEM_RESET") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-02License cleanup: add SPDX GPL-2.0 license identifier to files with no licenseGreg Kroah-Hartman
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-22powerpc: Disable the fast-endian switch syscall by defaultMichael Ellerman
Back in 2008 we added support for "fast little-endian switch" in the syscall path. This added a special case syscall number 0x1ebe, which is caught very early in the system call exception and switches endian with as little overhead as possible. See commit 745a14cc264b ("[POWERPC] Add fast little-endian switch system call") for full details. Although it is fast, it's also completely non standard. The "syscall number" is out of the range of normal syscalls, it can't be traced or audited, and it's a bit of a wart. To the best of our knowledge it was only used by one program, now long since discontinued. So in an effort to shake out any current users, put it behind a config option, and make it default n. If anyone *is* using it they can quickly reinstate it with a rebuild, and we can flip it to default y. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-22powerpc/64s: Move the two FAST_ENDIAN macros next to each otherMichael Ellerman
So we can #ifdef them in the next patch. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-16powerpc/mce: Hookup derror (load/store) UE errorsBalbir Singh
Extract physical_address for UE errors by walking the page tables for the mm and address at the NIP, to extract the instruction. Then use the instruction to find the effective address via analyse_instr(). We might have page table walking races, but we expect them to be rare, the physical address extraction is best effort. The idea is to then hook up this infrastructure to memory failure eventually. Signed-off-by: Balbir Singh <bsingharora@gmail.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-06powerpc/64s: Use emergency stack for kernel TM Bad Thing program checksCyril Bur
When using transactional memory (TM), the CPU can be in one of six states as far as TM is concerned, encoded in the Machine State Register (MSR). Certain state transitions are illegal and if attempted trigger a "TM Bad Thing" type program check exception. If we ever hit one of these exceptions it's treated as a bug, ie. we oops, and kill the process and/or panic, depending on configuration. One case where we can trigger a TM Bad Thing, is when returning to userspace after a system call or interrupt, using RFID. When this happens the CPU first restores the user register state, in particular r1 (the stack pointer) and then attempts to update the MSR. However the MSR update is not allowed and so we take the program check with the user register state, but the kernel MSR. This tricks the exception entry code into thinking we have a bad kernel stack pointer, because the MSR says we're coming from the kernel, but r1 is pointing to userspace. To avoid this we instead always switch to the emergency stack if we take a TM Bad Thing from the kernel. That way none of the user register values are used, other than for printing in the oops message. This is the fix for CVE-2017-1000255. Fixes: 5d176f751ee3 ("powerpc: tm: Enable transactional memory (TM) lazily for userspace") Cc: stable@vger.kernel.org # v4.9+ Signed-off-by: Cyril Bur <cyrilbur@gmail.com> [mpe: Rewrite change log & comments, tweak asm slightly] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-27powerpc/64s: Add workaround for P9 vector CI load issueMichael Neuling
POWER9 DD2.1 and earlier has an issue where some cache inhibited vector load will return bad data. The workaround is two part, one firmware/microcode part triggers HMI interrupts when hitting such loads, the other part is this patch which then emulates the instructions in Linux. The affected instructions are limited to lxvd2x, lxvw4x, lxvb16x and lxvh8x. When an instruction triggers the HMI, all threads in the core will be sent to the HMI handler, not just the one running the vector load. In general, these spurious HMIs are detected by the emulation code and we just return back to the running process. Unfortunately, if a spurious interrupt occurs on a vector load that's to normal memory we have no way to detect that it's spurious (unless we walk the page tables, which is very expensive). In this case we emulate the load but we need do so using a vector load itself to ensure 128bit atomicity is preserved. Some additional debugfs emulated instruction counters are added also. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [mpe: Switch CONFIG_PPC_BOOK3S_64 to CONFIG_VSX to unbreak the build] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23powerpc/64s: Remove spurious IRQ reason in IRQ replayNicholas Piggin
HVI interrupts have always used 0x500, so remove the dead branch. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23powerpc/64s: Use the HV handler for external IRQ replay in HV mode on POWER9Nicholas Piggin
POWER9 host external interrupts use the h_virt_irq_common handler, so use that to replay them rather than using the hardware_interrupt_common handler. Both call do_IRQ, but using the correct handler reduces i-cache footprint. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23powerpc/64s: Merge HV and non-HV paths for doorbell IRQ replayNicholas Piggin
This results in smaller code, and fewer branches. This relies on the fact that both the 0xe80 and 0xa00 handlers call the same upper level code, namely doorbell_exception(). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Mention we rely on the implementation of the 0xe80/0xa00 handlers] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23powerpc/64s: masked_interrupt() returns to kernel so avoid restoring r13Nicholas Piggin
Places in the kernel where r13 is not the PACA pointer must have maskable interrupts disabled, so r13 does not have to be restored when returning from a soft-masked interrupt. We should never have interrupts soft disabled when we're in user space. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23powerpc/64s: Optimise clearing of MSR_EE in masked_[H]interrupt()Nicholas Piggin
MSR_EE is always enabled in SRR1 for masked interrupts, so we can use xor to clear it. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23powerpc/64s: Avoid a branch in masked_[H]interrupt()Nicholas Piggin
Interrupts which do not require EE to be cleared can all be tested with a single bitwise test. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23powerpc/64s: Fix replay interrupt return label nameMichael Ellerman
In __replay_interrupt() we take the address of a local label so we can return to it later. However the assembler turns the local label into a symbol with a name like ".L1^B42" - where "^B" is literally "\002". This does not make for pleasant stack traces. Fix it by giving the label a sensible name. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23Merge branch 'fixes' into nextMichael Ellerman
There's a non-trivial dependency between some commits we want to put in next and the KVM prefetch work around that went into fixes. So merge fixes into next.
2017-08-10powerpc: Fix powerpc-specific watchdog build configurationNicholas Piggin
The powerpc kernel/watchdog.o should be built when HARDLOCKUP_DETECTOR and HAVE_HARDLOCKUP_DETECTOR_ARCH are both selected. If only the former is selected, then the generic perf watchdog has been selected. To simplify this check, introduce a new Kconfig symbol PPC_WATCHDOG that depends on both. This Kconfig option means the powerpc specific watchdog is enabled. Without this patch, Book3E will attempt to build the powerpc watchdog. Fixes: 2104180a53 ("powerpc/64s: implement arch-specific hardlockup watchdog") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-03powerpc/mm: Use symbolic constants for filtering SRR1 bits on ISIsBenjamin Herrenschmidt
This uses the newly defined constants for this rather than open-coded numbers. There is a side effect on 64-bit which is to pass through some of the new P9 bits which we didn't before. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-03powerpc/mm: Update bits used to skip hash_pageBenjamin Herrenschmidt
We test a number of bits from DSISR/SRR1 before deciding to call hash_page(). If any of these is set, we go directly to do_page_fault() as the bit indicate a fault that needs to be handled there (no hashing needed). This updates the current open-coded masks to use the new DSISR definitions. This *does* change the masks actually used in two ways: - We used to test various bits that were defined as "always 0" in the architecture and could be repurposed for something else. From now on, we just ignore such bits. - We were missing some new bits defined on P9 Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-31powerpc/64s: Fix stack setup in watchdog soft_nmi_common()Nicholas Piggin
The watchdog soft-NMI exception stack setup loads a stack pointer twice, which is an obvious error. It ends up using the system reset interrupt (true-NMI) stack, which is also a bug because the watchdog could be preempted by a system reset interrupt that overwrites the NMI stack. Change the soft-NMI to use the "emergency stack". The current kernel stack is not used, because of the longer-term goal to prevent asynchronous stack access using soft-disable. Fixes: 2104180a5369 ("powerpc/64s: implement arch-specific hardlockup watchdog") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-31Merge tag 'v4.13-rc1' into fixesMichael Ellerman
The fixes branch is based off a random pre-rc1 commit, because we had some fixes that needed to go in before rc1 was released. However we now need to fix some code that went in after that point, but before rc1, so merge rc1 to get that code into fixes so we can fix it!
2017-07-21Merge tag 'powerpc-4.13-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "A handful of fixes, mostly for new code: - some reworking of the new STRICT_KERNEL_RWX support to make sure we also remove executable permission from __init memory before it's freed. - a fix to some recent optimisations to the hypercall entry where we were clobbering r12, this was breaking nested guests (PR KVM). - a fix for the recent patch to opal_configure_cores(). This could break booting on bare metal Power8 boxes if the kernel was built without CONFIG_JUMP_LABEL_FEATURE_CHECK_DEBUG. - .. and finally a workaround for spurious PMU interrupts on Power9 DD2. Thanks to: Nicholas Piggin, Anton Blanchard, Balbir Singh" * tag 'powerpc-4.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/mm: Mark __init memory no-execute when STRICT_KERNEL_RWX=y powerpc/mm/hash: Refactor hash__mark_rodata_ro() powerpc/mm/radix: Refactor radix__mark_rodata_ro() powerpc/64s: Fix hypercall entry clobbering r12 input powerpc/perf: Avoid spurious PMU interrupts after idle powerpc/powernv: Fix boot on Power8 bare metal due to opal_configure_cores()
2017-07-18powerpc/64s: Fix hypercall entry clobbering r12 inputNicholas Piggin
A previous optimisation incorrectly assumed the PAPR hcall does not use r12, and clobbers it upon entry. In fact it is used as an input. This can result in KVM guests crashing (observed with PR KVM). Instead of using r12 to save r13, tihs patch saves r13 in ctr. This is more costly, but not as slow as using the SPRG. Fixes: acd7d8cef0153 ("powerpc/64s: Optimize hypercall/syscall entry") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-12powerpc/64s: implement arch-specific hardlockup watchdogNicholas Piggin
Implement an arch-speicfic watchdog rather than use the perf-based hardlockup detector. The new watchdog takes the soft-NMI directly, rather than going through perf. Perf interrupts are to be made maskable in future, so that would prevent the perf detector from working in those regions. Additionally, implement a SMP based detector where all CPUs watch one another by pinging a shared cpumask. This is because powerpc Book3S does not have a true periodic local NMI, but some platforms do implement a true NMI IPI. If a CPU is stuck with interrupts hard disabled, the soft-NMI watchdog does not work, but the SMP watchdog will. Even on platforms without a true NMI IPI to get a good trace from the stuck CPU, other CPUs will notice the lockup sufficiently to report it and panic. [npiggin@gmail.com: honor watchdog disable at boot/hotplug] Link: http://lkml.kernel.org/r/20170621001346.5bb337c9@roar.ozlabs.ibm.com [npiggin@gmail.com: fix false positive warning at CPU unplug] Link: http://lkml.kernel.org/r/20170630080740.20766-1-npiggin@gmail.com [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/20170616065715.18390-6-npiggin@gmail.com Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Don Zickus <dzickus@redhat.com> Tested-by: Babu Moger <babu.moger@oracle.com> [sparc] Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-03powerpc/64s: Blacklist functions invoked on a trapNaveen N. Rao
Blacklist all functions involved while handling a trap. We: - convert some of the symbols into private symbols, and - blacklist most functions involved while handling a trap. Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-03powerpc/64s: Convert .L__replay_interrupt_return to a local labelNaveen N. Rao
Commit b48bbb82e2b835 ("powerpc/64s: Don't unbalance the return branch predictor in __replay_interrupt()") introduced __replay_interrupt_return symbol with '.L' prefix in hopes of keeping it private. However, due to the use of LOAD_REG_ADDR(), the assembler kept this symbol visible. Fix the same by instead using the local label '1'. Fixes: Commit b48bbb82e2b835 ("powerpc/64s: Don't unbalance the return branch predictor in __replay_interrupt()") Suggested-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-03Merge branch 'fixes' into nextMichael Ellerman
Merge our fixes branch, a few of them are tripping people up while working on top of next, and we also have a dependency between the CXL fixes and new CXL code we want to merge into next.
2017-06-27powerpc/64s: Invalidate ERAT on powersave wakeup for POWER9Benjamin Herrenschmidt
On POWER9 the ERAT may be incorrect on wakeup from some stop states that lose state. This causes random segvs and illegal instructions when these stop states are enabled. This patch invalidates the ERAT on wakeup on POWER9 to prevent this from causing a problem. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Merge comment change with upstream changes] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-21powerpc/64s: Rename slb_allocate_realmode() to slb_allocate()Michael Ellerman
As for slb_miss_realmode(), rename slb_allocate_realmode() to avoid confusion over whether it runs in real or virtual mode - it runs in both. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
2017-06-21powerpc/64s: Rename slb_miss_realmode() to slb_miss_common()Michael Ellerman
slb_miss_realmode() doesn't always runs in real mode, which is what the name implies. So rename it to avoid confusing people. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
2017-06-21powerpc/64s: Use BRANCH_TO_COMMON() for slb_miss_realmodeMichael Ellerman
All the callers of slb_miss_realmode currently open code the #ifndef CONFIG_RELOCATABLE check and the branch via CTR in the RELOCATABLE case. We have a macro to do this, BRANCH_TO_COMMON(), so use it. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
2017-06-20powerpc/64s: Avoid r3 save/restore in SLB miss handlerNicholas Piggin
The SLB miss handler uses r3 for the faulting address but r12 is mostly able to be freed up to save r3 in. It just requires SRR1 be reloaded again on error. It would be more conventional to use r12 for SRR1 (and use r11 to save r3), but slb_allocate_realmode clobbers r11 and not r12. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-20powerpc/64s: SLB miss already has CTR saved for relocatable kernelNicholas Piggin
The EXCEPTION_PROLOG_1 used by SLB miss already saves CTR when the kernel is built with CONFIG_RELOCATABLE. So it does not have to be saved and reloaded when branching to slb_miss_realmode. It can be restored from the PACA as usual. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-20powerpc/64s: Avoid saving faulting address into EX_DAR in SLB missNicholas Piggin
The EX_DAR save area is only used in exceptional cases. With r3 no longer clobbered by slb_allocate_realmode, saving faulting address to EX_DAR can be deferred to those cases. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-19powerpc/64s/idle: Avoid SRR usage in idle sleep/wake pathsNicholas Piggin
Idle code now always runs at the 0xc... effective address whether in real or virtual mode. This means rfid can be ditched, along with a lot of SRR manipulations. In the wakeup path, carry SRR1 around in r12. Use mtmsrd to change MSR states as required. This also balances the return prediction for the idle call, by doing blr rather than rfid to return to the idle caller. On POWER9, 2-process context switch on different cores, with snooze disabled, increases performance by 2%. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Incorporate v2 fixes from Nick] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-19powerpc/64s/idle: Branch to handler with virtual mode offsetNicholas Piggin
Have the system reset idle wakeup handlers branched to in real mode with the 0xc... kernel address applied. This allows simplifications of avoiding rfid when switching to virtual mode in the wakeup handler. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-19powerpc/64s: Don't unbalance the return branch predictor in __replay_interrupt()Nicholas Piggin
The __replay_interrupt() code is branched to with bl, but the caller is returned to directly with rfid from the interrupt. Instead, rfid to a stub that returns to the caller with blr, which should keep the return branch predictor balanced. Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-19powerpc/64s: msgclr when handling doorbell exceptions from system resetNicholas Piggin
msgsnd doorbell exceptions are cleared when the doorbell interrupt is taken. However if a doorbell exception causes a system reset interrupt wake from power saving state, the message is not cleared. Processing the doorbell from the system reset interrupt requires msgclr to avoid taking the exception again. Testing this plus the previous wakup direct patch gives: original wakeup direct msgclr Different threads, same core: 315k/s 264k/s 345k/s Different cores: 235k/s 242k/s 242k/s Net speedup is +10% for same core, and +3% for different core. Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-16powerpc/64s: Handle data breakpoints in Radix modeNaveen N. Rao
On Power9, trying to use data breakpoints throws the splat shown below. This is because the check for a data breakpoint in DSISR is in do_hash_page(), which is not called when in Radix mode. Unable to handle kernel paging request for data at address 0xc000000000e19218 Faulting instruction address: 0xc0000000001155e8 cpu 0x0: Vector: 300 (Data Access) at [c0000000ef1e7b20] pc: c0000000001155e8: find_pid_ns+0x48/0xe0 lr: c000000000116ac4: find_task_by_vpid+0x44/0x90 sp: c0000000ef1e7da0 msr: 9000000000009033 dar: c000000000e19218 dsisr: 400000 Move the check to handle_page_fault() so as to catch data breakpoints in both Hash and Radix MMU modes. We have to change the check in do_hash_page() against 0xa410 to use 0xa450, so as to include the value of (DSISR_DABRMATCH << 16). There are two sites that call handle_page_fault() when in Radix, both already pass DSISR in r4. Fixes: caca285e5ab4 ("powerpc/mm/radix: Use STD_MMU_64 to properly isolate hash related code") Cc: stable@vger.kernel.org # v4.7+ Reported-by: Shriya R. Kulkarni <shriykul@in.ibm.com> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> [mpe: Fix the fall-through case on hash, we need to reload DSISR] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-15powerpc/64s: Optimize hypercall/syscall entryNicholas Piggin
After bc3551257a ("powerpc/64: Allow for relocation-on interrupts from guest to host"), a getppid() system call goes from 307 cycles to 358 cycles (+17%) on POWER8. This is due significantly to the scratch SPR used by the hypercall check. It turns out there are a some volatile registers common to both system call and hypercall (in particular, r12, cr0, ctr), which can be used to avoid the SPR and some other overheads. This brings getppid to 320 cycles (+4%). Testing hcall entry performance by running "sc 1" in guest userspace before this patch is 854 cycles, afterwards is 826. Also a small win there. POWER9 syscall is improved by about the same amount, hcall not tested. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-12Merge tag 'powerpc-4.12-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull more powerpc updates from Michael Ellerman: "The change to the Linux page table geometry was delayed for more testing with 16G pages, and there's the new CPU features stuff which just needed one more polish before going in. Plus a few changes from Scott which came in a bit late. And then various fixes, mostly minor. Summary highlights: - rework the Linux page table geometry to lower memory usage on 64-bit Book3S (IBM chips) using the Hash MMU. - support for a new device tree binding for discovering CPU features on future firmwares. - Freescale updates from Scott: "Includes a fix for a powerpc/next mm regression on 64e, a fix for a kernel hang on 64e when using a debugger inside a relocated kernel, a qman fix, and misc qe improvements." Thanks to: Christophe Leroy, Gavin Shan, Horia Geantă, LiuHailong, Nicholas Piggin, Roy Pledge, Scott Wood, Valentin Longchamp" * tag 'powerpc-4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s: Support new device tree binding for discovering CPU features powerpc: Don't print cpu_spec->cpu_name if it's NULL of/fdt: introduce of_scan_flat_dt_subnodes and of_get_flat_dt_phandle powerpc/64s: Fix unnecessary machine check handler relocation branch powerpc/mm/book3s/64: Rework page table geometry for lower memory usage powerpc: Fix distclean with Makefile.postlink powerpc/64e: Don't place the stack beyond TASK_SIZE powerpc/powernv: Block PCI config access on BCM5718 during EEH recovery powerpc/8xx: Adding support of IRQ in MPC8xx GPIO soc/fsl/qbman: Disable IRQs for deferred QBMan work soc/fsl/qe: add EXPORT_SYMBOL for the 2 qe_tdm functions soc/fsl/qe: only apply QE_General4 workaround on affected SoCs soc/fsl/qe: round brg_freq to 1kHz granularity soc/fsl/qe: get rid of immrbar_virt_to_phys() net: ethernet: ucc_geth: fix MEM_PART_MURAM mode powerpc/64e: Fix hang when debugging programs with relocated kernel
2017-05-09powerpc/64s: Fix unnecessary machine check handler relocation branchNicholas Piggin
Similarly to commit 2563a70c3b ("powerpc/64s: Remove unnecessary relocation branch from idle handler"), the machine check handler has a BRANCH_TO from relocated to relocated code, which is unnecessary. It has also caused build errors with some toolchains: arch/powerpc/kernel/exceptions-64s.S: Assembler messages: arch/powerpc/kernel/exceptions-64s.S:395: Error: operand out of range (0xffffffffffff8280 is not between 0x0000000000000000 and 0x000000000000ffff) Fixes: 1945bc4549e5 ("powerpc/64s: Fix POWER9 machine check handler from stop state") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reported-and-tested-by : Abdul Haleem <abdhalee@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-05Merge tag 'powerpc-4.12-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Highlights include: - Larger virtual address space on 64-bit server CPUs. By default we use a 128TB virtual address space, but a process can request access to the full 512TB by passing a hint to mmap(). - Support for the new Power9 "XIVE" interrupt controller. - TLB flushing optimisations for the radix MMU on Power9. - Support for CAPI cards on Power9, using the "Coherent Accelerator Interface Architecture 2.0". - The ability to configure the mmap randomisation limits at build and runtime. - Several small fixes and cleanups to the kprobes code, as well as support for KPROBES_ON_FTRACE. - Major improvements to handling of system reset interrupts, correctly treating them as NMIs, giving them a dedicated stack and using a new hypervisor call to trigger them, all of which should aid debugging and robustness. - Many fixes and other minor enhancements. Thanks to: Alastair D'Silva, Alexey Kardashevskiy, Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anshuman Khandual, Anton Blanchard, Balbir Singh, Ben Hutchings, Benjamin Herrenschmidt, Bhupesh Sharma, Chris Packham, Christian Zigotzky, Christophe Leroy, Christophe Lombard, Daniel Axtens, David Gibson, Gautham R. Shenoy, Gavin Shan, Geert Uytterhoeven, Guilherme G. Piccoli, Hamish Martin, Hari Bathini, Kees Cook, Laurent Dufour, Madhavan Srinivasan, Mahesh J Salgaonkar, Mahesh Salgaonkar, Masami Hiramatsu, Matt Brown, Matthew R. Ochs, Michael Neuling, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Pan Xinhui, Paul Mackerras, Rashmica Gupta, Russell Currey, Sukadev Bhattiprolu, Thadeu Lima de Souza Cascardo, Tobin C. Harding, Tyrel Datwyler, Uma Krishnan, Vaibhav Jain, Vipin K Parashar, Yang Shi" * tag 'powerpc-4.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (214 commits) powerpc/64s: Power9 has no LPCR[VRMASD] field so don't set it powerpc/powernv: Fix TCE kill on NVLink2 powerpc/mm/radix: Drop support for CPUs without lockless tlbie powerp