summaryrefslogtreecommitdiffstats
path: root/arch/powerpc/platforms
AgeCommit message (Collapse)Author
2020-12-09powerpc/8xx: Always pin kernel text TLBChristophe Leroy
There is no big poing in not pinning kernel text anymore, as now we can keep pinned TLB even with things like DEBUG_PAGEALLOC. Remove CONFIG_PIN_TLB_TEXT, making it always right. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> [mpe: Drop ifdef around mmu_pin_tlb() to fix build errors] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/203b89de491e1379f1677a2685211b7c32adfff0.1606231483.git.christophe.leroy@csgroup.eu
2020-12-09powerpc/8xx: Fix early debug when SMC1 is relocatedChristophe Leroy
When SMC1 is relocated and early debug is selected, the board hangs is ppc_md.setup_arch(). This is because ones the microcode has been loaded and SMC1 relocated, early debug writes in the weed. To allow smooth continuation, the SMC1 parameter RAM set up by the bootloader have to be copied into the new location. Fixes: 43db76f41824 ("powerpc/8xx: Add microcode patch to move SMC parameter RAM.") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b2f71f39eca543f1e4ec06596f09a8b12235c701.1607076683.git.christophe.leroy@csgroup.eu
2020-12-09powerpc/32s: Make support for 603 and 604+ selectableChristophe Leroy
book3s/32 has two main families: - CPU with 603 cores that don't have HASH PTE table and perform SW TLB loading. - Other CPUs based on 604+ cores that have HASH PTE table. This leads to some complex logic and additionnal code to support both. This makes sense for distribution kernels that aim at running on any CPU, but when you are fine tuning a kernel for an embedded 603 based board you don't need all the HASH logic. Allow selection of support for each family, in order to opt out unneeded parts of code. At least one must be selected. Note that some of the CPU supporting HASH also support SW TLB loading, however it is not supported by Linux kernel at the time being, because they do not have alternate registers in the TLB miss exception handlers. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/8dde0cdb629a71abc29b0d85a52a86e920376cb6.1603348103.git.christophe.leroy@csgroup.eu
2020-12-09powerpc/32s: Remove CONFIG_PPC_BOOK3S_6xxChristophe Leroy
As 601 is gone, CONFIG_PPC_BOO3S_6xx and CONFIG_PPC_BOOK3S_32 are dedundant. Remove CONFIG_PPC_BOOK3S_6xx. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/f18c16af37f6f77b577bed8d9e12831b695617ae.1603348103.git.christophe.leroy@csgroup.eu
2020-12-08powerpc/powermac: Fix low_sleep_handler with CONFIG_VMAP_STACKChristophe Leroy
low_sleep_handler() can't restore the context from standard stack because the stack can hardly be accessed with MMU OFF. Store everything in a global storage area instead of storing a pointer to the stack in that global storage area. To avoid a complete churn of the function, still use r1 as the pointer to the storage area during restore. Fixes: cd08f109e262 ("powerpc/32s: Enable CONFIG_VMAP_STACK") Reported-by: Giuseppe Sacco <giuseppe@sguazz.it> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Tested-by: Giuseppe Sacco <giuseppe@sguazz.it> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/e3e0d8042a3ba75cb4a9546c19c408b5b5b28994.1607404931.git.christophe.leroy@csgroup.eu
2020-12-08powerpc/pseries/mobility: refactor node lookup during DT updateNathan Lynch
In pseries_devicetree_update(), with each call to ibm,update-nodes the partition firmware communicates the node to be deleted or updated by placing its phandle in the work buffer. Each of delete_dt_node(), update_dt_node(), and add_dt_node() have duplicate lookups using the phandle value and corresponding refcount management. Move the lookup and of_node_put() into pseries_devicetree_update(), and emit a warning on any failed lookups. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201207215200.1785968-29-nathanl@linux.ibm.com
2020-12-08powerpc/pseries/hibernation: remove prepare_late() callbackNathan Lynch
The pseries hibernate code no longer calls into the original join/suspend code in kernel/rtas.c, so pseries_prepare_late() and related code don't accomplish anything now. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201207215200.1785968-27-nathanl@linux.ibm.com
2020-12-08powerpc/pseries/hibernation: perform post-suspend fixups laterNathan Lynch
The pseries hibernate code calls post_mobility_fixup() which is sort of a dumping ground of fixups that need to run after resuming from suspend regardless of whether suspend was a hibernation or a migration. Calling post_mobility_fixup() from pseries_suspend_enable_irqs() runs this code early in resume with devices suspended and only one CPU up, while the much more commonly used migration case runs these fixups in a more typical process context. Call post_mobility_fixup() after the suspend core returns a success status to the hibernate sysfs store method and remove pseries_suspend_enable_irqs(). Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201207215200.1785968-26-nathanl@linux.ibm.com
2020-12-08powerpc/pseries/hibernation: remove redundant cacheinfo updateNathan Lynch
Partitions with cache nodes in the device tree can encounter the following warning on resume: CPU 0 already accounted in PowerPC,POWER9@0(Data) WARNING: CPU: 0 PID: 3177 at arch/powerpc/kernel/cacheinfo.c:197 cacheinfo_cpu_online+0x640/0x820 These calls to cacheinfo_cpu_offline/online have been redundant since commit e610a466d16a ("powerpc/pseries/mobility: rebuild cacheinfo hierarchy post-migration"). Fixes: e610a466d16a ("powerpc/pseries/mobility: rebuild cacheinfo hierarchy post-migration") Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201207215200.1785968-25-nathanl@linux.ibm.com
2020-12-08powerpc/pseries/hibernation: switch to rtas_ibm_suspend_me()Nathan Lynch
rtas_suspend_last_cpu() and related code perform a lot of work that isn't relevant to the hibernation workflow. All other CPUs are offline when called so there is no need to place them in H_JOIN or prod them on resume, nor is there need for retries or operations on shared state. Call the rtas_ibm_suspend_me() wrapper function directly from pseries_suspend_enter() instead of using rtas_suspend_last_cpu(). Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201207215200.1785968-23-nathanl@linux.ibm.com
2020-12-08powerpc/pseries/hibernation: remove pseries_suspend_cpu()Nathan Lynch
Since commit 48f6e7f6d948 ("powerpc/pseries: remove cede offline state for CPUs"), ppc_md.suspend_disable_cpu() is no longer used and all CPUs (save one) are placed into true offline state as opposed to H_JOIN. So pseries_suspend_cpu() is effectively unused; remove it. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201207215200.1785968-20-nathanl@linux.ibm.com
2020-12-08powerpc/pseries/hibernation: pass stream id via function argumentsNathan Lynch
There is no need for the stream id to be a file-global variable; pass it from hibernate_store() to pseries_suspend_begin() for the H_VASI_STATE call. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201207215200.1785968-19-nathanl@linux.ibm.com
2020-12-08powerpc/pseries/hibernation: drop pseries_suspend_begin() from suspend opsNathan Lynch
There are three ways pseries_suspend_begin() can be reached: 1. When "mem" is written to /sys/power/state: kobj_attr_store() -> state_store() -> pm_suspend() -> suspend_devices_and_enter() -> pseries_suspend_begin() This never works because there is no way to supply a valid stream id using this interface, and H_VASI_STATE is called with a stream id of zero. So this call path is useless at best. 2. When a stream id is written to /sys/devices/system/power/hibernate. pseries_suspend_begin() is polled directly from store_hibernate() until the stream is in the "Suspending" state (i.e. the platform is ready for the OS to suspend execution): dev_attr_store() -> store_hibernate() -> pseries_suspend_begin() 3. When a stream id is written to /sys/devices/system/power/hibernate (continued). After #2, pseries_suspend_begin() is called once again from the pm core: dev_attr_store() -> store_hibernate() -> pm_suspend() -> suspend_devices_and_enter() -> pseries_suspend_begin() This is redundant because the VASI suspend state is already known to be Suspending. The begin() callback of platform_suspend_ops is optional, so we can simply remove that assignment with no loss of function. Fixes: 32d8ad4e621d ("powerpc/pseries: Partition hibernation support") Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201207215200.1785968-18-nathanl@linux.ibm.com
2020-12-08powerpc/rtas: dispatch partition migration requests to pseriesNathan Lynch
sys_rtas() cannot call ibm,suspend-me directly in the same way it handles other inputs. Instead it must dispatch the request to code that can first perform the H_JOIN sequence before any call to ibm,suspend-me can succeed. Over time kernel/rtas.c has accreted a fair amount of platform-specific code to implement this. Since a different, more robust implementation of the suspend sequence is now in the pseries platform code, we want to dispatch the request there. Note that invoking ibm,suspend-me via the RTAS syscall is all but deprecated; this change preserves ABI compatibility for old programs while providing to them the benefit of the new partition suspend implementation. This is a behavior change in that the kernel performs the device tree update and firmware activation before returning, but experimentation indicates this is tolerated fine by legacy user space. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201207215200.1785968-16-nathanl@linux.ibm.com
2020-12-08powerpc/pseries/mobility: retry partition suspend after errorNathan Lynch
This is a mitigation for the relatively rare occurrence where a virtual IOA can be in a transient state that prevents the suspend/migration from succeeding, resulting in an error from ibm,suspend-me. If the join/suspend sequence returns an error, it is acceptable to retry as long as the VASI suspend session state is still "Suspending" (i.e. the platform is still waiting for the OS to suspend). Retry a few times on suspend failure while this condition holds, progressively increasing the delay between attempts. We don't want to retry indefinitey because firmware emits an error log event on each unsuccessful attempt. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201207215200.1785968-15-nathanl@linux.ibm.com
2020-12-08powerpc/pseries/mobility: signal suspend cancellation to platformNathan Lynch
If we're returning an error to user space, use H_VASI_SIGNAL to send a cancellation request to the platform. This isn't strictly required but it communicates that Linux will not attempt to complete the suspend, which allows the various entities involved to promptly end the operation in progress. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201207215200.1785968-14-nathanl@linux.ibm.com
2020-12-08powerpc/pseries/mobility: use stop_machine for join/suspendNathan Lynch
The partition suspend sequence as specified in the platform architecture requires that all active processor threads call H_JOIN, which: - suspends the calling thread until it is the target of an H_PROD; or - immediately returns H_CONTINUE, if the calling thread is the last to call H_JOIN. This thread is expected to call ibm,suspend-me to completely suspend the partition. Upon returning from ibm,suspend-me the calling thread must wake all others using H_PROD. rtas_ibm_suspend_me_unsafe() uses on_each_cpu() to implement this protocol, but because of its synchronizing nature this is susceptible to deadlock versus users of stop_machine() or other callers of on_each_cpu(). Not only is stop_machine() intended for use cases like this, it handles error propagation and allows us to keep the data shared between CPUs minimal: a single atomic counter which ensures exactly one CPU will wake the others from their joined states. Switch the migration code to use stop_machine() and a less complex local implementation of the H_JOIN/ibm,suspend-me logic, which carries additional benefits: - more informative error reporting, appropriately ratelimited - resets the lockup detector / watchdog on resume to prevent lockup warnings when the OS has been suspended for a time exceeding the threshold. Fixes: 91dc182ca6e2 ("[PATCH] powerpc: special-case ibm,suspend-me RTAS call") Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201207215200.1785968-13-nathanl@linux.ibm.com
2020-12-08powerpc/pseries/mobility: extract VASI session polling logicNathan Lynch
The behavior of rtas_ibm_suspend_me_unsafe() is to return -EAGAIN to the caller until the specified VASI suspend session state makes the transition from H_VASI_ENABLED to H_VASI_SUSPENDING. In the interest of separating concerns to prepare for a new implementation of the join/suspend sequence, extract VASI session polling logic into a couple of local functions. Waiting for the session state to reach H_VASI_SUSPENDING before calling rtas_ibm_suspend_me_unsafe() ensures that we will never get an EAGAIN result necessitating a retry. No user-visible change in behavior is intended. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201207215200.1785968-12-nathanl@linux.ibm.com
2020-12-08powerpc/pseries/mobility: use rtas_activate_firmware() on resumeNathan Lynch
It's incorrect to abort post-suspend processing if ibm,activate-firmware isn't available. Use rtas_activate_firmware(), which logs this condition appropriately and allows us to proceed. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201207215200.1785968-11-nathanl@linux.ibm.com
2020-12-08powerpc/pseries/mobility: error message improvementsNathan Lynch
- Convert printk(KERN_ERR) to pr_err(). - Include errno in property update failure message. - Remove reference to "Post-mobility" from device tree update message: with pr_err() it will have a "mobility:" prefix. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201207215200.1785968-10-nathanl@linux.ibm.com
2020-12-08powerpc/pseries/mobility: add missing break to default caseNathan Lynch
update_dt_node() has a switch statement where the default case lacks a break statement. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201207215200.1785968-9-nathanl@linux.ibm.com
2020-12-08powerpc/pseries/mobility: don't error on absence of ibm, update-nodesNathan Lynch
Treat the absence of the ibm,update-nodes function as benign instead of reporting an error. If the platform does not provide that facility, it's not a problem for Linux. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201207215200.1785968-8-nathanl@linux.ibm.com
2020-12-08powerpc/rtas: rtas_ibm_suspend_me -> rtas_ibm_suspend_me_unsafeNathan Lynch
The pseries partition suspend sequence requires that all active CPUs call H_JOIN, which suspends all but one of them with interrupts disabled. The "chosen" CPU is then to call ibm,suspend-me to complete the suspend. Upon returning from ibm,suspend-me, the chosen CPU is to use H_PROD to wake the joined CPUs. Using on_each_cpu() for this, as rtas_ibm_suspend_me() does to implement partition migration, is susceptible to deadlock with other users of on_each_cpu() and with users of stop_machine APIs. The callback passed to on_each_cpu() is not allowed to synchronize with other CPUs in the way it is used here. Complicating the fix is the fact that rtas_ibm_suspend_me() also occupies the function name that should be used to provide a more conventional wrapper for ibm,suspend-me. Rename rtas_ibm_suspend_me() to rtas_ibm_suspend_me_unsafe() to free up the name and indicate that it should not gain users. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201207215200.1785968-4-nathanl@linux.ibm.com
2020-12-07powerpc/powernv/idle: Restore CIABR after idle for Power9Jordan Niethe
On Power9, CIABR is lost after idle. This means that instruction breakpoints set by xmon which use CIABR do not work. Fix this by restoring CIABR after idle. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201207010519.15597-2-jniethe5@gmail.com
2020-12-05powerpc: Retire e200 core (mpc555x processor)Christophe Leroy
There is no defconfig selecting CONFIG_E200, and no platform. e200 is an earlier version of booke, a predecessor of e500, with some particularities like an unified cache instead of both an instruction cache and a data cache. Remove it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Scott Wood <oss@buserror.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/34ebc3ba2c768d97f363bd5f2deea2356e9ae127.1605589460.git.christophe.leroy@csgroup.eu
2020-12-04powerpc/powernv/npu: Do not attempt NPU2 setup on POWER8NVL NPUAlexey Kardashevskiy
We execute certain NPU2 setup code (such as mapping an LPID to a device in NPU2) unconditionally if an Nvlink bridge is detected. However this cannot succeed on POWER8NVL machines and errors appear in dmesg. This is harmless as skiboot returns an error and the only place we check it is vfio-pci but that code does not get called on P8+ either. This adds a check if pnv_npu2_xxx helpers are called on a machine with NPU2 which initializes pnv_phb::npu in pnv_npu2_init(); pnv_phb::npu==NULL on POWER8/NVL (Naples). While at this, fix NULL derefencing in pnv_npu_peers_take_ownership/ pnv_npu_peers_release_ownership which occurs when GPUs on mentioned P8s cause EEH which happens if "vfio-pci" disables devices using the D3 power state; the vfio-pci's disable_idle_d3 module parameter controls this and must be set on Naples. The EEH handling clears the entire pnv_ioda_pe struct in pnv_ioda_free_pe() hence the NULL derefencing. We cannot recover from that but at least we stop crashing. Tested on - POWER9 pvr=004e1201, Ubuntu 19.04 host, Ubuntu 18.04 vm, NVIDIA GV100 10de:1db1 driver 418.39 - POWER8 pvr=004c0100, RHEL 7.6 host, Ubuntu 16.10 vm, NVIDIA P100 10de:15f9 driver 396.47 Fixes: 1b785611e119 ("powerpc/powernv/npu: Add release_ownership hook") Cc: stable@vger.kernel.org # 5.0 Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201122073828.15446-1-aik@ozlabs.ru
2020-12-04powernv/pci: Print an error when device enable is blockedOliver O'Halloran
If the platform decides to block enabling the device nothing is printed currently. This can lead to some confusion since the dmesg output will usually print an error with no context e.g. e1000e: probe of 0022:01:00.0 failed with error -22 This shouldn't be spammy since pci_enable_device() already prints a messages when it succeeds. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200409061337.9187-1-oohall@gmail.com
2020-12-04ocxl: Initiate a TLB invalidate commandChristophe Lombard
When a TLB Invalidate is required for the Logical Partition, the following sequence has to be performed: 1. Load MMIO ATSD AVA register with the necessary value, if required. 2. Write the MMIO ATSD launch register to initiate the TLB Invalidate command. 3. Poll the MMIO ATSD status register to determine when the TLB Invalidate has been completed. Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Acked-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201125155013.39955-3-clombard@linux.vnet.ibm.com
2020-12-04ocxl: Assign a register set to a Logical PartitionChristophe Lombard
Platform specific function to assign a register set to a Logical Partition. The "ibm,mmio-atsd" property, provided by the firmware, contains the 16 base ATSD physical addresses (ATSD0 through ATSD15) of the set of MMIO registers (XTS MMIO ATSDx LPARID/AVA/launch/status register). For the time being, the ATSD0 set of registers is used by default. Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Acked-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201125155013.39955-2-clombard@linux.vnet.ibm.com
2020-12-04powerpc/book3s64/kuap/kuep: Add PPC_PKEY config on book3s64Aneesh Kumar K.V
The config CONFIG_PPC_PKEY is used to select the base support that is required for PPC_MEM_KEYS, KUAP, and KUEP. Adding this dependency reduces the code complexity(in terms of #ifdefs) and enables us to move some of the initialization code to pkeys.c Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201127044424.40686-4-aneesh.kumar@linux.ibm.com
2020-12-04powerpc/64s/pseries: Add ERAT specific machine check handlerNicholas Piggin
Don't treat ERAT MCEs as SLB, don't save the SLB and use a specific ERAT flush to recover it. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201128070728.825934-7-npiggin@gmail.com
2020-12-04powerpc/64s/powernv: Ratelimit harmless HMI error printingNicholas Piggin
Harmless HMI errors can be triggered by guests in some cases, and don't contain much useful information anyway. Ratelimit these to avoid flooding the console/logs. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Use dedicated ratelimit state, not printk_ratelimit()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201128070728.825934-6-npiggin@gmail.com
2020-12-04powerpc/ps3: make system bus's remove and shutdown callbacks return voidUwe Kleine-König
The driver core ignores the return value of struct device_driver::remove because there is only little that can be done. For the shutdown callback it's ps3_system_bus_shutdown() which ignores the return value. To simplify the quest to make struct device_driver::remove return void, let struct ps3_system_bus_driver::remove return void, too. All users already unconditionally return 0, this commit makes it obvious that returning an error code is a bad idea and ensures future users behave accordingly. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201126165950.2554997-2-u.kleine-koenig@pengutronix.de
2020-12-04powerpc: Rename is_kvm_guest() to check_kvm_guest()Srikar Dronamraju
We want to reuse the is_kvm_guest() name in a subsequent patch but with a new body. Hence rename is_kvm_guest() to check_kvm_guest(). No additional changes. Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Waiman Long <longman@redhat.com> Signed-off-by: kernel test robot <lkp@intel.com> # int -> bool fix [mpe: Fold in fix from lkp to use true/false not 0/1] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201202050456.164005-3-srikar@linux.vnet.ibm.com
2020-12-04powerpc: Refactor is_kvm_guest() declaration to new headerSrikar Dronamraju
Only code/declaration movement, in anticipation of doing a KVM-aware vcpu_is_preempted(). No additional changes. Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Waiman Long <longman@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201202050456.164005-2-srikar@linux.vnet.ibm.com
2020-12-04powerpc/pseries: Define PCI bus speed for Gen4 and Gen5Frederic Barrat
Update bus speed definition for PCI Gen4 and 5. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201130152949.26467-1-fbarrat@linux.ibm.com
2020-12-04powerpc/32s: Allow deselecting CONFIG_PPC_FPU on mpc832xChristophe Leroy
The e300c2 core which is embedded in mpc832x CPU doesn't have an FPU. Make it possible to not select CONFIG_PPC_FPU when building a kernel dedicated to that target. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/fcdc60d85baf80eaa0a7f3261d9d889282068216.1597770847.git.christophe.leroy@csgroup.eu
2020-12-04powerpc/signal: Don't manage floating point regs when no FPUChristophe Leroy
There is no point in copying floating point regs when there is no FPU and MATH_EMULATION is not selected. Create a new CONFIG_PPC_FPU_REGS bool that is selected by CONFIG_MATH_EMULATION and CONFIG_PPC_FPU, and use it to opt out everything related to fp_state in thread_struct. The asm const used only by fpu.S are opted out with CONFIG_PPC_FPU as fpu.S build is conditionnal to CONFIG_PPC_FPU. The following app spends approx 8.1 seconds system time on an 8xx without the patch, and 7.0 seconds with the patch (13.5% reduction). On an 832x, it spends approx 2.6 seconds system time without the patch and 2.1 seconds with the patch (19% reduction). void sigusr1(int sig) { } int main(int argc, char **argv) { int i = 100000; signal(SIGUSR1, sigusr1); for (;i--;) raise(SIGUSR1); exit(0); } Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7569070083e6cd5b279bb5023da601aba3c06f3c.1597770847.git.christophe.leroy@csgroup.eu
2020-11-25Merge branch 'fixes' into nextMichael Ellerman
Merge our fixes branch, in particular to bring in the changes for the entry/uaccess flush.
2020-11-19powerpc/64s: rename pnv|pseries_setup_rfi_flush to _setup_security_mitigationsDaniel Axtens
pseries|pnv_setup_rfi_flush already does the count cache flush setup, and we just added entry and uaccess flushes. So the name is not very accurate any more. In both platforms we then also immediately setup the STF flush. Rename them to _setup_security_mitigations and fold the STF flush in. Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2020-11-19powerpc/64s: flush L1D after user accessesNicholas Piggin
IBM Power9 processors can speculatively operate on data in the L1 cache before it has been completely validated, via a way-prediction mechanism. It is not possible for an attacker to determine the contents of impermissible memory using this method, since these systems implement a combination of hardware and software security measures to prevent scenarios where protected data could be leaked. However these measures don't address the scenario where an attacker induces the operating system to speculatively execute instructions using data that the attacker controls. This can be used for example to speculatively bypass "kernel user access prevention" techniques, as discovered by Anthony Steinhauser of Google's Safeside Project. This is not an attack by itself, but there is a possibility it could be used in conjunction with side-channels or other weaknesses in the privileged code to construct an attack. This issue can be mitigated by flushing the L1 cache between privilege boundaries of concern. This patch flushes the L1 cache after user accesses. This is part of the fix for CVE-2020-4788. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2020-11-19powerpc/64s: flush L1D on kernel entryNicholas Piggin
IBM Power9 processors can speculatively operate on data in the L1 cache before it has been completely validated, via a way-prediction mechanism. It is not possible for an attacker to determine the contents of impermissible memory using this method, since these systems implement a combination of hardware and software security measures to prevent scenarios where protected data could be leaked. However these measures don't address the scenario where an attacker induces the operating system to speculatively execute instructions using data that the attacker controls. This can be used for example to speculatively bypass "kernel user access prevention" techniques, as discovered by Anthony Steinhauser of Google's Safeside Project. This is not an attack by itself, but there is a possibility it could be used in conjunction with side-channels or other weaknesses in the privileged code to construct an attack. This issue can be mitigated by flushing the L1 cache between privilege boundaries of concern. This patch flushes the L1 cache on kernel entry. This is part of the fix for CVE-2020-4788. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2020-11-19powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory ↵David Hildenbrand
allocations Let's use alloc_contig_pages() for allocating memory and remove the linear mapping manually via arch_remove_linear_mapping(). Mark all pages PG_offline, such that they will definitely not get touched - e.g., when hibernating. When freeing memory, try to revert what we did. The original idea was discussed in: https://lkml.kernel.org/r/48340e96-7e6b-736f-9e23-d3111b915b6e@redhat.com This is similar to CONFIG_DEBUG_PAGEALLOC handling on other architectures, whereby only single pages are unmapped from the linear mapping. Let's mimic what memory hot(un)plug would do with the linear mapping. We now need MEMORY_HOTPLUG and CONTIG_ALLOC as dependencies. Add a TODO that we want to use __GFP_ZERO for clearing once alloc_contig_pages() understands that. Tested with in QEMU/TCG with 10 GiB of main memory: [root@localhost ~]# echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable [ 105.903043][ T1080] memtrace: Allocated trace memory on node 0 at 0x0000000080000000 [root@localhost ~]# echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable [ 145.042493][ T1080] radix-mmu: Mapped 0x0000000080000000-0x00000000c0000000 with 64.0 KiB pages [ 145.049019][ T1080] memtrace: Freed trace memory back on node 0 [ 145.333960][ T1080] memtrace: Allocated trace memory on node 0 at 0x0000000080000000 [root@localhost ~]# echo 0x80000000 > /sys/kernel/debug/powerpc/memtrace/enable [ 213.606916][ T1080] radix-mmu: Mapped 0x0000000080000000-0x00000000c0000000 with 64.0 KiB pages [ 213.613855][ T1080] memtrace: Freed trace memory back on node 0 [ 214.185094][ T1080] memtrace: Allocated trace memory on node 0 at 0x0000000080000000 [root@localhost ~]# echo 0x100000000 > /sys/kernel/debug/powerpc/memtrace/enable [ 234.874872][ T1080] radix-mmu: Mapped 0x0000000080000000-0x0000000100000000 with 64.0 KiB pages [ 234.886974][ T1080] memtrace: Freed trace memory back on node 0 [ 234.890153][ T1080] memtrace: Failed to allocate trace memory on node 0 [root@localhost ~]# echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable [ 259.490196][ T1080] memtrace: Allocated trace memory on node 0 at 0x0000000080000000 I also made sure allocated memory is properly zeroed. Note 1: We currently won't be allocating from ZONE_MOVABLE - because our pages are not movable. However, as we don't run with any memory hot(un)plug mechanism around, we could make an exception to increase the chance of allocations succeeding. Note 2: PG_reserved isn't sufficient. E.g., kernel_page_present() used along PG_reserved in hibernation code will always return "true" on powerpc, resulting in the pages getting touched. It's too generic - e.g., indicates boot allocations. Note 3: For now, we keep using memory_block_size_bytes() as minimum granularity. Suggested-by: Michal Hocko <mhocko@kernel.org> Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201111145322.15793-9-david@redhat.com
2020-11-19powerpc/powernv/memtrace: Fix crashing the kernel when enabling concurrentlyDavid Hildenbrand
It's very easy to crash the kernel right now by simply trying to enable memtrace concurrently, hammering on the "enable" interface loop.sh: #!/bin/bash dmesg --console-off while true; do echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable done [root@localhost ~]# loop.sh & [root@localhost ~]# loop.sh & Resulting quickly in a kernel crash. Let's properly protect using a mutex. Fixes: 9d5171a8f248 ("powerpc/powernv: Enable removal of memory for in memory tracing") Cc: stable@vger.kernel.org# v4.14+ Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201111145322.15793-3-david@redhat.com
2020-11-19powerpc/powernv/memtrace: Don't leak kernel memory to user spaceDavid Hildenbrand
We currently leak kernel memory to user space, because memory offlining doesn't do any implicit clearing of memory and we are missing explicit clearing of memory. Let's keep it simple and clear pages before removing the linear mapping. Reproduced in QEMU/TCG with 10 GiB of main memory: [root@localhost ~]# dd obs=9G if=/dev/urandom of=/dev/null [... wait until "free -m" used counter no longer changes and cancel] 19665802+0 records in 1+0 records out 9663676416 bytes (9.7 GB, 9.0 GiB) copied, 135.548 s, 71.3 MB/s [root@localhost ~]# cat /sys/devices/system/memory/block_size_bytes 40000000 [root@localhost ~]# echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable [ 402.978663][ T1086] page:000000001bc4bc74 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24900 [ 402.980063][ T1086] flags: 0x7ffff000001000(reserved) [ 402.980415][ T1086] raw: 007ffff000001000 c00c000000924008 c00c000000924008 0000000000000000 [ 402.980627][ T1086] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 [ 402.980845][ T1086] page dumped because: unmovable page [ 402.989608][ T1086] Offlined Pages 16384 [ 403.324155][ T1086] memtrace: Allocated trace memory on node 0 at 0x0000000200000000 Before this patch: [root@localhost ~]# hexdump -C /sys/kernel/debug/powerpc/memtrace/00000000/trace | head 00000000 c8 25 72 51 4d 26 36 c5 5c c2 56 15 d5 1a cd 10 |.%rQM&6.\.V.....| 00000010 19 b9 50 b2 cb e3 60 b8 ec 0a f3 ec 4b 3c 39 f0 |..P...`.....K<9.|$ 00000020 4e 5a 4c cf bd 26 19 ff 37 79 13 67 24 b7 b8 57 |NZL..&..7y.g$..W|$ 00000030 98 3e f5 be 6f 14 6a bd a4 52 bc 6e e9 e0 c1 5d |.>..o.j..R.n...]|$ 00000040 76 b3 ae b5 88 d7 da e3 64 23 85 2c 10 88 07 b6 |v.......d#.,....|$ 00000050 9a d8 91 de f7 50 27 69 2e 64 9c 6f d3 19 45 79 |.....P'i.d.o..Ey|$ 00000060 6a 6f 8a 61 71 19 1f c7 f1 df 28 26 ca 0f 84 55 |jo.aq.....(&...U|$ 00000070 01 3f be e4 e2 e1 da ff 7b 8c 8e 32 37 b4 24 53 |.?......{..27.$S|$ 00000080 1b 70 30 45 56 e6 8c c4 0e b5 4c fb 9f dd 88 06 |.p0EV.....L.....|$ 00000090 ef c4 18 79 f1 60 b1 5c 79 59 4d f4 36 d7 4a 5c |...y.`.\yYM.6.J\|$ After this patch: [root@localhost ~]# hexdump -C /sys/kernel/debug/powerpc/memtrace/00000000/trace | head 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 40000000 Fixes: 9d5171a8f248 ("powerpc/powernv: Enable removal of memory for in memory tracing") Cc: stable@vger.kernel.org # v4.14+ Reported-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201111145322.15793-2-david@redhat.com
2020-11-19powerpc/powernv/sriov: fix unsigned int win compared to less than zeroKaixu Xia
Fix coccicheck warning: arch/powerpc/platforms/powernv/pci-sriov.c:443:7-10: WARNING: Unsigned expression compared with zero: win < 0 arch/powerpc/platforms/powernv/pci-sriov.c:462:7-10: WARNING: Unsigned expression compared with zero: win < 0 Fixes: 39efc03e3ee8 ("powerpc/powernv/sriov: Move M64 BAR allocation into a helper") Reported-by: Tosk Robot <tencent_os_robot@tencent.com> Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1605007170-22171-1-git-send-email-kaixuxia@tencent.com
2020-11-19Revert "powerpc/pseries/hotplug-cpu: Remove double free in error path"Zhang Xiaoxu
This reverts commit a0ff72f9f5a780341e7ff5e9ba50a0dad5fa1980. Since the commit b015f6bc9547 ("powerpc/pseries: Add cpu DLPAR support for drc-info property"), the 'cpu_drcs' wouldn't be double freed when the 'cpus' node not found. So we needn't apply this patch, otherwise, the memory will be leaked. Fixes: a0ff72f9f5a7 ("powerpc/pseries/hotplug-cpu: Remove double free in error path") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> [mpe: Caused by me applying a patch to a function that had changed in the interim] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201111020752.1686139-1-zhangxiaoxu5@huawei.com
2020-11-19powerpc/85xx: Fix declaration made after definitionMichael Ellerman
Currently the clang build of corenet64_smp_defconfig fails with: arch/powerpc/platforms/85xx/corenet_generic.c:210:1: error: attribute declaration must precede definition machine_arch_initcall(corenet_generic, corenet_gen_publish_devices); Fix it by moving the initcall definition prior to the machine definition, and directly below the function it calls, which is the usual style anyway. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201023020838.3274226-1-mpe@ellerman.id.au
2020-10-24Merge tag 'powerpc-5.10-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - A fix for undetected data corruption on Power9 Nimbus <= DD2.1 in the emulation of VSX loads. The affected CPUs were not widely available. - Two fixes for machine check handling in guests under PowerVM. - A fix for our recent changes to SMP setup, when CONFIG_CPUMASK_OFFSTACK=y. - Three fixes for races in the handling of some of our powernv sysfs attributes. - One change to remove TM from the set of Power10 CPU features. - A couple of other minor fixes. Thanks to: Aneesh Kumar K.V, Christophe Leroy, Ganesh Goudar, Jordan Niethe, Mahesh Salgaonkar, Michael Neuling, Oliver O'Halloran, Qian Cai, Srikar Dronamraju, Vasant Hegde. * tag 'powerpc-5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/pseries: Avoid using addr_to_pfn in real mode powerpc/uaccess: Don't use "m<>" constraint with GCC 4.9 powerpc/eeh: Fix eeh_dev_check_failure() for PE#0 powerpc/64s: Remove TM from Power10 features selftests/powerpc: Make alignment handler test P9N DD2.1 vector CI load workaround powerpc: Fix undetected data corruption with P9N DD2.1 VSX CI load emulation powerpc/powernv/dump: Handle multiple writes to ack attribute powerpc/powernv/dump: Fix race while processing OPAL dump powerpc/smp: Use GFP_ATOMIC while allocating tmp mask powerpc/smp: Remove unnecessary variable powerpc/mce: Avoid nmi_enter/exit in real mode on pseries hash powerpc/opal_elog: Handle multiple writes to ack attribute
2020-10-22powerpc/pseries: Avoid using addr_to_pfn in real modeGanesh Goudar
When an UE or memory error exception is encountered the MCE handler tries to find the pfn using addr_to_pfn() which takes effective address as an argument, later pfn is used to poison the page where memory error occurred, recent rework in this area made addr_to_pfn to run in real mode, which can be fatal as it may try to access memory outside RMO region. Have two helper functions to separate things to be done in real mode and virtual mode without changing any functionality. This also fixes the following error as the use of addr_to_pfn is now moved to virtual mode. Without this change following kernel crash is seen on hitting UE. [ 485.128036] Oops: Kernel access of bad area, sig: 11 [#1] [ 485.128040] LE SMP NR_CPUS=2048 NUMA pSeries [ 485.128047] Modules linked in: [ 485.128067] CPU: 15 PID: 6536 Comm: insmod Kdump: loaded Tainted: G OE 5.7.0 #22 [ 485.128074] NIP: c00000000009b24c LR: c0000000000398d8 CTR: c000000000cd57c0 [ 485.128078] REGS: c000000003f1f970 TRAP: 0300 Tainted: G OE (5.7.0) [ 485.128082] MSR: 8000000000001003 <SF,ME,RI,LE> CR: 28008284 XER: 00000001 [ 485.128088] CFAR: c00000000009b190 DAR: c0000001fab00000 DSISR: 40000000 IRQMASK: 1 [ 485.128088] GPR00: 0000000000000001 c000000003f1fbf0 c000000001634300 0000b0fa01000000 [ 485.128088] GPR04: d000000002220000 0000000000000000 00000000fab00000 0000000000000022 [ 485.128088] GPR08: c0000001fab00000 0000000000000000 c0000001fab00000 c000000003f1fc14 [ 485.128088] GPR12: 0000000000000008 c000000003ff5880 d000000002100008 0000000000000000 [ 485.128088] GPR16: 000000000000ff20 000000000000fff1 000000000000fff2 d0000000021a1100 [ 485.128088] GPR20: d000000002200000 c00000015c893c50 c000000000d49b28 c00000015c893c50 [ 485.128088] GPR24: d0000000021a0d08 c0000000014e5da8 d0000000021a0818 000000000000000a [ 485.128088] GPR28: 0000000000000008 000000000000000a c0000000017e2970 000000000000000a [ 485.128125] NIP [c00000000009b24c] __find_linux_pte+0x11c/0x310 [ 485.128130] LR [c0000000000398d8] addr_to_pfn+0x138/0x170 [ 485.128133] Call Trace: [ 485.128135] Instruction dump: [ 485.128138] 3929ffff 7d4a3378 7c883c36 7d2907b4 794a1564 7d294038 794af082 3900ffff [ 485.128144] 79291f24 790af00e 78e70020 7d095214 <7c69502a> 2fa30000 419e011c 70690040 [ 485.128152] ---[ end trace d34b27e29ae0e340 ]--- Fixes: 9ca766f9891d ("powerpc/64s/pseries: machine check convert to use common event code") Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200724063946.21378-1-ganeshgr@linux.ibm.com