summaryrefslogtreecommitdiffstats
path: root/arch/powerpc/include
AgeCommit message (Collapse)Author
2017-05-30powerpc: Use uapi/asm-generic/sockios.hStephen Rothwell
The arch version is identical except for comments and white space. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-30powerpc: Use the asm-generic versions of some uapi includesStephen Rothwell
These are completely obvious as all they do is include the asm-generic versions. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-30powerpc/powernv/idle: Use Requested Level for restoring state on P9 DD1Gautham R. Shenoy
On Power9 DD1 due to a hardware bug the Power-Saving Level Status field (PLS) of the PSSCR for a thread waking up from a deep state can under-report if some other thread in the core is in a shallow stop state. The scenario in which this can manifest is as follows: 1) All the threads of the core are in deep stop. 2) One of the threads is woken up. The PLS for this thread will correctly reflect that it is waking up from deep stop. 3) The thread that has woken up now executes a shallow stop. 4) When some other thread in the core is woken, its PLS will reflect the shallow stop state. Thus, the subsequent thread for which the PLS is under-reporting the wakeup state will not restore the hypervisor resources. Hence, on DD1 systems, use the Requested Level (RL) field as a workaround to restore the contents of the hypervisor resources on the wakeup from the stop state. Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-30powerpc/64s: Fix FIXUP_ENDIAN non-maskable interrupt reentrancyNicholas Piggin
FIXUP_ENDIAN uses SRR[01] with MSR_RI=1, which gets corrupted if there is an interleaving system reset or machine check interrupt. Set MSR_RI=0 before setting SRRs. The rfid will restore MSR. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-25powerpc: Add PPC_FEATURE userspace bits for SCV and DARN instructionsNicholas Piggin
Providing "scv" support to userspace requires kernel support, so it must be advertised as independently to the base ISA 3 instruction set. The darn instruction relies on firmware enablement, so it has been decided to split this out from the core ISA 3 feature as well. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-19powerpc/mm: Fix virt_addr_valid() etc. on 64-bit hashMichael Ellerman
virt_addr_valid() is supposed to tell you if it's OK to call virt_to_page() on an address. What this means in practice is that it should only return true for addresses in the linear mapping which are backed by a valid PFN. We are failing to properly check that the address is in the linear mapping, because virt_to_pfn() will return a valid looking PFN for more or less any address. That bug is actually caused by __pa(), used in virt_to_pfn(). eg: __pa(0xc000000000010000) = 0x10000 # Good __pa(0xd000000000010000) = 0x10000 # Bad! __pa(0x0000000000010000) = 0x10000 # Bad! This started happening after commit bdbc29c19b26 ("powerpc: Work around gcc miscompilation of __pa() on 64-bit") (Aug 2013), where we changed the definition of __pa() to work around a GCC bug. Prior to that we subtracted PAGE_OFFSET from the value passed to __pa(), meaning __pa() of a 0xd or 0x0 address would give you something bogus back. Until we can verify if that GCC bug is no longer an issue, or come up with another solution, this commit does the minimal fix to make virt_addr_valid() work, by explicitly checking that the address is in the linear mapping region. Fixes: bdbc29c19b26 ("powerpc: Work around gcc miscompilation of __pa() on 64-bit") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Paul Mackerras <paulus@ozlabs.org> Reviewed-by: Balbir Singh <bsingharora@gmail.com> Tested-by: Breno Leitao <breno.leitao@gmail.com>
2017-05-15powerpc/modules: If mprofile-kernel is enabled add it to vermagicMichael Ellerman
On powerpc we can build the kernel with two different ABIs for mcount(), which is used by ftrace. Kernels built with one ABI do not know how to load modules built with the other ABI. The new style ABI is called "mprofile-kernel", for want of a better name. Currently if we build a module using the old style ABI, and the kernel with mprofile-kernel, when we load the module we'll oops something like: # insmod autofs4-no-mprofile-kernel.ko ftrace-powerpc: Unexpected instruction f8810028 around bl _mcount ------------[ cut here ]------------ WARNING: CPU: 6 PID: 3759 at ../kernel/trace/ftrace.c:2024 ftrace_bug+0x2b8/0x3c0 CPU: 6 PID: 3759 Comm: insmod Not tainted 4.11.0-rc3-gcc-5.4.1-00017-g5a61ef74f269 #11 ... NIP [c0000000001eaa48] ftrace_bug+0x2b8/0x3c0 LR [c0000000001eaff8] ftrace_process_locs+0x4a8/0x590 Call Trace: alloc_pages_current+0xc4/0x1d0 (unreliable) ftrace_process_locs+0x4a8/0x590 load_module+0x1c8c/0x28f0 SyS_finit_module+0x110/0x140 system_call+0x38/0xfc ... ftrace failed to modify [<d000000002a31024>] 0xd000000002a31024 actual: 35:65:00:48 We can avoid this by including in the vermagic whether the kernel/module was built with mprofile-kernel. Which results in: # insmod autofs4-pg.ko autofs4: version magic '4.11.0-rc3-gcc-5.4.1-00017-g5a61ef74f269 SMP mod_unload modversions ' should be '4.11.0-rc3-gcc-5.4.1-00017-g5a61ef74f269-dirty SMP mod_unload modversions mprofile-kernel' insmod: ERROR: could not insert module autofs4-pg.ko: Invalid module format Fixes: 8c50b72a3b4f ("powerpc/ftrace: Add Kconfig & Make glue for mprofile-kernel") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Balbir Singh <bsingharora@gmail.com> Acked-by: Jessica Yu <jeyu@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-12Merge tag 'powerpc-4.12-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull more powerpc updates from Michael Ellerman: "The change to the Linux page table geometry was delayed for more testing with 16G pages, and there's the new CPU features stuff which just needed one more polish before going in. Plus a few changes from Scott which came in a bit late. And then various fixes, mostly minor. Summary highlights: - rework the Linux page table geometry to lower memory usage on 64-bit Book3S (IBM chips) using the Hash MMU. - support for a new device tree binding for discovering CPU features on future firmwares. - Freescale updates from Scott: "Includes a fix for a powerpc/next mm regression on 64e, a fix for a kernel hang on 64e when using a debugger inside a relocated kernel, a qman fix, and misc qe improvements." Thanks to: Christophe Leroy, Gavin Shan, Horia Geantă, LiuHailong, Nicholas Piggin, Roy Pledge, Scott Wood, Valentin Longchamp" * tag 'powerpc-4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s: Support new device tree binding for discovering CPU features powerpc: Don't print cpu_spec->cpu_name if it's NULL of/fdt: introduce of_scan_flat_dt_subnodes and of_get_flat_dt_phandle powerpc/64s: Fix unnecessary machine check handler relocation branch powerpc/mm/book3s/64: Rework page table geometry for lower memory usage powerpc: Fix distclean with Makefile.postlink powerpc/64e: Don't place the stack beyond TASK_SIZE powerpc/powernv: Block PCI config access on BCM5718 during EEH recovery powerpc/8xx: Adding support of IRQ in MPC8xx GPIO soc/fsl/qbman: Disable IRQs for deferred QBMan work soc/fsl/qe: add EXPORT_SYMBOL for the 2 qe_tdm functions soc/fsl/qe: only apply QE_General4 workaround on affected SoCs soc/fsl/qe: round brg_freq to 1kHz granularity soc/fsl/qe: get rid of immrbar_virt_to_phys() net: ethernet: ucc_geth: fix MEM_PART_MURAM mode powerpc/64e: Fix hang when debugging programs with relocated kernel
2017-05-10Merge tag 'kbuild-uapi-v4.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild UAPI updates from Masahiro Yamada: "Improvement of headers_install by Nicolas Dichtel. It has been long since the introduction of uapi directories, but the de-coupling of exported headers has not been completed. Headers listed in header-y are exported whether they exist in uapi directories or not. His work fixes this inconsistency. All (and only) headers under uapi directories are now exported. The asm-generic wrappers are still exceptions, but this is a big step forward" * tag 'kbuild-uapi-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: arch/include: remove empty Kbuild files uapi: export all arch specifics directories uapi: export all headers under uapi directories smc_diag.h: fix include from userland btrfs_tree.h: fix include from userland uapi: includes linux/types.h before exporting files Makefile.headersinst: remove destination-y option Makefile.headersinst: cleanup input files x86: stop exporting msr-index.h to userland nios2: put setup.h in uapi h8300: put bitsperlong.h in uapi
2017-05-11uapi: export all headers under uapi directoriesNicolas Dichtel
Regularly, when a new header is created in include/uapi/, the developer forgets to add it in the corresponding Kbuild file. This error is usually detected after the release is out. In fact, all headers under uapi directories should be exported, thus it's useless to have an exhaustive list. After this patch, the following files, which were not exported, are now exported (with make headers_install_all): asm-arc/kvm_para.h asm-arc/ucontext.h asm-blackfin/shmparam.h asm-blackfin/ucontext.h asm-c6x/shmparam.h asm-c6x/ucontext.h asm-cris/kvm_para.h asm-h8300/shmparam.h asm-h8300/ucontext.h asm-hexagon/shmparam.h asm-m32r/kvm_para.h asm-m68k/kvm_para.h asm-m68k/shmparam.h asm-metag/kvm_para.h asm-metag/shmparam.h asm-metag/ucontext.h asm-mips/hwcap.h asm-mips/reg.h asm-mips/ucontext.h asm-nios2/kvm_para.h asm-nios2/ucontext.h asm-openrisc/shmparam.h asm-parisc/kvm_para.h asm-powerpc/perf_regs.h asm-sh/kvm_para.h asm-sh/ucontext.h asm-tile/shmparam.h asm-unicore32/shmparam.h asm-unicore32/ucontext.h asm-x86/hwcap2.h asm-xtensa/kvm_para.h drm/armada_drm.h drm/etnaviv_drm.h drm/vgem_drm.h linux/aspeed-lpc-ctrl.h linux/auto_dev-ioctl.h linux/bcache.h linux/btrfs_tree.h linux/can/vxcan.h linux/cifs/cifs_mount.h linux/coresight-stm.h linux/cryptouser.h linux/fsmap.h linux/genwqe/genwqe_card.h linux/hash_info.h linux/kcm.h linux/kcov.h linux/kfd_ioctl.h linux/lightnvm.h linux/module.h linux/nbd-netlink.h linux/nilfs2_api.h linux/nilfs2_ondisk.h linux/nsfs.h linux/pr.h linux/qrtr.h linux/rpmsg.h linux/sched/types.h linux/sed-opal.h linux/smc.h linux/smc_diag.h linux/stm.h linux/switchtec_ioctl.h linux/vfio_ccw.h linux/wil6210_uapi.h rdma/bnxt_re-abi.h Note that I have removed from this list the files which are generated in every exported directories (like .install or .install.cmd). Thanks to Julien Floret <julien.floret@6wind.com> for the tip to get all subdirs with a pure makefile command. For the record, note that exported files for asm directories are a mix of files listed by: - include/uapi/asm-generic/Kbuild.asm; - arch/<arch>/include/uapi/asm/Kbuild; - arch/<arch>/include/asm/Kbuild. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Russell King <rmk+kernel@armlinux.org.uk> Acked-by: Mark Salter <msalter@redhat.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2017-05-09powerpc/64s: Support new device tree binding for discovering CPU featuresNicholas Piggin
The ibm,powerpc-cpu-features device tree binding describes CPU features with ASCII names and extensible compatibility, privilege, and enablement metadata that allows improved flexibility and compatibility with new hardware. The interface is described in detail in ibm,powerpc-cpu-features.txt in this patch. Currently this code is not enabled by default, and there are no released firmwares that provide the binding. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-09Merge branch 'next' of ↵Michael Ellerman
git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux into next Freescale updates from Scott: "Includes a fix for a powerpc/next mm regression on 64e, a fix for a kernel hang on 64e when using a debugger inside a relocated kernel, a qman fix, and misc qe improvements."
2017-05-09Merge branch 'kvm-ppc-next' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD The main thing here is a new implementation of the in-kernel XICS interrupt controller emulation for POWER9 machines, from Ben Herrenschmidt. POWER9 has a new interrupt controller called XIVE (eXternal Interrupt Virtualization Engine) which is able to deliver interrupts directly to guest virtual CPUs in hardware without hypervisor intervention. With this new code, the guest still sees the old XICS interface but performance is better because the XICS emulation in the host uses the XIVE directly rather than going through a XICS emulation in firmware. Conflicts: arch/powerpc/kernel/cpu_setup_power.S [cherry-picked fix] arch/powerpc/kvm/book3s_xive.c [include asm/debugfs.h]
2017-05-09powerpc/mm/book3s/64: Rework page table geometry for lower memory usageMichael Ellerman
Recently in commit f6eedbba7a26 ("powerpc/mm/hash: Increase VA range to 128TB") we increased the virtual address space for user processes to 128TB by default, and up to 512TB if user space opts in. This obviously required expanding the range of the Linux page tables. For Book3s 64-bit using hash and with PAGE_SIZE=64K, we increased the PGD to 2^15 entries. This meant we could cover the full address range, while still being able to insert a 16G hugepage at the PGD level and a 16M hugepage in the PMD. The downside of that geometry is that it uses a lot of memory for the PGD, and in particular makes the PGD a 4-page allocation, which means it's much more likely to fail under memory pressure. Instead we can make the PMD larger, so that a single PUD entry maps 16G, allowing the 16G hugepages to sit at that level in the tree. We're then able to split the remaining bits between the PUG and PGD. We make the PGD slightly larger as that results in lower memory usage for typical programs. When THP is enabled the PMD actually doubles in size, to 2^11 entries, or 2^14 bytes, which is large but still < PAGE_SIZE. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Balbir Singh <bsingharora@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2017-05-08Merge tag 'pci-v4.12-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: - add framework for supporting PCIe devices in Endpoint mode (Kishon Vijay Abraham I) - use non-postable PCI config space mappings when possible (Lorenzo Pieralisi) - clean up and unify mmap of PCI BARs (David Woodhouse) - export and unify Function Level Reset support (Christoph Hellwig) - avoid FLR for Intel 82579 NICs (Sasha Neftin) - add pci_request_irq() and pci_free_irq() helpers (Christoph Hellwig) - short-circuit config access failures for disconnected devices (Keith Busch) - remove D3 sleep delay when possible (Adrian Hunter) - freeze PME scan before suspending devices (Lukas Wunner) - stop disabling MSI/MSI-X in pci_device_shutdown() (Prarit Bhargava) - disable boot interrupt quirk for ASUS M2N-LR (Stefan Assmann) - add arch-specific alignment control to improve device passthrough by avoiding multiple BARs in a page (Yongji Xie) - add sysfs sriov_drivers_autoprobe to control VF driver binding (Bodong Wang) - allow slots below PCI-to-PCIe "reverse bridges" (Bjorn Helgaas) - fix crashes when unbinding host controllers that don't support removal (Brian Norris) - add driver for MicroSemi Switchtec management interface (Logan Gunthorpe) - add driver for Faraday Technology FTPCI100 host bridge (Linus Walleij) - add i.MX7D support (Andrey Smirnov) - use generic MSI support for Aardvark (Thomas Petazzoni) - make Rockchip driver modular (Brian Norris) - advertise 128-byte Read Completion Boundary support for Rockchip (Shawn Lin) - advertise PCI_EXP_LNKSTA_SLC for Rockchip root port (Shawn Lin) - convert atomic_t to refcount_t in HV driver (Elena Reshetova) - add CPU IRQ affinity in HV driver (K. Y. Srinivasan) - fix PCI bus removal in HV driver (Long Li) - add support for ThunderX2 DMA alias topology (Jayachandran C) - add ThunderX pass2.x 2nd node MCFG quirk (Tomasz Nowicki) - add ITE 8893 bridge DMA alias quirk (Jarod Wilson) - restrict Cavium ACS quirk only to CN81xx/CN83xx/CN88xx devices (Manish Jaggi) * tag 'pci-v4.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (146 commits) PCI: Don't allow unbinding host controllers that aren't prepared ARM: DRA7: clockdomain: Change the CLKTRCTRL of CM_PCIE_CLKSTCTRL to SW_WKUP MAINTAINERS: Add PCI Endpoint maintainer Documentation: PCI: Add userguide for PCI endpoint test function tools: PCI: Add sample test script to invoke pcitest tools: PCI: Add a userspace tool to test PCI endpoint Documentation: misc-devices: Add Documentation for pci-endpoint-test driver misc: Add host side PCI driver for PCI test function device PCI: Add device IDs for DRA74x and DRA72x dt-bindings: PCI: dra7xx: Add DT bindings to enable unaligned access PCI: dwc: dra7xx: Workaround for errata id i870 dt-bindings: PCI: dra7xx: Add DT bindings for PCI dra7xx EP mode PCI: dwc: dra7xx: Add EP mode support PCI: dwc: dra7xx: Facilitate wrapper and MSI interrupts to be enabled independently dt-bindings: PCI: Add DT bindings for PCI designware EP mode PCI: dwc: designware: Add EP mode support Documentation: PCI: Add binding documentation for pci-test endpoint function ixgbe: Use pcie_flr() instead of duplicating it IB/hfi1: Use pcie_flr() instead of duplicating it PCI: imx6: Fix spelling mistake: "contol" -> "control" ...
2017-05-08Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge more updates from Andrew Morton: - the rest of MM - various misc things - procfs updates - lib/ updates - checkpatch updates - kdump/kexec updates - add kvmalloc helpers, use them - time helper updates for Y2038 issues. We're almost ready to remove current_fs_time() but that awaits a btrfs merge. - add tracepoints to DAX * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (114 commits) drivers/staging/ccree/ssi_hash.c: fix build with gcc-4.4.4 selftests/vm: add a test for virtual address range mapping dax: add tracepoint to dax_insert_mapping() dax: add tracepoint to dax_writeback_one() dax: add tracepoints to dax_writeback_mapping_range() dax: add tracepoints to dax_load_hole() dax: add tracepoints to dax_pfn_mkwrite() dax: add tracepoints to dax_iomap_pte_fault() mtd: nand: nandsim: convert to memalloc_noreclaim_*() treewide: convert PF_MEMALLOC manipulations to new helpers mm: introduce memalloc_noreclaim_{save,restore} mm: prevent potential recursive reclaim due to clearing PF_MEMALLOC mm/huge_memory.c: deposit a pgtable for DAX PMD faults when required mm/huge_memory.c: use zap_deposited_table() more time: delete CURRENT_TIME_SEC and CURRENT_TIME gfs2: replace CURRENT_TIME with current_time apparmorfs: replace CURRENT_TIME with current_time() lustre: replace CURRENT_TIME macro fs: ubifs: replace CURRENT_TIME_SEC with current_time fs: ufs: use ktime_get_real_ts64() for birthtime ...
2017-05-08powerpc/fadump: remove dependency with CONFIG_KEXECHari Bathini
Now that crashkernel parameter parsing and vmcoreinfo related code is moved under CONFIG_CRASH_CORE instead of CONFIG_KEXEC_CORE, remove dependency with CONFIG_KEXEC for CONFIG_FA_DUMP. While here, get rid of definitions of fadump_append_elf_note() & fadump_final_note() functions to reuse similar functions compiled under CONFIG_CRASH_CORE. Link: http://lkml.kernel.org/r/149035343956.6881.1536459326017709354.stgit@hbathini.in.ibm.com Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "ARM: - HYP mode stub supports kexec/kdump on 32-bit - improved PMU support - virtual interrupt controller performance improvements - support for userspace virtual interrupt controller (slower, but necessary for KVM on the weird Broadcom SoCs used by the Raspberry Pi 3) MIPS: - basic support for hardware virtualization (ImgTec P5600/P6600/I6400 and Cavium Octeon III) PPC: - in-kernel acceleration for VFIO s390: - support for guests without storage keys - adapter interruption suppression x86: - usual range of nVMX improvements, notably nested EPT support for accessed and dirty bits - emulation of CPL3 CPUID faulting generic: - first part of VCPU thread request API - kvm_stat improvements" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits) kvm: nVMX: Don't validate disabled secondary controls KVM: put back #ifndef CONFIG_S390 around kvm_vcpu_kick Revert "KVM: Support vCPU-based gfn->hva cache" tools/kvm: fix top level makefile KVM: x86: don't hold kvm->lock in KVM_SET_GSI_ROUTING KVM: Documentation: remove VM mmap documentation kvm: nVMX: Remove superfluous VMX instruction fault checks KVM: x86: fix emulation of RSM and IRET instructions KVM: mark requests that need synchronization KVM: return if kvm_vcpu_wake_up() did wake up the VCPU KVM: add explicit barrier to kvm_vcpu_kick KVM: perform a wake_up in kvm_make_all_cpus_request KVM: mark requests that do not need a wakeup KVM: remove #ifndef CONFIG_S390 around kvm_vcpu_wake_up KVM: x86: always use kvm_make_request instead of set_bit KVM: add kvm_{test,clear}_request to replace {test,clear}_bit s390: kvm: Cpu model support for msa6, msa7 and msa8 KVM: x86: remove irq disablement around KVM_SET_CLOCK/KVM_GET_CLOCK kvm: better MWAIT emulation for guests KVM: x86: virtualize cpuid faulting ...
2017-05-05Merge tag 'powerpc-4.12-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Highlights include: - Larger virtual address space on 64-bit server CPUs. By default we use a 128TB virtual address space, but a process can request access to the full 512TB by passing a hint to mmap(). - Support for the new Power9 "XIVE" interrupt controller. - TLB flushing optimisations for the radix MMU on Power9. - Support for CAPI cards on Power9, using the "Coherent Accelerator Interface Architecture 2.0". - The ability to configure the mmap randomisation limits at build and runtime. - Several small fixes and cleanups to the kprobes code, as well as support for KPROBES_ON_FTRACE. - Major improvements to handling of system reset interrupts, correctly treating them as NMIs, giving them a dedicated stack and using a new hypervisor call to trigger them, all of which should aid debugging and robustness. - Many fixes and other minor enhancements. Thanks to: Alastair D'Silva, Alexey Kardashevskiy, Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anshuman Khandual, Anton Blanchard, Balbir Singh, Ben Hutchings, Benjamin Herrenschmidt, Bhupesh Sharma, Chris Packham, Christian Zigotzky, Christophe Leroy, Christophe Lombard, Daniel Axtens, David Gibson, Gautham R. Shenoy, Gavin Shan, Geert Uytterhoeven, Guilherme G. Piccoli, Hamish Martin, Hari Bathini, Kees Cook, Laurent Dufour, Madhavan Srinivasan, Mahesh J Salgaonkar, Mahesh Salgaonkar, Masami Hiramatsu, Matt Brown, Matthew R. Ochs, Michael Neuling, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Pan Xinhui, Paul Mackerras, Rashmica Gupta, Russell Currey, Sukadev Bhattiprolu, Thadeu Lima de Souza Cascardo, Tobin C. Harding, Tyrel Datwyler, Uma Krishnan, Vaibhav Jain, Vipin K Parashar, Yang Shi" * tag 'powerpc-4.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (214 commits) powerpc/64s: Power9 has no LPCR[VRMASD] field so don't set it powerpc/powernv: Fix TCE kill on NVLink2 powerpc/mm/radix: Drop support for CPUs without lockless tlbie powerpc/book3s/mce: Move add_taint() later in virtual mode powerpc/sysfs: Move #ifdef CONFIG_HOTPLUG_CPU out of the function body powerpc/smp: Document irq enable/disable after migrating IRQs powerpc/mpc52xx: Don't select user-visible RTAS_PROC powerpc/powernv: Document cxl dependency on special case in pnv_eeh_reset() powerpc/eeh: Clean up and document event handling functions powerpc/eeh: Avoid use after free in eeh_handle_special_event() cxl: Mask slice error interrupts after first occurrence cxl: Route eeh events to all drivers in cxl_pci_error_detected() cxl: Force context lock during EEH flow powerpc/64: Allow CONFIG_RELOCATABLE if COMPILE_TEST powerpc/xmon: Teach xmon oops about radix vectors powerpc/mm/hash: Fix off-by-one in comment about kernel contexts ids powerpc/pseries: Enable VFIO powerpc/powernv: Fix iommu table size calculation hook for small tables powerpc/powernv: Check kzalloc() return value in pnv_pci_table_alloc powerpc: Add arch/powerpc/tools directory ...
2017-05-05powerpc/64e: Don't place the stack beyond TASK_SIZEScott Wood
Commit f4ea6dcb08ea ("powerpc/mm: Enable mappings above 128TB") increased the task size on book3s, and introduced a mechanism to dynamically control whether a task uses these larger addresses. While the change to the task size itself was ifdef-protected to only apply on book3s, the change to STACK_TOP_USER64 was not. On book3e, this had the effect of trying to use addresses up to 128TiB for the stack despite a 64TiB task size limit -- which broke 64-bit userspace producing the following errors: Starting init: /sbin/init exists but couldn't execute it (error -14) Starting init: /bin/sh exists but couldn't execute it (error -14) Kernel panic - not syncing: No working init found. Try passing init= option to kernel. See Linux Documentation/admin-guide/init.rst for guidance. Fixes: f4ea6dcb08ea ("powerpc/mm: Enable mappings above 128TB") Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Scott Wood <oss@buserror.net>
2017-05-02powerpc/8xx: Adding support of IRQ in MPC8xx GPIOChristophe Leroy
This patch allows the use of IRQ to notify the change of GPIO status on MPC8xx CPM IO ports. This then allows to associate IRQs to GPIOs in the Device Tree. Ex: CPM1_PIO_C: gpio-controller@960 { #gpio-cells = <2>; compatible = "fsl,cpm1-pario-bank-c"; reg = <0x960 0x10>; fsl,cpm1-gpio-irq-mask = <0x0fff>; interrupts = <1 2 6 9 10 11 14 15 23 24 26 31>; interrupt-parent = <&CPM_PIC>; gpio-controller; }; The property 'fsl,cpm1-gpio-irq-mask' defines which of the 16 GPIOs have the associated interrupts defined in the 'interrupts' property. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
2017-05-02Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching Pull livepatch updates from Jiri Kosina: - a per-task consistency model is being added for architectures that support reliable stack dumping (extending this, currently rather trivial set, is currently in the works). This extends the nature of the types of patches that can be applied by live patching infrastructure. The code stems from the design proposal made [1] back in November 2014. It's a hybrid of SUSE's kGraft and RH's kpatch, combining advantages of both: it uses kGraft's per-task consistency and syscall barrier switching combined with kpatch's stack trace switching. There are also a number of fallback options which make it quite flexible. Most of the heavy lifting done by Josh Poimboeuf with help from Miroslav Benes and Petr Mladek [1] https://lkml.kernel.org/r/20141107140458.GA21774@suse.cz - module load time patch optimization from Zhou Chengming - a few assorted small fixes * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching: livepatch: add missing printk newlines livepatch: Cancel transition a safe way for immediate patches livepatch: Reduce the time of finding module symbols livepatch: make klp_mutex proper part of API livepatch: allow removal of a disabled patch livepatch: add /proc/<pid>/patch_state livepatch: change to a per-task consistency model livepatch: store function sizes livepatch: use kstrtobool() in enabled_store() livepatch: move patching functions into patch.c livepatch: remove unnecessary object loaded check livepatch: separate enabled and patched states livepatch/s390: add TIF_PATCH_PENDING thread flag livepatch/s390: reorganize TIF thread flag bits livepatch/powerpc: add TIF_PATCH_PENDING thread flag livepatch/x86: add TIF_PATCH_PENDING thread flag livepatch: create temporary klp_update_patch_state() stub x86/entry: define _TIF_ALLWORK_MASK flags explicitly stacktrace/x86: add function for detecting reliable stack traces
2017-05-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Millar: "Here are some highlights from the 2065 networking commits that happened this development cycle: 1) XDP support for IXGBE (John Fastabend) and thunderx (Sunil Kowuri) 2) Add a generic XDP driver, so that anyone can test XDP even if they lack a networking device whose driver has explicit XDP support (me). 3) Sparc64 now has an eBPF JIT too (me) 4) Add a BPF program testing framework via BPF_PROG_TEST_RUN (Alexei Starovoitov) 5) Make netfitler network namespace teardown less expensive (Florian Westphal) 6) Add symmetric hashing support to nft_hash (Laura Garcia Liebana) 7) Implement NAPI and GRO in netvsc driver (Stephen Hemminger) 8) Support TC flower offload statistics in mlxsw (Arkadi Sharshevsky) 9) Multiqueue support in stmmac driver (Joao Pinto) 10) Remove TCP timewait recycling, it never really could possibly work well in the real world and timestamp randomization really zaps any hint of usability this feature had (Soheil Hassas Yeganeh) 11) Support level3 vs level4 ECMP route hashing in ipv4 (Nikolay Aleksandrov) 12) Add socket busy poll support to epoll (Sridhar Samudrala) 13) Netlink extended ACK support (Johannes Berg, Pablo Neira Ayuso, and several others) 14) IPSEC hw offload infrastructure (Steffen Klassert)" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2065 commits) tipc: refactor function tipc_sk_recv_stream() tipc: refactor function tipc_sk_recvmsg() net: thunderx: Optimize page recycling for XDP net: thunderx: Support for XDP header adjustment net: thunderx: Add support for XDP_TX net: thunderx: Add support for XDP_DROP net: thunderx: Add basic XDP support net: thunderx: Cleanup receive buffer allocation net: thunderx: Optimize CQE_TX handling net: thunderx: Optimize RBDR descriptor handling net: thunderx: Support for page recycling ipx: call ipxitf_put() in ioctl error path net: sched: add helpers to handle extended actions qed*: Fix issues in the ptp filter config implementation. qede: Fix concurrency issue in PTP Tx path processing. stmmac: Add support for SIMATIC IOT2000 platform net: hns: fix ethtool_get_strings overflow in hns driver tcp: fix wraparound issue in tcp_lp bpf, arm64: fix jit branch offset related to ldimm64 bpf, arm64: implement jiting of BPF_XADD ...
2017-05-01Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Ingo Molnar: "The main x86 MM changes in this cycle were: - continued native kernel PCID support preparation patches to the TLB flushing code (Andy Lutomirski) - various fixes related to 32-bit compat syscall returning address over 4Gb in applications, launched from 64-bit binaries - motivated by C/R frameworks such as Virtuozzo. (Dmitry Safonov) - continued Intel 5-level paging enablement: in particular the conversion of x86 GUP to the generic GUP code. (Kirill A. Shutemov) - x86/mpx ABI corner case fixes/enhancements (Joerg Roedel) - ... plus misc updates, fixes and cleanups" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (62 commits) mm, zone_device: Replace {get, put}_zone_device_page() with a single reference to fix pmem crash x86/mm: Fix flush_tlb_page() on Xen x86/mm: Make flush_tlb_mm_range() more predictable x86/mm: Remove flush_tlb() and flush_tlb_current_task() x86/vm86/32: Switch to flush_tlb_mm_range() in mark_screen_rdonly() x86/mm/64: Fix crash in remove_pagetable() Revert "x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation" x86/boot/e820: Remove a redundant self assignment x86/mm: Fix dump pagetables for 4 levels of page tables x86/mpx, selftests: Only check bounds-vs-shadow when we keep shadow x86/mpx: Correctly report do_mpx_bt_fault() failures to user-space Revert "x86/mm/numa: Remove numa_nodemask_from_meminfo()" x86/espfix: Add support for 5-level paging x86/kasan: Extend KASAN to support 5-level paging x86/mm: Add basic defines/helpers for CONFIG_X86_5LEVEL=y x86/paravirt: Add 5-level support to the paravirt code x86/mm: Define virtual memory map for 5-level paging x86/asm: Remove __VIRTUAL_MASK_SHIFT==47 assert x86/boot: Detect 5-level paging support x86/mm/numa: Remove numa_nodemask_from_meminfo() ...
2017-05-01Merge branch 'x86-asm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm updates from Ingo Molnar: "The main changes in this cycle were: - unwinder fixes and enhancements - improve ftrace interaction with the unwinder - optimize the code footprint of WARN() and related debugging constructs - ... plus misc updates, cleanups and fixes" * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) x86/unwind: Dump all stacks in unwind_dump() x86/unwind: Silence more entry-code related warnings x86/ftrace: Fix ebp in ftrace_regs_caller that screws up unwinder x86/unwind: Remove unused 'sp' parameter in unwind_dump() x86/unwind: Prepend hex mask value with '0x' in unwind_dump() x86/unwind: Properly zero-pad 32-bit values in unwind_dump() x86/unwind: Ensure stack pointer is aligned debug: Avoid setting BUGFLAG_WARNING twice x86/unwind: Silence entry-related warnings x86/unwind: Read stack return address in update_stack_state() x86/unwind: Move common code into update_stack_state() debug: Fix __bug_table[] in arch linker scripts debug: Add _ONCE() logic to report_bug() x86/debug: Define BUG() again for !CONFIG_BUG x86/debug: Implement __WARN() using UD0 x86/ftrace: Use Makefile logic instead of #ifdef for compiling ftrace_*.o x86/ftrace: Add -mfentry support to x86_32 with DYNAMIC_FTRACE set x86/ftrace: Clean up ftrace_regs_caller x86/ftrace: Add stack frame pointer to ftrace_caller x86/ftrace: Move the ftrace specific code out of entry_32.S ...
2017-05-01Merge branch 'work.uaccess' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull uaccess unification updates from Al Viro: "This is the uaccess unification pile. It's _not_ the end of uaccess work, but the next batch of that will go into the next cycle. This one mostly takes copy_from_user() and friends out of arch/* and gets the zero-padding behaviour in sync for all architectures. Dealing with the nocache/writethrough mess is for the next cycle; fortunately, that's x86-only. Same for cleanups in iov_iter.c (I am sold on access_ok() in there, BTW; just not in this pile), same for reducing __copy_... callsites, strn*... stuff, etc. - there will be a pile about as large as this one in the next merge window. This one sat in -next for weeks. -3KLoC" * 'work.uaccess' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (96 commits) HAVE_ARCH_HARDENED_USERCOPY is unconditional now CONFIG_ARCH_HAS_RAW_COPY_USER is unconditional now m32r: switch to RAW_COPY_USER hexagon: switch to RAW_COPY_USER microblaze: switch to RAW_COPY_USER get rid of padding, switch to RAW_COPY_USER ia64: get rid of copy_in_user() ia64: sanitize __access_ok() ia64: get rid of 'segment' argument of __do_{get,put}_user() ia64: get rid of 'segment' argument of __{get,put}_user_check() ia64: add extable.h powerpc: get rid of zeroing, switch to RAW_COPY_USER esas2r: don't open-code memdup_user() alpha: fix stack smashing in old_adjtimex(2) don't open-code kernel_setsockopt() mips: switch to RAW_COPY_USER mips: get rid of tail-zeroing in primitives mips: make copy_from_user() zero tail explicitly mips: clean and reorder the forest of macros... mips: consolidate __invoke_... wrappers ...
2017-04-28Merge branch 'pci/resource-mmap' into nextBjorn Helgaas
* pci/resource-mmap: ia64: Use generic pci_mmap_resource_range() ia64: Remove redundant checks for WC in pci_mmap_page_range() ia64: Remove redundant valid_mmap_phys_addr_range() from pci_mmap_page_range() PCI: Add I/O BAR support to generic pci_mmap_resource_range() x86/PCI: Use generic pci_mmap_resource_range() unicore32/PCI: Use generic pci_mmap_resource_range() sh/PCI: Use generic pci_mmap_resource_range() parisc: Use generic pci_mmap_resource_range() mn10300/PCI: Use generic pci_mmap_resource_range() MIPS: PCI: Use generic pci_mmap_resource_range() cris/PCI: Use generic pci_mmap_resource_range() ARM/PCI: Use generic pci_mmap_resource_range() PCI: Add pci_mmap_resource_range() and use it for ARM64 PCI: Add BAR index argument to pci_mmap_page_range() PCI: Use BAR index in sysfs attr->private instead of resource pointer PCI: Add arch_can_pci_mmap_io() on architectures which can mmap() I/O space PCI: Move multiple declarations of pci_mmap_page_range() to <linux/pci.h> PCI: Add arch_can_pci_mmap_wc() macro xtensa/PCI: Do not mmap PCI BARs to userspace as write-through PCI: Only allow WC mmap on prefetchable resources PCI: Fix another sanity check bug in /proc/pci mmap PCI: Fix pci_mmap_fits() for HAVE_PCI_RESOURCE_TO_USER platforms
2017-04-28powerpc/mm/hash: Fix off-by-one in comment about kernel contexts idsMichael Ellerman
Michal Suchánek noticed a comment in book3s/64/mmu-hash.h about the context ids we use for the kernel was inconsistent with the code and other comments in the same file. It should read 1-4 not 1-5. While we're touching it, update "address" to "addresses" which makes more sense as it's referring to more than one address below. Reported-by: Michal Suchánek <msuchanek@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-28powerpc: Add struct smp_ops_t.cause_nmi_ipi operationNicholas Piggin
Have the NMI IPI code use this op when the platform defines it. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-28powerpc: Add NMI IPI infrastructureNicholas Piggin
Add a simple NMI IPI system that handles concurrency and reentrancy. The platform does not have to implement a true non-maskable interrupt, the default is to simply use the debugger break IPI message. This has now been co-opted for a general IPI message, and users (debugger and crash) have been reimplemented on top of the NMI system. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Incorporate incremental fixes from Nick] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-28powerpc/64s: Dedicated system reset interrupt stackNicholas Piggin
The system reset interrupt is used for crash/debug situations, so it is desirable to have as little impact on the normal state of the system as possible. Currently it uses the current kernel stack to process the exception. This stores into the stack which may be involved with the crash. The stack pointer may be corrupted, or it may have overflowed. Avoid or minimise these problems by creating a dedicated NMI stack for the system reset interrupt to use. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-28powerpc/64s: Disallow system reset vs system reset reentrancyNicholas Piggin
In preparation for using a dedicated stack for system reset interrupts, prevent a nested system reset from recovering, in order to simplify code that is called in crash/debug path. This allows a system reset interrupt to just use the base stack pointer. Keep an in_nmi nesting counter similarly to the in_mce counter. Consider the interrrupt non-recoverable if it is taken inside another system reset. Interrupt nesting could be allowed similarly to MCE, but system reset is a special case that's not for normal operation, so simplicity wins until there is requirement for nested system reset interrupts. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-28powerpc/64s: Fix system reset vs general interrupt reentrancyNicholas Piggin
The system reset interrupt can occur when MSR_EE=0, and it currently uses the PACA_EXGEN save area. Some PACA_EXGEN interrupts have a window where MSR_RI=1 and MSR_EE=0 when the save area is still in use. A system reset interrupt in this window can lead to undetected corruption when the save area gets overwritten. This patch introduces PACA_EXNMI save area for system reset exceptions, which closes this corruption window. It's also helpful to retain the EXGEN state for debugging situations, even if not considering the recoverability aspect. This patch also moves the PACA_EXMC area down to a less frequently used part of the paca with the new save area. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-28powerpc/64s: Exception macro for stack frame and initial register saveNicholas Piggin
This code is common to a few exceptions, and another user will be added. This causes a trivial change to generated code: - 604: std r9,416(r1) - 608: mfspr r11,314 - 60c: std r11,368(r1) - 610: mfspr r12,315 + 604: mfspr r11,314 + 608: mfspr r12,315 + 60c: std r9,416(r1) + 610: std r11,368(r1) machine_check_powernv_early could also use this, but that requires non trivial changes to generated code, so that's for another patch. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-28powerpc/64s: Add exception macro that does not enable RINicholas Piggin
Subsequent patches will add more non-RI variant exceptions, so create a macro for it rather than open-code it. This does not change generated instructions. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-28Merge branch 'topic/ppc-kvm' into nextMichael Ellerman
Merge the topic branch we were sharing with kvm-ppc, Paul has also merged it.
2017-04-28Merge remote-tracking branch 'remotes/powerpc/topic/xive' into kvm-ppc-nextPaul Mackerras
This merges in the powerpc topic/xive branch to bring in the code for the in-kernel XICS interrupt controller emulation to use the new XIVE (eXternal Interrupt Virtualization Engine) hardware in the POWER9 chip directly, rather than via a XICS emulation in firmware. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-04-27powerpc/mm: Fix missing page attributes in page table dumpChristophe Leroy
On some targets, _PAGE_RW is 0 and this is _PAGE_RO which is used. There is also _PAGE_SHARED that is missing. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-27KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controllerBenjamin Herrenschmidt
This patch makes KVM capable of using the XIVE interrupt controller to provide the standard PAPR "XICS" style hypercalls. It is necessary for proper operations when the host uses XIVE natively. This has been lightly tested on an actual system, including PCI pass-through with a TG3 device. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [mpe: Cleanup pr_xxx(), unsplit pr_xxx() strings, etc., fix build failures by adding KVM_XIVE which depends on KVM_XICS and XIVE, and adding empty stubs for the kvm_xive_xxx() routines, fixup subject, integrate fixes from Paul for building PR=y HV=n] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-26Merge branches 'uaccess.alpha', 'uaccess.arc', 'uaccess.arm', ↵Al Viro
'uaccess.arm64', 'uaccess.avr32', 'uaccess.bfin', 'uaccess.c6x', 'uaccess.cris', 'uaccess.frv', 'uaccess.h8300', 'uaccess.hexagon', 'uaccess.ia64', 'uaccess.m32r', 'uaccess.m68k', 'uaccess.metag', 'uaccess.microblaze', 'uaccess.mips', 'uaccess.mn10300', 'uaccess.nios2', 'uaccess.openrisc', 'uaccess.parisc', 'uaccess.powerpc', 'uaccess.s390', 'uaccess.score', 'uaccess.sh', 'uaccess.sparc', 'uaccess.tile', 'uaccess.um', 'uaccess.unicore32', 'uaccess.x86' and 'uaccess.xtensa' into work.uaccess
2017-04-25powerpc/mm: Ensure IRQs are off in switch_mm()David Gibson
powerpc expects IRQs to already be (soft) disabled when switch_mm() is called, as made clear in the commit message of 9c1e105238c4 ("powerpc: Allow perf_counters to access user memory at interrupt time"). Aside from any race conditions that might exist between switch_mm() and an IRQ, there is also an unconditional hard_irq_disable() in switch_slb(). If that isn't followed at some point by an IRQ enable then interrupts will remain disabled until we return to userspace. It is true that when switch_mm() is called from the scheduler IRQs are off, but not when it's called by use_mm(). Looking closer we see that last year in commit f98db6013c55 ("sched/core: Add switch_mm_irqs_off() and use it in the scheduler") this was made more explicit by the addition of switch_mm_irqs_off() which is now called by the scheduler, vs switch_mm() which is used by use_mm(). Arguably it is a bug in use_mm() to call switch_mm() in a different context than it expects, but fixing that will take time. This was discovered recently when vhost started throwing warnings such as: BUG: sleeping function called from invalid context at kernel/mutex.c:578 in_atomic(): 0, irqs_disabled(): 1, pid: 10768, name: vhost-10760 no locks held by vhost-10760/10768. irq event stamp: 10 hardirqs last enabled at (9): _raw_spin_unlock_irq+0x40/0x80 hardirqs last disabled at (10): switch_slb+0x2e4/0x490 softirqs last enabled at (0): copy_process+0x5e8/0x1260 softirqs last disabled at (0): (null) Call Trace: show_stack+0x88/0x390 (unreliable) dump_stack+0x30/0x44 __might_sleep+0x1c4/0x2d0 mutex_lock_nested+0x74/0x5c0 cgroup_attach_task_all+0x5c/0x180 vhost_attach_cgroups_work+0x58/0x80 [vhost] vhost_worker+0x24c/0x3d0 [vhost] kthread+0xec/0x100 ret_from_kernel_thread+0x5c/0xd4 Prior to commit 04b96e5528ca ("vhost: lockless enqueuing") (Aug 2016) the vhost_worker() would do a spin_unlock_irq() not long after calling use_mm(), which had the effect of reenabling IRQs. Since that commit removed the locking in vhost_worker() the body of the vhost_worker() loop now runs with interrupts off causing the warnings. This patch addresses the problem by making the powerpc code mirror the x86 code, ie. we disable interrupts in switch_mm(), and optimise the scheduler case by defining switch_mm_irqs_off(). Cc: stable@vger.kernel.org # v4.7+ Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [mpe: Flesh out/rewrite change log, add stable] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-25Merge branch 'topic/kprobes' into nextMichael Ellerman
Although most of these kprobes patches are powerpc specific, the