summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)Author
2019-10-22KVM: SVM: Guard against DEACTIVATE when performing WBINVD/DF_FLUSHTom Lendacky
The SEV firmware DEACTIVATE command disassociates an SEV guest from an ASID, clears the WBINVD indicator on all threads and indicates that the SEV firmware DF_FLUSH command must be issued before the ASID can be re-used. The SEV firmware DF_FLUSH command will return an error if a WBINVD has not been performed on every thread before it has been invoked. A window exists between the WBINVD and the invocation of the DF_FLUSH command where an SEV firmware DEACTIVATE command could be invoked on another thread, clearing the WBINVD indicator. This will cause the subsequent SEV firmware DF_FLUSH command to fail which, in turn, results in the SEV firmware ACTIVATE command failing for the reclaimed ASID. This results in the SEV guest failing to start. Use a mutex to close the WBINVD/DF_FLUSH window by obtaining the mutex before the DEACTIVATE and releasing it after the DF_FLUSH. This ensures that any DEACTIVATE cannot run before a DF_FLUSH has completed. Fixes: 59414c989220 ("KVM: SVM: Add support for KVM_SEV_LAUNCH_START command") Tested-by: David Rientjes <rientjes@google.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: SVM: Serialize access to the SEV ASID bitmapTom Lendacky
The SEV ASID bitmap currently is not protected against parallel SEV guest startups. This can result in an SEV guest failing to start because another SEV guest could have been assigned the same ASID value. Use a mutex to serialize access to the SEV ASID bitmap. Fixes: 1654efcbc431 ("KVM: SVM: Add KVM_SEV_INIT command") Tested-by: David Rientjes <rientjes@google.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22Merge tag 'kvm-ppc-fixes-5.4-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD PPC KVM fix for 5.4 - Fix a bug in the XIVE code which can cause a host crash.
2019-10-22Merge tag 'kvmarm-fixes-5.4-2' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm fixes for 5.4, take #2 Special PMU edition: - Fix cycle counter truncation - Fix cycle counter overflow limit on pure 64bit system - Allow chained events to be actually functional - Correct sample period after overflow
2019-10-22kvm: clear kvmclock MSR on resetPaolo Bonzini
After resetting the vCPU, the kvmclock MSR keeps the previous value but it is not enabled. This can be confusing, so fix it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: x86: fix bugon.cocci warningskbuild test robot
Use BUG_ON instead of a if condition followed by BUG. Generated by: scripts/coccinelle/misc/bugon.cocci Fixes: 4b526de50e39 ("KVM: x86: Check kvm_rebooting in kvm_spurious_fault()") CC: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: kbuild test robot <lkp@intel.com> Signed-off-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: VMX: Remove specialized handling of unexpected exit-reasonsLiran Alon
Commit bf653b78f960 ("KVM: vmx: Introduce handle_unexpected_vmexit and handle WAITPKG vmexit") introduced specialized handling of specific exit-reasons that should not be raised by CPU because KVM configures VMCS such that they should never be raised. However, since commit 7396d337cfad ("KVM: x86: Return to userspace with internal error on unexpected exit reason"), VMX & SVM exit handlers were modified to generically handle all unexpected exit-reasons by returning to userspace with internal error. Therefore, there is no need for specialized handling of specific unexpected exit-reasons (This specialized handling also introduced inconsistency for these exit-reasons to silently skip guest instruction instead of return to userspace on internal-error). Fixes: bf653b78f960 ("KVM: vmx: Introduce handle_unexpected_vmexit and handle WAITPKG vmexit") Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22selftests: kvm: fix sync_regs_test with newer gccsVitaly Kuznetsov
Commit 204c91eff798a ("KVM: selftests: do not blindly clobber registers in guest asm") was intended to make test more gcc-proof, however, the result is exactly the opposite: on newer gccs (e.g. 8.2.1) the test breaks with ==== Test Assertion Failure ==== x86_64/sync_regs_test.c:168: run->s.regs.regs.rbx == 0xBAD1DEA + 1 pid=14170 tid=14170 - Invalid argument 1 0x00000000004015b3: main at sync_regs_test.c:166 (discriminator 6) 2 0x00007f413fb66412: ?? ??:0 3 0x000000000040191d: _start at ??:? rbx sync regs value incorrect 0x1. Apparently, compile is still free to play games with registers even when they have variables attached. Re-write guest code with 'asm volatile' by embedding ucall there and making sure rbx is preserved. Fixes: 204c91eff798a ("KVM: selftests: do not blindly clobber registers in guest asm") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22selftests: kvm: vmx_dirty_log_test: skip the test when VMX is not supportedVitaly Kuznetsov
vmx_dirty_log_test fails on AMD and this is no surprise as it is VMX specific. Bail early when nested VMX is unsupported. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22selftests: kvm: consolidate VMX support checksVitaly Kuznetsov
vmx_* tests require VMX and three of them implement the same check. Move it to vmx library. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22selftests: kvm: vmx_set_nested_state_test: don't check for VMX support twiceVitaly Kuznetsov
vmx_set_nested_state_test() checks if VMX is supported twice: in the very beginning (and skips the whole test if it's not) and before doing test_vmx_nested_state(). One should be enough. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: Don't shrink/grow vCPU halt_poll_ns if host side polling is disabledWanpeng Li
Don't waste cycles to shrink/grow vCPU halt_poll_ns if host side polling is disabled. Acked-by: Marcelo Tosatti <mtosatti@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22selftests: kvm: synchronize .gitignore to MakefileVitaly Kuznetsov
Because "Untracked files:" are annoying. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22kvm: x86: Expose RDPID in KVM_GET_SUPPORTED_CPUIDJim Mattson
When the RDPID instruction is supported on the host, enumerate it in KVM_GET_SUPPORTED_CPUID. Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22Merge tag 'pinctrl-v5.4-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fixes from Linus Walleij: "Here is a bunch of pin control fixes. I was lagging behind on this one, some fixes should have come in earlier, sorry about that. Anyways here it is, pretty straight-forward fixes, the Strago fix stand out as something serious affecting a lot of machines. Summary: - Handle multiple instances of Intel chips without complaining. - Restore the Intel Strago DMI workaround - Make the Armada 37xx handle pins over 32 - Fix the polarity of the LED group on Armada 37xx - Fix an off-by-one bug in the NS2 driver - Fix error path for iproc's platform_get_irq() - Fix error path on the STMFX driver - Fix a typo in the Berlin AS370 driver - Fix up misc errors in the Aspeed 2600 BMC support - Fix a stray SPDX tag" * tag 'pinctrl-v5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: aspeed-g6: Rename SD3 to EMMC and rework pin groups pinctrl: aspeed-g6: Fix UART13 group pinmux pinctrl: aspeed-g6: Make SIG_DESC_CLEAR() behave intuitively pinctrl: aspeed-g6: Fix I3C3/I3C4 pinmux configuration pinctrl: aspeed-g6: Fix I2C14 SDA description pinctrl: aspeed-g6: Sort pins for sanity dt-bindings: pinctrl: aspeed-g6: Rework SD3 function and groups pinctrl: berlin: as370: fix a typo s/spififib/spdifib pinctrl: armada-37xx: swap polarity on LED group pinctrl: stmfx: fix null pointer on remove pinctrl: iproc: allow for error from platform_get_irq() pinctrl: ns2: Fix off by one bugs in ns2_pinmux_enable() pinctrl: bcm-iproc: Use SPDX header pinctrl: armada-37xx: fix control of pins 32 and up pinctrl: cherryview: restore Strago DMI workaround for all versions pinctrl: intel: Allocate IRQ chip dynamic
2019-10-22KVM: PPC: Book3S HV: Reject mflags=2 (LPCR[AIL]=2) ADDR_TRANS_MODE modeNicholas Piggin
AIL=2 mode has no known users, so is not well tested or supported. Disallow guests from selecting this mode because it may become deprecated in future versions of the architecture. This policy decision is not left to QEMU because KVM support is required for AIL=2 (when injecting interrupts). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-10-22KVM: PPC: Book3S HV: Implement LPCR[AIL]=3 mode for injected interruptsNicholas Piggin
kvmppc_inject_interrupt does not implement LPCR[AIL]!=0 modes, which can result in the guest receiving interrupts as if LPCR[AIL]=0 contrary to the ISA. In practice, Linux guests cope with this deviation, but it should be fixed. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-10-22KVM: PPC: Book3S HV: Reuse kvmppc_inject_interrupt for async guest deliveryNicholas Piggin
This consolidates the HV interrupt delivery logic into one place. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-10-22KVM: PPC: Book3S: Replace reset_msr mmu op with inject_interrupt arch opNicholas Piggin
reset_msr sets the MSR for interrupt injection, but it's cleaner and more flexible to provide a single op to set both MSR and PC for the interrupt. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-10-22KVM: PPC: Book3S: Define and use SRR1_MSR_BITSNicholas Piggin
Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-10-22KVM: PPC: Book3S HV: XIVE: Allow userspace to set the # of VPsGreg Kurz
Add a new attribute to both XIVE and XICS-on-XIVE KVM devices so that userspace can tell how many interrupt servers it needs. If a VM needs less than the current default of KVM_MAX_VCPUS (2048), we can allocate less VPs in OPAL. Combined with a core stride (VSMT) that matches the number of guest threads per core, this may substantially increases the number of VMs that can run concurrently with an in-kernel XIVE device. Since the legacy XIVE KVM device is exposed to userspace through the XICS KVM API, a new attribute group is added to it for this purpose. While here, fix the syntax of the existing KVM_DEV_XICS_GRP_SOURCES in the XICS documentation. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-10-22KVM: PPC: Book3S HV: XIVE: Make VP block size configurableGreg Kurz
The XIVE VP is an internal structure which allow the XIVE interrupt controller to maintain the interrupt context state of vCPUs non dispatched on HW threads. When a guest is started, the XIVE KVM device allocates a block of XIVE VPs in OPAL, enough to accommodate the highest possible vCPU id KVM_MAX_VCPU_ID (16384) packed down to KVM_MAX_VCPUS (2048). With a guest's core stride of 8 and a threading mode of 1 (QEMU's default), a VM must run at least 256 vCPUs to actually need such a range of VPs. A POWER9 system has a limited XIVE VP space : 512k and KVM is currently wasting this HW resource with large VP allocations, especially since a typical VM likely runs with a lot less vCPUs. Make the size of the VP block configurable. Add an nr_servers field to the XIVE structure and a function to set it for this purpose. Split VP allocation out of the device create function. Since the VP block isn't used before the first vCPU connects to the XIVE KVM device, allocation is now performed by kvmppc_xive_connect_vcpu(). This gives the opportunity to set nr_servers in between: kvmppc_xive_create() / kvmppc_xive_native_create() . . kvmppc_xive_set_nr_servers() . . kvmppc_xive_connect_vcpu() / kvmppc_xive_native_connect_vcpu() The connect_vcpu() functions check that the vCPU id is below nr_servers and if it is the first vCPU they allocate the VP block. This is protected against a concurrent update of nr_servers by kvmppc_xive_set_nr_servers() with the xive->lock mutex. Also, the block is allocated once for the device lifetime: nr_servers should stay constant otherwise connect_vcpu() could generate a boggus VP id and likely crash OPAL. It is thus forbidden to update nr_servers once the block is allocated. If the VP allocation fail, return ENOSPC which seems more appropriate to report the depletion of system wide HW resource than ENOMEM or ENXIO. A VM using a stride of 8 and 1 thread per core with 32 vCPUs would hence only need 256 VPs instead of 2048. If the stride is set to match the number of threads per core, this goes further down to 32. This will be exposed to userspace by a subsequent patch. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-10-22KVM: PPC: Book3S HV: XIVE: Compute the VP id in a common helperGreg Kurz
Reduce code duplication by consolidating the checking of vCPU ids and VP ids to a common helper used by both legacy and native XIVE KVM devices. And explain the magic with a comment. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-10-22KVM: PPC: Book3S HV: XIVE: Show VP id in debugfsGreg Kurz
Print out the VP id of each connected vCPU, this allow to see: - the VP block base in which OPAL encodes information that may be useful when debugging - the packed vCPU id which may differ from the raw vCPU id if the latter is >= KVM_MAX_VCPUS (2048) Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-10-22KVM: PPC: Book3S HV: XIVE: Set kvm->arch.xive when VPs are allocatedGreg Kurz
If we cannot allocate the XIVE VPs in OPAL, the creation of a XIVE or XICS-on-XIVE device is aborted as expected, but we leave kvm->arch.xive set forever since the release method isn't called in this case. Any subsequent tentative to create a XIVE or XICS-on-XIVE for this VM will thus always fail (DoS). This is a problem for QEMU since it destroys and re-creates these devices when the VM is reset: the VM would be restricted to using the much slower emulated XIVE or XICS forever. As an alternative to adding rollback, do not assign kvm->arch.xive before making sure the XIVE VPs are allocated in OPAL. Cc: stable@vger.kernel.org # v5.2 Fixes: 5422e95103cf ("KVM: PPC: Book3S HV: XIVE: Replace the 'destroy' method by a 'release' method") Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-10-22KVM: PPC: E500: Replace current->mm by kvm->mmLeonardo Bras
Given that in kvm_create_vm() there is: kvm->mm = current->mm; And that on every kvm_*_ioctl we have: if (kvm->mm != current->mm) return -EIO; I see no reason to keep using current->mm instead of kvm->mm. By doing so, we would reduce the use of 'global' variables on code, relying more in the contents of kvm struct. Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-10-22KVM: PPC: Reduce calls to get current->mm by storing the value locallyLeonardo Bras
Reduces the number of calls to get_current() in order to get the value of current->mm by doing it once and storing the value, since it is not supposed to change inside the same process). Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-10-21KVM: PPC: Report single stepping capabilityFabiano Rosas
When calling the KVM_SET_GUEST_DEBUG ioctl, userspace might request the next instruction to be single stepped via the KVM_GUESTDBG_SINGLESTEP control bit of the kvm_guest_debug structure. This patch adds the KVM_CAP_PPC_GUEST_DEBUG_SSTEP capability in order to inform userspace about the state of single stepping support. We currently don't have support for guest single stepping implemented in Book3S HV so the capability is only present for Book3S PR and BookE. Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-10-20Linux 5.4-rc4Linus Torvalds
2019-10-20Merge tag 'kbuild-fixes-v5.4-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull more Kbuild fixes from Masahiro Yamada: - fix a bashism of setlocalversion - do not use the too new --sort option of tar * tag 'kbuild-fixes-v5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kheaders: substituting --sort in archive creation scripts: setlocalversion: fix a bashism kbuild: update comment about KBUILD_ALLDIRS
2019-10-20Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A small set of x86 fixes: - Prevent a NULL pointer dereference in the X2APIC code in case of a CPU hotplug failure. - Prevent boot failures on HP superdome machines by invalidating the level2 kernel pagetable entries outside of the kernel area as invalid so BIOS reserved space won't be touched unintentionally. Also ensure that memory holes are rounded up to the next PMD boundary correctly. - Enable X2APIC support on Hyper-V to prevent boot failures. - Set the paravirt name when running on Hyper-V for consistency - Move a function under the appropriate ifdef guard to prevent build warnings" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot/acpi: Move get_cmdline_acpi_rsdp() under #ifdef guard x86/hyperv: Set pv_info.name to "Hyper-V" x86/apic/x2apic: Fix a NULL pointer deref when handling a dying cpu x86/hyperv: Make vapic support x2apic mode x86/boot/64: Round memory hole size up to next PMD page x86/boot/64: Make level2_kernel_pgt pages invalid outside kernel area
2019-10-20Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: "A small set of irq chip driver fixes and updates: - Update the SIFIVE PLIC interrupt driver to use the fasteoi handler to address the shortcomings of the existing flow handling which was prone to lose interrupts - Use the proper limit for GIC interrupt line numbers - Add retrigger support for the recently merged Anapurna Labs Fabric interrupt controller to make it complete - Enable the ATMEL AIC5 interrupt controller driver on the new SAM9X60 SoC" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/sifive-plic: Switch to fasteoi flow irqchip/gic-v3: Fix GIC_LINE_NR accessor irqchip/atmel-aic5: Add support for sam9x60 irqchip irqchip/al-fic: Add support for irq retrigger
2019-10-20Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull hrtimer fixlet from Thomas Gleixner: "A single commit annotating the lockcless access to timer->base with READ_ONCE() and adding the WRITE_ONCE() counterparts for completeness" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: hrtimer: Annotate lockless access to timer->base
2019-10-20Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull stop-machine fix from Thomas Gleixner: "A single fix, amending stop machine with WRITE/READ_ONCE() to address the fallout of KCSAN" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: stop_machine: Avoid potential race behaviour
2019-10-20KVM: arm64: pmu: Reset sample period on overflow handlingMarc Zyngier
The PMU emulation code uses the perf event sample period to trigger the overflow detection. This works fine for the *first* overflow handling, but results in a huge number of interrupts on the host, unrelated to the number of interrupts handled in the guest (a x20 factor is pretty common for the cycle counter). On a slow system (such as a SW model), this can result in the guest only making forward progress at a glacial pace. It turns out that the clue is in the name. The sample period is exactly that: a period. And once the an overflow has occured, the following period should be the full width of the associated counter, instead of whatever the guest had initially programed. Reset the sample period to the architected value in the overflow handler, which now results in a number of host interrupts that is much closer to the number of interrupts in the guest. Fixes: b02386eb7dac ("arm64: KVM: Add PMU overflow interrupt routing") Reviewed-by: Andrew Murray <andrew.murray@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-10-20KVM: arm64: pmu: Set the CHAINED attribute before creating the in-kernel eventMarc Zyngier
The current convention for KVM to request a chained event from the host PMU is to set bit[0] in attr.config1 (PERF_ATTR_CFG1_KVM_PMU_CHAINED). But as it turns out, this bit gets set *after* we create the kernel event that backs our virtual counter, meaning that we never get a 64bit counter. Moving the setting to an earlier point solves the problem. Fixes: 80f393a23be6 ("KVM: arm/arm64: Support chained PMU counters") Reviewed-by: Andrew Murray <andrew.murray@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-10-20arm64: KVM: Handle PMCR_EL0.LC as RES1 on pure AArch64 systemsMarc Zyngier
Of PMCR_EL0.LC, the ARMv8 ARM says: "In an AArch64 only implementation, this field is RES 1." So be it. Fixes: ab9468340d2bc ("arm64: KVM: Add access handler for PMCR register") Reviewed-by: Andrew Murray <andrew.murray@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-10-20KVM: arm64: pmu: Fix cycle counter truncationMarc Zyngier
When a counter is disabled, its value is sampled before the event is being disabled, and the value written back in the shadow register. In that process, the value gets truncated to 32bit, which is adequate for any counter but the cycle counter (defined as a 64bit counter). This obviously results in a corrupted counter, and things like "perf record -e cycles" not working at all when run in a guest... A similar, but less critical bug exists in kvm_pmu_get_counter_value. Make the truncation conditional on the counter not being the cycle counter, which results in a minor code reorganisation. Fixes: 80f393a23be6 ("KVM: arm/arm64: Support chained PMU counters") Reviewed-by: Andrew Murray <andrew.murray@arm.com> Reported-by: Julien Thierry <julien.thierry.kdev@gmail.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-10-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: "I was battling a cold after some recent trips, so quite a bit piled up meanwhile, sorry about that. Highlights: 1) Fix fd leak in various bpf selftests, from Brian Vazquez. 2) Fix crash in xsk when device doesn't support some methods, from Magnus Karlsson. 3) Fix various leaks and use-after-free in rxrpc, from David Howells. 4) Fix several SKB leaks due to confusion of who owns an SKB and who should release it in the llc code. From Eric Biggers. 5) Kill a bunc of KCSAN warnings in TCP, from Eric Dumazet. 6) Jumbo packets don't work after resume on r8169, as the BIOS resets the chip into non-jumbo mode during suspend. From Heiner Kallweit. 7) Corrupt L2 header during MPLS push, from Davide Caratti. 8) Prevent possible infinite loop in tc_ctl_action, from Eric Dumazet. 9) Get register bits right in bcmgenet driver, based upon chip version. From Florian Fainelli. 10) Fix mutex problems in microchip DSA driver, from Marek Vasut. 11) Cure race between route lookup and invalidation in ipv4, from Wei Wang. 12) Fix performance regression due to false sharing in 'net' structure, from Eric Dumazet" * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (145 commits) net: reorder 'struct net' fields to avoid false sharing net: dsa: fix switch tree list net: ethernet: dwmac-sun8i: show message only when switching to promisc net: aquantia: add an error handling in aq_nic_set_multicast_list net: netem: correct the parent's backlog when corrupted packet was dropped net: netem: fix error path for corrupted GSO frames macb: propagate errors when getting optional clocks xen/netback: fix error path of xenvif_connect_data() net: hns3: fix mis-counting IRQ vector numbers issue net: usb: lan78xx: Connect PHY before registering MAC vsock/virtio: discard packets if credit is not respected vsock/virtio: send a credit update when buffer size is changed mlxsw: spectrum_trap: Push Ethernet header before reporting trap net: ensure correct skb->tstamp in various fragmenters net: bcmgenet: reset 40nm EPHY on energy detect net: bcmgenet: soft reset 40nm EPHYs before MAC init net: phy: bcm7xxx: define soft_reset for 40nm EPHY net: bcmgenet: don't set phydev->link from MAC net: Update address for MediaTek ethernet driver in MAINTAINERS ipv4: fix race condition between route lookup and invalidation ...
2019-10-19net: reorder 'struct net' fields to avoid false sharingEric Dumazet
Intel test robot reported a ~7% regression on TCP_CRR tests that they bisected to the cited commit. Indeed, every time a new TCP socket is created or deleted, the atomic counter net->count is touched (via get_net(net) and put_net(net) calls) So cpus might have to reload a contended cache line in net_hash_mix(net) calls. We need to reorder 'struct net' fields to move @hash_mix in a read mostly cache line. We move in the first cache line fields that can be dirtied often. We probably will have to address in a followup patch the __randomize_layout that was added in linux-4.13, since this might break our placement choices. Fixes: 355b98553789 ("netns: provide pure entropy for net_hash_mix()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-19net: dsa: fix switch tree listVivien Didelot
If there are multiple switch trees on the device, only the last one will be listed, because the arguments of list_add_tail are swapped. Fixes: 83c0afaec7b7 ("net: dsa: Add new binding implementation") Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-19net: ethernet: dwmac-sun8i: show message only when switching to promiscMans Rullgard
Printing the info message every time more than the max number of mac addresses are requested generates unnecessary log spam. Showing it only when the hw is not already in promiscous mode is equally informative without being annoying. Signed-off-by: Mans Rullgard <mans@mansr.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-19net: aquantia: add an error handling in aq_nic_set_multicast_listChenwandun
add an error handling in aq_nic_set_multicast_list, it may not work when hw_multicast_list_set error; and at the same time it will remove gcc Wunused-but-set-variable warning. Signed-off-by: Chenwandun <chenwandun@huawei.com> Reviewed-by: Igor Russkikh <igor.russkikh@aquantia.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-19Merge branch 'netem-fix-further-issues-with-packet-corruption'David S. Miller
Jakub Kicinski says: ==================== net: netem: fix further issues with packet corruption This set is fixing two more issues with the netem packet corruption. First patch (which was previously posted) avoids NULL pointer dereference if the first frame gets freed due to allocation or checksum failure. v2 improves the clarity of the code a little as requested by Cong. Second patch ensures we don't return SUCCESS if the frame was in fact dropped. Thanks to this commit message for patch 1 no longer needs the "this will still break with a single-frame failure" disclaimer. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-19net: netem: correct the parent's backlog when corrupted packet was droppedJakub Kicinski
If packet corruption failed we jump to finish_segs and return NET_XMIT_SUCCESS. Seeing success will make the parent qdisc increment its backlog, that's incorrect - we need to return NET_XMIT_DROP. Fixes: 6071bd1aa13e ("netem: Segment GSO packets on enqueue") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-19net: netem: fix error path for corrupted GSO framesJakub Kicinski
To corrupt a GSO frame we first perform segmentation. We then proceed using the first segment instead of the full GSO skb and requeue the rest of the segments as separate packets. If there are any issues with processing the first segment we still want to process the rest, therefore we jump to the finish_segs label. Commit 177b8007463c ("net: netem: fix backlog accounting for corrupted GSO frames") started using the pointer to the first segment in the "rest of segments processing", but as mentioned above the first segment may had already been freed at this point. Backlog corrections for parent qdiscs have to be adjusted. Fixes: 177b8007463c ("net: netem: fix backlog accounting for corrupted GSO frames") Reported-by: kbuild test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reported-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-19macb: propagate errors when getting optional clocksMichael Tretter
The tx_clk, rx_clk, and tsu_clk are optional. Currently the macb driver marks clock as not available if it receives an error when trying to get a clock. This is wrong, because a clock controller might return -EPROBE_DEFER if a clock is not available, but will eventually become available. In these cases, the driver would probe successfully but will never be able to adjust the clocks, because the clocks were not available during probe, but became available later. For example, the clock controller for the ZynqMP is implemented in the PMU firmware and the clocks are only available after the firmware driver has been probed. Use devm_clk_get_optional() in instead of devm_clk_get() to get the optional clock and propagate all errors to the calling function. Signed-off-by: Michael Tretter <m.tretter@pengutronix.de> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Tested-by: Nicolas Ferre <nicolas.ferre@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-19xen/netback: fix error path of xenvif_connect_data()Juergen Gross
xenvif_connect_data() calls module_put() in case of error. This is wrong as there is no related module_get(). Remove the superfluous module_put(). Fixes: 279f438e36c0a7 ("xen-netback: Don't destroy the netdev until the vif is shut down") Cc: <stable@vger.kernel.org> # 3.12 Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Paul Durrant <paul@xen.org> Reviewed-by: Wei Liu <wei.liu@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-19net: hns3: fix mis-counting IRQ vector numbers issueYonglong Liu
Currently, the num_msi_left means the vector numbers of NIC, but if the PF supported RoCE, it contains the vector numbers of NIC and RoCE(Not expected). This may cause interrupts lost in some case, because of the NIC module used the vector resources which belongs to RoCE. This patch adds a new variable num_nic_msi to store the vector numbers of NIC, and adjust the default TQP numbers and rss_size according to the value of num_nic_msi. Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support") Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-19Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "Rather a lot of fixes, almost all affecting mm/" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (26 commits) scripts/gdb: fix debugging modules on s390 kernel/events/uprobes.c: only do FOLL_SPLIT_PMD for uprobe register mm/thp: allow dropping THP from page cache mm/vmscan.c: support removing arbitrary sized pages from mapping mm/thp: fix node page state in split_huge_page_to_list() proc/meminfo: fix output alignment mm/init-mm.c: include <linux/mman.h> for vm_committed_as_batch mm/filemap.c: include <linux/ramfs.h> for generic_file_vm_ops definition mm: include <linux/huge_mm.h> for is_vma_temporary_stack zram: fix race between backing_dev_show and backing_dev_store mm/memcontrol: update lruvec counters in mem_cgroup_move_account ocfs2: fix panic due to ocfs2_wq is null hugetlbfs: don't access uninitialized memmaps in pfn_range_valid_gigantic() mm: memblock: do not enforce current limit for memblock_phys* family mm: memcg: get number of pages on the LRU list in memcgroup base on lru_zone_size mm/gup: fix a misnamed "write" argument, and a related bug mm/gup_benchmark: add a missing "w" to getopt string ocfs2: fix error handling in ocfs2_setattr() mm: memcg/slab: fix panic in __free_slab() caused by premature memcg pointer release mm/memunmap: don't access uninitialized memmap in memunmap_pages() ...