summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2019-11-25 18:02:36 -0800
committerLinus Torvalds <torvalds@linux-foundation.org>2019-11-25 18:02:36 -0800
commit752272f16dd18f2cac58a583a8673c8e2fb93abb (patch)
tree1a2bae3067e1133c1d1b8e0bbbaea8c34e3321b2
parent3f3c8be973af10875cfa1e7b85a535b6ba76b44f (diff)
parent96710247298df52a4b8150a62a6fe87083093ff3 (diff)
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini: "ARM: - data abort report and injection - steal time support - GICv4 performance improvements - vgic ITS emulation fixes - simplify FWB handling - enable halt polling counters - make the emulated timer PREEMPT_RT compliant s390: - small fixes and cleanups - selftest improvements - yield improvements PPC: - add capability to tell userspace whether we can single-step the guest - improve the allocation of XIVE virtual processor IDs - rewrite interrupt synthesis code to deliver interrupts in virtual mode when appropriate. - minor cleanups and improvements. x86: - XSAVES support for AMD - more accurate report of nested guest TSC to the nested hypervisor - retpoline optimizations - support for nested 5-level page tables - PMU virtualization optimizations, and improved support for nested PMU virtualization - correct latching of INITs for nested virtualization - IOAPIC optimization - TSX_CTRL virtualization for more TAA happiness - improved allocation and flushing of SEV ASIDs - many bugfixes and cleanups" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (127 commits) kvm: nVMX: Relax guest IA32_FEATURE_CONTROL constraints KVM: x86: Grab KVM's srcu lock when setting nested state KVM: x86: Open code shared_msr_update() in its only caller KVM: Fix jump label out_free_* in kvm_init() KVM: x86: Remove a spurious export of a static function KVM: x86: create mmu/ subdirectory KVM: nVMX: Remove unnecessary TLB flushes on L1<->L2 switches when L1 use apic-access-page KVM: x86: remove set but not used variable 'called' KVM: nVMX: Do not mark vmcs02->apic_access_page as dirty when unpinning KVM: vmx: use MSR_IA32_TSX_CTRL to hard-disable TSX on guest that lack it KVM: vmx: implement MSR_IA32_TSX_CTRL disable RTM functionality KVM: x86: implement MSR_IA32_TSX_CTRL effect on CPUID KVM: x86: do not modify masked bits of shared MSRs KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIES KVM: PPC: Book3S HV: XIVE: Fix potential page leak on error path KVM: PPC: Book3S HV: XIVE: Free previous EQ page when setting up a new one KVM: nVMX: Assume TLB entries of L1 and L2 are tagged differently if L0 use EPT KVM: x86: Unexport kvm_vcpu_reload_apic_access_page() KVM: nVMX: add CR4_LA57 bit to nested CR4_FIXED1 KVM: nVMX: Use semi-colon instead of comma for exit-handlers initialization ...
-rw-r--r--Documentation/admin-guide/kernel-parameters.txt6
-rw-r--r--Documentation/virt/kvm/api.txt58
-rw-r--r--Documentation/virt/kvm/arm/pvtime.rst80
-rw-r--r--Documentation/virt/kvm/devices/vcpu.txt14
-rw-r--r--Documentation/virt/kvm/devices/xics.txt14
-rw-r--r--Documentation/virt/kvm/devices/xive.txt8
-rw-r--r--arch/arm/include/asm/kvm_arm.h1
-rw-r--r--arch/arm/include/asm/kvm_emulate.h9
-rw-r--r--arch/arm/include/asm/kvm_host.h33
-rw-r--r--arch/arm/include/uapi/asm/kvm.h3
-rw-r--r--arch/arm/kvm/Makefile2
-rw-r--r--arch/arm/kvm/guest.c14
-rw-r--r--arch/arm/kvm/handle_exit.c2
-rw-r--r--arch/arm/mm/proc-v7-bugs.c13
-rw-r--r--arch/arm64/include/asm/kvm_arm.h3
-rw-r--r--arch/arm64/include/asm/kvm_emulate.h26
-rw-r--r--arch/arm64/include/asm/kvm_host.h37
-rw-r--r--arch/arm64/include/asm/paravirt.h9
-rw-r--r--arch/arm64/include/asm/pvclock-abi.h17
-rw-r--r--arch/arm64/include/uapi/asm/kvm.h5
-rw-r--r--arch/arm64/kernel/cpu_errata.c81
-rw-r--r--arch/arm64/kernel/paravirt.c140
-rw-r--r--arch/arm64/kernel/time.c3
-rw-r--r--arch/arm64/kvm/Kconfig4
-rw-r--r--arch/arm64/kvm/Makefile2
-rw-r--r--arch/arm64/kvm/guest.c23
-rw-r--r--arch/arm64/kvm/handle_exit.c4
-rw-r--r--arch/arm64/kvm/inject_fault.c4
-rw-r--r--arch/powerpc/include/asm/kvm_host.h1
-rw-r--r--arch/powerpc/include/asm/kvm_ppc.h1
-rw-r--r--arch/powerpc/include/asm/reg.h12
-rw-r--r--arch/powerpc/include/uapi/asm/kvm.h3
-rw-r--r--arch/powerpc/kvm/book3s.c27
-rw-r--r--arch/powerpc/kvm/book3s.h3
-rw-r--r--arch/powerpc/kvm/book3s_32_mmu.c6
-rw-r--r--arch/powerpc/kvm/book3s_64_mmu.c15
-rw-r--r--arch/powerpc/kvm/book3s_64_mmu_hv.c26
-rw-r--r--arch/powerpc/kvm/book3s_64_vio.c2
-rw-r--r--arch/powerpc/kvm/book3s_hv.c28
-rw-r--r--arch/powerpc/kvm/book3s_hv_builtin.c82
-rw-r--r--arch/powerpc/kvm/book3s_hv_nested.c2
-rw-r--r--arch/powerpc/kvm/book3s_pr.c40
-rw-r--r--arch/powerpc/kvm/book3s_xive.c128
-rw-r--r--arch/powerpc/kvm/book3s_xive.h5
-rw-r--r--arch/powerpc/kvm/book3s_xive_native.c82
-rw-r--r--arch/powerpc/kvm/e500_mmu_host.c6
-rw-r--r--arch/powerpc/kvm/powerpc.c2
-rw-r--r--arch/s390/include/asm/kvm_host.h1
-rw-r--r--arch/s390/kvm/diag.c22
-rw-r--r--arch/s390/kvm/interrupt.c5
-rw-r--r--arch/s390/kvm/kvm-s390.c19
-rw-r--r--arch/x86/events/intel/core.c11
-rw-r--r--arch/x86/include/asm/kvm_host.h32
-rw-r--r--arch/x86/kernel/kvm.c1
-rw-r--r--arch/x86/kvm/Makefile4
-rw-r--r--arch/x86/kvm/cpuid.c8
-rw-r--r--arch/x86/kvm/emulate.c6
-rw-r--r--arch/x86/kvm/ioapic.c34
-rw-r--r--arch/x86/kvm/kvm_cache_regs.h51
-rw-r--r--arch/x86/kvm/lapic.c111
-rw-r--r--arch/x86/kvm/lapic.h3
-rw-r--r--arch/x86/kvm/mmu/mmu.c (renamed from arch/x86/kvm/mmu.c)2
-rw-r--r--arch/x86/kvm/mmu/page_track.c (renamed from arch/x86/kvm/page_track.c)0
-rw-r--r--arch/x86/kvm/mmu/paging_tmpl.h (renamed from arch/x86/kvm/paging_tmpl.h)0
-rw-r--r--arch/x86/kvm/pmu.c124
-rw-r--r--arch/x86/kvm/pmu.h29
-rw-r--r--arch/x86/kvm/pmu_amd.c24
-rw-r--r--arch/x86/kvm/svm.c140
-rw-r--r--arch/x86/kvm/vmx/nested.c252
-rw-r--r--arch/x86/kvm/vmx/nested.h9
-rw-r--r--arch/x86/kvm/vmx/pmu_intel.c34
-rw-r--r--arch/x86/kvm/vmx/vmx.c339
-rw-r--r--arch/x86/kvm/vmx/vmx.h12
-rw-r--r--arch/x86/kvm/x86.c244
-rw-r--r--arch/x86/kvm/x86.h15
-rw-r--r--drivers/crypto/ccp/psp-dev.c9
-rw-r--r--drivers/irqchip/irq-gic-v4.c7
-rw-r--r--include/Kbuild2
-rw-r--r--include/kvm/arm_hypercalls.h43
-rw-r--r--include/kvm/arm_psci.h2
-rw-r--r--include/kvm/arm_vgic.h8
-rw-r--r--include/linux/arm-smccc.h59
-rw-r--r--include/linux/cpuhotplug.h1
-rw-r--r--include/linux/irqchip/arm-gic-v4.h4
-rw-r--r--include/linux/kvm_host.h41
-rw-r--r--include/linux/kvm_types.h2
-rw-r--r--include/linux/perf_event.h10
-rw-r--r--include/uapi/linux/kvm.h11
-rw-r--r--kernel/events/core.c46
-rw-r--r--tools/testing/selftests/kvm/.gitignore1
-rw-r--r--tools/testing/selftests/kvm/Makefile1
-rw-r--r--tools/testing/selftests/kvm/include/x86_64/processor.h7
-rw-r--r--tools/testing/selftests/kvm/kvm_create_max_vcpus.c7
-rw-r--r--tools/testing/selftests/kvm/lib/x86_64/processor.c72
-rw-r--r--tools/testing/selftests/kvm/s390x/sync_regs_test.c15
-rw-r--r--tools/testing/selftests/kvm/x86_64/xss_msr_test.c76
-rw-r--r--virt/kvm/arm/arch_timer.c8
-rw-r--r--virt/kvm/arm/arm.c49
-rw-r--r--virt/kvm/arm/hypercalls.c71
-rw-r--r--virt/kvm/arm/mmio.c9
-rw-r--r--virt/kvm/arm/psci.c84
-rw-r--r--virt/kvm/arm/pvtime.c131
-rw-r--r--virt/kvm/arm/vgic/vgic-init.c1
-rw-r--r--virt/kvm/arm/vgic/vgic-its.c3
-rw-r--r--virt/kvm/arm/vgic/vgic-v3.c12
-rw-r--r--virt/kvm/arm/vgic/vgic-v4.c59
-rw-r--r--virt/kvm/arm/vgic/vgic.c4
-rw-r--r--virt/kvm/arm/vgic/vgic.h2
-rw-r--r--virt/kvm/coalesced_mmio.c8
-rw-r--r--virt/kvm/kvm_main.c34
110 files changed, 2593 insertions, 924 deletions
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 8dee8f68fe15..9b847f06cb19 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3110,9 +3110,9 @@
[X86,PV_OPS] Disable paravirtualized VMware scheduler
clock and use the default one.
- no-steal-acc [X86,KVM] Disable paravirtualized steal time accounting.
- steal time is computed, but won't influence scheduler
- behaviour
+ no-steal-acc [X86,KVM,ARM64] Disable paravirtualized steal time
+ accounting. steal time is computed, but won't
+ influence scheduler behaviour
nolapic [X86-32,APIC] Do not enable or use the local APIC.
diff --git a/Documentation/virt/kvm/api.txt b/Documentation/virt/kvm/api.txt
index 4833904d32a5..49183add44e7 100644
--- a/Documentation/virt/kvm/api.txt
+++ b/Documentation/virt/kvm/api.txt
@@ -1002,12 +1002,18 @@ Specifying exception.has_esr on a system that does not support it will return
-EINVAL. Setting anything other than the lower 24bits of exception.serror_esr
will return -EINVAL.
+It is not possible to read back a pending external abort (injected via
+KVM_SET_VCPU_EVENTS or otherwise) because such an exception is always delivered
+directly to the virtual CPU).
+
+
struct kvm_vcpu_events {
struct {
__u8 serror_pending;
__u8 serror_has_esr;
+ __u8 ext_dabt_pending;
/* Align it to 8 bytes */
- __u8 pad[6];
+ __u8 pad[5];
__u64 serror_esr;
} exception;
__u32 reserved[12];
@@ -1051,9 +1057,23 @@ contain a valid state and shall be written into the VCPU.
ARM/ARM64:
+User space may need to inject several types of events to the guest.
+
Set the pending SError exception state for this VCPU. It is not possible to
'cancel' an Serror that has been made pending.
+If the guest performed an access to I/O memory which could not be handled by
+userspace, for example because of missing instruction syndrome decode
+information or because there is no device mapped at the accessed IPA, then
+userspace can ask the kernel to inject an external abort using the address
+from the exiting fault on the VCPU. It is a programming error to set
+ext_dabt_pending after an exit which was not either KVM_EXIT_MMIO or
+KVM_EXIT_ARM_NISV. This feature is only available if the system supports
+KVM_CAP_ARM_INJECT_EXT_DABT. This is a helper which provides commonality in
+how userspace reports accesses for the above cases to guests, across different
+userspace implementations. Nevertheless, userspace can still emulate all Arm
+exceptions by manipulating individual registers using the KVM_SET_ONE_REG API.
+
See KVM_GET_VCPU_EVENTS for the data structure.
@@ -2982,6 +3002,9 @@ can be determined by querying the KVM_CAP_GUEST_DEBUG_HW_BPS and
KVM_CAP_GUEST_DEBUG_HW_WPS capabilities which return a positive number
indicating the number of supported registers.
+For ppc, the KVM_CAP_PPC_GUEST_DEBUG_SSTEP capability indicates whether
+the single-step debug event (KVM_GUESTDBG_SINGLESTEP) is supported.
+
When debug events exit the main run loop with the reason
KVM_EXIT_DEBUG with the kvm_debug_exit_arch part of the kvm_run
structure containing architecture specific debug information.
@@ -4468,6 +4491,39 @@ Hyper-V SynIC state change. Notification is used to remap SynIC
event/message pages and to enable/disable SynIC messages/events processing
in userspace.
+ /* KVM_EXIT_ARM_NISV */
+ struct {
+ __u64 esr_iss;
+ __u64 fault_ipa;
+ } arm_nisv;
+
+Used on arm and arm64 systems. If a guest accesses memory not in a memslot,
+KVM will typically return to userspace and ask it to do MMIO emulation on its
+behalf. However, for certain classes of instructions, no instruction decode
+(direction, length of memory access) is provided, and fetching and decoding
+the instruction from the VM is overly complicated to live in the kernel.
+
+Historically, when this situation occurred, KVM would print a warning and kill
+the VM. KVM assumed that if the guest accessed non-memslot memory, it was
+trying to do I/O, which just couldn't be emulated, and the warning message was
+phrased accordingly. However, what happened more often was that a guest bug
+caused access outside the guest memory areas which should lead to a more
+meaningful warning message and an external abort in the guest, if the access
+did not fall within an I/O window.
+
+Userspace implementations can query for KVM_CAP_ARM_NISV_TO_USER, and enable
+this capability at VM creation. Once this is done, these types of errors will
+instead return to userspace with KVM_EXIT_ARM_NISV, with the valid bits from
+the HSR (arm) and ESR_EL2 (arm64) in the esr_iss field, and the faulting IPA
+in the fault_ipa field. Userspace can either fix up the access if it's
+actually an I/O access by decoding the instruction from guest memory (if it's
+very brave) and continue executing the guest, or it can decide to suspend,
+dump, or restart the guest.
+
+Note that KVM does not skip the faulting instruction as it does for
+KVM_EXIT_MMIO, but userspace has to emulate any change to the processing state
+if it decides to decode and emulate the instruction.
+
/* Fix the size of the union. */
char padding[256];
};
diff --git a/Documentation/virt/kvm/arm/pvtime.rst b/Documentation/virt/kvm/arm/pvtime.rst
new file mode 100644
index 000000000000..2357dd2d8655
--- /dev/null
+++ b/Documentation/virt/kvm/arm/pvtime.rst
@@ -0,0 +1,80 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Paravirtualized time support for arm64
+======================================
+
+Arm specification DEN0057/A defines a standard for paravirtualised time
+support for AArch64 guests:
+
+https://developer.arm.com/docs/den0057/a
+
+KVM/arm64 implements the stolen time part of this specification by providing
+some hypervisor service calls to support a paravirtualized guest obtaining a
+view of the amount of time stolen from its execution.
+
+Two new SMCCC compatible hypercalls are defined:
+
+* PV_TIME_FEATURES: 0xC5000020
+* PV_TIME_ST: 0xC5000021
+
+These are only available in the SMC64/HVC64 calling convention as
+paravirtualized time is not available to 32 bit Arm guests. The existence of
+the PV_FEATURES hypercall should be probed using the SMCCC 1.1 ARCH_FEATURES
+mechanism before calling it.
+
+PV_TIME_FEATURES
+ ============= ======== ==========
+ Function ID: (uint32) 0xC5000020
+ PV_call_id: (uint32) The function to query for support.
+ Currently only PV_TIME_ST is supported.
+ Return value: (int64) NOT_SUPPORTED (-1) or SUCCESS (0) if the relevant
+ PV-time feature is supported by the hypervisor.
+ ============= ======== ==========
+
+PV_TIME_ST
+ ============= ======== ==========
+ Function ID: (uint32) 0xC5000021
+ Return value: (int64) IPA of the stolen time data structure for this
+ VCPU. On failure:
+ NOT_SUPPORTED (-1)
+ ============= ======== ==========
+
+The IPA returned by PV_TIME_ST should be mapped by the guest as normal memory
+with inner and outer write back caching attributes, in the inner shareable
+domain. A total of 16 bytes from the IPA returned are guaranteed to be
+meaningfully filled by the hypervisor (see structure below).
+
+PV_TIME_ST returns the structure for the calling VCPU.
+