summaryrefslogtreecommitdiffstats
path: root/arch/x86/include/asm
AgeCommit message (Collapse)Author
2020-03-31KVM: x86: Copy kvm_x86_ops by value to eliminate layer of indirectionSean Christopherson
Replace the kvm_x86_ops pointer in common x86 with an instance of the struct to save one pointer dereference when invoking functions. Copy the struct by value to set the ops during kvm_init(). Arbitrarily use kvm_x86_ops.hardware_enable to track whether or not the ops have been initialized, i.e. a vendor KVM module has been loaded. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200321202603.19355-7-sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-31KVM: x86: Move init-only kvm_x86_ops to separate structSean Christopherson
Move the kvm_x86_ops functions that are used only within the scope of kvm_init() into a separate struct, kvm_x86_init_ops. In addition to identifying the init-only functions without restorting to code comments, this also sets the stage for waiting until after ->hardware_setup() to set kvm_x86_ops. Setting kvm_x86_ops after ->hardware_setup() is desirable as many of the hooks are not usable until ->hardware_setup() completes. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200321202603.19355-3-sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-31Merge tag 'kvmarm-5.7' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm updates for Linux 5.7 - GICv4.1 support - 32bit host removal
2020-03-16KVM: x86: rename set_cr3 callback and related flags to load_mmu_pgdPaolo Bonzini
The set_cr3 callback is not setting the guest CR3, it is setting the root of the guest page tables, either shadow or two-dimensional. To make this clearer as well as to indicate that the MMU calls it via kvm_mmu_load_cr3, rename it to load_mmu_pgd. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: unify callbacks to load paging rootPaolo Bonzini
Similar to what kvm-intel.ko is doing, provide a single callback that merges svm_set_cr3, set_tdp_cr3 and nested_svm_set_tdp_cr3. This lets us unify the set_cr3 and set_tdp_cr3 entries in kvm_x86_ops. I'm doing that in this same patch because splitting it adds quite a bit of churn due to the need for forward declarations. For the same reason the assignment to vcpu->arch.mmu->set_cr3 is moved to kvm_init_shadow_mmu from init_kvm_softmmu and nested_svm_init_mmu_context. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: Move nSVM CPUID 0x8000000A handling into common x86 codeSean Christopherson
Handle CPUID 0x8000000A in the main switch in __do_cpuid_func() and drop ->set_supported_cpuid() now that both VMX and SVM implementations are empty. Like leaf 0x14 (Intel PT) and leaf 0x8000001F (SEV), leaf 0x8000000A is is (obviously) vendor specific but can be queried in common code while respecting SVM's wishes by querying kvm_cpu_cap_has(). Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: Move VMX's host_efer to common x86 codeSean Christopherson
Move host_efer to common x86 code and use it for CPUID's is_efer_nx() to avoid constantly re-reading the MSR. No functional change intended. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86/mmu: Configure max page level during hardware setupSean Christopherson
Configure the max page level during hardware setup to avoid a retpoline in the page fault handler. Drop ->get_lpage_level() as the page fault handler was the last user. No functional change intended. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86/mmu: Merge kvm_{enable,disable}_tdp() into a common functionSean Christopherson
Combine kvm_enable_tdp() and kvm_disable_tdp() into a single function, kvm_configure_mmu(), in preparation for doing additional configuration during hardware setup. And because having separate helpers is silly. No functional change intended. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: VMX: Directly query Intel PT mode when refreshing PMUsSean Christopherson
Use vmx_pt_mode_is_host_guest() in intel_pmu_refresh() instead of bouncing through kvm_x86_ops->pt_supported, and remove ->pt_supported() as the PMU code was the last remaining user. Opportunistically clean up the wording of a comment that referenced kvm_x86_ops->pt_supported(). No functional change intended. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: Use KVM cpu caps to detect MSR_TSC_AUX virt supportSean Christopherson
Check for MSR_TSC_AUX virtualization via kvm_cpu_cap_has() and drop ->rdtscp_supported(). Note, vmx_rdtscp_supported() needs to hang around a tiny bit longer due other usage in VMX code. No functional change intended. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: Use KVM cpu caps to track UMIP emulationSean Christopherson
Set UMIP in kvm_cpu_caps when it is emulated by VMX, even though the bit will effectively be dropped by do_host_cpuid(). This allows checking for UMIP emulation via kvm_cpu_caps instead of a dedicated kvm_x86_ops callback. No functional change intended. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: Move XSAVES CPUID adjust to VMX's KVM cpu cap updateSean Christopherson
Move the clearing of the XSAVES CPUID bit into VMX, which has a separate VMCS control to enable XSAVES in non-root, to eliminate the last ugly renmant of the undesirable "unsigned f_* = *_supported ? F(*) : 0" pattern in the common CPUID handling code. Drop ->xsaves_supported(), CPUID adjustment was the only user. No functional change intended. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: Handle PKU CPUID adjustment in VMX codeSean Christopherson
Move the setting of the PKU CPUID bit into VMX to eliminate an instance of the undesirable "unsigned f_* = *_supported ? F(*) : 0" pattern in the common CPUID handling code. Drop ->pku_supported(), CPUID adjustment was the only user. Note, some AMD CPUs now support PKU, but SVM doesn't yet support exposing it to a guest. No functional change intended. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: Handle INVPCID CPUID adjustment in VMX codeSean Christopherson
Move the INVPCID CPUID adjustments into VMX to eliminate an instance of the undesirable "unsigned f_* = *_supported ? F(*) : 0" pattern in the common CPUID handling code. Drop ->invpcid_supported(), CPUID adjustment was the only user. No functional change intended. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: Drop explicit @func param from ->set_supported_cpuid()Sean Christopherson
Drop the explicit @func param from ->set_supported_cpuid() and instead pull the CPUID function from the relevant entry. This sets the stage for hardening guest CPUID updates in future patches, e.g. allows adding run-time assertions that the CPUID feature being changed is actually a bit in the referenced CPUID entry. No functional change intended. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: Use supported_xcr0 to detect MPX supportSean Christopherson
Query supported_xcr0 when checking for MPX support instead of invoking ->mpx_supported() and drop ->mpx_supported() as kvm_mpx_supported() was its last user. Rename vmx_mpx_supported() to cpu_has_vmx_mpx() to better align with VMX/VMCS nomenclature. Modify VMX's adjustment of xcr0 to call cpus_has_vmx_mpx() (renamed from vmx_mpx_supported()) directly to avoid reading supported_xcr0 before it's fully configured. No functional change intended. Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> [Test that *all* bits are set. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: Move kvm_emulate.h into KVM's private directorySean Christopherson
Now that the emulation context is dynamically allocated and not embedded in struct kvm_vcpu, move its header, kvm_emulate.h, out of the public asm directory and into KVM's private x86 directory. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: Dynamically allocate per-vCPU emulation contextSean Christopherson
Allocate the emulation context instead of embedding it in struct kvm_vcpu_arch. Dynamic allocation provides several benefits: - Shrinks the size x86 vcpus by ~2.5k bytes, dropping them back below the PAGE_ALLOC_COSTLY_ORDER threshold. - Allows for dropping the include of kvm_emulate.h from asm/kvm_host.h and moving kvm_emulate.h into KVM's private directory. - Allows a reducing KVM's attack surface by shrinking the amount of vCPU data that is exposed to usercopy. - Allows a future patch to disable the emulator entirely, which may or may not be a realistic endeavor. Mark the entire struct as valid for usercopy to maintain existing behavior with respect to hardened usercopy. Future patches can shrink the usercopy range to cover only what is necessary. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: Explicitly pass an exception struct to check_interceptSean Christopherson
Explicitly pass an exception struct when checking for intercept from the emulator, which eliminates the last reference to arch.emulate_ctxt in vendor specific code. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86/mmu: Rename kvm_mmu->get_cr3() to ->get_guest_pgd()Sean Christopherson
Rename kvm_mmu->get_cr3() to call out that it is retrieving a guest value, as opposed to kvm_mmu->set_cr3(), which sets a host value, and to note that it will return something other than CR3 when nested EPT is in use. Hopefully the new name will also make it more obvious that L1's nested_cr3 is returned in SVM's nested NPT case. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: nVMX: Allow L1 to use 5-level page walks for nested EPTSean Christopherson
Add support for 5-level nested EPT, and advertise said support in the EPT capabilities MSR. KVM's MMU can already handle 5-level legacy page tables, there's no reason to force an L1 VMM to use shadow paging if it wants to employ 5-level page tables. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86/mmu: Drop kvm_mmu_extended_role.cr4_la57 hackSean Christopherson
Drop kvm_mmu_extended_role.cr4_la57 now that mmu_role doesn't mask off level, which already incorporates the guest's CR4.LA57 for a shadow MMU by querying is_la57_mode(). Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: nVMX: Properly handle userspace interrupt window requestSean Christopherson
Return true for vmx_interrupt_allowed() if the vCPU is in L2 and L1 has external interrupt exiting enabled. IRQs are never blocked in hardware if the CPU is in the guest (L2 from L1's perspective) when IRQs trigger VM-Exit. The new check percolates up to kvm_vcpu_ready_for_interrupt_injection() and thus vcpu_run(), and so KVM will exit to userspace if userspace has requested an interrupt window (to inject an IRQ into L1). Remove the @external_intr param from vmx_check_nested_events(), which is actually an indicator that userspace wants an interrupt window, e.g. it's named @req_int_win further up the stack. Injecting a VM-Exit into L1 to try and bounce out to L0 userspace is all kinds of broken and is no longer necessary. Remove the hack in nested_vmx_vmexit() that attempted to workaround the breakage in vmx_check_nested_events() by only filling interrupt info if there's an actual interrupt pending. The hack actually made things worse because it caused KVM to _never_ fill interrupt info when the LAPIC resides in userspace (kvm_cpu_has_interrupt() queries interrupt.injected, which is always cleared by prepare_vmcs12() before reaching the hack in nested_vmx_vmexit()). Fixes: 6550c4df7e50 ("KVM: nVMX: Fix interrupt window request with "Acknowledge interrupt on exit"") Cc: stable@vger.kernel.org Cc: Liran Alon <liran.alon@oracle.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: LAPIC: Recalculate apic map in batchWanpeng Li
In the vCPU reset and set APIC_BASE MSR path, the apic map will be recalculated several times, each time it will consume 10+ us observed by ftrace in my non-overcommit environment since the expensive memory allocate/mutex/rcu etc operations. This patch optimizes it by recaluating apic map in batch, I hope this can benefit the serverless scenario which can frequently create/destroy VMs. Before patch: kvm_lapic_reset ~27us After patch: kvm_lapic_reset ~14us Observed by ftrace, improve ~48%. Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: enable dirty log gradually in small chunksJay Zhou
It could take kvm->mmu_lock for an extended period of time when enabling dirty log for the first time. The main cost is to clear all the D-bits of last level SPTEs. This situation can benefit from manual dirty log protect as well, which can reduce the mmu_lock time taken. The sequence is like this: 1. Initialize all the bits of the dirty bitmap to 1 when enabling dirty log for the first time 2. Only write protect the huge pages 3. KVM_GET_DIRTY_LOG returns the dirty bitmap info 4. KVM_CLEAR_DIRTY_LOG will clear D-bit for each of the leaf level SPTEs gradually in small chunks Under the Intel(R) Xeon(R) Gold 6152 CPU @ 2.10GHz environment, I did some tests with a 128G windows VM and counted the time taken of memory_global_dirty_log_start, here is the numbers: VM Size Before After optimization 128G 460ms 10ms Signed-off-by: Jay Zhou <jianjay.zhou@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: SVM: Inhibit APIC virtualization for X2APIC guestOliver Upton
The AVIC does not support guest use of the x2APIC interface. Currently, KVM simply chooses to squash the x2APIC feature in the guest's CPUID If the AVIC is enabled. Doing so prevents KVM from running a guest with greater than 255 vCPUs, as such a guest necessitates the use of the x2APIC interface. Instead, inhibit AVIC enablement on a per-VM basis whenever the x2APIC feature is set in the guest's CPUID. Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: Consolidate VM allocation and free for VMX and SVMSean Christopherson
Move the VM allocation and free code to common x86 as the logic is more or less identical across SVM and VMX. Note, although hyperv.hv_pa_pg is part of the common kvm->arch, it's (currently) only allocated by VMX VMs. But, since kfree() plays nice when passed a NULL pointer, the superfluous call for SVM is harmless and avoids future churn if SVM gains support for HyperV's direct TLB flush. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> [Make vm_size a field instead of a function. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: Simplify kvm_free_memslot() and all its descendentsSean Christopherson
Now that all callers of kvm_free_memslot() pass NULL for @dont, remove the param from the top-level routine and all arch's implementations. No functional change intended. Tested-by: Christoffer Dall <christoffer.dall@arm.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: Move gpa_val and gpa_available into the emulator contextSean Christopherson
Move the GPA tracking into the emulator context now that the context is guaranteed to be initialized via __init_emulate_ctxt() prior to dereferencing gpa_{available,val}, i.e. now that seeing a stale gpa_available will also trigger a WARN due to an invalid context. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: Add EMULTYPE_PF when emulation is triggered by a page faultSean Christopherson
Add a new emulation type flag to explicitly mark emulation related to a page fault. Move the propation of the GPA into the emulator from the page fault handler into x86_emulate_instruction, using EMULTYPE_PF as an indicator that cr2 is valid. Similarly, don't propagate cr2 into the exception.address when it's *not* valid. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-24Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "Bugfixes, including the fix for CVE-2020-2732 and a few issues found by 'make W=1'" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: s390: rstify new ioctls in api.rst KVM: nVMX: Check IO instruction VM-exit conditions KVM: nVMX: Refactor IO bitmap checks into helper function KVM: nVMX: Don't emulate instructions in guest mode KVM: nVMX: Emulate MTF when performing instruction emulation KVM: fix error handling in svm_hardware_setup KVM: SVM: Fix potential memory leak in svm_cpu_init() KVM: apic: avoid calculating pending eoi from an uninitialized val KVM: nVMX: clear PIN_BASED_POSTED_INTR from nested pinbased_ctls only when apicv is globally disabled KVM: nVMX: handle nested posted interrupts when apicv is disabled for L1 kvm: x86: svm: Fix NULL pointer dereference when AVIC not enabled KVM: VMX: Add VMX_FEATURE_USR_WAIT_PAUSE KVM: nVMX: Hold KVM's srcu lock when syncing vmcs12->shadow KVM: x86: don't notify userspace IOAPIC on edge-triggered interrupt EOI kvm/emulate: fix a -Werror=cast-function-type KVM: x86: fix incorrect comparison in trace event KVM: nVMX: Fix some obsolete comments and grammar error KVM: x86: fix missing prototypes KVM: x86: enable -Werror
2020-02-23KVM: nVMX: Emulate MTF when performing instruction emulationOliver Upton
Since commit 5f3d45e7f282 ("kvm/x86: add support for MONITOR_TRAP_FLAG"), KVM has allowed an L1 guest to use the monitor trap flag processor-based execution control for its L2 guest. KVM simply forwards any MTF VM-exits to the L1 guest, which works for normal instruction execution. However, when KVM needs to emulate an instruction on the behalf of an L2 guest, the monitor trap flag is not emulated. Add the necessary logic to kvm_skip_emulated_instruction() to synthesize an MTF VM-exit to L1 upon instruction emulation for L2. Fixes: 5f3d45e7f282 ("kvm/x86: add support for MONITOR_TRAP_FLAG") Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-21KVM: nVMX: handle nested posted interrupts when apicv is disabled for L1Vitaly Kuznetsov
Even when APICv is disabled for L1 it can (and, actually, is) still available for L2, this means we need to always call vmx_deliver_nested_posted_interrupt() when attempting an interrupt delivery. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-21KVM: VMX: Add VMX_FEATURE_USR_WAIT_PAUSEXiaoyao Li
Commit 159348784ff0 ("x86/vmx: Introduce VMX_FEATURES_*") missed bit 26 (enable user wait and pause) of Secondary Processor-based VM-Execution Controls. Add VMX_FEATURE_USR_WAIT_PAUSE flag so that it shows up in /proc/cpuinfo, and use it to define SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE to make them uniform. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-20kvm/emulate: fix a -Werror=cast-function-typeQian Cai
arch/x86/kvm/emulate.c: In function 'x86_emulate_insn': arch/x86/kvm/emulate.c:5686:22: error: cast between incompatible function types from 'int (*)(struct x86_emulate_ctxt *)' to 'void (*)(struct fastop *)' [-Werror=cast-function-type] rc = fastop(ctxt, (fastop_t)ctxt->execute); Fix it by using an unnamed union of a (*execute) function pointer and a (*fastop) function pointer. Fixes: 3009afc6e39e ("KVM: x86: Use a typedef for fastop functions") Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-19x86/cpu/amd: Enable the fixed Instructions Retired counter IRPERFKim Phillips
Commit aaf248848db50 ("perf/x86/msr: Add AMD IRPERF (Instructions Retired) performance counter") added support for access to the free-running counter via 'perf -e msr/irperf/', but when exercised, it always returns a 0 count: BEFORE: $ perf stat -e instructions,msr/irperf/ true Performance counter stats for 'true': 624,833 instructions 0 msr/irperf/ Simply set its enable bit - HWCR bit 30 - to make it start counting. Enablement is restricted to all machines advertising IRPERF capability, except those susceptible to an erratum that makes the IRPERF return bad values. That erratum occurs in Family 17h models 00-1fh [1], but not in F17h models 20h and above [2]. AFTER (on a family 17h model 31h machine): $ perf stat -e instructions,msr/irperf/ true Performance counter stats for 'true': 621,690 instructions 622,490 msr/irperf/ [1] Revision Guide for AMD Family 17h Models 00h-0Fh Processors [2] Revision Guide for AMD Family 17h Models 30h-3Fh Processors The revision guides are available from the bugzilla Link below. [ bp: Massage commit message. ] Fixes: aaf248848db50 ("perf/x86/msr: Add AMD IRPERF (Instructions Retired) performance counter") Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 Link: http://lkml.kernel.org/r/20200214201805.13830-1-kim.phillips@amd.com
2020-02-12KVM: nVMX: Fix some comment typos and coding styleMiaohe Lin
Fix some typos in the comments. Also fix coding style. [Sean Christopherson rewrites the comment of write_fault_to_shadow_pgtable field in struct kvm_vcpu_arch.] Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-09Merge tag 'x86-urgent-2020-02-09' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A set of fixes for X86: - Ensure that the PIT is set up when the local APIC is disable or configured in legacy mode. This is caused by an ordering issue introduced in the recent changes which skip PIT initialization when the TSC and APIC frequencies are already known. - Handle malformed SRAT tables during early ACPI parsing which caused an infinite loop anda boot hang. - Fix a long standing race in the affinity setting code which affects PCI devices with non-maskable MSI interrupts. The problem is caused by the non-atomic writes of the MSI address (destination APIC id) and data (vector) fields which the device uses to construct the MSI message. The non-atomic writes are mandated by PCI. If both fields change and the device raises an interrupt after writing address and before writing data, then the MSI block constructs a inconsistent message which causes interrupts to be lost and subsequent malfunction of the device. The fix is to redirect the interrupt to the new vector on the current CPU first and then switch it over to the new target CPU. This allows to observe an eventually raised interrupt in the transitional stage (old CPU, new vector) to be observed in the APIC IRR and retriggered on the new target CPU and the new vector. The potential spurious interrupts caused by this are harmless and can in the worst case expose a buggy driver (all handlers have to be able to deal with spurious interrupts as they can and do happen for various reasons). - Add the missing suspend/resume mechanism for the HYPERV hypercall page which prevents resume hibernation on HYPERV guests. This change got lost before the merge window. - Mask the IOAPIC before disabling the local APIC to prevent potentially stale IOAPIC remote IRR bits which cause stale interrupt lines after resume" * tag 'x86-urgent-2020-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apic: Mask IOAPIC entries when disabling the local APIC x86/hyperv: Suspend/resume the hypercall page for hibernation x86/apic/msi: Plug non-maskable MSI affinity race x86/boot: Handle malformed SRAT tables during early ACPI parsing x86/timer: Don't skip PIT setup when APIC is disabled or in legacy mode
2020-02-06Merge tag 'kvm-5.6-2' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull more KVM updates from Paolo Bonzini: "s390: - fix register corruption - ENOTSUPP/EOPNOTSUPP mixed - reset cleanups/fixes - selftests x86: - Bug fixes and cleanups - AMD support for APIC virtualization even in combination with in-kernel PIT or IOAPIC. MIPS: - Compilation fix. Generic: - Fix refcount overflow for zero page" * tag 'kvm-5.6-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (42 commits) KVM: vmx: delete meaningless vmx_decache_cr0_guest_bits() declaration KVM: x86: Mark CR4.UMIP as reserved based on associated CPUID bit x86: vmxfeatures: rename features for consistency with KVM and manual KVM: SVM: relax conditions for allowing MSR_IA32_SPEC_CTRL accesses KVM: x86: Fix perfctr WRMSR for running counters x86/kvm/hyper-v: don't allow to turn on unsupported VMX controls for nested guests x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs() kvm: mmu: Separate generating and setting mmio ptes kvm: mmu: Replace unsigned with unsigned int for PTE access KVM: nVMX: Remove stale comment from nested_vmx_load_cr3() KVM: MIPS: Fold comparecount_func() into comparecount_wakeup() KVM: MIPS: Fix a build error due to referencing not-yet-defined function x86/kvm: do not setup pv tlb flush when not paravirtualized KVM: fix overflow of zero page refcount with ksm running KVM: x86: Take a u64 when checking for a valid dr7 value KVM: x86: use raw clock values consistently KVM: x86: reorganize pvclock_gtod_data members KVM: nVMX: delete meaningless nested_vmx_run() declaration KVM: SVM: allow AVIC without split irqchip kvm: ioapic: Lazy update IOAPIC EOI ...
2020-02-06Merge tag 'pci-v5.6-fixes-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fixes from Bjorn Helgaas: - Define to_pci_sysdata() always to fix build breakage when !CONFIG_PCI (Jason A. Donenfeld) - Use PF PASID for VFs to fix VF IOMMU bind failures (Kuppuswamy Sathyanarayanan) * tag 'pci-v5.6-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI/ATS: Use PF PASID for VFs x86/PCI: Define to_pci_sysdata() even when !CONFIG_PCI
2020-02-05x86: vmxfeatures: rename features for consistency with KVM and manualPaolo Bonzini
Three of the feature bits in vmxfeatures.h have names that are different from the Intel SDM. The names have been adjusted recently in KVM but they were using the old name in the tip tree's x86/cpu branch. Adjust for consistency. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05kvm: i8254: Deactivate APICv when using in-kernel PIT re-injection mode.Suravee Suthikulpanit
AMD SVM AVIC accelerates EOI write and does not trap. This causes in-kernel PIT re-injection mode to fail since it relies on irq-ack notifier mechanism. So, APICv is activated only when in-kernel PIT is in discard mode e.g. w/ qemu option: -global kvm-pit.lost_tick_policy=discard Also, introduce APICV_INHIBIT_REASON_PIT_REINJ bit to be used for this reason. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05svm: Temporarily deactivate AVIC during ExtINT handlingSuravee Suthikulpanit
AMD AVIC does not support ExtINT. Therefore, AVIC must be temporary deactivated and fall back to using legacy interrupt injection via vINTR and interrupt window. Also, introduce APICV_INHIBIT_REASON_IRQWIN to be used for this reason. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> [Rename svm_request_update_avic to svm_toggle_avic_for_extint. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05svm: Deactivate AVIC when launching guest with nested SVM supportSuravee Suthikulpanit
Since AVIC does not currently work w/ nested virtualization, deactivate AVIC for the guest if setting CPUID Fn80000001_ECX[SVM] (i.e. indicate support for SVM, which is needed for nested virtualization). Also, introduce a new APICV_INHIBIT_REASON_NESTED bit to be used for this reason. Suggested-by: Alexander Graf <graf@amazon.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05kvm: x86: hyperv: Use APICv update request interfaceSuravee Suthikulpanit
Since disabling APICv has to be done for all vcpus on AMD-based system, adopt the newly introduced kvm_request_apicv_update() interface, and introduce a new APICV_INHIBIT_REASON_HYPERV. Also, remove the kvm_vcpu_deactivate_apicv() since no longer used. Cc: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05kvm: x86: Introduce x86 ops hook for pre-update APICvSuravee Suthikulpanit
AMD SVM AVIC needs to update APIC backing page mapping before changing APICv mode. Introduce struct kvm_x86_ops.pre_update_apicv_exec_ctrl function hook to be called prior KVM APICv update request to each vcpu. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05kvm: x86: Introduce APICv x86 ops for checking APIC inhibit reasonsSuravee Suthikulpanit
Inibit reason bits are used to determine if APICv deactivation is applicable for a particular hardware virtualization architecture. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05kvm: x86: Add support for dynamic APICv activationSuravee Suthikulpanit
Certain runtime conditions require APICv to be temporary deactivated during runtime. The current implementation only support run-time deactivation of APICv when Hyper-V SynIC is enabled, which is not temporary. In addition, for AMD, when APICv is (de)activated at runtime, all vcpus in the VM have to operate in the same mode. Thus the requesting vcpu must notify the others. So, introduce the following: * A new KVM_REQ_APICV_UPDATE request bit * Interfaces to request all vcpus to update APICv status * A new interface to update APICV-related parameters for each vcpu Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05KVM: x86: remove get_enable_apicv from kvm_x86_opsPaolo Bonzini
It is unused now. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>