summaryrefslogtreecommitdiffstats
path: root/arch/x86/kvm/mmu
AgeCommit message (Collapse)Author
2020-08-23treewide: Use fallthrough pseudo-keywordGustavo A. R. Silva
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-21KVM: Pass MMU notifier range flags to kvm_unmap_hva_range()Will Deacon
The 'flags' field of 'struct mmu_notifier_range' is used to indicate whether invalidate_range_{start,end}() are permitted to block. In the case of kvm_mmu_notifier_invalidate_range_start(), this field is not forwarded on to the architecture-specific implementation of kvm_unmap_hva_range() and therefore the backend cannot sensibly decide whether or not to block. Add an extra 'flags' parameter to kvm_unmap_hva_range() so that architectures are aware as to whether or not they are permitted to block. Cc: <stable@vger.kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Message-Id: <20200811102725.7121-2-will@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-08-06Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "s390: - implement diag318 x86: - Report last CPU for debugging - Emulate smaller MAXPHYADDR in the guest than in the host - .noinstr and tracing fixes from Thomas - nested SVM page table switching optimization and fixes Generic: - Unify shadow MMU cache data structures across architectures" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (127 commits) KVM: SVM: Fix sev_pin_memory() error handling KVM: LAPIC: Set the TDCR settable bits KVM: x86: Specify max TDP level via kvm_configure_mmu() KVM: x86/mmu: Rename max_page_level to max_huge_page_level KVM: x86: Dynamically calculate TDP level from max level and MAXPHYADDR KVM: VXM: Remove temporary WARN on expected vs. actual EPTP level mismatch KVM: x86: Pull the PGD's level from the MMU instead of recalculating it KVM: VMX: Make vmx_load_mmu_pgd() static KVM: x86/mmu: Add separate helper for shadow NPT root page role calc KVM: VMX: Drop a duplicate declaration of construct_eptp() KVM: nSVM: Correctly set the shadow NPT root level in its MMU role KVM: Using macros instead of magic values MIPS: KVM: Fix build error caused by 'kvm_run' cleanup KVM: nSVM: remove nonsensical EXITINFO1 adjustment on nested NPF KVM: x86: Add a capability for GUEST_MAXPHYADDR < HOST_MAXPHYADDR support KVM: VMX: optimize #PF injection when MAXPHYADDR does not match KVM: VMX: Add guest physical address check in EPT violation and misconfig KVM: VMX: introduce vmx_need_pf_intercept KVM: x86: update exception bitmap on CPUID changes KVM: x86: rename update_bp_intercept to update_exception_bitmap ...
2020-08-04Merge tag 'uninit-macro-v5.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull uninitialized_var() macro removal from Kees Cook: "This is long overdue, and has hidden too many bugs over the years. The series has several "by hand" fixes, and then a trivial treewide replacement. - Clean up non-trivial uses of uninitialized_var() - Update documentation and checkpatch for uninitialized_var() removal - Treewide removal of uninitialized_var()" * tag 'uninit-macro-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: compiler: Remove uninitialized_var() macro treewide: Remove uninitialized_var() usage checkpatch: Remove awareness of uninitialized_var() macro mm/debug_vm_pgtable: Remove uninitialized_var() usage f2fs: Eliminate usage of uninitialized_var() macro media: sur40: Remove uninitialized_var() usage KVM: PPC: Book3S PR: Remove uninitialized_var() usage clk: spear: Remove uninitialized_var() usage clk: st: Remove uninitialized_var() usage spi: davinci: Remove uninitialized_var() usage ide: Remove uninitialized_var() usage rtlwifi: rtl8192cu: Remove uninitialized_var() usage b43: Remove uninitialized_var() usage drbd: Remove uninitialized_var() usage x86/mm/numa: Remove uninitialized_var() usage docs: deprecated.rst: Add uninitialized_var()
2020-07-30KVM: x86: Specify max TDP level via kvm_configure_mmu()Sean Christopherson
Capture the max TDP level during kvm_configure_mmu() instead of using a kvm_x86_ops hook to do it at every vCPU creation. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200716034122.5998-10-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-30KVM: x86/mmu: Rename max_page_level to max_huge_page_levelSean Christopherson
Rename max_page_level to explicitly call out that it tracks the max huge page level so as to avoid confusion when a future patch moves the max TDP level, i.e. max root level, into the MMU and kvm_configure_mmu(). Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200716034122.5998-9-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-30KVM: x86: Dynamically calculate TDP level from max level and MAXPHYADDRSean Christopherson
Calculate the desired TDP level on the fly using the max TDP level and MAXPHYADDR instead of doing the same when CPUID is updated. This avoids the hidden dependency on cpuid_maxphyaddr() in vmx_get_tdp_level() and also standardizes the "use 5-level paging iff MAXPHYADDR > 48" behavior across x86. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200716034122.5998-8-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-30KVM: x86/mmu: Add separate helper for shadow NPT root page role calcSean Christopherson
Refactor the shadow NPT role calculation into a separate helper to better differentiate it from the non-nested shadow MMU, e.g. the NPT variant is never direct and derives its root level from the TDP level. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200716034122.5998-3-sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-30KVM: nSVM: Correctly set the shadow NPT root level in its MMU roleSean Christopherson
Move the initialization of shadow NPT MMU's shadow_root_level into kvm_init_shadow_npt_mmu() and explicitly set the level in the shadow NPT MMU's role to be the TDP level. This ensures the role and MMU levels are synchronized and also initialized before __kvm_mmu_new_pgd(), which consumes the level when attempting a fast PGD switch. Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Fixes: 9fa72119b24db ("kvm: x86: Introduce kvm_mmu_calc_root_page_role()") Fixes: a506fdd223426 ("KVM: nSVM: implement nested_svm_load_cr3() and use it for host->guest switch") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200716034122.5998-2-sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-16treewide: Remove uninitialized_var() usageKees Cook
Using uninitialized_var() is dangerous as it papers over real bugs[1] (or can in the future), and suppresses unrelated compiler warnings (e.g. "unused variable"). If the compiler thinks it is uninitialized, either simply initialize the variable or make compiler changes. In preparation for removing[2] the[3] macro[4], remove all remaining needless uses with the following script: git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \ xargs perl -pi -e \ 's/\buninitialized_var\(([^\)]+)\)/\1/g; s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;' drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid pathological white-space. No outstanding warnings were found building allmodconfig with GCC 9.3.0 for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64, alpha, and m68k. [1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/ [2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/ [3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/ [4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/ Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5 Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-10KVM: x86: mmu: Add guest physical address check in translate_gpa()Mohammed Gamal
Intel processors of various generations have supported 36, 39, 46 or 52 bits for physical addresses. Until IceLake introduced MAXPHYADDR==52, running on a machine with higher MAXPHYADDR than the guest more or less worked, because software that relied on reserved address bits (like KVM) generally used bit 51 as a marker and therefore the page faults where generated anyway. Unfortunately this is not true anymore if the host MAXPHYADDR is 52, and this can cause problems when migrating from a MAXPHYADDR<52 machine to one with MAXPHYADDR==52. Typically, the latter are machines that support 5-level page tables, so they can be identified easily from the LA57 CPUID bit. When that happens, the guest might have a physical address with reserved bits set, but the host won't see that and trap it. Hence, we need to check page faults' physical addresses against the guest's maximum physical memory and if it's exceeded, we need to add the PFERR_RSVD_MASK bits to the page fault error code. This patch does this for the MMU's page walks. The next patches will ensure that the correct exception and error code is produced whenever no host-reserved bits are set in page table entries. Signed-off-by: Mohammed Gamal <mgamal@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200710154811.418214-4-mgamal@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-10KVM: x86: mmu: Move translate_gpa() to mmu.cMohammed Gamal
Also no point of it being inline since it's always called through function pointers. So remove that. Signed-off-by: Mohammed Gamal <mgamal@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200710154811.418214-3-mgamal@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-10KVM: x86: drop superfluous mmu_check_root() from fast_pgd_switch()Vitaly Kuznetsov
The mmu_check_root() check in fast_pgd_switch() seems to be superfluous: when GPA is outside of the visible range cached_root_available() will fail for non-direct roots (as we can't have a matching one on the list) and we don't seem to care for direct ones. Also, raising #TF immediately when a non-existent GFN is written to CR3 doesn't seem to mach architectural behavior. Drop the check. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200710141157.1640173-10-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-10KVM: nSVM: implement nested_svm_load_cr3() and use it for host->guest switchVitaly Kuznetsov
Undesired triple fault gets injected to L1 guest on SVM when L2 is launched with certain CR3 values. #TF is raised by mmu_check_root() check in fast_pgd_switch() and the root cause is that when kvm_set_cr3() is called from nested_prepare_vmcb_save() with NPT enabled CR3 points to a nGPA so we can't check it with kvm_is_visible_gfn(). Using generic kvm_set_cr3() when switching to nested guest is not a great idea as we'll have to distinguish between 'real' CR3s and 'nested' CR3s to e.g. not call kvm_mmu_new_pgd() with nGPA. Following nVMX implement nested-specific nested_svm_load_cr3() doing the job. To support the change, nested_svm_load_cr3() needs to be re-ordered with nested_svm_init_mmu_context(). Note: the current implementation is sub-optimal as we always do TLB flush/MMU sync but this is still an improvement as we at least stop doing kvm_mmu_reset_context(). Fixes: 7c390d350f8b ("kvm: x86: Add fast CR3 switch code path") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200710141157.1640173-8-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-10KVM: MMU: stop dereferencing vcpu->arch.mmu to get the context for MMU initPaolo Bonzini
kvm_init_shadow_mmu() was actually the only function that could be called with different vcpu->arch.mmu values. Now that kvm_init_shadow_npt_mmu() is separated from kvm_init_shadow_mmu(), we always know the MMU context we need to use and there is no need to dereference vcpu->arch.mmu pointer. Based on a patch by Vitaly Kuznetsov <vkuznets@redhat.com>. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200710141157.1640173-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-10KVM: nSVM: split kvm_init_shadow_npt_mmu() from kvm_init_shadow_mmu()Vitaly Kuznetsov
As a preparatory change for moving kvm_mmu_new_pgd() from nested_prepare_vmcb_save() to nested_svm_init_mmu_context() split kvm_init_shadow_npt_mmu() from kvm_init_shadow_mmu(). This also makes the code look more like nVMX (kvm_init_shadow_ept_mmu()). No functional change intended. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200710141157.1640173-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09KVM: Move x86's MMU memory cache helpers to common KVM codeSean Christopherson
Move x86's memory cache helpers to common KVM code so that they can be reused by arm64 and MIPS in future patches. Suggested-by: Christoffer Dall <christoffer.dall@arm.com> Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200703023545.8771-16-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09KVM: x86/mmu: Prepend "kvm_" to memory cache helpers that will be globalSean Christopherson
Rename the memory helpers that will soon be moved to common code and be made globaly available via linux/kvm_host.h. "mmu" alone is not a sufficient namespace for globally available KVM symbols. Opportunistically add "nr_" in mmu_memory_cache_free_objects() to make it clear the function returns the number of free objects, as opposed to freeing existing objects. Suggested-by: Christoffer Dall <christoffer.dall@arm.com> Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200703023545.8771-14-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09KVM: x86/mmu: Skip filling the gfn cache for guaranteed direct MMU topupsSean Christopherson
Don't bother filling the gfn array cache when the caller is a fully direct MMU, i.e. won't need a gfn array for shadow pages. Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200703023545.8771-13-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09KVM: x86/mmu: Zero allocate shadow pages (outside of mmu_lock)Sean Christopherson
Set __GFP_ZERO for the shadow page memory cache and drop the explicit clear_page() from kvm_mmu_get_page(). This moves the cost of zeroing a page to the allocation time of the physical page, i.e. when topping up the memory caches, and thus avoids having to zero out an entire page while holding mmu_lock. Cc: Peter Feiner <pfeiner@google.com> Cc: Peter Shier <pshier@google.com> Cc: Junaid Shahid <junaids@google.com> Cc: Jim Mattson <jmattson@google.com> Suggested-by: Ben Gardon <bgardon@google.com> Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200703023545.8771-12-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09KVM: x86/mmu: Make __GFP_ZERO a property of the memory cacheSean Christopherson
Add a gfp_zero flag to 'struct kvm_mmu_memory_cache' and use it to control __GFP_ZERO instead of hardcoding a call to kmem_cache_zalloc(). A future patch needs such a flag for the __get_free_page() path, as gfn arrays do not need/want the allocator to zero the memory. Convert the kmem_cache paths to __GFP_ZERO now so as to avoid a weird and inconsistent API in the future. No functional change intended. Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200703023545.8771-11-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09KVM: x86/mmu: Separate the memory caches for shadow pages and gfn arraysSean Christopherson
Use separate caches for allocating shadow pages versus gfn arrays. This sets the stage for specifying __GFP_ZERO when allocating shadow pages without incurring extra cost for gfn arrays. No functional change intended. Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200703023545.8771-10-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09KVM: x86/mmu: Clean up the gorilla math in mmu_topup_memory_caches()Sean Christopherson
Clean up the minimums in mmu_topup_memory_caches() to document the driving mechanisms behind the minimums. Now that encountering an empty cache is unlikely to trigger BUG_ON(), it is less dangerous to be more precise when defining the minimums. For rmaps, the logic is 1 parent PTE per level, plus a single rmap, and prefetched rmaps. The extra objects in the current '8 + PREFETCH' minimum came about due to an abundance of paranoia in commit c41ef344de212 ("KVM: MMU: increase per-vcpu rmap cache alloc size"), i.e. it could have increased the minimum to 2 rmaps. Furthermore, the unexpected extra rmap case was killed off entirely by commits f759e2b4c728c ("KVM: MMU: avoid pte_list_desc running out in kvm_mmu_pte_write") and f5a1e9f89504f ("KVM: MMU: remove call to kvm_mmu_pte_write from walk_addr"). For the so called page cache, replace '8' with 2*PT64_ROOT_MAX_LEVEL. The 2x multiplier is needed because the cache is used for both shadow pages and gfn arrays for indirect MMUs. And finally, for page headers, replace '4' with PT64_ROOT_MAX_LEVEL. Note, KVM now supports 5-level paging, i.e. the old minimums that used a baseline derived from 4-level paging were technically wrong. But, KVM always allocates roots in a separate flow, e.g. it's impossible in the current implementation to actually need 5 new shadow pages in a single flow. Use PT64_ROOT_MAX_LEVEL unmodified instead of subtracting 1, as the direct usage is likely more intuitive to uninformed readers, and the inflated minimum is unlikely to affect functionality in practice. Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200703023545.8771-9-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09KVM: x86/mmu: Topup memory caches after walking GVA->GPASean Christopherson
Topup memory caches after walking the GVA->GPA translation during a shadow page fault, there is no need to ensure the caches are full when walking the GVA. As of commit f5a1e9f89504f ("KVM: MMU: remove call to kvm_mmu_pte_write from walk_addr"), the FNAME(walk_addr) flow no longer add rmaps via kvm_mmu_pte_write(). This avoids allocating memory in the case that the GVA is unmapped in the guest, and also provides a paper trail of why/when the memory caches need to be filled. Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200703023545.8771-8-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09KVM: x86/mmu: Move fast_page_fault() call above mmu_topup_memory_caches()Sean Christopherson
Avoid refilling the memory caches and potentially slow reclaim/swap when handling a fast page fault, which does not need to allocate any new objects. Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200703023545.8771-7-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09KVM: x86/mmu: Try to avoid crashing KVM if a MMU memory cache is emptySean Christopherson
Attempt to allocate a new object instead of crashing KVM (and likely the kernel) if a memory cache is unexpectedly empty. Use GFP_ATOMIC for the allocation as the caches are used while holding mmu_lock. The immediate BUG_ON() makes the code unnecessarily explosive and led to confusing minimums being used in the past, e.g. allocating 4 objects where 1 would suffice. Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200703023545.8771-6-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09KVM: x86/mmu: Remove superfluous gotos from mmu_topup_memory_caches()Sean Christopherson
Return errors directly from mmu_topup_memory_caches() instead of branching to a label that does the same. No functional change intended. Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200703023545.8771-5-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09KVM: x86/mmu: Use consistent "mc" name for kvm_mmu_memory_cache localsSean Christopherson
Use "mc" for local variables to shorten line lengths and provide consistent names, which will be especially helpful when some of the helpers are moved to common KVM code in future patches. No functional change intended. Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200703023545.8771-4-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09KVM: x86/mmu: Consolidate "page" variant of memory cache helpersSean Christopherson
Drop the "page" variants of the topup/free memory cache helpers, using the existence of an associated kmem_cache to select the correct alloc or free routine. No functional change intended. Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200703023545.8771-3-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09KVM: x86/mmu: Track the associated kmem_cache in the MMU cachesSean Christopherson
Track the kmem_cache used for non-page KVM MMU memory caches instead of passing in the associated kmem_cache when filling the cache. This will allow consolidating code and other cleanups. No functional change intended. Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200703023545.8771-2-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09KVM: x86: take as_id into account when checking PGDVitaly Kuznetsov
OVMF booted guest running on shadow pages crashes on TRIPLE FAULT after enabling paging from SMM. The crash is triggered from mmu_check_root() and is caused by kvm_is_visible_gfn() searching through memslots with as_id = 0 while vCPU may be in a different context (address space). Introduce kvm_vcpu_is_visible_gfn() and use it from mmu_check_root(). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200708140023.1476020-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08KVM: x86/mmu: Rename page_header() to to_shadow_page()Sean Christopherson
Rename KVM's accessor for retrieving a 'struct kvm_mmu_page' from the associated host physical address to better convey what the function is doing. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200622202034.15093-7-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08KVM: x86/mmu: Add sptep_to_sp() helper to wrap shadow page lookupSean Christopherson
Introduce sptep_to_sp() to reduce the boilerplate code needed to get the shadow page associated with a spte pointer, and to improve readability as it's not immediately obvious that "page_header" is a KVM-specific accessor for retrieving a shadow page. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200622202034.15093-6-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08KVM: x86/mmu: Make kvm_mmu_page definition and accessor internal-onlySean Christopherson
Make 'struct kvm_mmu_page' MMU-only, nothing outside of the MMU should be poking into the gory details of shadow pages. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200622202034.15093-5-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08KVM: x86/mmu: Add MMU-internal headerSean Christopherson
Add mmu/mmu_internal.h to hold declarations and definitions that need to be shared between various mmu/ files, but should not be used by anything outside of the MMU. Begin populating mmu_internal.h with declarations of the helpers used by page_track.c. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200622202034.15093-4-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08KVM: x86/mmu: Move kvm_mmu_available_pages() into mmu.cSean Christopherson
Move kvm_mmu_available_pages() from mmu.h to mmu.c, it has a single caller and has no business being exposed via mmu.h. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200622202034.15093-3-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08KVM: x86/mmu: Move mmu_audit.c and mmutrace.h into the mmu/ sub-directorySean Christopherson
Move mmu_audit.c and mmutrace.h under mmu/ where they belong. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200622202034.15093-2-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08KVM: x86/mmu: Exit to userspace on make_mmu_pages_available() errorSean Christopherson
Propagate any error returned by make_mmu_pages_available() out to userspace instead of resuming the guest if the error occurs while handling a page fault. Now that zapping the oldest MMU pages skips active roots, i.e. fails if and only if there are no zappable pages, there is no chance for a false positive, i.e. no chance of returning a spurious error to userspace. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200623193542.7554-5-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08KVM: x86/mmu: Batch zap MMU pages when shrinking the slabSean Christopherson
Use the recently introduced kvm_mmu_zap_oldest_mmu_pages() to batch zap MMU pages when shrinking a slab. This fixes a long standing issue where KVM's shrinker implementation is completely ineffective due to zapping only a single page. E.g. without batch zapping, forcing a scan via drop_caches basically has no impact on a VM with ~2k shadow pages. With batch zapping, the number of shadow pages can be reduced to a few hundred pages in one or two runs of drop_caches. Note, if the default batch size (currently 128) is problematic, e.g. zapping 128 pages holds mmu_lock for too long, KVM can bound the batch size by setting @batch in mmu_shrinker. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200623193542.7554-4-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08KVM: x86/mmu: Batch zap MMU pages when recycling oldest pagesSean Christopherson
Collect MMU pages for zapping in a loop when making MMU pages available, and skip over active roots when doing so as zapping an active root can never immediately free up a page. Batching the zapping avoids multiple remote TLB flushes and remedies the issue where the loop would bail early if an active root was encountered. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200623193542.7554-3-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08KVM: x86/mmu: Don't put invalid SPs back on the list of active pagesSean Christopherson
Delete a shadow page from the invalidation list instead of throwing it back on the list of active pages when it's a root shadow page with active users. Invalid active root pages will be explicitly freed by mmu_free_root_page() when the root_count hits zero, i.e. they don't need to be put on the active list to avoid leakage. Use sp->role.invalid to detect that a shadow page has already been zapped, i.e. is not on a list. WARN if an invalid page is encountered when zapping pages, as it should now be impossible. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200623193542.7554-2-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08KVM: x86/mmu: Optimize MMU page cache lookup for fully direct MMUsSean Christopherson
Skip the unsync checks and the write flooding clearing for fully direct MMUs, which are guaranteed to not have unsync'd or indirect pages (write flooding detection only applies to indirect pages). For TDP, this avoids unnecessary memory reads and writes, and for the write flooding count will also avoid dirtying a cache line (unsync_child_bitmap itself consumes a cache line, i.e. write_flooding_count is guaranteed to be in a different cache line than parent_ptes). Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200623194027.23135-3-sean.j.christopherson@intel.com> Reviewed-By: Jon Cargille <jcargill@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08KVM: x86/mmu: Avoid multiple hash lookups in kvm_get_mmu_page()Sean Christopherson
Refactor for_each_valid_sp() to take the list of shadow pages instead of retrieving it from a gfn to avoid doing the gfn->list hash and lookup multiple times during kvm_get_mmu_page(). Cc: Peter Feiner <pfeiner@google.com> Cc: Jon Cargille <jcargill@google.com> Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200623194027.23135-2-sean.j.christopherson@intel.com> Reviewed-By: Jon Cargille <jcargill@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08KVM: x86/mmu: Make .write_log_dirty a nested operationSean Christopherson
Move .write_log_dirty() into kvm_x86_nested_ops to help differentiate it from the non-nested dirty log hooks. And because it's a nested-only operation. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200622215832.22090-5-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08KVM: x86/mmu: Drop kvm_arch_write_log_dirty() wrapperSean Christopherson
Drop kvm_arch_write_log_dirty() in favor of invoking .write_log_dirty() directly from FNAME(update_accessed_dirty_bits). "kvm_arch" is usually used for x86 functions that are invoked from generic KVM, and implies that there are external callers, neither of which is true. Remove the check for a non-NULL kvm_x86_ops hook as the call is wrapped in PTTYPE_EPT and is unconditionally set by VMX. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200622215832.22090-3-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08KVM: async_pf: change kvm_setup_async_pf()/kvm_arch_setup_async_pf() return ↵Vitaly Kuznetsov
type to bool Unlike normal 'int' functions returning '0' on success, kvm_setup_async_pf()/ kvm_arch_setup_async_pf() return '1' when a job to handle page fault asynchronously was scheduled and '0' otherwise. To avoid the confusion change return type to 'bool'. No functional change intended. Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200615121334.91300-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08KVM: x86: drop KVM_PV_REASON_PAGE_READY case from kvm_handle_page_fault()Vitaly Kuznetsov
KVM guest code in Linux enables APF only when KVM_FEATURE_ASYNC_PF_INT is supported, this means we will never see KVM_PV_REASON_PAGE_READY when handling page fault vmexit in KVM. While on it, make sure we only follow genuine page fault path when APF reason is zero. If we happen to see something else this means that the underlying hypervisor is misbehaving. Leave WARN_ON_ONCE() to catch that. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-30KVM: x86: bit 8 of non-leaf PDPEs is not reservedPaolo Bonzini
Bit 8 would be the "global" bit, which does not quite make sense for non-leaf page table entries. Intel ignores it; AMD ignores it in PDEs and PDPEs, but reserves it in PML4Es. Probably, earlier versions of the AMD manual documented it as reserved in PDPEs as well, and that behavior made it into KVM as well as kvm-unit-tests; fix it. Cc: stable@vger.kernel.org Reported-by: Nadav Amit <namit@vmware.com> Fixes: a0c0feb57992 ("KVM: x86: reserve bit 8 of non-leaf PDPEs and PML4Es in 64-bit mode on AMD", 2014-09-03) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-22KVM: nVMX: Plumb L2 GPA through to PML emulationSean Christopherson
Explicitly pass the L2 GPA to kvm_arch_write_log_dirty(), which for all intents and purposes is vmx_write_pml_buffer(), instead of having the latter pull the GPA from vmcs.GUEST_PHYSICAL_ADDRESS. If the dirty bit update is the result of KVM emulation (rare for L2), then the GPA in the VMCS may be stale and/or hold a completely unrelated GPA. Fixes: c5f983f6e8455 ("nVMX: Implement emulated Page Modification Logging") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200622215832.22090-2-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-22KVM: x86/mmu: Avoid mixing gpa_t with gfn_t in walk_addr_generic()Vitaly Kuznetsov
translate_gpa() returns a GPA, assigning it to 'real_gfn' seems obviously wrong. There is no real issue because both 'gpa_t' and 'gfn_t' are u64 and we don't use the value in 'real_gfn' as a GFN, we do real_gfn = gpa_to_gfn(real_gfn); instead. 'If you see a "buffalo" sign on an elephant's cage, do not trust your eyes', but let's fix it for good. No functional change intended. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200622151435.752560-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>