summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2015-11-05 16:26:26 -0800
committerLinus Torvalds <torvalds@linux-foundation.org>2015-11-05 16:26:26 -0800
commit933425fb0010bd02bd459b41e63082756818ffce (patch)
tree1cbc6c2035b9dcff8cb265c9ac562cbee7c6bb82
parenta3e7531535a0c6e5acbaa5436f37933bb471aa95 (diff)
parenta3eaa8649e4c6a6afdafaa04b9114fb230617bb1 (diff)
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini: "First batch of KVM changes for 4.4. s390: A bunch of fixes and optimizations for interrupt and time handling. PPC: Mostly bug fixes. ARM: No big features, but many small fixes and prerequisites including: - a number of fixes for the arch-timer - introducing proper level-triggered semantics for the arch-timers - a series of patches to synchronously halt a guest (prerequisite for IRQ forwarding) - some tracepoint improvements - a tweak for the EL2 panic handlers - some more VGIC cleanups getting rid of redundant state x86: Quite a few changes: - support for VT-d posted interrupts (i.e. PCI devices can inject interrupts directly into vCPUs). This introduces a new component (in virt/lib/) that connects VFIO and KVM together. The same infrastructure will be used for ARM interrupt forwarding as well. - more Hyper-V features, though the main one Hyper-V synthetic interrupt controller will have to wait for 4.5. These will let KVM expose Hyper-V devices. - nested virtualization now supports VPID (same as PCID but for vCPUs) which makes it quite a bit faster - for future hardware that supports NVDIMM, there is support for clflushopt, clwb, pcommit - support for "split irqchip", i.e. LAPIC in kernel + IOAPIC/PIC/PIT in userspace, which reduces the attack surface of the hypervisor - obligatory smattering of SMM fixes - on the guest side, stable scheduler clock support was rewritten to not require help from the hypervisor" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (123 commits) KVM: VMX: Fix commit which broke PML KVM: x86: obey KVM_X86_QUIRK_CD_NW_CLEARED in kvm_set_cr0() KVM: x86: allow RSM from 64-bit mode KVM: VMX: fix SMEP and SMAP without EPT KVM: x86: move kvm_set_irq_inatomic to legacy device assignment KVM: device assignment: remove pointless #ifdefs KVM: x86: merge kvm_arch_set_irq with kvm_set_msi_inatomic KVM: x86: zero apic_arb_prio on reset drivers/hv: share Hyper-V SynIC constants with userspace KVM: x86: handle SMBASE as physical address in RSM KVM: x86: add read_phys to x86_emulate_ops KVM: x86: removing unused variable KVM: don't pointlessly leave KVM_COMPAT=y in non-KVM configs KVM: arm/arm64: Merge vgic_set_lr() and vgic_sync_lr_elrsr() KVM: arm/arm64: Clean up vgic_retire_lr() and surroundings KVM: arm/arm64: Optimize away redundant LR tracking KVM: s390: use simple switch statement as multiplexer KVM: s390: drop useless newline in debugging data KVM: s390: SCA must not cross page boundaries KVM: arm: Do not indent the arguments of DECLARE_BITMAP ...
-rw-r--r--Documentation/kernel-parameters.txt1
-rw-r--r--Documentation/virtual/kvm/api.txt52
-rw-r--r--Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt187
-rw-r--r--Documentation/virtual/kvm/devices/arm-vgic.txt18
-rw-r--r--Documentation/virtual/kvm/locking.txt12
-rw-r--r--MAINTAINERS7
-rw-r--r--Makefile10
-rw-r--r--arch/arm/include/asm/kvm_arm.h20
-rw-r--r--arch/arm/include/asm/kvm_host.h5
-rw-r--r--arch/arm/kvm/Kconfig2
-rw-r--r--arch/arm/kvm/arm.c76
-rw-r--r--arch/arm/kvm/psci.c10
-rw-r--r--arch/arm/kvm/trace.h10
-rw-r--r--arch/arm64/include/asm/kvm_arm.h16
-rw-r--r--arch/arm64/include/asm/kvm_host.h5
-rw-r--r--arch/arm64/kvm/Kconfig2
-rw-r--r--arch/arm64/kvm/hyp.S8
-rw-r--r--arch/mips/include/asm/kvm_host.h2
-rw-r--r--arch/powerpc/include/asm/disassemble.h5
-rw-r--r--arch/powerpc/include/asm/kvm_host.h2
-rw-r--r--arch/powerpc/include/asm/reg_booke.h6
-rw-r--r--arch/powerpc/kvm/book3s_64_mmu_hv.c3
-rw-r--r--arch/powerpc/kvm/book3s_hv_rm_mmu.c2
-rw-r--r--arch/powerpc/kvm/book3s_hv_rmhandlers.S29
-rw-r--r--arch/powerpc/kvm/e500.c3
-rw-r--r--arch/powerpc/kvm/e500_emulate.c19
-rw-r--r--arch/powerpc/kvm/e500_mmu_host.c4
-rw-r--r--arch/powerpc/kvm/powerpc.c3
-rw-r--r--arch/s390/include/asm/kvm_host.h2
-rw-r--r--arch/s390/kvm/intercept.c42
-rw-r--r--arch/s390/kvm/interrupt.c116
-rw-r--r--arch/s390/kvm/kvm-s390.c58
-rw-r--r--arch/s390/kvm/kvm-s390.h35
-rw-r--r--arch/s390/kvm/priv.c19
-rw-r--r--arch/x86/include/asm/irq_remapping.h10
-rw-r--r--arch/x86/include/asm/kvm_emulate.h10
-rw-r--r--arch/x86/include/asm/kvm_host.h38
-rw-r--r--arch/x86/include/asm/vmx.h3
-rw-r--r--arch/x86/include/uapi/asm/hyperv.h18
-rw-r--r--arch/x86/include/uapi/asm/vmx.h4
-rw-r--r--arch/x86/kernel/kvmclock.c46
-rw-r--r--arch/x86/kvm/Kconfig2
-rw-r--r--arch/x86/kvm/assigned-dev.c62
-rw-r--r--arch/x86/kvm/cpuid.c2
-rw-r--r--arch/x86/kvm/cpuid.h37
-rw-r--r--arch/x86/kvm/emulate.c35
-rw-r--r--arch/x86/kvm/hyperv.c31
-rw-r--r--arch/x86/kvm/i8254.c4
-rw-r--r--arch/x86/kvm/ioapic.c29
-rw-r--r--arch/x86/kvm/ioapic.h15
-rw-r--r--arch/x86/kvm/irq.c40
-rw-r--r--arch/x86/kvm/irq.h27
-rw-r--r--arch/x86/kvm/irq_comm.c129
-rw-r--r--arch/x86/kvm/lapic.c127
-rw-r--r--arch/x86/kvm/lapic.h7
-rw-r--r--arch/x86/kvm/mmu.c91
-rw-r--r--arch/x86/kvm/paging_tmpl.h19
-rw-r--r--arch/x86/kvm/svm.c43
-rw-r--r--arch/x86/kvm/trace.h51
-rw-r--r--arch/x86/kvm/vmx.c750
-rw-r--r--arch/x86/kvm/x86.c256
-rw-r--r--drivers/hv/hyperv_vmbus.h5
-rw-r--r--drivers/iommu/irq_remapping.c12
-rw-r--r--drivers/vfio/Kconfig1
-rw-r--r--drivers/vfio/pci/Kconfig1
-rw-r--r--drivers/vfio/pci/vfio_pci_intrs.c9
-rw-r--r--drivers/vfio/pci/vfio_pci_private.h2
-rw-r--r--include/kvm/arm_arch_timer.h4
-rw-r--r--include/kvm/arm_vgic.h16
-rw-r--r--include/linux/hyperv.h1
-rw-r--r--include/linux/irqbypass.h90
-rw-r--r--include/linux/kvm_host.h42
-rw-r--r--include/linux/kvm_irqfd.h71
-rw-r--r--include/uapi/linux/kvm.h7
-rw-r--r--kernel/sched/cputime.c2
-rw-r--r--virt/Makefile1
-rw-r--r--virt/kvm/Kconfig5
-rw-r--r--virt/kvm/arm/arch_timer.c173
-rw-r--r--virt/kvm/arm/trace.h63
-rw-r--r--virt/kvm/arm/vgic-v2.c6
-rw-r--r--virt/kvm/arm/vgic-v3.c6
-rw-r--r--virt/kvm/arm/vgic.c308
-rw-r--r--virt/kvm/async_pf.c4
-rw-r--r--virt/kvm/eventfd.c190
-rw-r--r--virt/kvm/irqchip.c18
-rw-r--r--virt/kvm/kvm_main.c11
-rw-r--r--virt/lib/Kconfig2
-rw-r--r--virt/lib/Makefile1
-rw-r--r--virt/lib/irqbypass.c257
89 files changed, 2956 insertions, 1029 deletions
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 688760f790b1..816bf2fe55f5 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1585,6 +1585,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
nosid disable Source ID checking
no_x2apic_optout
BIOS x2APIC opt-out request will be ignored
+ nopost disable Interrupt Posting
iomem= Disable strict checking of access to MMIO memory
strict regions from userspace.
diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 29ece601008e..092ee9fbaf2b 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -401,10 +401,9 @@ Capability: basic
Architectures: x86, ppc, mips
Type: vcpu ioctl
Parameters: struct kvm_interrupt (in)
-Returns: 0 on success, -1 on error
+Returns: 0 on success, negative on failure.
-Queues a hardware interrupt vector to be injected. This is only
-useful if in-kernel local APIC or equivalent is not used.
+Queues a hardware interrupt vector to be injected.
/* for KVM_INTERRUPT */
struct kvm_interrupt {
@@ -414,7 +413,14 @@ struct kvm_interrupt {
X86:
-Note 'irq' is an interrupt vector, not an interrupt pin or line.
+Returns: 0 on success,
+ -EEXIST if an interrupt is already enqueued
+ -EINVAL the the irq number is invalid
+ -ENXIO if the PIC is in the kernel
+ -EFAULT if the pointer is invalid
+
+Note 'irq' is an interrupt vector, not an interrupt pin or line. This
+ioctl is useful if the in-kernel PIC is not used.
PPC:
@@ -1598,7 +1604,7 @@ provided event instead of triggering an exit.
struct kvm_ioeventfd {
__u64 datamatch;
__u64 addr; /* legal pio/mmio address */
- __u32 len; /* 1, 2, 4, or 8 bytes */
+ __u32 len; /* 0, 1, 2, 4, or 8 bytes */
__s32 fd;
__u32 flags;
__u8 pad[36];
@@ -1621,6 +1627,10 @@ to the registered address is equal to datamatch in struct kvm_ioeventfd.
For virtio-ccw devices, addr contains the subchannel id and datamatch the
virtqueue index.
+With KVM_CAP_IOEVENTFD_ANY_LENGTH, a zero length ioeventfd is allowed, and
+the kernel will ignore the length of guest write and may get a faster vmexit.
+The speedup may only apply to specific architectures, but the ioeventfd will
+work anyway.
4.60 KVM_DIRTY_TLB
@@ -3309,6 +3319,18 @@ Valid values for 'type' are:
to ignore the request, or to gather VM memory core dump and/or
reset/shutdown of the VM.
+ /* KVM_EXIT_IOAPIC_EOI */
+ struct {
+ __u8 vector;
+ } eoi;
+
+Indicates that the VCPU's in-kernel local APIC received an EOI for a
+level-triggered IOAPIC interrupt. This exit only triggers when the
+IOAPIC is implemented in userspace (i.e. KVM_CAP_SPLIT_IRQCHIP is enabled);
+the userspace IOAPIC should process the EOI and retrigger the interrupt if
+it is still asserted. Vector is the LAPIC interrupt vector for which the
+EOI was received.
+
/* Fix the size of the union. */
char padding[256];
};
@@ -3627,6 +3649,26 @@ struct {
KVM handlers should exit to userspace with rc = -EREMOTE.
+7.5 KVM_CAP_SPLIT_IRQCHIP
+
+Architectures: x86
+Parameters: args[0] - number of routes reserved for userspace IOAPICs
+Returns: 0 on success, -1 on error
+
+Create a local apic for each processor in the kernel. This can be used
+instead of KVM_CREATE_IRQCHIP if the userspace VMM wishes to emulate the
+IOAPIC and PIC (and also the PIT, even though this has to be enabled
+separately).
+
+This capability also enables in kernel routing of interrupt requests;
+when KVM_CAP_SPLIT_IRQCHIP only routes of KVM_IRQ_ROUTING_MSI type are
+used in the IRQ routing table. The first args[0] MSI routes are reserved
+for the IOAPIC pins. Whenever the LAPIC receives an EOI for these routes,
+a KVM_EXIT_IOAPIC_EOI vmexit will be reported to userspace.
+
+Fails if VCPU has already been created, or if the irqchip is already in the
+kernel (i.e. KVM_CREATE_IRQCHIP has already been called).
+
8. Other capabilities.
----------------------
diff --git a/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
new file mode 100644
index 000000000000..38bca2835278
--- /dev/null
+++ b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
@@ -0,0 +1,187 @@
+KVM/ARM VGIC Forwarded Physical Interrupts
+==========================================
+
+The KVM/ARM code implements software support for the ARM Generic
+Interrupt Controller's (GIC's) hardware support for virtualization by
+allowing software to inject virtual interrupts to a VM, which the guest
+OS sees as regular interrupts. The code is famously known as the VGIC.
+
+Some of these virtual interrupts, however, correspond to physical
+interrupts from real physical devices. One example could be the
+architected timer, which itself supports virtualization, and therefore
+lets a guest OS program the hardware device directly to raise an
+interrupt at some point in time. When such an interrupt is raised, the
+host OS initially handles the interrupt and must somehow signal this
+event as a virtual interrupt to the guest. Another example could be a
+passthrough device, where the physical interrupts are initially handled
+by the host, but the device driver for the device lives in the guest OS
+and KVM must therefore somehow inject a virtual interrupt on behalf of
+the physical one to the guest OS.
+
+These virtual interrupts corresponding to a physical interrupt on the
+host are called forwarded physical interrupts, but are also sometimes
+referred to as 'virtualized physical interrupts' and 'mapped interrupts'.
+
+Forwarded physical interrupts are handled slightly differently compared
+to virtual interrupts generated purely by a software emulated device.
+
+
+The HW bit
+----------
+Virtual interrupts are signalled to the guest by programming the List
+Registers (LRs) on the GIC before running a VCPU. The LR is programmed
+with the virtual IRQ number and the state of the interrupt (Pending,
+Active, or Pending+Active). When the guest ACKs and EOIs a virtual
+interrupt, the LR state moves from Pending to Active, and finally to
+inactive.
+
+The LRs include an extra bit, called the HW bit. When this bit is set,
+KVM must also program an additional field in the LR, the physical IRQ
+number, to link the virtual with the physical IRQ.
+
+When the HW bit is set, KVM must EITHER set the Pending OR the Active
+bit, never both at the same time.
+
+Setting the HW bit causes the hardware to deactivate the physical
+interrupt on the physical distributor when the guest deactivates the
+corresponding virtual interrupt.
+
+
+Forwarded Physical Interrupts Life Cycle
+----------------------------------------
+
+The state of forwarded physical interrupts is managed in the following way:
+
+ - The physical interrupt is acked by the host, and becomes active on
+ the physical distributor (*).
+ - KVM sets the LR.Pending bit, because this is the only way the GICV
+ interface is going to present it to the guest.
+ - LR.Pending will stay set as long as the guest has not acked the interrupt.
+ - LR.Pending transitions to LR.Active on the guest read of the IAR, as
+ expected.
+ - On guest EOI, the *physical distributor* active bit gets cleared,
+ but the LR.Active is left untouched (set).
+ - KVM clears the LR on VM exits when the physical distributor
+ active state has been cleared.
+
+(*): The host handling is slightly more complicated. For some forwarded
+interrupts (shared), KVM directly sets the active state on the physical
+distributor before entering the guest, because the interrupt is never actually
+handled on the host (see details on the timer as an example below). For other
+forwarded interrupts (non-shared) the host does not deactivate the interrupt
+when the host ISR completes, but leaves the interrupt active until the guest
+deactivates it. Leaving the interrupt active is allowed, because Linux
+configures the physical GIC with EOIMode=1, which causes EOI operations to
+perform a priority drop allowing the GIC to receive other interrupts of the
+default priority.
+
+
+Forwarded Edge and Level Triggered PPIs and SPIs
+------------------------------------------------
+Forwarded physical interrupts injected should always be active on the
+physical distributor when injected to a guest.
+
+Level-triggered interrupts will keep the interrupt line to the GIC
+asserted, typically until the guest programs the device to deassert the
+line. This means that the interrupt will remain pending on the physical
+distributor until the guest has reprogrammed the device. Since we
+always run the VM with interrupts enabled on the CPU, a pending
+interrupt will exit the guest as soon as we switch into the guest,
+preventing the guest from ever making progress as the process repeats
+over and over. Therefore, the active state on the physical distributor
+must be set when entering the guest, preventing the GIC from forwarding
+the pending interrupt to the CPU. As soon as the guest deactivates the
+interrupt, the physical line is sampled by the hardware again and the host
+takes a new interrupt if and only if the physical line is still asserted.
+
+Edge-triggered interrupts do not exhibit the same problem with
+preventing guest execution that level-triggered interrupts do. One
+option is to not use HW bit at all, and inject edge-triggered interrupts
+from a physical device as pure virtual interrupts. But that would
+potentially slow down handling of the interrupt in the guest, because a
+physical interrupt occurring in the middle of the guest ISR would
+preempt the guest for the host to handle the interrupt. Additionally,
+if you configure the system to handle interrupts on a separate physical
+core from that running your VCPU, you still have to interrupt the VCPU
+to queue the pending state onto the LR, even though the guest won't use
+this information until the guest ISR completes. Therefore, the HW
+bit should always be set for forwarded edge-triggered interrupts. With
+the HW bit set, the virtual interrupt is injected and additional
+physical interrupts occurring before the guest deactivates the interrupt
+simply mark the state on the physical distributor as Pending+Active. As
+soon as the guest deactivates the interrupt, the host takes another
+interrupt if and only if there was a physical interrupt between injecting
+the forwarded interrupt to the guest and the guest deactivating the
+interrupt.
+
+Consequently, whenever we schedule a VCPU with one or more LRs with the
+HW bit set, the interrupt must also be active on the physical
+distributor.
+
+
+Forwarded LPIs
+--------------
+LPIs, introduced in GICv3, are always edge-triggered and do not have an
+active state. They become pending when a device signal them, and as
+soon as they are acked by the CPU, they are inactive again.
+
+It therefore doesn't make sense, and is not supported, to set the HW bit
+for physical LPIs that are forwarded to a VM as virtual interrupts,
+typically virtual SPIs.
+
+For LPIs, there is no other choice than to preempt the VCPU thread if
+necessary, and queue the pending state onto the LR.
+
+
+Putting It Together: The Architected Timer
+------------------------------------------
+The architected timer is a device that signals interrupts with level
+triggered semantics. The timer hardware is directly accessed by VCPUs
+which program the timer to fire at some point in time. Each VCPU on a
+system programs the timer to fire at different times, and therefore the
+hardware is multiplexed between multiple VCPUs. This is implemented by
+context-switching the timer state along with each VCPU thread.
+
+However, this means that a scenario like the following is entirely
+possible, and in fact, typical:
+
+1. KVM runs the VCPU
+2. The guest programs the time to fire in T+100
+3. The guest is idle and calls WFI (wait-for-interrupts)<