summaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/RCU/stallwarn.txt2
-rw-r--r--Documentation/cpu-freq/governors.txt4
-rw-r--r--Documentation/devicetree/bindings/input/cros-ec-keyb.txt72
-rw-r--r--Documentation/devicetree/bindings/mfd/as3711.txt73
-rw-r--r--Documentation/devicetree/bindings/mfd/cros-ec.txt56
-rw-r--r--Documentation/devicetree/bindings/mfd/omap-usb-host.txt80
-rw-r--r--Documentation/devicetree/bindings/mfd/omap-usb-tll.txt17
-rw-r--r--Documentation/devicetree/bindings/sound/wm8994.txt58
-rw-r--r--Documentation/kernel-parameters.txt8
-rw-r--r--Documentation/s390/CommonIO12
-rw-r--r--Documentation/timers/NO_HZ.txt273
-rw-r--r--Documentation/virtual/kvm/api.txt146
-rw-r--r--Documentation/virtual/kvm/devices/README1
-rw-r--r--Documentation/virtual/kvm/devices/mpic.txt53
-rw-r--r--Documentation/virtual/kvm/devices/xics.txt66
15 files changed, 906 insertions, 15 deletions
diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt
index e38b8df3d727..8e9359de1d28 100644
--- a/Documentation/RCU/stallwarn.txt
+++ b/Documentation/RCU/stallwarn.txt
@@ -191,7 +191,7 @@ o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that
o A hardware or software issue shuts off the scheduler-clock
interrupt on a CPU that is not in dyntick-idle mode. This
problem really has happened, and seems to be most likely to
- result in RCU CPU stall warnings for CONFIG_NO_HZ=n kernels.
+ result in RCU CPU stall warnings for CONFIG_NO_HZ_COMMON=n kernels.
o A bug in the RCU implementation.
diff --git a/Documentation/cpu-freq/governors.txt b/Documentation/cpu-freq/governors.txt
index 66f9cc310686..219970ba54b7 100644
--- a/Documentation/cpu-freq/governors.txt
+++ b/Documentation/cpu-freq/governors.txt
@@ -131,8 +131,8 @@ sampling_rate_min:
The sampling rate is limited by the HW transition latency:
transition_latency * 100
Or by kernel restrictions:
-If CONFIG_NO_HZ is set, the limit is 10ms fixed.
-If CONFIG_NO_HZ is not set or nohz=off boot parameter is used, the
+If CONFIG_NO_HZ_COMMON is set, the limit is 10ms fixed.
+If CONFIG_NO_HZ_COMMON is not set or nohz=off boot parameter is used, the
limits depend on the CONFIG_HZ option:
HZ=1000: min=20000us (20ms)
HZ=250: min=80000us (80ms)
diff --git a/Documentation/devicetree/bindings/input/cros-ec-keyb.txt b/Documentation/devicetree/bindings/input/cros-ec-keyb.txt
new file mode 100644
index 000000000000..0f6355ce39b5
--- /dev/null
+++ b/Documentation/devicetree/bindings/input/cros-ec-keyb.txt
@@ -0,0 +1,72 @@
+ChromeOS EC Keyboard
+
+Google's ChromeOS EC Keyboard is a simple matrix keyboard implemented on
+a separate EC (Embedded Controller) device. It provides a message for reading
+key scans from the EC. These are then converted into keycodes for processing
+by the kernel.
+
+This binding is based on matrix-keymap.txt and extends/modifies it as follows:
+
+Required properties:
+- compatible: "google,cros-ec-keyb"
+
+Optional properties:
+- google,needs-ghost-filter: True to enable a ghost filter for the matrix
+keyboard. This is recommended if the EC does not have its own logic or
+hardware for this.
+
+
+Example:
+
+cros-ec-keyb {
+ compatible = "google,cros-ec-keyb";
+ keypad,num-rows = <8>;
+ keypad,num-columns = <13>;
+ google,needs-ghost-filter;
+ /*
+ * Keymap entries take the form of 0xRRCCKKKK where
+ * RR=Row CC=Column KKKK=Key Code
+ * The values below are for a US keyboard layout and
+ * are taken from the Linux driver. Note that the
+ * 102ND key is not used for US keyboards.
+ */
+ linux,keymap = <
+ /* CAPSLCK F1 B F10 */
+ 0x0001003a 0x0002003b 0x00030030 0x00040044
+ /* N = R_ALT ESC */
+ 0x00060031 0x0008000d 0x000a0064 0x01010001
+ /* F4 G F7 H */
+ 0x0102003e 0x01030022 0x01040041 0x01060023
+ /* ' F9 BKSPACE L_CTRL */
+ 0x01080028 0x01090043 0x010b000e 0x0200001d
+ /* TAB F3 T F6 */
+ 0x0201000f 0x0202003d 0x02030014 0x02040040
+ /* ] Y 102ND [ */
+ 0x0205001b 0x02060015 0x02070056 0x0208001a
+ /* F8 GRAVE F2 5 */
+ 0x02090042 0x03010029 0x0302003c 0x03030006
+ /* F5 6 - \ */
+ 0x0304003f 0x03060007 0x0308000c 0x030b002b
+ /* R_CTRL A D F */
+ 0x04000061 0x0401001e 0x04020020 0x04030021
+ /* S K J ; */
+ 0x0404001f 0x04050025 0x04060024 0x04080027
+ /* L ENTER Z C */
+ 0x04090026 0x040b001c 0x0501002c 0x0502002e
+ /* V X , M */
+ 0x0503002f 0x0504002d 0x05050033 0x05060032
+ /* L_SHIFT / . SPACE */
+ 0x0507002a 0x05080035 0x05090034 0x050B0039
+ /* 1 3 4 2 */
+ 0x06010002 0x06020004 0x06030005 0x06040003
+ /* 8 7 0 9 */
+ 0x06050009 0x06060008 0x0608000b 0x0609000a
+ /* L_ALT DOWN RIGHT Q */
+ 0x060a0038 0x060b006c 0x060c006a 0x07010010
+ /* E R W I */
+ 0x07020012 0x07030013 0x07040011 0x07050017
+ /* U R_SHIFT P O */
+ 0x07060016 0x07070036 0x07080019 0x07090018
+ /* UP LEFT */
+ 0x070b0067 0x070c0069>;
+};
diff --git a/Documentation/devicetree/bindings/mfd/as3711.txt b/Documentation/devicetree/bindings/mfd/as3711.txt
new file mode 100644
index 000000000000..d98cf18c721c
--- /dev/null
+++ b/Documentation/devicetree/bindings/mfd/as3711.txt
@@ -0,0 +1,73 @@
+AS3711 is an I2C PMIC from Austria MicroSystems with multiple DCDC and LDO power
+supplies, a battery charger and an RTC. So far only bindings for the two stepup
+DCDC converters are defined. Other DCDC and LDO supplies are configured, using
+standard regulator properties, they must belong to a sub-node, called
+"regulators" and be called "sd1" to "sd4" and "ldo1" to "ldo8." Stepup converter
+configuration should be placed in a subnode, called "backlight."
+
+Compulsory properties:
+- compatible : must be "ams,as3711"
+- reg : specifies the I2C address
+
+To use the SU1 converter as a backlight source the following two properties must
+be provided:
+- su1-dev : framebuffer phandle
+- su1-max-uA : maximum current
+
+To use the SU2 converter as a backlight source the following two properties must
+be provided:
+- su2-dev : framebuffer phandle
+- su1-max-uA : maximum current
+
+Additionally one of these properties must be provided to select the type of
+feedback used:
+- su2-feedback-voltage : voltage feedback is used
+- su2-feedback-curr1 : CURR1 input used for current feedback
+- su2-feedback-curr2 : CURR2 input used for current feedback
+- su2-feedback-curr3 : CURR3 input used for current feedback
+- su2-feedback-curr-auto: automatic current feedback selection
+
+and one of these to select the over-voltage protection pin
+- su2-fbprot-lx-sd4 : LX_SD4 is used for over-voltage protection
+- su2-fbprot-gpio2 : GPIO2 is used for over-voltage protection
+- su2-fbprot-gpio3 : GPIO3 is used for over-voltage protection
+- su2-fbprot-gpio4 : GPIO4 is used for over-voltage protection
+
+If "su2-feedback-curr-auto" is selected, one or more of the following properties
+have to be specified:
+- su2-auto-curr1 : use CURR1 input for current feedback
+- su2-auto-curr2 : use CURR2 input for current feedback
+- su2-auto-curr3 : use CURR3 input for current feedback
+
+Example:
+
+as3711@40 {
+ compatible = "ams,as3711";
+ reg = <0x40>;
+
+ regulators {
+ sd4 {
+ regulator-name = "1.215V";
+ regulator-min-microvolt = <1215000>;
+ regulator-max-microvolt = <1235000>;
+ };
+ ldo2 {
+ regulator-name = "2.8V CPU";
+ regulator-min-microvolt = <2800000>;
+ regulator-max-microvolt = <2800000>;
+ regulator-always-on;
+ regulator-boot-on;
+ };
+ };
+
+ backlight {
+ compatible = "ams,as3711-bl";
+ su2-dev = <&lcdc>;
+ su2-max-uA = <36000>;
+ su2-feedback-curr-auto;
+ su2-fbprot-gpio4;
+ su2-auto-curr1;
+ su2-auto-curr2;
+ su2-auto-curr3;
+ };
+};
diff --git a/Documentation/devicetree/bindings/mfd/cros-ec.txt b/Documentation/devicetree/bindings/mfd/cros-ec.txt
new file mode 100644
index 000000000000..e0e59c58a1f9
--- /dev/null
+++ b/Documentation/devicetree/bindings/mfd/cros-ec.txt
@@ -0,0 +1,56 @@
+ChromeOS Embedded Controller
+
+Google's ChromeOS EC is a Cortex-M device which talks to the AP and
+implements various function such as keyboard and battery charging.
+
+The EC can be connect through various means (I2C, SPI, LPC) and the
+compatible string used depends on the inteface. Each connection method has
+its own driver which connects to the top level interface-agnostic EC driver.
+Other Linux driver (such as cros-ec-keyb for the matrix keyboard) connect to
+the top-level driver.
+
+Required properties (I2C):
+- compatible: "google,cros-ec-i2c"
+- reg: I2C slave address
+
+Required properties (SPI):
+- compatible: "google,cros-ec-spi"
+- reg: SPI chip select
+
+Required properties (LPC):
+- compatible: "google,cros-ec-lpc"
+- reg: List of (IO address, size) pairs defining the interface uses
+
+
+Example for I2C:
+
+i2c@12CA0000 {
+ cros-ec@1e {
+ reg = <0x1e>;
+ compatible = "google,cros-ec-i2c";
+ interrupts = <14 0>;
+ interrupt-parent = <&wakeup_eint>;
+ wakeup-source;
+ };
+
+
+Example for SPI:
+
+spi@131b0000 {
+ ec@0 {
+ compatible = "google,cros-ec-spi";
+ reg = <0x0>;
+ interrupts = <14 0>;
+ interrupt-parent = <&wakeup_eint>;
+ wakeup-source;
+ spi-max-frequency = <5000000>;
+ controller-data {
+ cs-gpio = <&gpf0 3 4 3 0>;
+ samsung,spi-cs;
+ samsung,spi-feedback-delay = <2>;
+ };
+ };
+};
+
+
+Example for LPC is not supplied as it is not yet implemented.
diff --git a/Documentation/devicetree/bindings/mfd/omap-usb-host.txt b/Documentation/devicetree/bindings/mfd/omap-usb-host.txt
new file mode 100644
index 000000000000..b381fa696bf9
--- /dev/null
+++ b/Documentation/devicetree/bindings/mfd/omap-usb-host.txt
@@ -0,0 +1,80 @@
+OMAP HS USB Host
+
+Required properties:
+
+- compatible: should be "ti,usbhs-host"
+- reg: should contain one register range i.e. start and length
+- ti,hwmods: must contain "usb_host_hs"
+
+Optional properties:
+
+- num-ports: number of USB ports. Usually this is automatically detected
+ from the IP's revision register but can be overridden by specifying
+ this property. A maximum of 3 ports are supported at the moment.
+
+- portN-mode: String specifying the port mode for port N, where N can be
+ from 1 to 3. If the port mode is not specified, that port is treated
+ as unused. When specified, it must be one of the following.
+ "ehci-phy",
+ "ehci-tll",
+ "ehci-hsic",
+ "ohci-phy-6pin-datse0",
+ "ohci-phy-6pin-dpdm",
+ "ohci-phy-3pin-datse0",
+ "ohci-phy-4pin-dpdm",
+ "ohci-tll-6pin-datse0",
+ "ohci-tll-6pin-dpdm",
+ "ohci-tll-3pin-datse0",
+ "ohci-tll-4pin-dpdm",
+ "ohci-tll-2pin-datse0",
+ "ohci-tll-2pin-dpdm",
+
+- single-ulpi-bypass: Must be present if the controller contains a single
+ ULPI bypass control bit. e.g. OMAP3 silicon <= ES2.1
+
+Required properties if child node exists:
+
+- #address-cells: Must be 1
+- #size-cells: Must be 1
+- ranges: must be present
+
+Properties for children:
+
+The OMAP HS USB Host subsystem contains EHCI and OHCI controllers.
+See Documentation/devicetree/bindings/usb/omap-ehci.txt and
+omap3-ohci.txt
+
+Example for OMAP4:
+
+usbhshost: usbhshost@4a064000 {
+ compatible = "ti,usbhs-host";
+ reg = <0x4a064000 0x800>;
+ ti,hwmods = "usb_host_hs";
+ #address-cells = <1>;
+ #size-cells = <1>;
+ ranges;
+
+ usbhsohci: ohci@4a064800 {
+ compatible = "ti,ohci-omap3", "usb-ohci";
+ reg = <0x4a064800 0x400>;
+ interrupt-parent = <&gic>;
+ interrupts = <0 76 0x4>;
+ };
+
+ usbhsehci: ehci@4a064c00 {
+ compatible = "ti,ehci-omap", "usb-ehci";
+ reg = <0x4a064c00 0x400>;
+ interrupt-parent = <&gic>;
+ interrupts = <0 77 0x4>;
+ };
+};
+
+&usbhshost {
+ port1-mode = "ehci-phy";
+ port2-mode = "ehci-tll";
+ port3-mode = "ehci-phy";
+};
+
+&usbhsehci {
+ phys = <&hsusb1_phy 0 &hsusb3_phy>;
+};
diff --git a/Documentation/devicetree/bindings/mfd/omap-usb-tll.txt b/Documentation/devicetree/bindings/mfd/omap-usb-tll.txt
new file mode 100644
index 000000000000..62fe69724e3b
--- /dev/null
+++ b/Documentation/devicetree/bindings/mfd/omap-usb-tll.txt
@@ -0,0 +1,17 @@
+OMAP HS USB Host TLL (Transceiver-Less Interface)
+
+Required properties:
+
+- compatible : should be "ti,usbhs-tll"
+- reg : should contain one register range i.e. start and length
+- interrupts : should contain the TLL module's interrupt
+- ti,hwmod : must contain "usb_tll_hs"
+
+Example:
+
+ usbhstll: usbhstll@4a062000 {
+ compatible = "ti,usbhs-tll";
+ reg = <0x4a062000 0x1000>;
+ interrupts = <78>;
+ ti,hwmods = "usb_tll_hs";
+ };
diff --git a/Documentation/devicetree/bindings/sound/wm8994.txt b/Documentation/devicetree/bindings/sound/wm8994.txt
index 7a7eb1e7bda6..f2f3e80934d2 100644
--- a/Documentation/devicetree/bindings/sound/wm8994.txt
+++ b/Documentation/devicetree/bindings/sound/wm8994.txt
@@ -5,14 +5,70 @@ on the board).
Required properties:
- - compatible : "wlf,wm1811", "wlf,wm8994", "wlf,wm8958"
+ - compatible : One of "wlf,wm1811", "wlf,wm8994" or "wlf,wm8958".
- reg : the I2C address of the device for I2C, the chip select
number for SPI.
+ - gpio-controller : Indicates this device is a GPIO controller.
+ - #gpio-cells : Must be 2. The first cell is the pin number and the
+ second cell is used to specify optional parameters (currently unused).
+
+ - AVDD2-supply, DBVDD1-supply, DBVDD2-supply, DBVDD3-supply, CPVDD-supply,
+ SPKVDD1-supply, SPKVDD2-supply : power supplies for the device, as covered
+ in Documentation/devicetree/bindings/regulator/regulator.txt
+
+Optional properties:
+
+ - interrupts : The interrupt line the IRQ signal for the device is
+ connected to. This is optional, if it is not connected then none
+ of the interrupt related properties should be specified.
+ - interrupt-controller : These devices contain interrupt controllers
+ and may provide interrupt services to other devices if they have an
+ interrupt line connected.
+ - interrupt-parent : The parent interrupt controller.
+ - #interrupt-cells: the number of cells to describe an IRQ, this should be 2.
+ The first cell is the IRQ number.
+ The second cell is the flags, encoded as the trigger masks from
+ Documentation/devicetree/bindings/interrupts.txt
+
+ - wlf,gpio-cfg : A list of GPIO configuration register values. If absent,
+ no configuration of these registers is performed. If any value is
+ over 0xffff then the register will be left as default. If present 11
+ values must be supplied.
+
+ - wlf,micbias-cfg : Two MICBIAS register values for WM1811 or
+ WM8958. If absent the register defaults will be used.
+
+ - wlf,ldo1ena : GPIO specifier for control of LDO1ENA input to device.
+ - wlf,ldo2ena : GPIO specifier for control of LDO2ENA input to device.
+
+ - wlf,lineout1-se : If present LINEOUT1 is in single ended mode.
+ - wlf,lineout2-se : If present LINEOUT2 is in single ended mode.
+
+ - wlf,lineout1-feedback : If present LINEOUT1 has common mode feedback
+ connected.
+ - wlf,lineout2-feedback : If present LINEOUT2 has common mode feedback
+ connected.
+
+ - wlf,ldoena-always-driven : If present LDOENA is always driven.
+
Example:
codec: wm8994@1a {
compatible = "wlf,wm8994";
reg = <0x1a>;
+
+ gpio-controller;
+ #gpio-cells = <2>;
+
+ lineout1-se;
+
+ AVDD2-supply = <&regulator>;
+ CPVDD-supply = <&regulator>;
+ DBVDD1-supply = <&regulator>;
+ DBVDD2-supply = <&regulator>;
+ DBVDD3-supply = <&regulator>;
+ SPKVDD1-supply = <&regulator>;
+ SPKVDD2-supply = <&regulator>;
};
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 9653cf2f9727..8920f9f5fa9e 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1964,6 +1964,14 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
Valid arguments: on, off
Default: on
+ nohz_full= [KNL,BOOT]
+ In kernels built with CONFIG_NO_HZ_FULL=y, set
+ the specified list of CPUs whose tick will be stopped
+ whenever possible. The boot CPU will be forced outside
+ the range to maintain the timekeeping.
+ The CPUs in this range must also be included in the
+ rcu_nocbs= set.
+
noiotrap [SH] Disables trapped I/O port accesses.
noirqdebug [X86-32] Disables the code which attempts to detect and
diff --git a/Documentation/s390/CommonIO b/Documentation/s390/CommonIO
index d378cba66456..6e0f63f343b4 100644
--- a/Documentation/s390/CommonIO
+++ b/Documentation/s390/CommonIO
@@ -8,9 +8,9 @@ Command line parameters
Enable logging of debug information in case of ccw device timeouts.
-* cio_ignore = {all} |
- {<device> | <range of devices>} |
- {!<device> | !<range of devices>}
+* cio_ignore = device[,device[,..]]
+
+ device := {all | [!]ipldev | [!]condev | [!]<devno> | [!]<devno>-<devno>}
The given devices will be ignored by the common I/O-layer; no detection
and device sensing will be done on any of those devices. The subchannel to
@@ -24,8 +24,10 @@ Command line parameters
device numbers (0xabcd or abcd, for 2.4 backward compatibility). If you
give a device number 0xabcd, it will be interpreted as 0.0.abcd.
- You can use the 'all' keyword to ignore all devices.
- The '!' operator will cause the I/O-layer to _not_ ignore a device.
+ You can use the 'all' keyword to ignore all devices. The 'ipldev' and 'condev'
+ keywords can be used to refer to the CCW based boot device and CCW console
+ device respectively (these are probably useful only when combined with the '!'
+ operator). The '!' operator will cause the I/O-layer to _not_ ignore a device.
The command line is parsed from left to right.
For example,
diff --git a/Documentation/timers/NO_HZ.txt b/Documentation/timers/NO_HZ.txt
new file mode 100644
index 000000000000..5b5322024067
--- /dev/null
+++ b/Documentation/timers/NO_HZ.txt
@@ -0,0 +1,273 @@
+ NO_HZ: Reducing Scheduling-Clock Ticks
+
+
+This document describes Kconfig options and boot parameters that can
+reduce the number of scheduling-clock interrupts, thereby improving energy
+efficiency and reducing OS jitter. Reducing OS jitter is important for
+some types of computationally intensive high-performance computing (HPC)
+applications and for real-time applications.
+
+There are two main contexts in which the number of scheduling-clock
+interrupts can be reduced compared to the old-school approach of sending
+a scheduling-clock interrupt to all CPUs every jiffy whether they need
+it or not (CONFIG_HZ_PERIODIC=y or CONFIG_NO_HZ=n for older kernels):
+
+1. Idle CPUs (CONFIG_NO_HZ_IDLE=y or CONFIG_NO_HZ=y for older kernels).
+
+2. CPUs having only one runnable task (CONFIG_NO_HZ_FULL=y).
+
+These two cases are described in the following two sections, followed
+by a third section on RCU-specific considerations and a fourth and final
+section listing known issues.
+
+
+IDLE CPUs
+
+If a CPU is idle, there is little point in sending it a scheduling-clock
+interrupt. After all, the primary purpose of a scheduling-clock interrupt
+is to force a busy CPU to shift its attention among multiple duties,
+and an idle CPU has no duties to shift its attention among.
+
+The CONFIG_NO_HZ_IDLE=y Kconfig option causes the kernel to avoid sending
+scheduling-clock interrupts to idle CPUs, which is critically important
+both to battery-powered devices and to highly virtualized mainframes.
+A battery-powered device running a CONFIG_HZ_PERIODIC=y kernel would
+drain its battery very quickly, easily 2-3 times as fast as would the
+same device running a CONFIG_NO_HZ_IDLE=y kernel. A mainframe running
+1,500 OS instances might find that half of its CPU time was consumed by
+unnecessary scheduling-clock interrupts. In these situations, there
+is strong motivation to avoid sending scheduling-clock interrupts to
+idle CPUs. That said, dyntick-idle mode is not free:
+
+1. It increases the number of instructions executed on the path
+ to and from the idle loop.
+
+2. On many architectures, dyntick-idle mode also increases the
+ number of expensive clock-reprogramming operations.
+
+Therefore, systems with aggressive real-time response constraints often
+run CONFIG_HZ_PERIODIC=y kernels (or CONFIG_NO_HZ=n for older kernels)
+in order to avoid degrading from-idle transition latencies.
+
+An idle CPU that is not receiving scheduling-clock interrupts is said to
+be "dyntick-idle", "in dyntick-idle mode", "in nohz mode", or "running
+tickless". The remainder of this document will use "dyntick-idle mode".
+
+There is also a boot parameter "nohz=" that can be used to disable
+dyntick-idle mode in CONFIG_NO_HZ_IDLE=y kernels by specifying "nohz=off".
+By default, CONFIG_NO_HZ_IDLE=y kernels boot with "nohz=on", enabling
+dyntick-idle mode.
+
+
+CPUs WITH ONLY ONE RUNNABLE TASK
+
+If a CPU has only one runnable task, there is little point in sending it
+a scheduling-clock interrupt because there is no other task to switch to.
+
+The CONFIG_NO_HZ_FULL=y Kconfig option causes the kernel to avoid
+sending scheduling-clock interrupts to CPUs with a single runnable task,
+and such CPUs are said to be "adaptive-ticks CPUs". This is important
+for applications with aggressive real-time response constraints because
+it allows them to improve their worst-case response times by the maximum
+duration of a scheduling-clock interrupt. It is also important for
+computationally intensive short-iteration workloads: If any CPU is
+delayed during a given iteration, all the other CPUs will be forced to
+wait idle while the delayed CPU finishes. Thus, the delay is multiplied
+by one less than the number of CPUs. In these situations, there is
+again strong motivation to avoid sending scheduling-clock interrupts.
+
+By default, no CPU will be an adaptive-ticks CPU. The "nohz_full="
+boot parameter specifies the adaptive-ticks CPUs. For example,
+"nohz_full=1,6-8" says that CPUs 1, 6, 7, and 8 are to be adaptive-ticks
+CPUs. Note that you are prohibited from marking all of the CPUs as
+adaptive-tick CPUs: At least one non-adaptive-tick CPU must remain
+online to handle timekeeping tasks in order to ensure that system calls
+like gettimeofday() returns accurate values on adaptive-tick CPUs.
+(This is not an issue for CONFIG_NO_HZ_IDLE=y because there are no
+running user processes to observe slight drifts in clock rate.)
+Therefore, the boot CPU is prohibited from entering adaptive-ticks
+mode. Specifying a "nohz_full=" mask that includes the boot CPU will
+result in a boot-time error message, and the boot CPU will be removed
+from the mask.
+
+Alternatively, the CONFIG_NO_HZ_FULL_ALL=y Kconfig parameter specifies
+that all CPUs other than the boot CPU are adaptive-ticks CPUs. This
+Kconfig parameter will be overridden by the "nohz_full=" boot parameter,
+so that if both the CONFIG_NO_HZ_FULL_ALL=y Kconfig parameter and
+the "nohz_full=1" boot parameter is specified, the boot parameter will
+prevail so that only CPU 1 will be an adaptive-ticks CPU.
+
+Finally, adaptive-ticks CPUs must have their RCU callbacks offloaded.
+This is covered in the "RCU IMPLICATIONS" section below.
+
+Normally, a CPU remains in adaptive-ticks mode as long as possible.
+In particular, transitioning to kernel mode does not automatically change
+the mode. Instead, the CPU will exit adaptive-ticks mode only if needed,
+for example, if that CPU enqueues an RCU callback.
+
+Just as with dyntick-idle mode, the benefits of adaptive-tick mode do
+not come for free:
+
+1. CONFIG_NO_HZ_FULL selects CONFIG_NO_HZ_COMMON, so you cannot run
+ adaptive ticks without also running dyntick idle. This dependency
+ extends down into the implementation, so that all of the costs
+ of CONFIG_NO_HZ_IDLE are also incurred by CONFIG_NO_HZ_FULL.
+
+2. The user/kernel transitions are slightly more expensive due
+ to the need to inform kernel subsystems (such as RCU) about
+ the change in mode.
+
+3. POSIX CPU timers on adaptive-tick CPUs may miss their deadlines
+ (perhaps indefinitely) because they currently rely on
+ scheduling-tick interrupts. This will likely be fixed in
+ one of two ways: (1) Prevent CPUs with POSIX CPU timers from
+ entering adaptive-tick mode, or (2) Use hrtimers or other
+ adaptive-ticks-immune mechanism to cause the POSIX CPU timer to
+ fire properly.
+
+4. If there are more perf events pending than the hardware can
+ accommodate, they are normally round-robined so as to collect
+ all of them over time. Adaptive-tick mode may prevent this
+ round-robining from happening. This will likely be fixed by
+ preventing CPUs with large numbers of perf events pending from
+ entering adaptive-tick mode.
+
+5. Scheduler statistics for adaptive-tick CPUs may be computed
+ slightly differently than those for non-adaptive-tick CPUs.
+ This might in turn perturb load-balancing of real-time tasks.
+
+6. The LB_BIAS scheduler feature is disabled by adaptive ticks.
+
+Although improvements are expected over time, adaptive ticks is quite
+useful for many types of real-time and compute-intensive applications.
+However, the drawbacks listed above mean that adaptive ticks should not
+(yet) be enabled by default.
+
+
+RCU IMPLICATIONS
+
+There are situations in which idle CPUs cannot be permitted to
+enter either dyntick-idle mode or adaptive-tick mode, the most
+common being when that CPU has RCU callbacks pending.
+
+The CONFIG_RCU_FAST_NO_HZ=y Kconfig option may be used to cause such CPUs
+to enter dyntick-idle mode or adaptive-tick mode anyway. In this case,
+a timer will awaken these CPUs every four jiffies in order to ensure
+that the RCU callbacks are processed in a timely fashion.
+
+Another approach is to offload RCU callback processing to "rcuo" kthreads
+using the CONFIG_RCU_NOCB_CPU=y Kconfig option. The specific CPUs to
+offload may be selected via several methods:
+
+1. One of three mutually exclusive Kconfig options specify a
+ build-time default for the CPUs to offload:
+
+ a. The CONFIG_RCU_NOCB_CPU_NONE=y Kconfig option results in
+ no CPUs being offloaded.
+
+ b. The CONFIG_RCU_NOCB_CPU_ZERO=y Kconfig option causes
+ CPU 0 to be offloaded.
+
+ c. The CONFIG_RCU_NOCB_CPU_ALL=y Kconfig option causes all
+ CPUs to be offloaded. Note that the callbacks will be
+ offloaded to "rcuo" kthreads, and that those kthreads
+ will in fact run on some CPU. However, this approach
+ gives fine-grained control on exactly which CPUs the
+ callbacks run on, along with their scheduling priority
+ (including the default of SCHED_OTHER), and it further
+ allows this control to be varied dynamically at runtime.
+
+2. The "rcu_nocbs=" kernel boot parameter, which takes a comma-separated
+ list of CPUs and CPU ranges, for example, "1,3-5" selects CPUs 1,
+ 3, 4, and 5. The specified CPUs will be offloaded in addition to
+ any CPUs specified as offloaded by CONFIG_RCU_NOCB_CPU_ZERO=y or
+ CONFIG_RCU_NOCB_CPU_ALL=y. This means that the "rcu_nocbs=" boot
+ parameter has no effect for kernels built with RCU_NOCB_CPU_ALL=y.
+
+The offloaded CPUs will never queue RCU callbacks, and therefore RCU
+never prevents offloaded CPUs from entering either dyntick-idle mode
+or adaptive-tick mode. That said, note that it is up to userspace to
+pin the "rcuo" kthreads to specific CPUs if desired. Otherwise, the
+scheduler will decide where to run them, which might or might not be
+where you want them to run.
+
+
+KNOWN ISSUES
+
+o Dyntick-idle slows transitions to and from idle slightly.
+ In practice, this has not been a problem except for the most
+ aggressive real-time workloads, which have the option of disabling
+ dyntick-idle mode, an option that most of them take. However,
+ some workloads will no doubt want to use adaptive ticks to
+ eliminate scheduling-clock interrupt latencies. Here are some
+ options for these workloads:
+
+ a. Use PMQOS from userspace to inform the kernel of your
+ latency requirements (preferred).
+
+ b. On x86 systems, use the "idle=mwait" boot parameter.
+
+ c. On x86 systems, use the "intel_idle.max_cstate=" to limit
+ ` the maximum C-state depth.
+
+ d. On x86 systems, use the "idle=poll" boot parameter.
+ However, please note that use of this parameter can cause
+ your CPU to overheat, which may cause thermal throttling
+ to degrade your latencies -- and that this degradation can
+ be even worse than that of dyntick-idle. Furthermore,
+ this parameter effectively disables Turbo Mode on Intel
+ CPUs, which can significantly reduce maximum performance.
+
+o Adaptive-ticks slows user/kernel transitions slightly.
+ This is not expected to be a problem for computationally intensive
+ workloads, which have few such transitions. Careful benchmarking
+ will be required to determine whether or not other workloads
+ are significantly affected by this effect.
+
+o Adaptive-ticks does not do anything unless there is only one
+ runnable task for a given CPU, even though there are a number
+ of other situations where the scheduling-clock tick is not
+ needed. To give but one example, consider a CPU that has one
+ runnable high-priority SCHED_FIFO task and an arbitrary number
+ of low-priority SCHED_OTHER tasks. In this case, the CPU is
+ required to run the SCHED_FIFO task until it either blocks or
+ some other higher-priority task awakens on (or is assigned to)
+ this CPU, so there is no point in sending a scheduling-clock
+ interrupt to this CPU. However, the current implementation
+ nevertheless sends scheduling-clock interrupts to CPUs having a
+ single runnable SCHED_FIFO task and multiple runnable SCHED_OTHER
+ tasks, even though these interrupts are unnecessary.
+
+ Better handling of these sorts of situations is future work.
+
+o A reboot is required to reconfigure both adaptive idle and RCU
+ callback offloading. Runtime reconfiguration could be provided
+ if needed, however, due to the complexity of reconfiguring RCU at
+ runtime, there would need to be an earthshakingly good reason.
+ Especially given that you have the straightforward option of
+ simply offloading RCU callbacks from all CPUs and pinning them
+ where you want them whenever you want them pinned.
+
+o Additional configuration is required to deal with other sources
+ of OS jitter, including interrupts and system-utility tasks
+ and processes. This configuration normally involves binding
+ interrupts and tasks to particular CPUs.
+
+o Some sources of OS jitter can currently be eliminated only by
+ constraining the workload. For example, the only way to eliminate
+ OS jitter due to global TLB shootdowns is to avoid the unmapping
+ operations (such as kernel module unload operations) that
+ result in these shootdowns. For another example, page faults
+ and TLB misses can be reduced (and in some cases eliminated) by
+ using huge pages and by constraining the amount of memory used
+ by the application. Pre-faulting the working set can also be
+ helpful, especially when combined with the mlock() and mlockall()
+ system calls.
+
+o Unless all CPUs are idle, at least one CPU must keep the
+ scheduling-clock interrupt going in order to support accurate
+ timekeeping.
+
+o If there are adaptive-ticks CPUs, there will be at least one
+ CPU keeping the scheduling-clock interrupt going, even if all
+ CPUs are otherwise idle.
diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 119358dfb742..5f91eda91647 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1486,15 +1486,23 @@ struct kvm_ioeventfd {
__u8 pad[36];
};
+For the special case of virtio-ccw devices on s390, the ioevent is matched
+to a subchannel/virtqueue tuple instead.
+
The following flags are defined:
#define KVM_IOEVENTFD_FLAG_DATAMATCH (1 << kvm_ioeventfd_flag_nr_datamatch)
#define KVM_IOEVENTFD_FLAG_PIO (1 << kvm_ioeventfd_flag_nr_pio)
#define KVM_IOEVENTFD_FLAG_DEASSIGN (1 << kvm_ioeventfd_flag_nr_deassign)
+#define KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY \
+ (1 << kvm_ioeventfd_flag_nr_virtio_ccw_notify)
If datamatch flag is set, the event will be signaled only if the written value
to the registered address is equal to datamatch in struct kvm_ioeventfd.
+For virtio-ccw devices, addr contains the subchannel id and datamatch the
+virtqueue index.
+
4.60 KVM_DIRTY_TLB
@@ -1780,27 +1788,48 @@ registers, find a list below:
PPC | KVM_REG_PPC_VPA_DTL | 128
PPC | KVM_REG_PPC_EPCR | 32
PPC | KVM_REG_PPC_EPR | 32
+ PPC | KVM_REG_PPC_TCR | 32
+ PPC | KVM_REG_PPC_TSR | 32
+ PPC | KVM_REG_PPC_OR_TSR | 32
+ PPC | KVM_REG_PPC_CLEAR_TSR | 32
+ PPC | KVM_REG_PPC_MAS0 | 32
+ PPC | KVM_REG_PPC_MAS1 | 32
+ PPC | KVM_REG_PPC_MAS2 | 64
+ PPC | KVM_REG_PPC_MAS7_3 | 64
+ PPC | KVM_REG_PPC_MAS4 | 32
+ PPC | KVM_REG_PPC_MAS6 | 32
+ PPC | KVM_REG_PPC_MMUCFG | 32
+ PPC | KVM_REG_PPC_TLB0CFG | 32
+ PPC | KVM_REG_PPC_TLB1CFG | 32
+ PPC | KVM_REG_PPC_TLB2CFG | 32
+ PPC | KVM_REG_PPC_TLB3CFG | 32
+ PPC | KVM_REG_PPC_TLB0PS | 32
+ PPC | KVM_REG_PPC_TLB1PS | 32
+ PPC | KVM_REG_PPC_TLB2PS | 32
+ PPC | KVM_REG_PPC_TLB3PS | 32
+ PPC | KVM_REG_PPC_EPTCFG | 32
+ PPC | KVM_REG_PPC_ICP_STATE | 64
ARM registers are mapped using the lower 32 bits. The upper 16 of that
is the register group type, or coprocessor number:
ARM core registers have the following id bit patterns:
- 0x4002 0000 0010 <index into the kvm_regs struct:16>
+ 0x4020 0000 0010 <index into the kvm_regs struct:16>
ARM 32-bit CP15 registers have the following id bit patterns:
- 0x4002 0000 000F <zero:1> <crn:4> <crm:4> <opc1:4> <opc2:3>
+ 0x4020 0000 000F <zero:1> <crn:4> <crm:4> <opc1:4> <opc2:3>
ARM 64-bit CP15 registers have the following id bit patterns:
- 0x4003 0000 000F <zero: