diff options
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/RCU/stallwarn.txt | 2 | ||||
-rw-r--r-- | Documentation/cpu-freq/governors.txt | 4 | ||||
-rw-r--r-- | Documentation/devicetree/bindings/input/cros-ec-keyb.txt | 72 | ||||
-rw-r--r-- | Documentation/devicetree/bindings/mfd/as3711.txt | 73 | ||||
-rw-r--r-- | Documentation/devicetree/bindings/mfd/cros-ec.txt | 56 | ||||
-rw-r--r-- | Documentation/devicetree/bindings/mfd/omap-usb-host.txt | 80 | ||||
-rw-r--r-- | Documentation/devicetree/bindings/mfd/omap-usb-tll.txt | 17 | ||||
-rw-r--r-- | Documentation/devicetree/bindings/sound/wm8994.txt | 58 | ||||
-rw-r--r-- | Documentation/kernel-parameters.txt | 8 | ||||
-rw-r--r-- | Documentation/s390/CommonIO | 12 | ||||
-rw-r--r-- | Documentation/timers/NO_HZ.txt | 273 | ||||
-rw-r--r-- | Documentation/virtual/kvm/api.txt | 146 | ||||
-rw-r--r-- | Documentation/virtual/kvm/devices/README | 1 | ||||
-rw-r--r-- | Documentation/virtual/kvm/devices/mpic.txt | 53 | ||||
-rw-r--r-- | Documentation/virtual/kvm/devices/xics.txt | 66 |
15 files changed, 906 insertions, 15 deletions
diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt index e38b8df3d727..8e9359de1d28 100644 --- a/Documentation/RCU/stallwarn.txt +++ b/Documentation/RCU/stallwarn.txt @@ -191,7 +191,7 @@ o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that o A hardware or software issue shuts off the scheduler-clock interrupt on a CPU that is not in dyntick-idle mode. This problem really has happened, and seems to be most likely to - result in RCU CPU stall warnings for CONFIG_NO_HZ=n kernels. + result in RCU CPU stall warnings for CONFIG_NO_HZ_COMMON=n kernels. o A bug in the RCU implementation. diff --git a/Documentation/cpu-freq/governors.txt b/Documentation/cpu-freq/governors.txt index 66f9cc310686..219970ba54b7 100644 --- a/Documentation/cpu-freq/governors.txt +++ b/Documentation/cpu-freq/governors.txt @@ -131,8 +131,8 @@ sampling_rate_min: The sampling rate is limited by the HW transition latency: transition_latency * 100 Or by kernel restrictions: -If CONFIG_NO_HZ is set, the limit is 10ms fixed. -If CONFIG_NO_HZ is not set or nohz=off boot parameter is used, the +If CONFIG_NO_HZ_COMMON is set, the limit is 10ms fixed. +If CONFIG_NO_HZ_COMMON is not set or nohz=off boot parameter is used, the limits depend on the CONFIG_HZ option: HZ=1000: min=20000us (20ms) HZ=250: min=80000us (80ms) diff --git a/Documentation/devicetree/bindings/input/cros-ec-keyb.txt b/Documentation/devicetree/bindings/input/cros-ec-keyb.txt new file mode 100644 index 000000000000..0f6355ce39b5 --- /dev/null +++ b/Documentation/devicetree/bindings/input/cros-ec-keyb.txt @@ -0,0 +1,72 @@ +ChromeOS EC Keyboard + +Google's ChromeOS EC Keyboard is a simple matrix keyboard implemented on +a separate EC (Embedded Controller) device. It provides a message for reading +key scans from the EC. These are then converted into keycodes for processing +by the kernel. + +This binding is based on matrix-keymap.txt and extends/modifies it as follows: + +Required properties: +- compatible: "google,cros-ec-keyb" + +Optional properties: +- google,needs-ghost-filter: True to enable a ghost filter for the matrix +keyboard. This is recommended if the EC does not have its own logic or +hardware for this. + + +Example: + +cros-ec-keyb { + compatible = "google,cros-ec-keyb"; + keypad,num-rows = <8>; + keypad,num-columns = <13>; + google,needs-ghost-filter; + /* + * Keymap entries take the form of 0xRRCCKKKK where + * RR=Row CC=Column KKKK=Key Code + * The values below are for a US keyboard layout and + * are taken from the Linux driver. Note that the + * 102ND key is not used for US keyboards. + */ + linux,keymap = < + /* CAPSLCK F1 B F10 */ + 0x0001003a 0x0002003b 0x00030030 0x00040044 + /* N = R_ALT ESC */ + 0x00060031 0x0008000d 0x000a0064 0x01010001 + /* F4 G F7 H */ + 0x0102003e 0x01030022 0x01040041 0x01060023 + /* ' F9 BKSPACE L_CTRL */ + 0x01080028 0x01090043 0x010b000e 0x0200001d + /* TAB F3 T F6 */ + 0x0201000f 0x0202003d 0x02030014 0x02040040 + /* ] Y 102ND [ */ + 0x0205001b 0x02060015 0x02070056 0x0208001a + /* F8 GRAVE F2 5 */ + 0x02090042 0x03010029 0x0302003c 0x03030006 + /* F5 6 - \ */ + 0x0304003f 0x03060007 0x0308000c 0x030b002b + /* R_CTRL A D F */ + 0x04000061 0x0401001e 0x04020020 0x04030021 + /* S K J ; */ + 0x0404001f 0x04050025 0x04060024 0x04080027 + /* L ENTER Z C */ + 0x04090026 0x040b001c 0x0501002c 0x0502002e + /* V X , M */ + 0x0503002f 0x0504002d 0x05050033 0x05060032 + /* L_SHIFT / . SPACE */ + 0x0507002a 0x05080035 0x05090034 0x050B0039 + /* 1 3 4 2 */ + 0x06010002 0x06020004 0x06030005 0x06040003 + /* 8 7 0 9 */ + 0x06050009 0x06060008 0x0608000b 0x0609000a + /* L_ALT DOWN RIGHT Q */ + 0x060a0038 0x060b006c 0x060c006a 0x07010010 + /* E R W I */ + 0x07020012 0x07030013 0x07040011 0x07050017 + /* U R_SHIFT P O */ + 0x07060016 0x07070036 0x07080019 0x07090018 + /* UP LEFT */ + 0x070b0067 0x070c0069>; +}; diff --git a/Documentation/devicetree/bindings/mfd/as3711.txt b/Documentation/devicetree/bindings/mfd/as3711.txt new file mode 100644 index 000000000000..d98cf18c721c --- /dev/null +++ b/Documentation/devicetree/bindings/mfd/as3711.txt @@ -0,0 +1,73 @@ +AS3711 is an I2C PMIC from Austria MicroSystems with multiple DCDC and LDO power +supplies, a battery charger and an RTC. So far only bindings for the two stepup +DCDC converters are defined. Other DCDC and LDO supplies are configured, using +standard regulator properties, they must belong to a sub-node, called +"regulators" and be called "sd1" to "sd4" and "ldo1" to "ldo8." Stepup converter +configuration should be placed in a subnode, called "backlight." + +Compulsory properties: +- compatible : must be "ams,as3711" +- reg : specifies the I2C address + +To use the SU1 converter as a backlight source the following two properties must +be provided: +- su1-dev : framebuffer phandle +- su1-max-uA : maximum current + +To use the SU2 converter as a backlight source the following two properties must +be provided: +- su2-dev : framebuffer phandle +- su1-max-uA : maximum current + +Additionally one of these properties must be provided to select the type of +feedback used: +- su2-feedback-voltage : voltage feedback is used +- su2-feedback-curr1 : CURR1 input used for current feedback +- su2-feedback-curr2 : CURR2 input used for current feedback +- su2-feedback-curr3 : CURR3 input used for current feedback +- su2-feedback-curr-auto: automatic current feedback selection + +and one of these to select the over-voltage protection pin +- su2-fbprot-lx-sd4 : LX_SD4 is used for over-voltage protection +- su2-fbprot-gpio2 : GPIO2 is used for over-voltage protection +- su2-fbprot-gpio3 : GPIO3 is used for over-voltage protection +- su2-fbprot-gpio4 : GPIO4 is used for over-voltage protection + +If "su2-feedback-curr-auto" is selected, one or more of the following properties +have to be specified: +- su2-auto-curr1 : use CURR1 input for current feedback +- su2-auto-curr2 : use CURR2 input for current feedback +- su2-auto-curr3 : use CURR3 input for current feedback + +Example: + +as3711@40 { + compatible = "ams,as3711"; + reg = <0x40>; + + regulators { + sd4 { + regulator-name = "1.215V"; + regulator-min-microvolt = <1215000>; + regulator-max-microvolt = <1235000>; + }; + ldo2 { + regulator-name = "2.8V CPU"; + regulator-min-microvolt = <2800000>; + regulator-max-microvolt = <2800000>; + regulator-always-on; + regulator-boot-on; + }; + }; + + backlight { + compatible = "ams,as3711-bl"; + su2-dev = <&lcdc>; + su2-max-uA = <36000>; + su2-feedback-curr-auto; + su2-fbprot-gpio4; + su2-auto-curr1; + su2-auto-curr2; + su2-auto-curr3; + }; +}; diff --git a/Documentation/devicetree/bindings/mfd/cros-ec.txt b/Documentation/devicetree/bindings/mfd/cros-ec.txt new file mode 100644 index 000000000000..e0e59c58a1f9 --- /dev/null +++ b/Documentation/devicetree/bindings/mfd/cros-ec.txt @@ -0,0 +1,56 @@ +ChromeOS Embedded Controller + +Google's ChromeOS EC is a Cortex-M device which talks to the AP and +implements various function such as keyboard and battery charging. + +The EC can be connect through various means (I2C, SPI, LPC) and the +compatible string used depends on the inteface. Each connection method has +its own driver which connects to the top level interface-agnostic EC driver. +Other Linux driver (such as cros-ec-keyb for the matrix keyboard) connect to +the top-level driver. + +Required properties (I2C): +- compatible: "google,cros-ec-i2c" +- reg: I2C slave address + +Required properties (SPI): +- compatible: "google,cros-ec-spi" +- reg: SPI chip select + +Required properties (LPC): +- compatible: "google,cros-ec-lpc" +- reg: List of (IO address, size) pairs defining the interface uses + + +Example for I2C: + +i2c@12CA0000 { + cros-ec@1e { + reg = <0x1e>; + compatible = "google,cros-ec-i2c"; + interrupts = <14 0>; + interrupt-parent = <&wakeup_eint>; + wakeup-source; + }; + + +Example for SPI: + +spi@131b0000 { + ec@0 { + compatible = "google,cros-ec-spi"; + reg = <0x0>; + interrupts = <14 0>; + interrupt-parent = <&wakeup_eint>; + wakeup-source; + spi-max-frequency = <5000000>; + controller-data { + cs-gpio = <&gpf0 3 4 3 0>; + samsung,spi-cs; + samsung,spi-feedback-delay = <2>; + }; + }; +}; + + +Example for LPC is not supplied as it is not yet implemented. diff --git a/Documentation/devicetree/bindings/mfd/omap-usb-host.txt b/Documentation/devicetree/bindings/mfd/omap-usb-host.txt new file mode 100644 index 000000000000..b381fa696bf9 --- /dev/null +++ b/Documentation/devicetree/bindings/mfd/omap-usb-host.txt @@ -0,0 +1,80 @@ +OMAP HS USB Host + +Required properties: + +- compatible: should be "ti,usbhs-host" +- reg: should contain one register range i.e. start and length +- ti,hwmods: must contain "usb_host_hs" + +Optional properties: + +- num-ports: number of USB ports. Usually this is automatically detected + from the IP's revision register but can be overridden by specifying + this property. A maximum of 3 ports are supported at the moment. + +- portN-mode: String specifying the port mode for port N, where N can be + from 1 to 3. If the port mode is not specified, that port is treated + as unused. When specified, it must be one of the following. + "ehci-phy", + "ehci-tll", + "ehci-hsic", + "ohci-phy-6pin-datse0", + "ohci-phy-6pin-dpdm", + "ohci-phy-3pin-datse0", + "ohci-phy-4pin-dpdm", + "ohci-tll-6pin-datse0", + "ohci-tll-6pin-dpdm", + "ohci-tll-3pin-datse0", + "ohci-tll-4pin-dpdm", + "ohci-tll-2pin-datse0", + "ohci-tll-2pin-dpdm", + +- single-ulpi-bypass: Must be present if the controller contains a single + ULPI bypass control bit. e.g. OMAP3 silicon <= ES2.1 + +Required properties if child node exists: + +- #address-cells: Must be 1 +- #size-cells: Must be 1 +- ranges: must be present + +Properties for children: + +The OMAP HS USB Host subsystem contains EHCI and OHCI controllers. +See Documentation/devicetree/bindings/usb/omap-ehci.txt and +omap3-ohci.txt + +Example for OMAP4: + +usbhshost: usbhshost@4a064000 { + compatible = "ti,usbhs-host"; + reg = <0x4a064000 0x800>; + ti,hwmods = "usb_host_hs"; + #address-cells = <1>; + #size-cells = <1>; + ranges; + + usbhsohci: ohci@4a064800 { + compatible = "ti,ohci-omap3", "usb-ohci"; + reg = <0x4a064800 0x400>; + interrupt-parent = <&gic>; + interrupts = <0 76 0x4>; + }; + + usbhsehci: ehci@4a064c00 { + compatible = "ti,ehci-omap", "usb-ehci"; + reg = <0x4a064c00 0x400>; + interrupt-parent = <&gic>; + interrupts = <0 77 0x4>; + }; +}; + +&usbhshost { + port1-mode = "ehci-phy"; + port2-mode = "ehci-tll"; + port3-mode = "ehci-phy"; +}; + +&usbhsehci { + phys = <&hsusb1_phy 0 &hsusb3_phy>; +}; diff --git a/Documentation/devicetree/bindings/mfd/omap-usb-tll.txt b/Documentation/devicetree/bindings/mfd/omap-usb-tll.txt new file mode 100644 index 000000000000..62fe69724e3b --- /dev/null +++ b/Documentation/devicetree/bindings/mfd/omap-usb-tll.txt @@ -0,0 +1,17 @@ +OMAP HS USB Host TLL (Transceiver-Less Interface) + +Required properties: + +- compatible : should be "ti,usbhs-tll" +- reg : should contain one register range i.e. start and length +- interrupts : should contain the TLL module's interrupt +- ti,hwmod : must contain "usb_tll_hs" + +Example: + + usbhstll: usbhstll@4a062000 { + compatible = "ti,usbhs-tll"; + reg = <0x4a062000 0x1000>; + interrupts = <78>; + ti,hwmods = "usb_tll_hs"; + }; diff --git a/Documentation/devicetree/bindings/sound/wm8994.txt b/Documentation/devicetree/bindings/sound/wm8994.txt index 7a7eb1e7bda6..f2f3e80934d2 100644 --- a/Documentation/devicetree/bindings/sound/wm8994.txt +++ b/Documentation/devicetree/bindings/sound/wm8994.txt @@ -5,14 +5,70 @@ on the board). Required properties: - - compatible : "wlf,wm1811", "wlf,wm8994", "wlf,wm8958" + - compatible : One of "wlf,wm1811", "wlf,wm8994" or "wlf,wm8958". - reg : the I2C address of the device for I2C, the chip select number for SPI. + - gpio-controller : Indicates this device is a GPIO controller. + - #gpio-cells : Must be 2. The first cell is the pin number and the + second cell is used to specify optional parameters (currently unused). + + - AVDD2-supply, DBVDD1-supply, DBVDD2-supply, DBVDD3-supply, CPVDD-supply, + SPKVDD1-supply, SPKVDD2-supply : power supplies for the device, as covered + in Documentation/devicetree/bindings/regulator/regulator.txt + +Optional properties: + + - interrupts : The interrupt line the IRQ signal for the device is + connected to. This is optional, if it is not connected then none + of the interrupt related properties should be specified. + - interrupt-controller : These devices contain interrupt controllers + and may provide interrupt services to other devices if they have an + interrupt line connected. + - interrupt-parent : The parent interrupt controller. + - #interrupt-cells: the number of cells to describe an IRQ, this should be 2. + The first cell is the IRQ number. + The second cell is the flags, encoded as the trigger masks from + Documentation/devicetree/bindings/interrupts.txt + + - wlf,gpio-cfg : A list of GPIO configuration register values. If absent, + no configuration of these registers is performed. If any value is + over 0xffff then the register will be left as default. If present 11 + values must be supplied. + + - wlf,micbias-cfg : Two MICBIAS register values for WM1811 or + WM8958. If absent the register defaults will be used. + + - wlf,ldo1ena : GPIO specifier for control of LDO1ENA input to device. + - wlf,ldo2ena : GPIO specifier for control of LDO2ENA input to device. + + - wlf,lineout1-se : If present LINEOUT1 is in single ended mode. + - wlf,lineout2-se : If present LINEOUT2 is in single ended mode. + + - wlf,lineout1-feedback : If present LINEOUT1 has common mode feedback + connected. + - wlf,lineout2-feedback : If present LINEOUT2 has common mode feedback + connected. + + - wlf,ldoena-always-driven : If present LDOENA is always driven. + Example: codec: wm8994@1a { compatible = "wlf,wm8994"; reg = <0x1a>; + + gpio-controller; + #gpio-cells = <2>; + + lineout1-se; + + AVDD2-supply = <®ulator>; + CPVDD-supply = <®ulator>; + DBVDD1-supply = <®ulator>; + DBVDD2-supply = <®ulator>; + DBVDD3-supply = <®ulator>; + SPKVDD1-supply = <®ulator>; + SPKVDD2-supply = <®ulator>; }; diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 9653cf2f9727..8920f9f5fa9e 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1964,6 +1964,14 @@ bytes respectively. Such letter suffixes can also be entirely omitted. Valid arguments: on, off Default: on + nohz_full= [KNL,BOOT] + In kernels built with CONFIG_NO_HZ_FULL=y, set + the specified list of CPUs whose tick will be stopped + whenever possible. The boot CPU will be forced outside + the range to maintain the timekeeping. + The CPUs in this range must also be included in the + rcu_nocbs= set. + noiotrap [SH] Disables trapped I/O port accesses. noirqdebug [X86-32] Disables the code which attempts to detect and diff --git a/Documentation/s390/CommonIO b/Documentation/s390/CommonIO index d378cba66456..6e0f63f343b4 100644 --- a/Documentation/s390/CommonIO +++ b/Documentation/s390/CommonIO @@ -8,9 +8,9 @@ Command line parameters Enable logging of debug information in case of ccw device timeouts. -* cio_ignore = {all} | - {<device> | <range of devices>} | - {!<device> | !<range of devices>} +* cio_ignore = device[,device[,..]] + + device := {all | [!]ipldev | [!]condev | [!]<devno> | [!]<devno>-<devno>} The given devices will be ignored by the common I/O-layer; no detection and device sensing will be done on any of those devices. The subchannel to @@ -24,8 +24,10 @@ Command line parameters device numbers (0xabcd or abcd, for 2.4 backward compatibility). If you give a device number 0xabcd, it will be interpreted as 0.0.abcd. - You can use the 'all' keyword to ignore all devices. - The '!' operator will cause the I/O-layer to _not_ ignore a device. + You can use the 'all' keyword to ignore all devices. The 'ipldev' and 'condev' + keywords can be used to refer to the CCW based boot device and CCW console + device respectively (these are probably useful only when combined with the '!' + operator). The '!' operator will cause the I/O-layer to _not_ ignore a device. The command line is parsed from left to right. For example, diff --git a/Documentation/timers/NO_HZ.txt b/Documentation/timers/NO_HZ.txt new file mode 100644 index 000000000000..5b5322024067 --- /dev/null +++ b/Documentation/timers/NO_HZ.txt @@ -0,0 +1,273 @@ + NO_HZ: Reducing Scheduling-Clock Ticks + + +This document describes Kconfig options and boot parameters that can +reduce the number of scheduling-clock interrupts, thereby improving energy +efficiency and reducing OS jitter. Reducing OS jitter is important for +some types of computationally intensive high-performance computing (HPC) +applications and for real-time applications. + +There are two main contexts in which the number of scheduling-clock +interrupts can be reduced compared to the old-school approach of sending +a scheduling-clock interrupt to all CPUs every jiffy whether they need +it or not (CONFIG_HZ_PERIODIC=y or CONFIG_NO_HZ=n for older kernels): + +1. Idle CPUs (CONFIG_NO_HZ_IDLE=y or CONFIG_NO_HZ=y for older kernels). + +2. CPUs having only one runnable task (CONFIG_NO_HZ_FULL=y). + +These two cases are described in the following two sections, followed +by a third section on RCU-specific considerations and a fourth and final +section listing known issues. + + +IDLE CPUs + +If a CPU is idle, there is little point in sending it a scheduling-clock +interrupt. After all, the primary purpose of a scheduling-clock interrupt +is to force a busy CPU to shift its attention among multiple duties, +and an idle CPU has no duties to shift its attention among. + +The CONFIG_NO_HZ_IDLE=y Kconfig option causes the kernel to avoid sending +scheduling-clock interrupts to idle CPUs, which is critically important +both to battery-powered devices and to highly virtualized mainframes. +A battery-powered device running a CONFIG_HZ_PERIODIC=y kernel would +drain its battery very quickly, easily 2-3 times as fast as would the +same device running a CONFIG_NO_HZ_IDLE=y kernel. A mainframe running +1,500 OS instances might find that half of its CPU time was consumed by +unnecessary scheduling-clock interrupts. In these situations, there +is strong motivation to avoid sending scheduling-clock interrupts to +idle CPUs. That said, dyntick-idle mode is not free: + +1. It increases the number of instructions executed on the path + to and from the idle loop. + +2. On many architectures, dyntick-idle mode also increases the + number of expensive clock-reprogramming operations. + +Therefore, systems with aggressive real-time response constraints often +run CONFIG_HZ_PERIODIC=y kernels (or CONFIG_NO_HZ=n for older kernels) +in order to avoid degrading from-idle transition latencies. + +An idle CPU that is not receiving scheduling-clock interrupts is said to +be "dyntick-idle", "in dyntick-idle mode", "in nohz mode", or "running +tickless". The remainder of this document will use "dyntick-idle mode". + +There is also a boot parameter "nohz=" that can be used to disable +dyntick-idle mode in CONFIG_NO_HZ_IDLE=y kernels by specifying "nohz=off". +By default, CONFIG_NO_HZ_IDLE=y kernels boot with "nohz=on", enabling +dyntick-idle mode. + + +CPUs WITH ONLY ONE RUNNABLE TASK + +If a CPU has only one runnable task, there is little point in sending it +a scheduling-clock interrupt because there is no other task to switch to. + +The CONFIG_NO_HZ_FULL=y Kconfig option causes the kernel to avoid +sending scheduling-clock interrupts to CPUs with a single runnable task, +and such CPUs are said to be "adaptive-ticks CPUs". This is important +for applications with aggressive real-time response constraints because +it allows them to improve their worst-case response times by the maximum +duration of a scheduling-clock interrupt. It is also important for +computationally intensive short-iteration workloads: If any CPU is +delayed during a given iteration, all the other CPUs will be forced to +wait idle while the delayed CPU finishes. Thus, the delay is multiplied +by one less than the number of CPUs. In these situations, there is +again strong motivation to avoid sending scheduling-clock interrupts. + +By default, no CPU will be an adaptive-ticks CPU. The "nohz_full=" +boot parameter specifies the adaptive-ticks CPUs. For example, +"nohz_full=1,6-8" says that CPUs 1, 6, 7, and 8 are to be adaptive-ticks +CPUs. Note that you are prohibited from marking all of the CPUs as +adaptive-tick CPUs: At least one non-adaptive-tick CPU must remain +online to handle timekeeping tasks in order to ensure that system calls +like gettimeofday() returns accurate values on adaptive-tick CPUs. +(This is not an issue for CONFIG_NO_HZ_IDLE=y because there are no +running user processes to observe slight drifts in clock rate.) +Therefore, the boot CPU is prohibited from entering adaptive-ticks +mode. Specifying a "nohz_full=" mask that includes the boot CPU will +result in a boot-time error message, and the boot CPU will be removed +from the mask. + +Alternatively, the CONFIG_NO_HZ_FULL_ALL=y Kconfig parameter specifies +that all CPUs other than the boot CPU are adaptive-ticks CPUs. This +Kconfig parameter will be overridden by the "nohz_full=" boot parameter, +so that if both the CONFIG_NO_HZ_FULL_ALL=y Kconfig parameter and +the "nohz_full=1" boot parameter is specified, the boot parameter will +prevail so that only CPU 1 will be an adaptive-ticks CPU. + +Finally, adaptive-ticks CPUs must have their RCU callbacks offloaded. +This is covered in the "RCU IMPLICATIONS" section below. + +Normally, a CPU remains in adaptive-ticks mode as long as possible. +In particular, transitioning to kernel mode does not automatically change +the mode. Instead, the CPU will exit adaptive-ticks mode only if needed, +for example, if that CPU enqueues an RCU callback. + +Just as with dyntick-idle mode, the benefits of adaptive-tick mode do +not come for free: + +1. CONFIG_NO_HZ_FULL selects CONFIG_NO_HZ_COMMON, so you cannot run + adaptive ticks without also running dyntick idle. This dependency + extends down into the implementation, so that all of the costs + of CONFIG_NO_HZ_IDLE are also incurred by CONFIG_NO_HZ_FULL. + +2. The user/kernel transitions are slightly more expensive due + to the need to inform kernel subsystems (such as RCU) about + the change in mode. + +3. POSIX CPU timers on adaptive-tick CPUs may miss their deadlines + (perhaps indefinitely) because they currently rely on + scheduling-tick interrupts. This will likely be fixed in + one of two ways: (1) Prevent CPUs with POSIX CPU timers from + entering adaptive-tick mode, or (2) Use hrtimers or other + adaptive-ticks-immune mechanism to cause the POSIX CPU timer to + fire properly. + +4. If there are more perf events pending than the hardware can + accommodate, they are normally round-robined so as to collect + all of them over time. Adaptive-tick mode may prevent this + round-robining from happening. This will likely be fixed by + preventing CPUs with large numbers of perf events pending from + entering adaptive-tick mode. + +5. Scheduler statistics for adaptive-tick CPUs may be computed + slightly differently than those for non-adaptive-tick CPUs. + This might in turn perturb load-balancing of real-time tasks. + +6. The LB_BIAS scheduler feature is disabled by adaptive ticks. + +Although improvements are expected over time, adaptive ticks is quite +useful for many types of real-time and compute-intensive applications. +However, the drawbacks listed above mean that adaptive ticks should not +(yet) be enabled by default. + + +RCU IMPLICATIONS + +There are situations in which idle CPUs cannot be permitted to +enter either dyntick-idle mode or adaptive-tick mode, the most +common being when that CPU has RCU callbacks pending. + +The CONFIG_RCU_FAST_NO_HZ=y Kconfig option may be used to cause such CPUs +to enter dyntick-idle mode or adaptive-tick mode anyway. In this case, +a timer will awaken these CPUs every four jiffies in order to ensure +that the RCU callbacks are processed in a timely fashion. + +Another approach is to offload RCU callback processing to "rcuo" kthreads +using the CONFIG_RCU_NOCB_CPU=y Kconfig option. The specific CPUs to +offload may be selected via several methods: + +1. One of three mutually exclusive Kconfig options specify a + build-time default for the CPUs to offload: + + a. The CONFIG_RCU_NOCB_CPU_NONE=y Kconfig option results in + no CPUs being offloaded. + + b. The CONFIG_RCU_NOCB_CPU_ZERO=y Kconfig option causes + CPU 0 to be offloaded. + + c. The CONFIG_RCU_NOCB_CPU_ALL=y Kconfig option causes all + CPUs to be offloaded. Note that the callbacks will be + offloaded to "rcuo" kthreads, and that those kthreads + will in fact run on some CPU. However, this approach + gives fine-grained control on exactly which CPUs the + callbacks run on, along with their scheduling priority + (including the default of SCHED_OTHER), and it further + allows this control to be varied dynamically at runtime. + +2. The "rcu_nocbs=" kernel boot parameter, which takes a comma-separated + list of CPUs and CPU ranges, for example, "1,3-5" selects CPUs 1, + 3, 4, and 5. The specified CPUs will be offloaded in addition to + any CPUs specified as offloaded by CONFIG_RCU_NOCB_CPU_ZERO=y or + CONFIG_RCU_NOCB_CPU_ALL=y. This means that the "rcu_nocbs=" boot + parameter has no effect for kernels built with RCU_NOCB_CPU_ALL=y. + +The offloaded CPUs will never queue RCU callbacks, and therefore RCU +never prevents offloaded CPUs from entering either dyntick-idle mode +or adaptive-tick mode. That said, note that it is up to userspace to +pin the "rcuo" kthreads to specific CPUs if desired. Otherwise, the +scheduler will decide where to run them, which might or might not be +where you want them to run. + + +KNOWN ISSUES + +o Dyntick-idle slows transitions to and from idle slightly. + In practice, this has not been a problem except for the most + aggressive real-time workloads, which have the option of disabling + dyntick-idle mode, an option that most of them take. However, + some workloads will no doubt want to use adaptive ticks to + eliminate scheduling-clock interrupt latencies. Here are some + options for these workloads: + + a. Use PMQOS from userspace to inform the kernel of your + latency requirements (preferred). + + b. On x86 systems, use the "idle=mwait" boot parameter. + + c. On x86 systems, use the "intel_idle.max_cstate=" to limit + ` the maximum C-state depth. + + d. On x86 systems, use the "idle=poll" boot parameter. + However, please note that use of this parameter can cause + your CPU to overheat, which may cause thermal throttling + to degrade your latencies -- and that this degradation can + be even worse than that of dyntick-idle. Furthermore, + this parameter effectively disables Turbo Mode on Intel + CPUs, which can significantly reduce maximum performance. + +o Adaptive-ticks slows user/kernel transitions slightly. + This is not expected to be a problem for computationally intensive + workloads, which have few such transitions. Careful benchmarking + will be required to determine whether or not other workloads + are significantly affected by this effect. + +o Adaptive-ticks does not do anything unless there is only one + runnable task for a given CPU, even though there are a number + of other situations where the scheduling-clock tick is not + needed. To give but one example, consider a CPU that has one + runnable high-priority SCHED_FIFO task and an arbitrary number + of low-priority SCHED_OTHER tasks. In this case, the CPU is + required to run the SCHED_FIFO task until it either blocks or + some other higher-priority task awakens on (or is assigned to) + this CPU, so there is no point in sending a scheduling-clock + interrupt to this CPU. However, the current implementation + nevertheless sends scheduling-clock interrupts to CPUs having a + single runnable SCHED_FIFO task and multiple runnable SCHED_OTHER + tasks, even though these interrupts are unnecessary. + + Better handling of these sorts of situations is future work. + +o A reboot is required to reconfigure both adaptive idle and RCU + callback offloading. Runtime reconfiguration could be provided + if needed, however, due to the complexity of reconfiguring RCU at + runtime, there would need to be an earthshakingly good reason. + Especially given that you have the straightforward option of + simply offloading RCU callbacks from all CPUs and pinning them + where you want them whenever you want them pinned. + +o Additional configuration is required to deal with other sources + of OS jitter, including interrupts and system-utility tasks + and processes. This configuration normally involves binding + interrupts and tasks to particular CPUs. + +o Some sources of OS jitter can currently be eliminated only by + constraining the workload. For example, the only way to eliminate + OS jitter due to global TLB shootdowns is to avoid the unmapping + operations (such as kernel module unload operations) that + result in these shootdowns. For another example, page faults + and TLB misses can be reduced (and in some cases eliminated) by + using huge pages and by constraining the amount of memory used + by the application. Pre-faulting the working set can also be + helpful, especially when combined with the mlock() and mlockall() + system calls. + +o Unless all CPUs are idle, at least one CPU must keep the + scheduling-clock interrupt going in order to support accurate + timekeeping. + +o If there are adaptive-ticks CPUs, there will be at least one + CPU keeping the scheduling-clock interrupt going, even if all + CPUs are otherwise idle. diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 119358dfb742..5f91eda91647 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1486,15 +1486,23 @@ struct kvm_ioeventfd { __u8 pad[36]; }; +For the special case of virtio-ccw devices on s390, the ioevent is matched +to a subchannel/virtqueue tuple instead. + The following flags are defined: #define KVM_IOEVENTFD_FLAG_DATAMATCH (1 << kvm_ioeventfd_flag_nr_datamatch) #define KVM_IOEVENTFD_FLAG_PIO (1 << kvm_ioeventfd_flag_nr_pio) #define KVM_IOEVENTFD_FLAG_DEASSIGN (1 << kvm_ioeventfd_flag_nr_deassign) +#define KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY \ + (1 << kvm_ioeventfd_flag_nr_virtio_ccw_notify) If datamatch flag is set, the event will be signaled only if the written value to the registered address is equal to datamatch in struct kvm_ioeventfd. +For virtio-ccw devices, addr contains the subchannel id and datamatch the +virtqueue index. + 4.60 KVM_DIRTY_TLB @@ -1780,27 +1788,48 @@ registers, find a list below: PPC | KVM_REG_PPC_VPA_DTL | 128 PPC | KVM_REG_PPC_EPCR | 32 PPC | KVM_REG_PPC_EPR | 32 + PPC | KVM_REG_PPC_TCR | 32 + PPC | KVM_REG_PPC_TSR | 32 + PPC | KVM_REG_PPC_OR_TSR | 32 + PPC | KVM_REG_PPC_CLEAR_TSR | 32 + PPC | KVM_REG_PPC_MAS0 | 32 + PPC | KVM_REG_PPC_MAS1 | 32 + PPC | KVM_REG_PPC_MAS2 | 64 + PPC | KVM_REG_PPC_MAS7_3 | 64 + PPC | KVM_REG_PPC_MAS4 | 32 + PPC | KVM_REG_PPC_MAS6 | 32 + PPC | KVM_REG_PPC_MMUCFG | 32 + PPC | KVM_REG_PPC_TLB0CFG | 32 + PPC | KVM_REG_PPC_TLB1CFG | 32 + PPC | KVM_REG_PPC_TLB2CFG | 32 + PPC | KVM_REG_PPC_TLB3CFG | 32 + PPC | KVM_REG_PPC_TLB0PS | 32 + PPC | KVM_REG_PPC_TLB1PS | 32 + PPC | KVM_REG_PPC_TLB2PS | 32 + PPC | KVM_REG_PPC_TLB3PS | 32 + PPC | KVM_REG_PPC_EPTCFG | 32 + PPC | KVM_REG_PPC_ICP_STATE | 64 ARM registers are mapped using the lower 32 bits. The upper 16 of that is the register group type, or coprocessor number: ARM core registers have the following id bit patterns: - 0x4002 0000 0010 <index into the kvm_regs struct:16> + 0x4020 0000 0010 <index into the kvm_regs struct:16> ARM 32-bit CP15 registers have the following id bit patterns: - 0x4002 0000 000F <zero:1> <crn:4> <crm:4> <opc1:4> <opc2:3> + 0x4020 0000 000F <zero:1> <crn:4> <crm:4> <opc1:4> <opc2:3> ARM 64-bit CP15 registers have the following id bit patterns: - 0x4003 0000 000F <zero: |