summaryrefslogtreecommitdiffstats
path: root/Documentation/driver-api
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2019-09-17 16:22:26 -0700
committerLinus Torvalds <torvalds@linux-foundation.org>2019-09-17 16:22:26 -0700
commit7c672abc120a55f678e5571ae2ee93f06ca4d7f9 (patch)
tree7beebc09f9626ca8d5f7df4dded0a553de479323 /Documentation/driver-api
parent1902314157b19754e0ff25b44527654847cfd127 (diff)
parentfe013f8bc160d79c6e33bb66d9bb0cd24949274c (diff)
Merge tag 'docs-5.4' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet: "It's a somewhat calmer cycle for docs this time, as the churn of the mass RST conversion is happily mostly behind us. - A new document on reproducible builds. - We finally got around to zapping the documentation for hardware support that was removed in 2004; one doesn't want to rush these things. - The usual assortment of fixes, typo corrections, etc" * tag 'docs-5.4' of git://git.lwn.net/linux: (67 commits) Documentation: kbuild: Add document about reproducible builds docs: printk-formats: Stop encouraging use of unnecessary %h[xudi] and %hh[xudi] Documentation: Add "earlycon=sbi" to the admin guide doc:lock: remove reference to clever use of read-write lock devices.txt: improve entry for comedi (char major 98) docs: mtd: Update spi nor reference driver doc: arm64: fix grammar dtb placed in no attributes region Documentation: sysrq: don't recommend 'S' 'U' before 'B' mailmap: Update email address for Quentin Perret docs: ftrace: clarify when tracing is disabled by the trace file docs: process: fix broken link Documentation/arm/samsung-s3c24xx: Remove stray U+FEFF character to fix title Documentation/arm/sa1100/assabet: Fix 'make assabet_defconfig' command Documentation/arm/sa1100: Remove some obsolete documentation docs/zh_CN: update Chinese howto.rst for latexdocs making Documentation: virt: Fix broken reference to virt tree's index docs: Fix typo on pull requests guide kernel-doc: Allow anonymous enum Documentation: sphinx: Don't parse socket() as identifier reference Documentation: sphinx: Add missing comma to list of strings ...
Diffstat (limited to 'Documentation/driver-api')
-rw-r--r--Documentation/driver-api/dmaengine/index.rst2
-rw-r--r--Documentation/driver-api/index.rst2
-rw-r--r--Documentation/driver-api/ipmb.rst2
-rw-r--r--Documentation/driver-api/mtd/spi-nor.rst2
-rw-r--r--Documentation/driver-api/soundwire/index.rst2
-rw-r--r--Documentation/driver-api/thermal/cpu-cooling-api.rst107
-rw-r--r--Documentation/driver-api/thermal/exynos_thermal.rst90
-rw-r--r--Documentation/driver-api/thermal/exynos_thermal_emulation.rst61
-rw-r--r--Documentation/driver-api/thermal/index.rst18
-rw-r--r--Documentation/driver-api/thermal/intel_powerclamp.rst320
-rw-r--r--Documentation/driver-api/thermal/nouveau_thermal.rst96
-rw-r--r--Documentation/driver-api/thermal/power_allocator.rst271
-rw-r--r--Documentation/driver-api/thermal/sysfs-api.rst798
-rw-r--r--Documentation/driver-api/thermal/x86_pkg_temperature_thermal.rst55
14 files changed, 1822 insertions, 4 deletions
diff --git a/Documentation/driver-api/dmaengine/index.rst b/Documentation/driver-api/dmaengine/index.rst
index 3026fa975937..b9df904d0a79 100644
--- a/Documentation/driver-api/dmaengine/index.rst
+++ b/Documentation/driver-api/dmaengine/index.rst
@@ -47,7 +47,7 @@ This book adds some notes about PXA DMA
pxa_dma
-.. only:: subproject
+.. only:: subproject and html
Indices
=======
diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst
index d12a80f386a6..38e638abe3eb 100644
--- a/Documentation/driver-api/index.rst
+++ b/Documentation/driver-api/index.rst
@@ -65,6 +65,7 @@ available subsections can be seen below.
dmaengine/index
slimbus
soundwire/index
+ thermal/index
fpga/index
acpi/index
backlight/lp855x-driver.rst
@@ -75,6 +76,7 @@ available subsections can be seen below.
dell_rbu
edid
eisa
+ ipmb
isa
isapnp
generic-counter
diff --git a/Documentation/driver-api/ipmb.rst b/Documentation/driver-api/ipmb.rst
index 7e2265144157..3ec3baed84c4 100644
--- a/Documentation/driver-api/ipmb.rst
+++ b/Documentation/driver-api/ipmb.rst
@@ -83,7 +83,7 @@ Instantiate the device
----------------------
After loading the driver, you can instantiate the device as
-described in 'Documentation/i2c/instantiating-devices'.
+described in 'Documentation/i2c/instantiating-devices.rst'.
If you have multiple BMCs, each connected to your Satellite MC via
a different I2C bus, you can instantiate a device for each of
those BMCs.
diff --git a/Documentation/driver-api/mtd/spi-nor.rst b/Documentation/driver-api/mtd/spi-nor.rst
index f5333e3bf486..1f0437676762 100644
--- a/Documentation/driver-api/mtd/spi-nor.rst
+++ b/Documentation/driver-api/mtd/spi-nor.rst
@@ -59,7 +59,7 @@ Part III - How can drivers use the framework?
The main API is spi_nor_scan(). Before you call the hook, a driver should
initialize the necessary fields for spi_nor{}. Please see
-drivers/mtd/spi-nor/spi-nor.c for detail. Please also refer to fsl-quadspi.c
+drivers/mtd/spi-nor/spi-nor.c for detail. Please also refer to spi-fsl-qspi.c
when you want to write a new driver for a SPI NOR controller.
Another API is spi_nor_restore(), this is used to restore the status of SPI
flash chip such as addressing mode. Call it whenever detach the driver from
diff --git a/Documentation/driver-api/soundwire/index.rst b/Documentation/driver-api/soundwire/index.rst
index 6db026028f27..234911a0db99 100644
--- a/Documentation/driver-api/soundwire/index.rst
+++ b/Documentation/driver-api/soundwire/index.rst
@@ -10,7 +10,7 @@ SoundWire Documentation
error_handling
locking
-.. only:: subproject
+.. only:: subproject and html
Indices
=======
diff --git a/Documentation/driver-api/thermal/cpu-cooling-api.rst b/Documentation/driver-api/thermal/cpu-cooling-api.rst
new file mode 100644
index 000000000000..645d914c45a6
--- /dev/null
+++ b/Documentation/driver-api/thermal/cpu-cooling-api.rst
@@ -0,0 +1,107 @@
+=======================
+CPU cooling APIs How To
+=======================
+
+Written by Amit Daniel Kachhap <amit.kachhap@linaro.org>
+
+Updated: 6 Jan 2015
+
+Copyright (c) 2012 Samsung Electronics Co., Ltd(http://www.samsung.com)
+
+0. Introduction
+===============
+
+The generic cpu cooling(freq clipping) provides registration/unregistration APIs
+to the caller. The binding of the cooling devices to the trip point is left for
+the user. The registration APIs returns the cooling device pointer.
+
+1. cpu cooling APIs
+===================
+
+1.1 cpufreq registration/unregistration APIs
+--------------------------------------------
+
+ ::
+
+ struct thermal_cooling_device
+ *cpufreq_cooling_register(struct cpumask *clip_cpus)
+
+ This interface function registers the cpufreq cooling device with the name
+ "thermal-cpufreq-%x". This api can support multiple instances of cpufreq
+ cooling devices.
+
+ clip_cpus:
+ cpumask of cpus where the frequency constraints will happen.
+
+ ::
+
+ struct thermal_cooling_device
+ *of_cpufreq_cooling_register(struct cpufreq_policy *policy)
+
+ This interface function registers the cpufreq cooling device with
+ the name "thermal-cpufreq-%x" linking it with a device tree node, in
+ order to bind it via the thermal DT code. This api can support multiple
+ instances of cpufreq cooling devices.
+
+ policy:
+ CPUFreq policy.
+
+
+ ::
+
+ void cpufreq_cooling_unregister(struct thermal_cooling_device *cdev)
+
+ This interface function unregisters the "thermal-cpufreq-%x" cooling device.
+
+ cdev: Cooling device pointer which has to be unregistered.
+
+2. Power models
+===============
+
+The power API registration functions provide a simple power model for
+CPUs. The current power is calculated as dynamic power (static power isn't
+supported currently). This power model requires that the operating-points of
+the CPUs are registered using the kernel's opp library and the
+`cpufreq_frequency_table` is assigned to the `struct device` of the
+cpu. If you are using CONFIG_CPUFREQ_DT then the
+`cpufreq_frequency_table` should already be assigned to the cpu
+device.
+
+The dynamic power consumption of a processor depends on many factors.
+For a given processor implementation the primary factors are:
+
+- The time the processor spends running, consuming dynamic power, as
+ compared to the time in idle states where dynamic consumption is
+ negligible. Herein we refer to this as 'utilisation'.
+- The voltage and frequency levels as a result of DVFS. The DVFS
+ level is a dominant factor governing power consumption.
+- In running time the 'execution' behaviour (instruction types, memory
+ access patterns and so forth) causes, in most cases, a second order
+ variation. In pathological cases this variation can be significant,
+ but typically it is of a much lesser impact than the factors above.
+
+A high level dynamic power consumption model may then be represented as::
+
+ Pdyn = f(run) * Voltage^2 * Frequency * Utilisation
+
+f(run) here represents the described execution behaviour and its
+result has a units of Watts/Hz/Volt^2 (this often expressed in
+mW/MHz/uVolt^2)
+
+The detailed behaviour for f(run) could be modelled on-line. However,
+in practice, such an on-line model has dependencies on a number of
+implementation specific processor support and characterisation
+factors. Therefore, in initial implementation that contribution is
+represented as a constant coefficient. This is a simplification
+consistent with the relative contribution to overall power variation.
+
+In this simplified representation our model becomes::
+
+ Pdyn = Capacitance * Voltage^2 * Frequency * Utilisation
+
+Where `capacitance` is a constant that represents an indicative
+running time dynamic power coefficient in fundamental units of
+mW/MHz/uVolt^2. Typical values for mobile CPUs might lie in range
+from 100 to 500. For reference, the approximate values for the SoC in
+ARM's Juno Development Platform are 530 for the Cortex-A57 cluster and
+140 for the Cortex-A53 cluster.
diff --git a/Documentation/driver-api/thermal/exynos_thermal.rst b/Documentation/driver-api/thermal/exynos_thermal.rst
new file mode 100644
index 000000000000..5bd556566c70
--- /dev/null
+++ b/Documentation/driver-api/thermal/exynos_thermal.rst
@@ -0,0 +1,90 @@
+========================
+Kernel driver exynos_tmu
+========================
+
+Supported chips:
+
+* ARM SAMSUNG EXYNOS4, EXYNOS5 series of SoC
+
+ Datasheet: Not publicly available
+
+Authors: Donggeun Kim <dg77.kim@samsung.com>
+Authors: Amit Daniel <amit.daniel@samsung.com>
+
+TMU controller Description:
+---------------------------
+
+This driver allows to read temperature inside SAMSUNG EXYNOS4/5 series of SoC.
+
+The chip only exposes the measured 8-bit temperature code value
+through a register.
+Temperature can be taken from the temperature code.
+There are three equations converting from temperature to temperature code.
+
+The three equations are:
+ 1. Two point trimming::
+
+ Tc = (T - 25) * (TI2 - TI1) / (85 - 25) + TI1
+
+ 2. One point trimming::
+
+ Tc = T + TI1 - 25
+
+ 3. No trimming::
+
+ Tc = T + 50
+
+ Tc:
+ Temperature code, T: Temperature,
+ TI1:
+ Trimming info for 25 degree Celsius (stored at TRIMINFO register)
+ Temperature code measured at 25 degree Celsius which is unchanged
+ TI2:
+ Trimming info for 85 degree Celsius (stored at TRIMINFO register)
+ Temperature code measured at 85 degree Celsius which is unchanged
+
+TMU(Thermal Management Unit) in EXYNOS4/5 generates interrupt
+when temperature exceeds pre-defined levels.
+The maximum number of configurable threshold is five.
+The threshold levels are defined as follows::
+
+ Level_0: current temperature > trigger_level_0 + threshold
+ Level_1: current temperature > trigger_level_1 + threshold
+ Level_2: current temperature > trigger_level_2 + threshold
+ Level_3: current temperature > trigger_level_3 + threshold
+
+The threshold and each trigger_level are set
+through the corresponding registers.
+
+When an interrupt occurs, this driver notify kernel thermal framework
+with the function exynos_report_trigger.
+Although an interrupt condition for level_0 can be set,
+it can be used to synchronize the cooling action.
+
+TMU driver description:
+-----------------------
+
+The exynos thermal driver is structured as::
+
+ Kernel Core thermal framework
+ (thermal_core.c, step_wise.c, cpu_cooling.c)
+ ^
+ |
+ |
+ TMU configuration data -----> TMU Driver <----> Exynos Core thermal wrapper
+ (exynos_tmu_data.c) (exynos_tmu.c) (exynos_thermal_common.c)
+ (exynos_tmu_data.h) (exynos_tmu.h) (exynos_thermal_common.h)
+
+a) TMU configuration data:
+ This consist of TMU register offsets/bitfields
+ described through structure exynos_tmu_registers. Also several
+ other platform data (struct exynos_tmu_platform_data) members
+ are used to configure the TMU.
+b) TMU driver:
+ This component initialises the TMU controller and sets different
+ thresholds. It invokes core thermal implementation with the call
+ exynos_report_trigger.
+c) Exynos Core thermal wrapper:
+ This provides 3 wrapper function to use the
+ Kernel core thermal framework. They are exynos_unregister_thermal,
+ exynos_register_thermal and exynos_report_trigger.
diff --git a/Documentation/driver-api/thermal/exynos_thermal_emulation.rst b/Documentation/driver-api/thermal/exynos_thermal_emulation.rst
new file mode 100644
index 000000000000..c21d10838bc5
--- /dev/null
+++ b/Documentation/driver-api/thermal/exynos_thermal_emulation.rst
@@ -0,0 +1,61 @@
+=====================
+Exynos Emulation Mode
+=====================
+
+Copyright (C) 2012 Samsung Electronics
+
+Written by Jonghwa Lee <jonghwa3.lee@samsung.com>
+
+Description
+-----------
+
+Exynos 4x12 (4212, 4412) and 5 series provide emulation mode for thermal
+management unit. Thermal emulation mode supports software debug for
+TMU's operation. User can set temperature manually with software code
+and TMU will read current temperature from user value not from sensor's
+value.
+
+Enabling CONFIG_THERMAL_EMULATION option will make this support
+available. When it's enabled, sysfs node will be created as
+/sys/devices/virtual/thermal/thermal_zone'zone id'/emul_temp.
+
+The sysfs node, 'emul_node', will contain value 0 for the initial state.
+When you input any temperature you want to update to sysfs node, it
+automatically enable emulation mode and current temperature will be
+changed into it.
+
+(Exynos also supports user changeable delay time which would be used to
+delay of changing temperature. However, this node only uses same delay
+of real sensing time, 938us.)
+
+Exynos emulation mode requires synchronous of value changing and
+enabling. It means when you want to update the any value of delay or
+next temperature, then you have to enable emulation mode at the same
+time. (Or you have to keep the mode enabling.) If you don't, it fails to
+change the value to updated one and just use last succeessful value
+repeatedly. That's why this node gives users the right to change
+termerpature only. Just one interface makes it more simply to use.
+
+Disabling emulation mode only requires writing value 0 to sysfs node.
+
+::
+
+
+ TEMP 120 |
+ |
+ 100 |
+ |
+ 80 |
+ | +-----------
+ 60 | | |
+ | +-------------| |
+ 40 | | | |
+ | | | |
+ 20 | | | +----------
+ | | | | |
+ 0 |______________|_____________|__________|__________|_________
+ A A A A TIME
+ |<----->| |<----->| |<----->| |
+ | 938us | | | | | |
+ emulation : 0 50 | 70 | 20 | 0
+ current temp: sensor 50 70 20 sensor
diff --git a/Documentation/driver-api/thermal/index.rst b/Documentation/driver-api/thermal/index.rst
new file mode 100644
index 000000000000..5ba61d19c6ae
--- /dev/null
+++ b/Documentation/driver-api/thermal/index.rst
@@ -0,0 +1,18 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=======
+Thermal
+=======
+
+.. toctree::
+ :maxdepth: 1
+
+ cpu-cooling-api
+ sysfs-api
+ power_allocator
+
+ exynos_thermal
+ exynos_thermal_emulation
+ intel_powerclamp
+ nouveau_thermal
+ x86_pkg_temperature_thermal
diff --git a/Documentation/driver-api/thermal/intel_powerclamp.rst b/Documentation/driver-api/thermal/intel_powerclamp.rst
new file mode 100644
index 000000000000..3f6dfb0b3ea6
--- /dev/null
+++ b/Documentation/driver-api/thermal/intel_powerclamp.rst
@@ -0,0 +1,320 @@
+=======================
+Intel Powerclamp Driver
+=======================
+
+By:
+ - Arjan van de Ven <arjan@linux.intel.com>
+ - Jacob Pan <jacob.jun.pan@linux.intel.com>
+
+.. Contents:
+
+ (*) Introduction
+ - Goals and Objectives
+
+ (*) Theory of Operation
+ - Idle Injection
+ - Calibration
+
+ (*) Performance Analysis
+ - Effectiveness and Limitations
+ - Power vs Performance
+ - Scalability
+ - Calibration
+ - Comparison with Alternative Techniques
+
+ (*) Usage and Interfaces
+ - Generic Thermal Layer (sysfs)
+ - Kernel APIs (TBD)
+
+INTRODUCTION
+============
+
+Consider the situation where a system’s power consumption must be
+reduced at runtime, due to power budget, thermal constraint, or noise
+level, and where active cooling is not preferred. Software managed
+passive power reduction must be performed to prevent the hardware
+actions that are designed for catastrophic scenarios.
+
+Currently, P-states, T-states (clock modulation), and CPU offlining
+are used for CPU throttling.
+
+On Intel CPUs, C-states provide effective power reduction, but so far
+they’re only used opportunistically, based on workload. With the
+development of intel_powerclamp driver, the method of synchronizing
+idle injection across all online CPU threads was introduced. The goal
+is to achieve forced and controllable C-state residency.
+
+Test/Analysis has been made in the areas of power, performance,
+scalability, and user experience. In many cases, clear advantage is
+shown over taking the CPU offline or modulating the CPU clock.
+
+
+THEORY OF OPERATION
+===================
+
+Idle Injection
+--------------
+
+On modern Intel processors (Nehalem or later), package level C-state
+residency is available in MSRs, thus also available to the kernel.
+
+These MSRs are::
+
+ #define MSR_PKG_C2_RESIDENCY 0x60D
+ #define MSR_PKG_C3_RESIDENCY 0x3F8
+ #define MSR_PKG_C6_RESIDENCY 0x3F9
+ #define MSR_PKG_C7_RESIDENCY 0x3FA
+
+If the kernel can also inject idle time to the system, then a
+closed-loop control system can be established that manages package
+level C-state. The intel_powerclamp driver is conceived as such a
+control system, where the target set point is a user-selected idle
+ratio (based on power reduction), and the error is the difference
+between the actual package level C-state residency ratio and the target idle
+ratio.
+
+Injection is controlled by high priority kernel threads, spawned for
+each online CPU.
+
+These kernel threads, with SCHED_FIFO class, are created to perform
+clamping actions of controlled duty ratio and duration. Each per-CPU
+thread synchronizes its idle time and duration, based on the rounding
+of jiffies, so accumulated errors can be prevented to avoid a jittery
+effect. Threads are also bound to the CPU such that they cannot be
+migrated, unless the CPU is taken offline. In this case, threads
+belong to the offlined CPUs will be terminated immediately.
+
+Running as SCHED_FIFO and relatively high priority, also allows such
+scheme to work for both preemptable and non-preemptable kernels.
+Alignment of idle time around jiffies ensures scalability for HZ
+values. This effect can be better visualized using a Perf timechart.
+The following diagram shows the behavior of kernel thread
+kidle_inject/cpu. During idle injection, it runs monitor/mwait idle
+for a given "duration", then relinquishes the CPU to other tasks,
+until the next time interval.
+
+The NOHZ schedule tick is disabled during idle time, but interrupts
+are not masked. Tests show that the extra wakeups from scheduler tick
+have a dramatic impact on the effectiveness of the powerclamp driver
+on large scale systems (Westmere system with 80 processors).
+
+::
+
+ CPU0
+ ____________ ____________
+ kidle_inject/0 | sleep | mwait | sleep |
+ _________| |________| |_______
+ duration
+ CPU1
+ ____________ ____________
+ kidle_inject/1 | sleep | mwait | sleep |
+ _________| |________| |_______
+ ^
+ |
+ |
+ roundup(jiffies, interval)
+
+Only one CPU is allowed to collect statistics and update global
+control parameters. This CPU is referred to as the controlling CPU in
+this document. The controlling CPU is elected at runtime, with a
+policy that favors BSP, taking into account the possibility of a CPU
+hot-plug.
+
+In terms of dynamics of the idle control system, package level idle
+time is considered largely as a non-causal system where its behavior
+cannot be based on the past or current input. Therefore, the
+intel_powerclamp driver attempts to enforce the desired idle time
+instantly as given input (target idle ratio). After injection,
+powerclamp monitors the actual idle for a given time window and adjust
+the next injection accordingly to avoid over/under correction.
+
+When used in a causal control system, such as a temperature control,
+it is up to the user of this driver to implement algorithms where
+past samples and outputs are included in the feedback. For example, a
+PID-based thermal controller can use the powerclamp driver to
+maintain a desired target temperature, based on integral and
+derivative gains of the past samples.
+
+
+
+Calibration
+-----------
+During scalability testing, it is observed that synchronized actions
+among CPUs become challenging as the number of cores grows. This is
+also true for the ability of a system to enter package level C-states.
+
+To make sure the intel_powerclamp driver scales well, online
+calibration is implemented. The goals for doing such a calibration
+are:
+
+a) determine the effective range of idle injection ratio
+b) determine the amount of compensation needed at each target ratio
+
+Compensation to each target ratio consists of two parts:
+
+ a) steady state error compensation
+ This is to offset the error occurring when the system can
+ enter idle without extra wakeups (such as external interrupts).
+
+ b) dynamic error compensation
+ When an excessive amount of wakeups occurs during idle, an
+ additional idle ratio can be added to quiet interrupts, by
+ slowing down CPU activities.
+
+A debugfs file is provided for the user to examine compensation
+progress and results, such as on a Westmere system::
+
+ [jacob@nex01 ~]$ cat
+ /sys/kernel/debug/intel_powerclamp/powerclamp_calib
+ controlling cpu: 0
+ pct confidence steady dynamic (compensation)
+ 0 0 0 0
+ 1 1 0 0
+ 2 1 1 0
+ 3 3 1 0
+ 4 3 1 0
+ 5 3 1 0
+ 6 3 1 0
+ 7 3 1 0
+ 8 3 1 0
+ ...
+ 30 3 2 0
+ 31 3 2 0
+ 32 3 1 0
+ 33 3 2 0
+ 34 3 1 0
+ 35 3 2 0
+ 36 3 1 0
+ 37 3 2 0
+ 38 3 1 0
+ 39 3 2 0
+ 40 3 3 0
+ 41 3 1 0
+ 42 3 2 0
+ 43 3 1 0
+ 44 3 1 0
+ 45 3 2 0
+ 46 3 3 0
+ 47 3 0 0
+ 48 3 2 0
+ 49 3 3 0
+
+Calibration occurs during runtime. No offline method is available.
+Steady state compensation is used only when confidence levels of all
+adjacent ratios have reached satisfactory level. A confidence level
+is accumulated based on clean data collected at runtime. Data
+collected during a period without extra interrupts is considered
+clean.
+
+To compensate for excessive amounts of wakeup during idle, additional
+idle time is injected when such a condition is detected. Currently,
+we have a simple algorithm to double the injection ratio. A possible
+enhancement might be to throttle the offending IRQ, such as delaying
+EOI for level triggered interrupts. But it is a challenge to be
+non-intrusive to the scheduler or the IRQ core code.
+
+
+CPU Online/Offline
+------------------
+Per-CPU kernel threads are started/stopped upon receiving
+notifications of CPU hotplug activities. The intel_powerclamp driver
+keeps track of clamping kernel threads, even after they are migrated
+to other CPUs, after a CPU offline event.
+
+
+Performance Analysis
+====================
+This section describes the general performance data collected on
+multiple systems, including Westmere (80P) and Ivy Bridge (4P, 8P).
+
+Effectiveness and Limitations
+-----------------------------
+The maximum range that idle injection is allowed is capped at 50
+percent. As mentioned earlier, since interrupts are allowed during
+forced idle time, excessive interrupts could result in less
+effectiveness. The extreme case would be doing a ping -f to generated
+flooded network interrupts without much CPU acknowledgement. In this
+case, little can be done from the idle injection threads. In most
+normal cases, such as scp a large file, applications can be throttled
+by the powerclamp driver, since slowing down the CPU also slows down
+network protocol processing, which in turn reduces interrupts.
+
+When control parameters change at runtime by the controlling CPU, it
+may take an additional period for the rest of the CPUs to catch up
+with the changes. During this time, idle injection is out of sync,
+thus not able to enter package C- states at the expected ratio. But
+this effect is minor, in that in most cases change to the target
+ratio is updated much less frequently than the idle injection
+frequency.
+
+Scalability
+-----------
+Tests also show a minor, but measurable, difference between the 4P/8P
+Ivy Bridge system and the 80P Westmere server under 50% idle ratio.
+More compensation is needed on Westmere for the same amount of
+target idle ratio. The compensation also increases as the idle ratio
+gets larger. The above reason constitutes the need for the
+calibration code.
+
+On the IVB 8P system, compared to an offline CPU, powerclamp can
+achieve up to 40% better performance per watt. (measured by a spin
+counter summed over per CPU counting threads spawned for all running
+CPUs).
+
+Usage and Interfaces
+====================
+The powerclamp driver is registered to the generic thermal layer as a
+cooling device. Currently, it’s not bound to any thermal zones::
+
+ jacob@chromoly:/sys/class/thermal/cooling_device14$ grep . *
+ cur_state:0
+ max_state:50
+ type:intel_powerclamp
+
+cur_state allows user to set the desired idle percentage. Writing 0 to
+cur_state will stop idle injection. Writing a value between 1 and
+max_state will start the idle injection. Reading cur_state returns the
+actual and current idle percentage. This may not be the same value
+set by the user in that current idle percentage depends on workload
+and includes natural idle. When idle injection is disabled, reading
+cur_state returns value -1 instead of 0 which is to avoid confusing
+100% busy state with the disabled state.
+
+Example usage:
+- To inject 25% idle time::
+
+ $ sudo sh -c "echo 25 > /sys/class/thermal/cooling_device80/cur_state
+
+If the system is not busy and has more than 25% idle time already,
+then the powerclamp driver will not start idle injection. Using Top
+will not show idle injection kernel threads.
+
+If the system is busy (spin test below) and has less than 25% natural
+idle time, powerclamp kernel threads will do idle injection. Forced
+idle time is accounted as normal idle in that common code path is
+taken as the idle task.
+
+In this example, 24.1% idle is shown. This helps the system admin or
+user determine the cause of slowdown, when a powerclamp driver is in action::
+
+
+ Tasks: 197 total, 1 running, 196 sleeping, 0 stopped, 0 zombie
+ Cpu(s): 71.2%us, 4.7%sy, 0.0%ni, 24.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
+ Mem: 3943228k total, 1689632k used, 2253596k free, 74960k buffers
+ Swap: 4087804k total, 0k used, 4087804k free, 945336k cached
+
+ PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
+ 3352 jacob 20 0 262m 644 428 S 286 0.0 0:17.16 spin
+ 3341 root -51 0 0 0 0 D 25 0.0 0:01.62 kidle_inject/0
+ 3344 root -51 0 0 0 0 D 25 0.0 0:01.60 kidle_inject/3
+ 3342 root -51 0 0 0 0 D 25 0.0 0:01.61 kidle_inject/1
+ 3343 root -51 0 0 0 0 D 25 0.0 0:01.60 kidle_inject/2
+ 2935 jacob 20 0 696m 125m 35m S 5 3.3 0:31.11 firefox
+ 1546 root 20 0 158m 20m 6640 S 3 0.5 0:26.97 Xorg
+ 2100 jacob 20 0 1223m 88m 30m S 3 2.3 0:23.68 compiz
+
+Tests have shown that by using the powerclamp driver as a cooling
+device, a PID based userspace thermal controller can manage to
+control CPU temperature effectively, when no other thermal influence
+is added. For example, a UltraBook user can compile the kernel under
+certain temperature (below most active trip points).
diff --git a/Documentation/driver-api/thermal/nouveau_thermal.rst b/Documentation/driver-api/thermal/nouveau_thermal.rst
new file mode 100644
index 000000000000..37255fd6735d
--- /dev/null
+++ b/Documentation/driver-api/thermal/nouveau_thermal.rst
@@ -0,0 +1,96 @@
+=====================
+Kernel driver nouveau
+=====================
+
+Supported chips:
+
+* NV43+
+
+Authors: Martin Peres (mupuf) <martin.peres@free.fr>
+
+Description
+-----------
+
+This driver allows to read the GPU core temperature, drive the GPU fan and
+set temperature alarms.
+
+Currently, due to the absence of in-kernel API to access HWMON drivers, Nouveau
+cannot access any of the i2c external monitoring chips it may find. If you
+have one of those, temperature and/or fan management through Nouveau's HWMON
+interface is likely not to work. This document may then not cover your situation
+entirely.
+
+Temperature management
+----------------------
+
+Temperature is exposed under as a read-only HWMON attribute temp1_input.
+
+In order to protect the GPU from overheating, Nouveau supports 4 configurable
+temperature thresholds:
+
+ * Fan_boost:
+ Fan speed is set to 100% when reaching this temperature;
+ * Downclock:
+ The GPU will be downclocked to reduce its power dissipation;
+ * Critical:
+ The GPU is put on hold to further lower power dissipation;
+ * Shutdown:
+ Shut the computer down to protect your GPU.
+
+WARNING:
+ Some of these thresholds may not be used by Nouveau depending
+ on your chipset.
+
+The default value for these thresholds comes from the GPU's vbios. These
+thresholds can be configured thanks to the following HWMON attributes:
+
+ * Fan_boost: temp1_auto_point1_temp and temp1_auto_point1_temp_hyst;
+ * Downclock: temp1_max and temp1_max_hyst;
+ * Critical: temp1_crit and temp1_crit_hyst;
+ * Shutdown: temp1_emergency and temp1_emergency_hyst.
+
+NOTE: Remember that the values are stored as milli degrees Celsius. Don't forget
+to multiply!
+
+Fan management
+--------------
+
+Not all cards have a drivable fan. If you do, then the following HWMON
+attributes should be available:
+
+ * pwm1_enable:
+ Current fan management mode (NONE, MANUAL or AUTO);
+ * pwm1:
+ Current PWM value (power percentage);
+ * pwm1_min:
+ The minimum PWM speed allowed;
+ * pwm1_max:
+ The maximum PWM speed allowed (bypassed when hitting Fan_boost);
+
+You may also have the following attribute:
+
+ * fan1_input:
+ Speed in RPM of your fan.
+
+Your fan can be driven in different modes:
+
+ * 0: The fan is left untouched;
+ * 1: The fan can be driven in manual (use pwm1 to change the speed);
+ * 2; The fan is driven automatically depending on the temperature.
+
+NOTE:
+ Be sure to use the manual mode if you want to drive the fan speed manually
+
+NOTE2:
+ When operating in manual mode outside the vbios-defined
+ [PWM_min, PWM_max] range, the reported fan speed (RPM) may not be accurate
+ depending on your hardware.
+
+Bug reports
+-----------
+
+Thermal management on Nouveau is new and may not work on all cards. If you have
+inquiries, please ping mupuf on IRC (#nouveau, freenode).
+
+Bug reports should be filled on Freedesktop's bug tracker. Please follow
+http://nouveau.freedesktop.org/wiki/Bugs
diff --git a/Documentation/driver-api/thermal/power_allocator.rst b/Documentation/driver-api/thermal/power_allocator.rst
new file mode 100644
index 000000000000..67b6a3297238
--- /dev/null
+++ b/Documentation/driver-api/thermal/power_allocator.rst
@@ -0,0 +1,271 @@
+=================================
+Power allocator governor tunables
+=================================
+
+Trip points
+-----------
+
+The governor works optimally with the following two passive trip points:
+
+1. "switch on" trip point: temperature above which the governor
+ control loop starts operating. This is the first passive trip
+ point of the thermal zone.
+
+2. "desired temperature" trip point: it should be higher than the
+ "switch on" trip point. This the target temperature the governor
+ is controlling for. This is the last passive trip point of the
+ thermal zone.
+
+PID Controller
+--------------
+
+The power allocator governor implements a
+Proportional-Integral-Derivative controller (PID controller) with
+temperature as the control input and power as the controlled output:
+
+ P_max = k_p * e + k_i * err_integral + k_d * diff_err + sustainable_power
+
+where
+ - e = desired_temperature - current_temperature
+ - err_integral is the sum of previous