summaryrefslogtreecommitdiffstats
path: root/drivers/cpuidle
AgeCommit message (Collapse)Author
2016-05-18cpuidle: Fix cpuidle_state_is_coupled() argument in cpuidle_enter()Daniel Lezcano
Commit 0b89e9aa2856 (cpuidle: delay enabling interrupts until all coupled CPUs leave idle) rightfully fixed a regression by letting the coupled idle state framework to handle local interrupt enabling when the CPU is exiting an idle state. The current code checks if the idle state is coupled and, if so, it will let the coupled code to enable interrupts. This way, it can decrement the ready-count before handling the interrupt. This mechanism prevents the other CPUs from waiting for a CPU which is handling interrupts. But the check is done against the state index returned by the back end driver's ->enter functions which could be different from the initial index passed as parameter to the cpuidle_enter_state() function. entered_state = target_state->enter(dev, drv, index); [ ... ] if (!cpuidle_state_is_coupled(drv, entered_state)) local_irq_enable(); [ ... ] If the 'index' is referring to a coupled idle state but the 'entered_state' is *not* coupled, then the interrupts are enabled again. All CPUs blocked on the sync barrier may busy loop longer if the CPU has interrupts to handle before decrementing the ready-count. That's consuming more energy than saving. Fixes: 0b89e9aa2856 (cpuidle: delay enabling interrupts until all coupled CPUs leave idle) Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: 3.15+ <stable@vger.kernel.org> # 3.15+ [ rjw: Subject & changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-05-06Merge back new cpuidle material for v4.7.Rafael J. Wysocki
2016-04-28ARM: cpuidle: Pass on arm_cpuidle_suspend()'s return valueJames Morse
arm_cpuidle_suspend() may return -EOPNOTSUPP, or any value returned by the cpu_ops/cpuidle_ops suspend call. arm_enter_idle_state() doesn't update 'ret' with this value, meaning we always signal success to cpuidle_enter_state(), causing it to update the usage counters as if we succeeded. Fixes: 191de17aa3c1 ("ARM64: cpuidle: Replace cpu_suspend by the common ARM/ARM64 function") Signed-off-by: James Morse <james.morse@arm.com> Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: 4.1+ <stable@vger.kernel.org> # 4.1+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-26cpuidle: Replace ktime_get() with local_clock()Daniel Lezcano
The ktime_get() can have a non negligeable overhead, use local_clock() instead. In order to test the difference between ktime_get() and local_clock(), a quick hack has been added to trigger, via debugfs, 10000 times a call to ktime_get() and local_clock() and measure the elapsed time. Then the average value, the min and max is computed for each call. From userspace, the test above was called 100 times every 2 seconds. So, ktime_get() and local_clock() have been called 1000000 times in total. The results are: ktime_get(): ============ * average: 101 ns (stddev: 27.4) * maximum: 38313 ns * minimum: 65 ns local_clock(): ============== * average: 60 ns (stddev: 9.8) * maximum: 13487 ns * minimum: 46 ns The local_clock() is faster and more stable. Even if it is a drop in the ocean, changing the ktime_get() by the local_clock() allows to save 80ns at idle time (entry + exit). And in some circumstances, especially when there are several CPUs racing for the clock access, we save tens of microseconds. The idle duration resulting from a diff is converted from nanosec to microsec. This could be done with integer division (div 1000) - which is an expensive operation or by 10 bits shifting (div 1024) - which is fast but unprecise. The following table gives some results at the limits. ------------------------------------------ | nsec | div(1000) | div(1024) | ------------------------------------------ | 1e3 | 1 usec | 976 nsec | ------------------------------------------ | 1e6 | 1000 usec | 976 usec | ------------------------------------------ | 1e9 | 1000000 usec | 976562 usec | ------------------------------------------ There is a linear deviation of 2.34%. This loss of precision is acceptable in the context of the resulting diff which is used for statistics. These ones are processed to guess estimate an approximation of the duration of the next idle period which ends up into an idle state selection. The selection criteria takes into account the next duration based on large intervals, represented by the idle state's target residency. The 2^10 division is enough because the approximation regarding the 1e3 division is lost in all the approximations done for the next idle duration computation. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> [ rjw: Subject ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-09cpuidle: Indicate when a device has been unregisteredDave Gerlach
Currently the 'registered' member of the cpuidle_device struct is set to 1 during cpuidle_register_device. In this same function there are checks to see if the device is already registered to prevent duplicate calls to register the device, but this value is never set to 0 even on unregister of the device. Because of this, any attempt to call cpuidle_register_device after a call to cpuidle_unregister_device will fail which shouldn't be the case. To prevent this, set registered to 0 when the device is unregistered. Fixes: c878a52d3c7c (cpuidle: Check if device is already registered) Signed-off-by: Dave Gerlach <d-gerlach@ti.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-21cpuidle: menu: Fall back to polling if next timer event is nearRafael J. Wysocki
Commit a9ceb78bc75c (cpuidle,menu: use interactivity_req to disable polling) changed the behavior of the fallback state selection part of menu_select() so it looks at interactivity_req instead of data->next_timer_us when it makes its decision. That effectively caused polling to be used more often as fallback idle which led to significant increases of energy consumption in some cases. Commit e132b9b3bc7f (cpuidle: menu: use high confidence factors only when considering polling) changed that logic again to be more predictable, but that didn't help with the increased energy consumption problem. For this reason, go back to making decisions on which state to fall back to based on data->next_timer_us which is the time we know for sure something will happen rather than a prediction (which may be inaccurate and turns out to be so often enough to be problematic). However, take the target residency of the first proper idle state (C1) into account, so that state is not used as the fallback one if its target residency is greater than data->next_timer_us. Fixes: a9ceb78bc75c (cpuidle,menu: use interactivity_req to disable polling) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reported-and-tested-by: Doug Smythies <dsmythies@telus.net>
2016-03-17cpuidle: menu: use high confidence factors only when considering pollingRik van Riel
The menu governor uses five different factors to pick the idle state: - the user configured latency_req - the time until the next timer (next_timer_us) - the typical sleep interval, as measured recently - an estimate of sleep time by dividing next_timer_us by an observed factor - a load corrected version of the above, divided again by load Only the first three items are known with enough confidence that we can use them to consider polling, instead of an actual CPU idle state, because the cost of being wrong about polling can be excessive power use. The latter two are used in the menu governor's main selection loop, and can result in choosing a shallower idle state when the system is expected to be busy again soon. This pushes a busy system in the "performance" direction of the performance<>power tradeoff, when choosing between idle states, but stays more strictly on the "power" state when deciding between polling and C1. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-02-17cpuidle: menu: help gcc generate slightly better codeRasmus Villemoes
We know that the avg variable actually ends up holding a 32 bit quantity, since it's an average of such numbers. It is only a u64 because it is temporarily used to hold the sum. Making it an actual u32 allows gcc to generate slightly better code, e.g. when computing the square, it can do a 32x32->64 multiply. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-02-17cpuidle: menu: avoid expensive square root computationRasmus Villemoes
Computing the integer square root is a rather expensive operation, at least compared to doing a 64x64 -> 64 multiply (avg*avg) and, on 64 bit platforms, doing an extra comparison to a constant (variance <= U64_MAX/36). On 64 bit platforms, this does mean that we add a restriction on the range of the variance where we end up using the estimate (since previously the stddev <= ULONG_MAX was a tautology), but on the other hand, we extend the range quite substantially on 32 bit platforms - in both cases, we now allow standard deviations up to 715 seconds, which is for example guaranteed if all observations are less than 1430 seconds. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-01-29Merge branches 'pm-cpuidle', 'pm-cpufreq', 'pm-domains' and 'pm-sleep'Rafael J. Wysocki
* pm-cpuidle: cpuidle: coupled: remove unused define cpuidle_coupled_lock cpuidle: fix fallback mechanism for suspend to idle in absence of enter_freeze * pm-cpufreq: cpufreq: cpufreq-dt: avoid uninitialized variable warnings: cpufreq: pxa2xx: fix pxa_cpufreq_change_voltage prototype cpufreq: Use list_is_last() to check last entry of the policy list cpufreq: Fix NULL reference crash while accessing policy->governor_data * pm-domains: PM / Domains: Fix typo in comment PM / Domains: Fix potential deadlock while adding/removing subdomains PM / domains: fix lockdep issue for all subdomains * pm-sleep: PM: APM_EMULATION does not depend on PM
2016-01-27cpuidle: coupled: remove unused define cpuidle_coupled_lockAnders Roxell
This was found with the -RT patch enabled, but the fix should apply to non-RT also. Used multi_v7_defconfig+PREEMPT_RT_FULL=y and this caused a compilation warning without this fix: ../drivers/cpuidle/coupled.c:122:21: warning: 'cpuidle_coupled_lock' defined but not used [-Wunused-variable] Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-01-22cpuidle: fix fallback mechanism for suspend to idle in absence of enter_freezeSudeep Holla
Commit 51164251f5c3 "sched / idle: Drop default_idle_call() fallback from call_cpuidle()" made find_deepest_state() return non-negative value and check all the states with index > 0. Also as a result, find_deepest_state() returns 0 even when enter_freeze callbacks are not implemented and enter_freeze_proper() is called which ends up crashing the kernel. This patch updates the check for index > 0 in cpuidle_enter_freeze and cpuidle_idle_call(when idle_should_freeze is true) to restore the suspend-to-idle functionality in absence of enter_freeze callback. Fixes: 51164251f5c3 "sched / idle: Drop default_idle_call() fallback from call_cpuidle()" Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-01-20Merge tag 'pm+acpi-4.5-rc1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more power management and ACPI updates from Rafael Wysocki: "This includes fixes on top of the previous batch of PM+ACPI updates and some new material as well. From the new material perspective the most significant are the driver core changes that should allow USB devices to stay suspended over system suspend/resume cycles if they have been runtime-suspended already beforehand. Apart from that, ACPICA is updated to upstream revision 20160108 (cosmetic mostly, but including one fixup on top of the previous ACPICA update) and there are some devfreq updates the didn't make it before (due to timing). A few recent regressions are fixed, most importantly in the cpuidle menu governor and in the ACPI backlight driver and some x86 platform drivers depending on it. Some more bugs are fixed and cleanups are made on top of that. Specifics: - Modify the driver core and the USB subsystem to allow USB devices to stay suspended over system suspend/resume cycles if they have been runtime-suspended already beforehand and fix some bugs on top of these changes (Tomeu Vizoso, Rafael Wysocki). - Update ACPICA to upstream revision 20160108, including updates of the ACPICA's copyright notices, a code fixup resulting from a regression fix that was necessary in the upstream code only (the regression fixed by it has never been present in Linux) and a compiler warning fix (Bob Moore, Lv Zheng). - Fix a recent regression in the cpuidle menu governor that broke it on practically all architectures other than x86 and make a couple of optimizations on top of that fix (Rafael Wysocki). - Clean up the selection of cpuidle governors depending on whether or not the kernel is configured for tickless systems (Jean Delvare). - Revert a recent commit that introduced a regression in the ACPI backlight driver, address the problem it attempted to fix in a different way and revert one more cosmetic change depending on the problematic commit (Hans de Goede). - Add two more ACPI backlight quirks (Hans de Goede). - Fix a few minor problems in the core devfreq code, clean it up a bit and update the MAINTAINERS information related to it (Chanwoo Choi, MyungJoo Ham). - Improve an error message in the ACPI fan driver (Andy Lutomirski). - Fix a recent build regression in the cpupower tool (Shreyas Prabhu)" * tag 'pm+acpi-4.5-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (32 commits) cpuidle: menu: Avoid pointless checks in menu_select() sched / idle: Drop default_idle_call() fallback from call_cpuidle() cpupower: Fix build error in cpufreq-info cpuidle: Don't enable all governors by default cpuidle: Default to ladder governor on ticking systems time: nohz: Expose tick_nohz_enabled ACPICA: Update version to 20160108 ACPICA: Silence a -Wbad-function-cast warning when acpi_uintptr_t is 'uintptr_t' ACPICA: Additional 2016 copyright changes ACPICA: Reduce regression fix divergence from upstream ACPICA ACPI / video: Add disable_backlight_sysfs_if quirk for the Toshiba Satellite R830 ACPI / video: Revert "thinkpad_acpi: Use acpi_video_handles_brightness_key_presses()" ACPI / video: Document acpi_video_handles_brightness_key_presses() a bit ACPI / video: Fix using an uninitialized mutex / list_head in acpi_video_handles_brightness_key_presses() ACPI / video: Revert "ACPI / video: driver must be registered before checking for keypresses" ACPI / fan: Improve acpi_device_update_power error message ACPI / video: Add disable_backlight_sysfs_if quirk for the Toshiba Portege R700 cpuidle: menu: Fix menu_select() for CPUIDLE_DRIVER_STATE_START == 0 MAINTAINERS: Add devfreq-event entry MAINTAINERS: Add missing git repository and directory for devfreq ...
2016-01-19cpuidle: menu: Avoid pointless checks in menu_select()Rafael J. Wysocki
If menu_select() cannot find a suitable state to return, it will return the state index stored in data->last_state_idx. This means that it is pointless to look at the states whose indices are less than or equal to data->last_state_idx in the main loop, so don't do that. Given that those checks are done on every idle state selection, this change can save quite a bit of completely unnecessary overhead. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Tested-by: Sudeep Holla <sudeep.holla@arm.com>
2016-01-19sched / idle: Drop default_idle_call() fallback from call_cpuidle()Rafael J. Wysocki
After commit 9c4b2867ed7c (cpuidle: menu: Fix menu_select() for CPUIDLE_DRIVER_STATE_START == 0) it is clear that menu_select() cannot return negative values. Moreover, ladder_select_state() will never return a negative value too, so make find_deepest_state() return non-negative values too and drop the default_idle_call() fallback from call_cpuidle(). This eliminates one branch from the idle loop and makes the governors and find_deepest_state() handle the case when all states have been disabled from sysfs consistently. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Ingo Molnar <mingo@kernel.org> Tested-by: Sudeep Holla <sudeep.holla@arm.com>
2016-01-15cpuidle: Don't enable all governors by defaultJean Delvare
Since commit d6f346f2d2 (cpuidle: improve governor Kconfig options) the best cpuidle governor is selected automatically. There is little point in additionally selecting the other one as it will not be used, so don't select both governors by default. Being able to select more than one governor is still good for developers booting with cpuidle_sysfs_switch, though. This fixes the second half of kernel BZ #65531. Link: https://bugzilla.kernel.org/show_bug.cgi?id=65531 Signed-off-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-01-15cpuidle: Default to ladder governor on ticking systemsJean Delvare
The menu governor is currently the default on all systems. However the documentation claims that the ladder governor is preferred on ticking systems. So bump the rating of the ladder governor when NO_HZ is disabled, or when booting with nohz=off. This fixes the first half of kernel BZ #65531. Link: https://bugzilla.kernel.org/show_bug.cgi?id=65531 Signed-off-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-01-15Merge tag 'powerpc-4.5-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Core: - Ground work for the new Power9 MMU from Aneesh Kumar K.V - Optimise FP/VMX/VSX context switching from Anton Blanchard Misc: - Various cleanups from Krzysztof Kozlowski, John Ogness, Rashmica Gupta, Russell Currey, Gavin Shan, Daniel Axtens, Michael Neuling, Andrew Donnellan - Allow wrapper to work on non-english system from Laurent Vivier - Add rN aliases to the pt_regs_offset table from Rashmica Gupta - Fix module autoload for rackmeter & axonram drivers from Luis de Bethencourt - Include KVM guest test in all interrupt vectors from Paul Mackerras - Fix DSCR inheritance over fork() from Anton Blanchard - Make value-returning atomics & {cmp}xchg* & their atomic_ versions fully ordered from Boqun Feng - Print MSR TM bits in oops messages from Michael Neuling - Add TM signal return & invalid stack selftests from Michael Neuling - Limit EPOW reset event warnings from Vipin K Parashar - Remove the Cell QPACE code from Rashmica Gupta - Append linux_banner to exception information in xmon from Rashmica Gupta - Add selftest to check if VSRs are corrupted from Rashmica Gupta - Remove broken GregorianDay() from Daniel Axtens - Import Anton's context_switch2 benchmark into selftests from Michael Ellerman - Add selftest script to test HMI functionality from Daniel Axtens - Remove obsolete OPAL v2 support from Stewart Smith - Make enter_rtas() private from Michael Ellerman - PPR exception cleanups from Michael Ellerman - Add page soft dirty tracking from Laurent Dufour - Add support for Nvlink NPUs from Alistair Popple - Add support for kexec on 476fpe from Alistair Popple - Enable kernel CPU dlpar from sysfs from Nathan Fontenot - Copy only required pieces of the mm_context_t to the paca from Michael Neuling - Add a kmsg_dumper that flushes OPAL console output on panic from Russell Currey - Implement save_stack_trace_regs() to enable kprobe stack tracing from Steven Rostedt - Add HWCAP bits for Power9 from Michael Ellerman - Fix _PAGE_PTE breaking swapoff from Aneesh Kumar K.V - Fix _PAGE_SWP_SOFT_DIRTY breaking swapoff from Hugh Dickins - scripts/recordmcount.pl: support data in text section on powerpc from Ulrich Weigand - Handle R_PPC64_ENTRY relocations in modules from Ulrich Weigand cxl: - cxl: Fix possible idr warning when contexts are released from Vaibhav Jain - cxl: use correct operator when writing pcie config space values from Andrew Donnellan - cxl: Fix DSI misses when the context owning task exits from Vaibhav Jain - cxl: fix build for GCC 4.6.x from Brian Norris - cxl: use -Werror only with CONFIG_PPC_WERROR from Brian Norris - cxl: Enable PCI device ID for future IBM CXL adapter from Uma Krishnan Freescale: - Freescale updates from Scott: Highlights include moving QE code out of arch/powerpc (to be shared with arm), device tree updates, and minor fixes" * tag 'powerpc-4.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (149 commits) powerpc/module: Handle R_PPC64_ENTRY relocations scripts/recordmcount.pl: support data in text section on powerpc powerpc/powernv: Fix OPAL_CONSOLE_FLUSH prototype and usages powerpc/mm: fix _PAGE_SWP_SOFT_DIRTY breaking swapoff powerpc/mm: Fix _PAGE_PTE breaking swapoff cxl: Enable PCI device ID for future IBM CXL adapter cxl: use -Werror only with CONFIG_PPC_WERROR cxl: fix build for GCC 4.6.x powerpc: Add HWCAP bits for Power9 powerpc/powernv: Reserve PE#0 on NPU powerpc/powernv: Change NPU PE# assignment powerpc/powernv: Fix update of NVLink DMA mask powerpc/powernv: Remove misleading comment in pci.c powerpc: Implement save_stack_trace_regs() to enable kprobe stack tracing powerpc: Fix build break due to paca mm_context_t changes cxl: Fix DSI misses when the context owning task exits MAINTAINERS: Update Scott Wood's e-mail address powerpc/powernv: Fix minor off-by-one error in opal_mce_check_early_recovery() powerpc: Fix style of self-test config prompts powerpc/powernv: Only delay opal_rtc_read() retry when necessary ...
2016-01-14cpuidle: menu: Fix menu_select() for CPUIDLE_DRIVER_STATE_START == 0Rafael J. Wysocki
Commit a9ceb78bc75c (cpuidle,menu: use interactivity_req to disable polling) exposed a bug in menu_select() causing it to return -1 on systems with CPUIDLE_DRIVER_STATE_START equal to zero, although it should have returned 0. As a result, idle states are not entered by CPUs on those systems. Namely, on the systems in question data->last_state_idx is initially equal to -1 and the above commit modified the condition that would have caused it to be changed to 0 to be less likely to trigger which exposed the problem. However, setting data->last_state_idx initially to -1 doesn't make sense at all and on the affected systems it should always be set to CPUIDLE_DRIVER_STATE_START (ie. 0) unconditionally, so make that happen. Fixes: a9ceb78bc75c (cpuidle,menu: use interactivity_req to disable polling) Reported-and-tested-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-12-17powerpc/powernv: remove FW_FEATURE_OPALv3 and just use FW_FEATURE_OPALStewart Smith
Long ago, only in the lab, there was OPALv1 and OPALv2. Now there is just OPALv3, with nobody ever expecting anything on pre-OPALv3 to be cared about or supported by mainline kernels. So, let's remove FW_FEATURE_OPALv3 and instead use FW_FEATURE_OPAL exclusively. Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-15drivers/cpuidle: make cpuidle-exynos.c explicitly non-modularPaul Gortmaker
The Kconfig currently controlling compilation of this code is: cpuidle/Kconfig.arm:config ARM_EXYNOS_CPUIDLE cpuidle/Kconfig.arm: bool "Cpu Idle Driver for the Exynos processors" ...meaning that it currently is not being built as a module by anyone. Lets remove the couple traces of modularity so that when reading the driver there is no doubt it is builtin-only. Since module_platform_driver() uses the same init level priority as builtin_platform_driver() the init ordering remains unchanged with this commit. Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-12-15drivers/cpuidle: make cpuidle-ux500.c explicitly non-modularPaul Gortmaker
The Kconfig currently controlling compilation of this code is: cpuidle/Kconfig.arm:config ARM_U8500_CPUIDLE cpuidle/Kconfig.arm: bool "Cpu Idle Driver for the ST-E u8500 processors" ...meaning that it currently is not being built as a module by anyone. Lets remove the couple traces of modularity so that when reading the driver there is no doubt it is builtin-only. Since module_platform_driver() uses the same init level priority as builtin_platform_driver() the init ordering remains unchanged with this commit. Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-12-15drivers/cpuidle: make cpuidle-clps711x.c explicitly non-modularPaul Gortmaker
The Kconfig currently controlling compilation of this code is: drivers/cpuidle/Kconfig.arm:config ARM_CLPS711X_CPUIDLE drivers/cpuidle/Kconfig.arm: bool "CPU Idle Driver for CLPS711X processors" ...meaning that it currently is not being built as a module by anyone. Lets remove the modular code that is essentially orphaned, so that when reading the driver there is no doubt it is builtin-only. Since module_platform_driver() uses the same init level priority as builtin_platform_driver() the init ordering remains unchanged with this commit. Also note that MODULE_DEVICE_TABLE is a no-op for non-modular code. We also delete the MODULE_LICENSE tag etc. since all that information is already contained at the top of the file in the comments. Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-11-17cpuidle,menu: smooth out measured_us calculationRik van Riel
The cpuidle state tables contain the maximum exit latency for each cpuidle state. On x86, that is the exit latency for when the entire package goes into that same idle state. However, a lot of the time we only go into the core idle state, not the package idle state. This means we see a much smaller exit latency. We have no way to detect whether we went into the core or package idle state while idle, and that is ok. However, the current menu_update logic does have the potential to trip up the repeating pattern detection in get_typical_interval. If the system is experiencing an exit latency near the idle state's exit latency, some of the samples will have exit_us subtracted, while others will not. This turns a repeating pattern into mush, potentially breaking get_typical_interval. Furthermore, for smaller sleep intervals, we know the chance that all the cores in the package went to the same idle state are fairly small. Dividing the measured_us by two, instead of subtracting the full exit latency when hitting a small measured_us, will reduce the error. Signed-off-by: Rik van Riel <riel@redhat.com> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-11-17cpuidle,menu: use interactivity_req to disable pollingRik van Riel
The menu governor carefully figures out how much time we typically sleep for an estimated sleep interval, or whether there is a repeating pattern going on, and corrects that estimate for the CPU load. Then it proceeds to ignore that information when determining whether or not to consider polling. This is not a big deal on most x86 CPUs, which have very low C1 latencies, and the patch should not have any effect on those CPUs. However, certain CPUs (eg. Atom) have much higher C1 latencies, and it would be good to not waste performance and power on those CPUs if we are expecting a very low wakeup latency. Disable polling based on the estimated interactivity requirement, not on the time to the next timer interrupt. Signed-off-by: Rik van Riel <riel@redhat.com> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-11-17cpuidle,x86: increase forced cut-off for polling to 20usRik van Riel
The cpuidle menu governor has a forced cut-off for polling at 5us, in order to deal with firmware that gives the OS bad information on cpuidle states, leading to the system spending way too much time in polling. However, at least one x86 CPU family (Atom) has chips that have a 20us break-even point for C1. Forcing the polling cut-off to less than that wastes performance and power. Increase the polling cut-off to 20us. Systems with a lower C1 latency will be found in the states table by the menu governor, which will pick those states as appropriate. Signed-off-by: Rik van Riel <riel@redhat.com> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-10-23cpuidle: mvebu: disable the bind/unbind attributes and use ↵Russell King
builtin_platform_driver As the driver doesn't support unbinding, nor does it support arbitary binding of devices, disable the bind/unbind attributes for this driver. Also, as the driver has no remove function, it can never be modular, so use builtin_platform_driver() to avoid the module exit boilerplate. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2015-10-23cpuidle: mvebu: clean up multiple platform driversRussell King
There's no need to use multiple platform drivers, especially when we want to do something different in the probe, but we still use a common probe function. We can use the platform ID system to only register one platform driver, but have it match several devices, and give us the CPU idle driver via the ID's driver_data. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2015-09-11Merge tag 'pm+acpi-4.3-rc1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more power management and ACPI updates from Rafael Wysocki: "These are mostly fixes and cleanups on top of the previous PM+ACPI pull request (cpufreq core and drivers, cpuidle, generic power domains framework). Some of them didn't make to that pull request and some fix issues introduced by it. The only really new thing is the support for suspend frequency in the cpufreq-dt driver, but it is needed to fix an issue with Exynos platforms. Specifics: - build fix for the new Mediatek MT8173 cpufreq driver (Guenter Roeck). - generic power domains framework fixes (power on error code path, subdomain removal) and cleanup of a deprecated API user (Geert Uytterhoeven, Jon Hunter, Ulf Hansson). - cpufreq-dt driver fixes including two fixes for bugs related to the new Operating Performance Points Device Tree bindings introduced recently (Viresh Kumar). - suspend frequency support for the cpufreq-dt driver (Bartlomiej Zolnierkiewicz, Viresh Kumar). - cpufreq core cleanups (Viresh Kumar). - intel_pstate driver fixes (Chen Yu, Kristen Carlson Accardi). - additional sanity check in the cpuidle core (Xunlei Pang). - fix for a comment related to CPU power management (Lina Iyer)" * tag 'pm+acpi-4.3-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: intel_pstate: fix PCT_TO_HWP macro intel_pstate: Fix user input of min/max to legal policy region PM / OPP: Return suspend_opp only if it is enabled cpufreq-dt: add suspend frequency support cpufreq: allow cpufreq_generic_suspend() to work without suspend frequency PM / OPP: add dev_pm_opp_get_suspend_opp() helper staging: board: Migrate away from __pm_genpd_name_add_device() cpufreq: Use __func__ to print function's name cpufreq: staticize cpufreq_cpu_get_raw() PM / Domains: Ensure subdomain is not in use before removing cpufreq: Add ARM_MT8173_CPUFREQ dependency on THERMAL cpuidle/coupled: Add sanity check for safe_state_index PM / Domains: Try power off masters in error path of __pm_genpd_poweron() cpufreq: dt: Tolerance applies on both sides of target voltage cpufreq: dt: Print error on failing to mark OPPs as shared cpufreq: dt: Check OPP count before marking them shared kernel/cpu_pm: fix cpu_cluster_pm_exit comment
2015-09-11Merge branches 'pm-cpu', 'pm-cpuidle' and 'pm-domains'Rafael J. Wysocki
* pm-cpu: kernel/cpu_pm: fix cpu_cluster_pm_exit comment * pm-cpuidle: cpuidle/coupled: Add sanity check for safe_state_index * pm-domains: staging: board: Migrate away from __pm_genpd_name_add_device() PM / Domains: Ensure subdomain is not in use before removing PM / Domains: Try power off masters in error path of __pm_genpd_poweron()
2015-09-03Merge branch 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-armLinus Torvalds
Pull ARM development updates from Russell King: "Included in this update: - moving PSCI code from ARM64/ARM to drivers/ - removal of some architecture internals from global kernel view - addition of software based "privileged no access" support using the old domains register to turn off the ability for kernel loads/stores to access userspace. Only the proper accessors will be usable. - addition of early fixup support for early console - re-addition (and reimplementation) of OMAP special interconnect barrier - removal of finish_arch_switch() - only expose cpuX/online in sysfs if hotpluggable - a number of code cleanups" * 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: (41 commits) ARM: software-based priviledged-no-access support ARM: entry: provide uaccess assembly macro hooks ARM: entry: get rid of multiple macro definitions ARM: 8421/1: smp: Collapse arch_cpu_idle_dead() into cpu_die() ARM: uaccess: provide uaccess_save_and_enable() and uaccess_restore() ARM: mm: improve do_ldrd_abort macro ARM: entry: ensure that IRQs are enabled when calling syscall_trace_exit() ARM: entry: efficiency cleanups ARM: entry: get rid of asm_trace_hardirqs_on_cond ARM: uaccess: simplify user access assembly ARM: domains: remove DOMAIN_TABLE ARM: domains: keep vectors in separate domain ARM: domains: get rid of manager mode for user domain ARM: domains: move initial domain setting value to asm/domains.h ARM: domains: provide domain_mask() ARM: domains: switch to keeping domain value in register ARM: 8419/1: dma-mapping: harmonize definition of DMA_ERROR_CODE ARM: 8417/1: refactor bitops functions with BIT_MASK() and BIT_WORD() ARM: 8416/1: Feroceon: use of_iomap() to map register base ARM: 8415/1: early fixmap support for earlycon ...
2015-09-03cpuidle/coupled: Add sanity check for safe_state_indexXunlei Pang
Since we are using cpuidle_driver::safe_state_index directly as the target state index, it is better to add the sanity check at the point of registering the driver. Signed-off-by: Xunlei Pang <pang.xunlei@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-09-01Merge tag 'pm+acpi-4.3-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management and ACPI updates from Rafael Wysocki: "From the number of commits perspective, the biggest items are ACPICA and cpufreq changes with the latter taking the lead (over 50 commits). On the cpufreq front, there are many cleanups and minor fixes in the core and governors, driver updates etc. We also have a new cpufreq driver for Mediatek MT8173 chips. ACPICA mostly updates its debug infrastructure and adds a number of fixes and cleanups for a good measure. The Operating Performance Points (OPP) framework is updated with new DT bindings and support for them among other things. We have a few updates of the generic power domains framework and a reorganization of the ACPI device enumeration code and bus type operations. And a lot of fixes and cleanups all over. Included is one branch from the MFD tree as it contains some PM-related driver core and ACPI PM changes a few other commits are based on. Specifics: - ACPICA update to upstream revision 20150818 including method tracing extensions to allow more in-depth AML debugging in the kernel and a number of assorted fixes and cleanups (Bob Moore, Lv Zheng, Markus Elfring). - ACPI sysfs code updates and a documentation update related to AML method tracing (Lv Zheng). - ACPI EC driver fix related to serialized evaluations of _Qxx methods and ACPI tools updates allowing the EC userspace tool to be built from the kernel source (Lv Zheng). - ACPI processor driver updates preparing it for future introduction of CPPC support and ACPI PCC mailbox driver updates (Ashwin Chaugule). - ACPI interrupts enumeration fix for a regression related to the handling of IRQ attribute conflicts between MADT and the ACPI namespace (Jiang Liu). - Fixes related to ACPI device PM (Mika Westerberg, Srinidhi Kasagar). - ACPI device registration code reorganization to separate the sysfs-related code and bus type operations from the rest (Rafael J Wysocki). - Assorted cleanups in the ACPI core (Jarkko Nikula, Mathias Krause, Andy Shevchenko, Rafael J Wysocki, Nicolas Iooss). - ACPI cpufreq driver and ia64 cpufreq driver fixes and cleanups (Pan Xinhui, Rafael J Wysocki). - cpufreq core cleanups on top of the previous changes allowing it to preseve its sysfs directories over system suspend/resume (Viresh Kumar, Rafael J Wysocki, Sebastian Andrzej Siewior). - cpufreq fixes and cleanups related to governors (Viresh Kumar). - cpufreq updates (core and the cpufreq-dt driver) related to the turbo/boost mode support (Viresh Kumar, Bartlomiej Zolnierkiewicz). - New DT bindings for Operating Performance Points (OPP), support for them in the OPP framework and in the cpufreq-dt driver plus related OPP framework fixes and cleanups (Viresh Kumar). - cpufreq powernv driver updates (Shilpasri G Bhat). - New cpufreq driver for Mediatek MT8173 (Pi-Cheng Chen). - Assorted cpufreq driver (speedstep-lib, sfi, integrator) cleanups and fixes (Abhilash Jindal, Andrzej Hajda, Cristian Ardelean). - intel_pstate driver updates including Skylake-S support, support for enabling HW P-states per CPU and an additional vendor bypass list entry (Kristen Carlson Accardi, Chen Yu, Ethan Zhao). - cpuidle core fixes related to the handling of coupled idle states (Xunlei Pang). - intel_idle driver updates including Skylake Client support and support for freeze-mode-specific idle states (Len Brown). - Driver core updates related to power management (Andy Shevchenko, Rafael J Wysocki). - Generic power domains framework fixes and cleanups (Jon Hunter, Geert Uytterhoeven, Rajendra Nayak, Ulf Hansson). - Device PM QoS framework update to allow the latency tolerance setting to be exposed to user space via sysfs (Mika Westerberg). - devfreq support for PPMUv2 in Exynos5433 and a fix for an incorrect exynos-ppmu DT binding (Chanwoo Choi, Javier Martinez Canillas). - System sleep support updates (Alan Stern, Len Brown, SungEun Kim). - rockchip-io AVS support updates (Heiko Stuebner). - PM core clocks support fixup (Colin Ian King). - Power capping RAPL driver update including support for Skylake H/S and Broadwell-H (Radivoje Jovanovic, Seiichi Ikarashi). - Generic device properties framework fixes related to the handling of static (driver-provided) property sets (Andy Shevchenko). - turbostat and cpupower updates (Len Brown, Shilpasri G Bhat, Shreyas B Prabhu)" * tag 'pm+acpi-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (180 commits) cpufreq: speedstep-lib: Use monotonic clock cpufreq: powernv: Increase the verbosity of OCC console messages cpufreq: sfi: use kmemdup rather than duplicating its implementation cpufreq: drop !cpufreq_driver check from cpufreq_parse_governor() cpufreq: rename cpufreq_real_policy as cpufreq_user_policy cpufreq: remove redundant 'policy' field from user_policy cpufreq: remove redundant 'governor' field from user_policy cpufreq: update user_policy.* on success cpufreq: use memcpy() to copy policy cpufreq: remove redundant CPUFREQ_INCOMPATIBLE notifier event cpufreq: mediatek: Add MT8173 cpufreq driver dt-bindings: mediatek: Add MT8173 CPU DVFS clock bindings PM / Domains: Fix typo in description of genpd_dev_pm_detach() PM / Domains: Remove unusable governor dummies PM / Domains: Make pm_genpd_init() available to modules PM / domains: Align column headers and data in pm_genpd_summary output powercap / RAPL: disable the 2nd power limit properly tools: cpupower: Fix error when running cpupower monitor PM / OPP: Drop unlikely before IS_ERR(_OR_NULL) PM / OPP: Fix static checker warning (broken 64bit big endian systems) ...
2015-08-31Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The biggest change in this cycle is the rewrite of the main SMP load balancing metric: the CPU load/utilization. The main goal was to make the metric more precise and more representative - see the changelog of this commit for the gory details: 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking") It is done in a way that significantly reduces complexity of the code: 5 files changed, 249 insertions(+), 494 deletions(-) and the performance testing results are encouraging. Nevertheless we need to keep an eye on potential regressions, since this potentially affects every SMP workload in existence. This work comes from Yuyang Du. Other changes: - SCHED_DL updates. (Andrea Parri) - Simplify architecture callbacks by removing finish_arch_switch(). (Peter Zijlstra et al) - cputime accounting: guarantee stime + utime == rtime. (Peter Zijlstra) - optimize idle CPU wakeups some more - inspired by Facebook server loads. (Mike Galbraith) - stop_machine fixes and updates. (Oleg Nesterov) - Introduce the 'trace_sched_waking' tracepoint. (Peter Zijlstra) - sched/numa tweaks. (Srikar Dronamraju) - misc fixes and small cleanups" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits) sched/deadline: Fix comment in enqueue_task_dl() sched/deadline: Fix comment in push_dl_tasks() sched: Change the sched_class::set_cpus_allowed() calling context sched: Make sched_class::set_cpus_allowed() unconditional sched: Fix a race between __kthread_bind() and sched_setaffinity() sched: Ensure a task has a non-normalized vruntime when returning back to CFS sched/numa: Fix NUMA_DIRECT topology identification tile: Reorganize _switch_to() sched, sparc32: Update scheduler comments in copy_thread() sched: Remove finish_arch_switch() sched, tile: Remove finish_arch_switch sched, sh: Fold finish_arch_switch() into switch_to() sched, score: Remove finish_arch_switch() sched, avr32: Remove finish_arch_switch() sched, MIPS: Get rid of finish_arch_switch() sched, arm: Remove finish_arch_switch() sched/fair: Clean up load average references sched/fair: Provide runnable_load_avg back to cfs_rq sched/fair: Remove task and group entity load when they are dead sched/fair: Init cfs_rq's sched_entity load average ...
2015-08-28cpuidle/coupled: Remove redundant 'dev' argument of cpuidle_state_is_coupled()Xunlei Pang
For cpuidle_state_is_coupled(), 'dev' is not used, so remove it. Signed-off-by: Xunlei Pang <pang.xunlei@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-08-28cpuidle/coupled: Remove cpuidle_device::safe_state_indexXunlei Pang
cpuidle_device::safe_state_index need to be initialized before use, it should be the same as cpuidle_driver::safe_state_index. We tackled this issue by removing the safe_state_index from the cpuidle_device structure and use the one in the cpuidle_driver structure instead. Suggested-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Xunlei Pang <pang.xunlei@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-08-03ARM: migrate to common PSCI client codeMark Rutland
Now that the common PSCI client code has been factored out to drivers/firmware, and made safe for 32-bit use, move the 32-bit ARM code over to it. This results in a moderate reduction of duplicated lines, and will prevent further duplication as the PSCI client code is updated for PSCI 1.0 and beyond. The two legacy platform users of the PSCI invocation code are updated to account for interface changes. In both cases the power state parameter (which is constant) is now generated using macros, so that the pack/unpack logic can be killed in preparation for PSCI 1.0 power state changes. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Rob Herring <robh@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ashwin Chaugule <ashwin.chaugule@linaro.org> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-07-21sched/idle: Move latency tracing stop/start calls deeper inside the idle loopLucas Stach
Make sure to stop tracing only once we are past a point where all latency tracing events have been processed (irqs are not enabled again). This has the slight advantage of capturing more latency related events in the idle path, but most importantly it makes sure that latency tracing doesn't get re-enabled inadvertently when new events are coming in. This makes the irqsoff latency tracer useful again, as we stop capturing CPU sleep time as IRQ latency. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel@pengutronix.de Cc: patchwork-lst@pengutronix.de Link: http://lkml.kernel.org/r/1437410090-3747-1-git-send-email-l.stach@pengutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-09suspend-to-idle: Prevent RCU from complaining about tick_freeze()Rafael J. Wysocki
Put tick_freeze() under RCU_NONIDLE() to prevent RCU from complaining about suspicious RCU usage in idle by trace_suspend_resume() called from there. While at it, fix a comment related to another usage of RCU_NONIDLE() in enter_freeze_proper(). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-02Merge tag 'module-builtin_driver-v4.1-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux Pull module_platform_driver replacement from Paul Gortmaker: "Replace module_platform_driver with builtin_platform driver in non modules. We see an increasing number of non-modular drivers using modular_driver() type register functions. There are several downsides to letting this continue unchecked: - The code can appear modular to a reader of the code, and they won't know if the code really is modular without checking the Makefile and Kconfig to see if compilation is governed by a bool or tristate. - Coders of drivers may be tempted to code up an __exit function that is never used, just in order to satisfy the required three args of