From ffb810563c0c049872a504978e06c8892104fb6c Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Sun, 10 Apr 2016 05:59:10 +0200 Subject: intel_pstate: Avoid getting stuck in high P-states when idle MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Jörg Otte reports that commit a4675fbc4a7a (cpufreq: intel_pstate: Replace timers with utilization update callbacks) caused the CPUs in his Haswell-based system to stay in the very high frequency region even if the system is completely idle. That turns out to be an existing problem in the intel_pstate driver's P-state selection algorithm for Core processors. Namely, all decisions made by that algorithm are based on the average frequency of the CPU between sampling events and on the P-state requested on the last invocation, so it may get stuck at a very hight frequency even if the utilization of the CPU is very low (in fact, it may get stuck in a inadequate P-state regardless of the CPU utilization). The only way to kick it out of that limbo is a sufficiently long idle period (3 times longer than the prescribed sampling interval), but if that doesn't happen often enough (eg. due to a timing change like after the above commit), the P-state of the CPU may be inadequate pretty much all the time. To address the most egregious manifestations of that issue, reset the core_busy value used to determine the next P-state to request if the utilization of the CPU, determined with the help of the MPERF feedback register and the TSC, is below 1%. Link: https://bugzilla.kernel.org/show_bug.cgi?id=115771 Reported-and-tested-by: Jörg Otte Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/intel_pstate.c | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'drivers/cpufreq') diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index 8b5a415ee14a..30fe323c4551 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -1130,6 +1130,10 @@ static inline int32_t get_target_pstate_use_performance(struct cpudata *cpu) sample_ratio = div_fp(int_tofp(pid_params.sample_rate_ns), int_tofp(duration_ns)); core_busy = mul_fp(core_busy, sample_ratio); + } else { + sample_ratio = div_fp(100 * cpu->sample.mperf, cpu->sample.tsc); + if (sample_ratio < int_tofp(1)) + core_busy = 0; } cpu->sample.busy_scaled = core_busy; -- cgit v1.2.3 From c9d9c929e6741118776eb0f385339d3c2b84d5f8 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Sun, 10 Apr 2016 05:59:33 +0200 Subject: cpufreq: Abort cpufreq_update_current_freq() for cpufreq_suspended set Since governor operations are generally skipped if cpufreq_suspended is set, cpufreq_start_governor() should do nothing in that case. That function is called in the cpufreq_online() path, and may also be called from cpufreq_offline() in some cases, which are invoked by the nonboot CPUs disabing/enabling code during system suspend to RAM and resume. That happens when all devices have been suspended, so if the cpufreq driver relies on things like I2C to get the current frequency, it may not be ready to do that then. To prevent problems from happening for this reason, make cpufreq_update_current_freq(), which is the only function invoked by cpufreq_start_governor() that doesn't check cpufreq_suspended already, return 0 upfront if cpufreq_suspended is set. Fixes: 3bbf8fe3ae08 (cpufreq: Always update current frequency before startig governor) Signed-off-by: Rafael J. Wysocki Acked-by: Viresh Kumar --- drivers/cpufreq/cpufreq.c | 3 +++ 1 file changed, 3 insertions(+) (limited to 'drivers/cpufreq') diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index b87596b591b3..e93405f0eac4 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1491,6 +1491,9 @@ static unsigned int cpufreq_update_current_freq(struct cpufreq_policy *policy) { unsigned int new_freq; + if (cpufreq_suspended) + return 0; + new_freq = cpufreq_driver->get(policy->cpu); if (!new_freq) return 0; -- cgit v1.2.3 From 94862a62dfe3ba1c7601115a2dc80721c5b256f0 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Thu, 21 Apr 2016 20:57:47 +0200 Subject: Revert "cpufreq: governor: Fix negative idle_time when configured with CONFIG_HZ_PERIODIC" Revert commit 0df35026c6a5 (cpufreq: governor: Fix negative idle_time when configured with CONFIG_HZ_PERIODIC) that introduced a regression by causing the ondemand cpufreq governor to misbehave for CONFIG_TICK_CPU_ACCOUNTING unset (the frequency goes up to the max at one point and stays there indefinitely). The revert takes subsequent modifications of the code in question into account. Fixes: 0df35026c6a5 (cpufreq: governor: Fix negative idle_time when configured with CONFIG_HZ_PERIODIC) Link: https://bugzilla.kernel.org/show_bug.cgi?id=115261 Reported-and-tested-by: Timo Valtoaho Cc: 4.5+ # 4.5+ Signed-off-by: Rafael J. Wysocki Acked-by: Viresh Kumar --- drivers/cpufreq/cpufreq_governor.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) (limited to 'drivers/cpufreq') diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 10a5cfeae8c5..5f1147fa9239 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -193,12 +193,8 @@ unsigned int dbs_update(struct cpufreq_policy *policy) wall_time = cur_wall_time - j_cdbs->prev_cpu_wall; j_cdbs->prev_cpu_wall = cur_wall_time; - if (cur_idle_time <= j_cdbs->prev_cpu_idle) { - idle_time = 0; - } else { - idle_time = cur_idle_time - j_cdbs->prev_cpu_idle; - j_cdbs->prev_cpu_idle = cur_idle_time; - } + idle_time = cur_idle_time - j_cdbs->prev_cpu_idle; + j_cdbs->prev_cpu_idle = cur_idle_time; if (ignore_nice) { u64 cur_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE]; -- cgit v1.2.3 From 1becf03545a0859ceaaf9e8c2d9861882a71cb01 Mon Sep 17 00:00:00 2001 From: Srinivas Pandruvada Date: Fri, 22 Apr 2016 19:53:59 -0700 Subject: cpufreq: intel_pstate: Fix processing for turbo activation ratio When the config TDP level is not nominal (level = 0), the MSR values for reading level 1 and level 2 ratios contain power in low 14 bits and actual ratio bits are at bits [23:16]. The current processing for level 1 and level 2 is wrong as there is no shift done to get actual ratio. Fixes: 6a35fc2d6c22 (cpufreq: intel_pstate: get P1 from TAR when available) Signed-off-by: Srinivas Pandruvada Cc: 4.4+ # 4.4+ Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/intel_pstate.c | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'drivers/cpufreq') diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index 30fe323c4551..f502d5b90c25 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -813,6 +813,11 @@ static int core_get_max_pstate(void) if (err) goto skip_tar; + /* For level 1 and 2, bits[23:16] contain the ratio */ + if (tdp_ctrl) + tdp_ratio >>= 16; + + tdp_ratio &= 0xff; /* ratios are only 8 bits long */ if (tdp_ratio - 1 == tar) { max_pstate = tar; pr_debug("max_pstate=TAC %x\n", max_pstate); -- cgit v1.2.3