From 48d07c04b4cc1dc1221965312f58fd84926212fe Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior Date: Wed, 20 Mar 2019 22:13:33 +0100 Subject: rcu: Enable elimination of Tree-RCU softirq processing Some workloads need to change kthread priority for RCU core processing without affecting other softirq work. This commit therefore introduces the rcutree.use_softirq kernel boot parameter, which moves the RCU core work from softirq to a per-CPU SCHED_OTHER kthread named rcuc. Use of SCHED_OTHER approach avoids the scalability problems that appeared with the earlier attempt to move RCU core processing to from softirq to kthreads. That said, kernels built with RCU_BOOST=y will run the rcuc kthreads at the RCU-boosting priority. Note that rcutree.use_softirq=0 must be specified to move RCU core processing to the rcuc kthreads: rcutree.use_softirq=1 is the default. Reported-by: Thomas Gleixner Tested-by: Mike Galbraith Signed-off-by: Sebastian Andrzej Siewior [ paulmck: Adjust for invoke_rcu_callbacks() only ever being invoked from RCU core processing, in contrast to softirq->rcuc transition in old mainline RCU priority boosting. ] [ paulmck: Avoid wakeups when scheduler might have invoked rcu_read_unlock() while holding rq or pi locks, also possibly fixing a pre-existing latent bug involving raise_softirq()-induced wakeups. ] Signed-off-by: Paul E. McKenney --- Documentation/admin-guide/kernel-parameters.txt | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'Documentation') diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 138f6664b2e2..b96fd15c7316 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -3752,6 +3752,12 @@ the propagation of recent CPU-hotplug changes up the rcu_node combining tree. + rcutree.use_softirq= [KNL] + If set to zero, move all RCU_SOFTIRQ processing to + per-CPU rcuc kthreads. Defaults to a non-zero + value, meaning that RCU_SOFTIRQ is used by default. + Specify rcutree.use_softirq=0 to use rcuc kthreads. + rcutree.rcu_fanout_exact= [KNL] Disable autobalancing of the rcu_node combining tree. This is used by rcutorture, and might -- cgit v1.2.3 From de1dbcee433ccff3e0a37698c65b40542c9d4cf1 Mon Sep 17 00:00:00 2001 From: "Joel Fernandes (Google)" Date: Fri, 29 Mar 2019 10:05:55 -0400 Subject: doc/rcuref: Document real world examples in kernel Document similar real world examples in the kernel corresponding to the second and third code snippets. Also correct an issue in release_referenced() in the code snippet example. Cc: oleg@redhat.com Cc: jannh@google.com Signed-off-by: Joel Fernandes (Google) [ paulmck: Do a bit of wordsmithing. ] Signed-off-by: Paul E. McKenney --- Documentation/RCU/rcuref.txt | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/RCU/rcuref.txt b/Documentation/RCU/rcuref.txt index 613033ff2b9b..5e6429d66c24 100644 --- a/Documentation/RCU/rcuref.txt +++ b/Documentation/RCU/rcuref.txt @@ -12,6 +12,7 @@ please read on. Reference counting on elements of lists which are protected by traditional reader/writer spinlocks or semaphores are straightforward: +CODE LISTING A: 1. 2. add() search_and_reference() { { @@ -28,7 +29,8 @@ add() search_and_reference() release_referenced() delete() { { ... write_lock(&list_lock); - atomic_dec(&el->rc, relfunc) ... + if(atomic_dec_and_test(&el->rc)) ... + kfree(el); ... remove_element } write_unlock(&list_lock); ... @@ -44,6 +46,7 @@ search_and_reference() could potentially hold reference to an element which has already been deleted from the list/array. Use atomic_inc_not_zero() in this scenario as follows: +CODE LISTING B: 1. 2. add() search_and_reference() { { @@ -79,6 +82,7 @@ search_and_reference() code path. In such cases, the atomic_dec_and_test() may be moved from delete() to el_free() as follows: +CODE LISTING C: 1. 2. add() search_and_reference() { { @@ -114,6 +118,17 @@ element can therefore safely be freed. This in turn guarantees that if any reader finds the element, that reader may safely acquire a reference without checking the value of the reference counter. +A clear advantage of the RCU-based pattern in listing C over the one +in listing B is that any call to search_and_reference() that locates +a given object will succeed in obtaining a reference to that object, +even given a concurrent invocation of delete() for that same object. +Similarly, a clear advantage of both listings B and C over listing A is +that a call to delete() is not delayed even if there are an arbitrarily +large number of calls to search_and_reference() searching for the same +object that delete() was invoked on. Instead, all that is delayed is +the eventual invocation of kfree(), which is usually not a problem on +modern computer systems, even the small ones. + In cases where delete() can sleep, synchronize_rcu() can be called from delete(), so that el_free() can be subsumed into delete as follows: @@ -130,3 +145,7 @@ delete() kfree(el); ... } + +As additional examples in the kernel, the pattern in listing C is used by +reference counting of struct pid, while the pattern in listing B is used by +struct posix_acl. -- cgit v1.2.3 From 588759a39145e18dd808341861d45d44383778b5 Mon Sep 17 00:00:00 2001 From: Zhenzhong Duan Date: Sun, 14 Apr 2019 11:11:03 +0800 Subject: doc: Fixup definition of rcupdate.rcu_task_stall_timeout A positive value of rcupdate.rcu_task_stall_timeout is an interval in seconds rather than jiffies. Signed-off-by: Zhenzhong Duan Signed-off-by: Paul E. McKenney --- Documentation/RCU/stallwarn.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt index 1ab70c37921f..13e88fc00f01 100644 --- a/Documentation/RCU/stallwarn.txt +++ b/Documentation/RCU/stallwarn.txt @@ -153,7 +153,7 @@ rcupdate.rcu_task_stall_timeout This boot/sysfs parameter controls the RCU-tasks stall warning interval. A value of zero or less suppresses RCU-tasks stall warnings. A positive value sets the stall-warning interval - in jiffies. An RCU-tasks stall warning starts with the line: + in seconds. An RCU-tasks stall warning starts with the line: INFO: rcu_tasks detected stalls on tasks: -- cgit v1.2.3 From 714b6904e23e1c37f262a4cd02b34d0f1863e227 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Thu, 11 Apr 2019 07:12:18 -0700 Subject: doc: Remove ".vnet" from paulmck email addresses Signed-off-by: Paul E. McKenney --- Documentation/core-api/circular-buffers.rst | 2 +- Documentation/memory-barriers.txt | 2 +- Documentation/translations/ko_KR/memory-barriers.txt | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) (limited to 'Documentation') diff --git a/Documentation/core-api/circular-buffers.rst b/Documentation/core-api/circular-buffers.rst index 53e51caa3347..50966f66e398 100644 --- a/Documentation/core-api/circular-buffers.rst +++ b/Documentation/core-api/circular-buffers.rst @@ -3,7 +3,7 @@ Circular Buffers ================ :Author: David Howells -:Author: Paul E. McKenney +:Author: Paul E. McKenney Linux provides a number of features that can be used to implement circular diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index f70ebcdfe592..e4e07c8ab89e 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt @@ -3,7 +3,7 @@ ============================ By: David Howells - Paul E. McKenney + Paul E. McKenney Will Deacon Peter Zijlstra diff --git a/Documentation/translations/ko_KR/memory-barriers.txt b/Documentation/translations/ko_KR/memory-barriers.txt index db0b9d8619f1..5f3c74dcad43 100644 --- a/Documentation/translations/ko_KR/memory-barriers.txt +++ b/Documentation/translations/ko_KR/memory-barriers.txt @@ -24,7 +24,7 @@ Documentation/memory-barriers.txt ========================= 저자: David Howells - Paul E. McKenney + Paul E. McKenney Will Deacon Peter Zijlstra -- cgit v1.2.3 From 9129b017b54dab09eb69b7269026243156e5188e Mon Sep 17 00:00:00 2001 From: Andrea Parri Date: Mon, 27 May 2019 10:49:57 +0200 Subject: rcu: Don't return a value from rcu_assign_pointer() Quoting Paul [1]: "Given that a quick (and perhaps error-prone) search of the uses of rcu_assign_pointer() in v5.1 didn't find a single use of the return value, let's please instead change the documentation and implementation to eliminate the return value." [1] https://lkml.kernel.org/r/20190523135013.GL28207@linux.ibm.com Signed-off-by: Andrea Parri Cc: "Paul E. McKenney" Cc: Josh Triplett Cc: Steven Rostedt Cc: Mathieu Desnoyers Cc: Lai Jiangshan Cc: Joel Fernandes Cc: rcu@vger.kernel.org Cc: Peter Zijlstra Cc: Will Deacon Cc: Mark Rutland Cc: Matthew Wilcox Cc: Sasha Levin Signed-off-by: Paul E. McKenney --- Documentation/RCU/whatisRCU.txt | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'Documentation') diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt index 981651a8b65d..7e1a8721637a 100644 --- a/Documentation/RCU/whatisRCU.txt +++ b/Documentation/RCU/whatisRCU.txt @@ -212,7 +212,7 @@ synchronize_rcu() rcu_assign_pointer() - typeof(p) rcu_assign_pointer(p, typeof(p) v); + void rcu_assign_pointer(p, typeof(p) v); Yes, rcu_assign_pointer() -is- implemented as a macro, though it would be cool to be able to declare a function in this manner. @@ -220,9 +220,9 @@ rcu_assign_pointer() The updater uses this function to assign a new value to an RCU-protected pointer, in order to safely communicate the change - in value from the updater to the reader. This function returns - the new value, and also executes any memory-barrier instructions - required for a given CPU architecture. + in value from the updater to the reader. This macro does not + evaluate to an rvalue, but it does execute any memory-barrier + instructions required for a given CPU architecture. Perhaps just as important, it serves to document (1) which pointers are protected by RCU and (2) the point at which a -- cgit v1.2.3 From 2966f8d440c3d2b6ef6c44093a9f9c42701a077a Mon Sep 17 00:00:00 2001 From: Alan Stern Date: Fri, 3 May 2019 13:13:44 -0400 Subject: Documentation: atomic_t.txt: Explain ordering provided by smp_mb__{before,after}_atomic() The description of smp_mb__before_atomic() and smp_mb__after_atomic() in Documentation/atomic_t.txt is slightly terse and misleading. It does not clearly state which other instructions are ordered by these barriers. This improves the text to make the actual ordering implications clear, and also to explain how these barriers differ from a RELEASE or ACQUIRE ordering. Signed-off-by: Alan Stern Cc: Jonathan Corbet Cc: Peter Zijlstra Acked-by: Andrea Parri Signed-off-by: Paul E. McKenney --- Documentation/atomic_t.txt | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) (limited to 'Documentation') diff --git a/Documentation/atomic_t.txt b/Documentation/atomic_t.txt index dca3fb0554db..b3afe69d03a1 100644 --- a/Documentation/atomic_t.txt +++ b/Documentation/atomic_t.txt @@ -187,8 +187,14 @@ The barriers: smp_mb__{before,after}_atomic() -only apply to the RMW ops and can be used to augment/upgrade the ordering -inherent to the used atomic op. These barriers provide a full smp_mb(). +only apply to the RMW atomic ops and can be used to augment/upgrade the +ordering inherent to the op. These barriers act almost like a full smp_mb(): +smp_mb__before_atomic() orders all earlier accesses against the RMW op +itself and all accesses following it, and smp_mb__after_atomic() orders all +later accesses against the RMW op and all accesses preceding it. However, +accesses between the smp_mb__{before,after}_atomic() and the RMW op are not +ordered, so it is advisable to place the barrier right next to the RMW atomic +op whenever possible. These helper barriers exist because architectures have varying implicit ordering on their SMP atomic primitives. For example our TSO architectures @@ -212,7 +218,9 @@ Further, while something like: atomic_dec(&X); is a 'typical' RELEASE pattern, the barrier is strictly stronger than -a RELEASE. Similarly for something like: +a RELEASE because it orders preceding instructions against both the read +and write parts of the atomic_dec(), and against all following instructions +as well. Similarly, something like: atomic_inc(&X); smp_mb__after_atomic(); @@ -244,7 +252,8 @@ strictly stronger than ACQUIRE. As illustrated: This should not happen; but a hypothetical atomic_inc_acquire() -- (void)atomic_fetch_inc_acquire() for instance -- would allow the outcome, -since then: +because it would not order the W part of the RMW against the following +WRITE_ONCE. Thus: P1 P2 -- cgit v1.2.3