From 6b05dfacd761c6ace11def4b3b42fc6a7583fec3 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Tue, 21 Apr 2020 19:04:02 +0200 Subject: docs: RCU: Convert checklist.txt to ReST - Add a SPDX header; - Adjust document title; - Some whitespace fixes and new line breaks; - Use the right list markups; - Add it to RCU/index.rst. Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Paul E. McKenney --- Documentation/RCU/checklist.rst | 465 ++++++++++++++++++++++++++++++++++++++++ Documentation/RCU/checklist.txt | 458 --------------------------------------- Documentation/RCU/index.rst | 3 + 3 files changed, 468 insertions(+), 458 deletions(-) create mode 100644 Documentation/RCU/checklist.rst delete mode 100644 Documentation/RCU/checklist.txt (limited to 'Documentation/RCU') diff --git a/Documentation/RCU/checklist.rst b/Documentation/RCU/checklist.rst new file mode 100644 index 000000000000..2efed9926c3f --- /dev/null +++ b/Documentation/RCU/checklist.rst @@ -0,0 +1,465 @@ +.. SPDX-License-Identifier: GPL-2.0 + +================================ +Review Checklist for RCU Patches +================================ + + +This document contains a checklist for producing and reviewing patches +that make use of RCU. Violating any of the rules listed below will +result in the same sorts of problems that leaving out a locking primitive +would cause. This list is based on experiences reviewing such patches +over a rather long period of time, but improvements are always welcome! + +0. Is RCU being applied to a read-mostly situation? If the data + structure is updated more than about 10% of the time, then you + should strongly consider some other approach, unless detailed + performance measurements show that RCU is nonetheless the right + tool for the job. Yes, RCU does reduce read-side overhead by + increasing write-side overhead, which is exactly why normal uses + of RCU will do much more reading than updating. + + Another exception is where performance is not an issue, and RCU + provides a simpler implementation. An example of this situation + is the dynamic NMI code in the Linux 2.6 kernel, at least on + architectures where NMIs are rare. + + Yet another exception is where the low real-time latency of RCU's + read-side primitives is critically important. + + One final exception is where RCU readers are used to prevent + the ABA problem (https://en.wikipedia.org/wiki/ABA_problem) + for lockless updates. This does result in the mildly + counter-intuitive situation where rcu_read_lock() and + rcu_read_unlock() are used to protect updates, however, this + approach provides the same potential simplifications that garbage + collectors do. + +1. Does the update code have proper mutual exclusion? + + RCU does allow -readers- to run (almost) naked, but -writers- must + still use some sort of mutual exclusion, such as: + + a. locking, + b. atomic operations, or + c. restricting updates to a single task. + + If you choose #b, be prepared to describe how you have handled + memory barriers on weakly ordered machines (pretty much all of + them -- even x86 allows later loads to be reordered to precede + earlier stores), and be prepared to explain why this added + complexity is worthwhile. If you choose #c, be prepared to + explain how this single task does not become a major bottleneck on + big multiprocessor machines (for example, if the task is updating + information relating to itself that other tasks can read, there + by definition can be no bottleneck). Note that the definition + of "large" has changed significantly: Eight CPUs was "large" + in the year 2000, but a hundred CPUs was unremarkable in 2017. + +2. Do the RCU read-side critical sections make proper use of + rcu_read_lock() and friends? These primitives are needed + to prevent grace periods from ending prematurely, which + could result in data being unceremoniously freed out from + under your read-side code, which can greatly increase the + actuarial risk of your kernel. + + As a rough rule of thumb, any dereference of an RCU-protected + pointer must be covered by rcu_read_lock(), rcu_read_lock_bh(), + rcu_read_lock_sched(), or by the appropriate update-side lock. + Disabling of preemption can serve as rcu_read_lock_sched(), but + is less readable and prevents lockdep from detecting locking issues. + + Letting RCU-protected pointers "leak" out of an RCU read-side + critical section is every bid as bad as letting them leak out + from under a lock. Unless, of course, you have arranged some + other means of protection, such as a lock or a reference count + -before- letting them out of the RCU read-side critical section. + +3. Does the update code tolerate concurrent accesses? + + The whole point of RCU is to permit readers to run without + any locks or atomic operations. This means that readers will + be running while updates are in progress. There are a number + of ways to handle this concurrency, depending on the situation: + + a. Use the RCU variants of the list and hlist update + primitives to add, remove, and replace elements on + an RCU-protected list. Alternatively, use the other + RCU-protected data structures that have been added to + the Linux kernel. + + This is almost always the best approach. + + b. Proceed as in (a) above, but also maintain per-element + locks (that are acquired by both readers and writers) + that guard per-element state. Of course, fields that + the readers refrain from accessing can be guarded by + some other lock acquired only by updaters, if desired. + + This works quite well, also. + + c. Make updates appear atomic to readers. For example, + pointer updates to properly aligned fields will + appear atomic, as will individual atomic primitives. + Sequences of operations performed under a lock will -not- + appear to be atomic to RCU readers, nor will sequences + of multiple atomic primitives. + + This can work, but is starting to get a bit tricky. + + d. Carefully order the updates and the reads so that + readers see valid data at all phases of the update. + This is often more difficult than it sounds, especially + given modern CPUs' tendency to reorder memory references. + One must usually liberally sprinkle memory barriers + (smp_wmb(), smp_rmb(), smp_mb()) through the code, + making it difficult to understand and to test. + + It is usually better to group the changing data into + a separate structure, so that the change may be made + to appear atomic by updating a pointer to reference + a new structure containing updated values. + +4. Weakly ordered CPUs pose special challenges. Almost all CPUs + are weakly ordered -- even x86 CPUs allow later loads to be + reordered to precede earlier stores. RCU code must take all of + the following measures to prevent memory-corruption problems: + + a. Readers must maintain proper ordering of their memory + accesses. The rcu_dereference() primitive ensures that + the CPU picks up the pointer before it picks up the data + that the pointer points to. This really is necessary + on Alpha CPUs. If you don't believe me, see: + + http://www.openvms.compaq.com/wizard/wiz_2637.html + + The rcu_dereference() primitive is also an excellent + documentation aid, letting the person reading the + code know exactly which pointers are protected by RCU. + Please note that compilers can also reorder code, and + they are becoming increasingly aggressive about doing + just that. The rcu_dereference() primitive therefore also + prevents destructive compiler optimizations. However, + with a bit of devious creativity, it is possible to + mishandle the return value from rcu_dereference(). + Please see rcu_dereference.txt in this directory for + more information. + + The rcu_dereference() primitive is used by the + various "_rcu()" list-traversal primitives, such + as the list_for_each_entry_rcu(). Note that it is + perfectly legal (if redundant) for update-side code to + use rcu_dereference() and the "_rcu()" list-traversal + primitives. This is particularly useful in code that + is common to readers and updaters. However, lockdep + will complain if you access rcu_dereference() outside + of an RCU read-side critical section. See lockdep.txt + to learn what to do about this. + + Of course, neither rcu_dereference() nor the "_rcu()" + list-traversal primitives can substitute for a good + concurrency design coordinating among multiple updaters. + + b. If the list macros are being used, the list_add_tail_rcu() + and list_add_rcu() primitives must be used in order + to prevent weakly ordered machines from misordering + structure initialization and pointer planting. + Similarly, if the hlist macros are being used, the + hlist_add_head_rcu() primitive is required. + + c. If the list macros are being used, the list_del_rcu() + primitive must be used to keep list_del()'s pointer + poisoning from inflicting toxic effects on concurrent + readers. Similarly, if the hlist macros are being used, + the hlist_del_rcu() primitive is required. + + The list_replace_rcu() and hlist_replace_rcu() primitives + may be used to replace an old structure with a new one + in their respective types of RCU-protected lists. + + d. Rules similar to (4b) and (4c) apply to the "hlist_nulls" + type of RCU-protected linked lists. + + e. Updates must ensure that initialization of a given + structure happens before pointers to that structure are + publicized. Use the rcu_assign_pointer() primitive + when publicizing a pointer to a structure that can + be traversed by an RCU read-side critical section. + +5. If call_rcu() or call_srcu() is used, the callback function will + be called from softirq context. In particular, it cannot block. + +6. Since synchronize_rcu() can block, it cannot be called + from any sort of irq context. The same rule applies + for synchronize_srcu(), synchronize_rcu_expedited(), and + synchronize_srcu_expedited(). + + The expedited forms of these primitives have the same semantics + as the non-expedited forms, but expediting is both expensive and + (with the exception of synchronize_srcu_expedited()) unfriendly + to real-time workloads. Use of the expedited primitives should + be restricted to rare configuration-change operations that would + not normally be undertaken while a real-time workload is running. + However, real-time workloads can use rcupdate.rcu_normal kernel + boot parameter to completely disable expedited grace periods, + though this might have performance implications. + + In particular, if you find yourself invoking one of the expedited + primitives repeatedly in a loop, please do everyone a favor: + Restructure your code so that it batches the updates, allowing + a single non-expedited primitive to cover the entire batch. + This will very likely be faster than the loop containing the + expedited primitive, and will be much much easier on the rest + of the system, especially to real-time workloads running on + the rest of the system. + +7. As of v4.20, a given kernel implements only one RCU flavor, + which is RCU-sched for PREEMPT=n and RCU-preempt for PREEMPT=y. + If the updater uses call_rcu() or synchronize_rcu(), + then the corresponding readers my use rcu_read_lock() and + rcu_read_unlock(), rcu_read_lock_bh() and rcu_read_unlock_bh(), + or any pair of primitives that disables and re-enables preemption, + for example, rcu_read_lock_sched() and rcu_read_unlock_sched(). + If the updater uses synchronize_srcu() or call_srcu(), + then the corresponding readers must use srcu_read_lock() and + srcu_read_unlock(), and with the same srcu_struct. The rules for + the expedited primitives are the same as for their non-expedited + counterparts. Mixing things up will result in confusion and + broken kernels, and has even resulted in an exploitable security + issue. + + One exception to this rule: rcu_read_lock() and rcu_read_unlock() + may be substituted for rcu_read_lock_bh() and rcu_read_unlock_bh() + in cases where local bottom halves are already known to be + disabled, for example, in irq or softirq context. Commenting + such cases is a must, of course! And the jury is still out on + whether the increased speed is worth it. + +8. Although synchronize_rcu() is slower than is call_rcu(), it + usually results in simpler code. So, unless update performance is + critically important, the updaters cannot block, or the latency of + synchronize_rcu() is visible from userspace, synchronize_rcu() + should be used in preference to call_rcu(). Furthermore, + kfree_rcu() usually results in even simpler code than does + synchronize_rcu() without synchronize_rcu()'s multi-millisecond + latency. So please take advantage of kfree_rcu()'s "fire and + forget" memory-freeing capabilities where it applies. + + An especially important property of the synchronize_rcu() + primitive is that it automatically self-limits: if grace periods + are delayed for whatever reason, then the synchronize_rcu() + primitive will correspondingly delay updates. In contrast, + code using call_rcu() should explicitly limit update rate in + cases where grace periods are delayed, as failing to do so can + result in excessive realtime latencies or even OOM conditions. + + Ways of gaining this self-limiting property when using call_rcu() + include: + + a. Keeping a count of the number of data-structure elements + used by the RCU-protected data structure, including + those waiting for a grace period to elapse. Enforce a + limit on this number, stalling updates as needed to allow + previously deferred frees to complete. Alternatively, + limit only the number awaiting deferred free rather than + the total number of elements. + + One way to stall the updates is to acquire the update-side + mutex. (Don't try this with a spinlock -- other CPUs + spinning on the lock could prevent the grace period + from ever ending.) Another way to stall the updates + is for the updates to use a wrapper function around + the memory allocator, so that this wrapper function + simulates OOM when there is too much memory awaiting an + RCU grace period. There are of course many other + variations on this theme. + + b. Limiting update rate. For example, if updates occur only + once per hour, then no explicit rate limiting is + required, unless your system is already badly broken. + Older versions of the dcache subsystem take this approach, + guarding updates with a global lock, limiting their rate. + + c. Trusted update -- if updates can only be done manually by + superuser or some other trusted user, then it might not + be necessary to automatically limit them. The theory + here is that superuser already has lots of ways to crash + the machine. + + d. Periodically invoke synchronize_rcu(), permitting a limited + number of updates per grace period. + + The same cautions apply to call_srcu() and kfree_rcu(). + + Note that although these primitives do take action to avoid memory + exhaustion when any given CPU has too many callbacks, a determined + user could still exhaust memory. This is especially the case + if a system with a large number of CPUs has been configured to + offload all of its RCU callbacks onto a single CPU, or if the + system has relatively little free memory. + +9. All RCU list-traversal primitives, which include + rcu_dereference(), list_for_each_entry_rcu(), and + list_for_each_safe_rcu(), must be either within an RCU read-side + critical section or must be protected by appropriate update-side + locks. RCU read-side critical sections are delimited by + rcu_read_lock() and rcu_read_unlock(), or by similar primitives + such as rcu_read_lock_bh() and rcu_read_unlock_bh(), in which + case the matching rcu_dereference() primitive must be used in + order to keep lockdep happy, in this case, rcu_dereference_bh(). + + The reason that it is permissible to use RCU list-traversal + primitives when the update-side lock is held is that doing so + can be quite helpful in reducing code bloat when common code is + shared between readers and updaters. Additional primitives + are provided for this case, as discussed in lockdep.txt. + +10. Conversely, if you are in an RCU read-side critical section, + and you don't hold the appropriate update-side lock, you -must- + use the "_rcu()" variants of the list macros. Failing to do so + will break Alpha, cause aggressive compilers to generate bad code, + and confuse people trying to read your code. + +11. Any lock acquired by an RCU callback must be acquired elsewhere + with softirq disabled, e.g., via spin_lock_irqsave(), + spin_lock_bh(), etc. Failing to disable softirq on a given + acquisition of that lock will result in deadlock as soon as + the RCU softirq handler happens to run your RCU callback while + interrupting that acquisition's critical section. + +12. RCU callbacks can be and are executed in parallel. In many cases, + the callback code simply wrappers around kfree(), so that this + is not an issue (or, more accurately, to the extent that it is + an issue, the memory-allocator locking handles it). However, + if the callbacks do manipulate a shared data structure, they + must use whatever locking or other synchronization is required + to safely access and/or modify that data structure. + + Do not assume that RCU callbacks will be executed on the same + CPU that executed the corresponding call_rcu() or call_srcu(). + For example, if a given CPU goes offline while having an RCU + callback pending, then that RCU callback will execute on some + surviving CPU. (If this was not the case, a self-spawning RCU + callback would prevent the victim CPU from ever going offline.) + Furthermore, CPUs designated by rcu_nocbs= might well -always- + have their RCU callbacks executed on some other CPUs, in fact, + for some real-time workloads, this is the whole point of using + the rcu_nocbs= kernel boot parameter. + +13. Unlike other forms of RCU, it -is- permissible to block in an + SRCU read-side critical section (demarked by srcu_read_lock() + and srcu_read_unlock()), hence the "SRCU": "sleepable RCU". + Please note that if you don't need to sleep in read-side critical + sections, you should be using RCU rather than SRCU, because RCU + is almost always faster and easier to use than is SRCU. + + Also unlike other forms of RCU, explicit initialization and + cleanup is required either at build time via DEFINE_SRCU() + or DEFINE_STATIC_SRCU() or at runtime via init_srcu_struct() + and cleanup_srcu_struct(). These last two are passed a + "struct srcu_struct" that defines the scope of a given + SRCU domain. Once initialized, the srcu_struct is passed + to srcu_read_lock(), srcu_read_unlock() synchronize_srcu(), + synchronize_srcu_expedited(), and call_srcu(). A given + synchronize_srcu() waits only for SRCU read-side critical + sections governed by srcu_read_lock() and srcu_read_unlock() + calls that have been passed the same srcu_struct. This property + is what makes sleeping read-side critical sections tolerable -- + a given subsystem delays only its own updates, not those of other + subsystems using SRCU. Therefore, SRCU is less prone to OOM the + system than RCU would be if RCU's read-side critical sections + were permitted to sleep. + + The ability to sleep in read-side critical sections does not + come for free. First, corresponding srcu_read_lock() and + srcu_read_unlock() calls must be passed the same srcu_struct. + Second, grace-period-detection overhead is amortized only + over those updates sharing a given srcu_struct, rather than + being globally amortized as they are for other forms of RCU. + Therefore, SRCU should be used in preference to rw_semaphore + only in extremely read-intensive situations, or in situations + requiring SRCU's read-side deadlock immunity or low read-side + realtime latency. You should also consider percpu_rw_semaphore + when you need lightweight readers. + + SRCU's expedited primitive (synchronize_srcu_expedited()) + never sends IPIs to other CPUs, so it is easier on + real-time workloads than is synchronize_rcu_expedited(). + + Note that rcu_assign_pointer() relates to SRCU just as it does to + other forms of RCU, but instead of rcu_dereference() you should + use srcu_dereference() in order to avoid lockdep splats. + +14. The whole point of call_rcu(), synchronize_rcu(), and friends + is to wait until all pre-existing readers have finished before + carrying out some otherwise-destructive operation. It is + therefore critically important to -first- remove any path + that readers can follow that could be affected by the + destructive operation, and -only- -then- invoke call_rcu(), + synchronize_rcu(), or friends. + + Because these primitives only wait for pre-existing readers, it + is the caller's responsibility to guarantee that any subsequent + readers will execute safely. + +15. The various RCU read-side primitives do -not- necessarily contain + memory barriers. You should therefore plan for the CPU + and the compiler to freely reorder code into and out of RCU + read-side critical sections. It is the responsibility of the + RCU update-side primitives to deal with this. + + For SRCU readers, you can use smp_mb__after_srcu_read_unlock() + immediately after an srcu_read_unlock() to get a full barrier. + +16. Use CONFIG_PROVE_LOCKING, CONFIG_DEBUG_OBJECTS_RCU_HEAD, and the + __rcu sparse checks to validate your RCU code. These can help + find problems as follows: + + CONFIG_PROVE_LOCKING: + check that accesses to RCU-protected data + structures are carried out under the proper RCU + read-side critical section, while holding the right + combination of locks, or whatever other conditions + are appropriate. + + CONFIG_DEBUG_OBJECTS_RCU_HEAD: + check that you don't pass the + same object to call_rcu() (or friends) before an RCU + grace period has elapsed since the last time that you + passed that same object to call_rcu() (or friends). + + __rcu sparse checks: + tag the pointer to the RCU-protected data + structure with __rcu, and sparse will warn you if you + access that pointer without the services of one of the + variants of rcu_dereference(). + + These debugging aids can help you find problems that are + otherwise extremely difficult to spot. + +17. If you register a callback using call_rcu() or call_srcu(), and + pass in a function defined within a loadable module, then it in + necessary to wait for all pending callbacks to be invoked after + the last invocation and before unloading that module. Note that + it is absolutely -not- sufficient to wait for a grace period! + The current (say) synchronize_rcu() implementation is -not- + guaranteed to wait for callbacks registered on other CPUs. + Or even on the current CPU if that CPU recently went offline + and came back online. + + You instead need to use one of the barrier functions: + + - call_rcu() -> rcu_barrier() + - call_srcu() -> srcu_barrier() + + However, these barrier functions are absolutely -not- guaranteed + to wait for a grace period. In fact, if there are no call_rcu() + callbacks waiting anywhere in the system, rcu_barrier() is within + its rights to return immediately. + + So if you need to wait for both an RCU grace period and for + all pre-existing call_rcu() callbacks, you will need to execute + both rcu_barrier() and synchronize_rcu(), if necessary, using + something like workqueues to to execute them concurrently. + + See rcubarrier.txt for more information. diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt deleted file mode 100644 index e98ff261a438..000000000000 --- a/Documentation/RCU/checklist.txt +++ /dev/null @@ -1,458 +0,0 @@ -Review Checklist for RCU Patches - - -This document contains a checklist for producing and reviewing patches -that make use of RCU. Violating any of the rules listed below will -result in the same sorts of problems that leaving out a locking primitive -would cause. This list is based on experiences reviewing such patches -over a rather long period of time, but improvements are always welcome! - -0. Is RCU being applied to a read-mostly situation? If the data - structure is updated more than about 10% of the time, then you - should strongly consider some other approach, unless detailed - performance measurements show that RCU is nonetheless the right - tool for the job. Yes, RCU does reduce read-side overhead by - increasing write-side overhead, which is exactly why normal uses - of RCU will do much more reading than updating. - - Another exception is where performance is not an issue, and RCU - provides a simpler implementation. An example of this situation - is the dynamic NMI code in the Linux 2.6 kernel, at least on - architectures where NMIs are rare. - - Yet another exception is where the low real-time latency of RCU's - read-side primitives is critically important. - - One final exception is where RCU readers are used to prevent - the ABA problem (https://en.wikipedia.org/wiki/ABA_problem) - for lockless updates. This does result in the mildly - counter-intuitive situation where rcu_read_lock() and - rcu_read_unlock() are used to protect updates, however, this - approach provides the same potential simplifications that garbage - collectors do. - -1. Does the update code have proper mutual exclusion? - - RCU does allow -readers- to run (almost) naked, but -writers- must - still use some sort of mutual exclusion, such as: - - a. locking, - b. atomic operations, or - c. restricting updates to a single task. - - If you choose #b, be prepared to describe how you have handled - memory barriers on weakly ordered machines (pretty much all of - them -- even x86 allows later loads to be reordered to precede - earlier stores), and be prepared to explain why this added - complexity is worthwhile. If you choose #c, be prepared to - explain how this single task does not become a major bottleneck on - big multiprocessor machines (for example, if the task is updating - information relating to itself that other tasks can read, there - by definition can be no bottleneck). Note that the definition - of "large" has changed significantly: Eight CPUs was "large" - in the year 2000, but a hundred CPUs was unremarkable in 2017. - -2. Do the RCU read-side critical sections make proper use of - rcu_read_lock() and friends? These primitives are needed - to prevent grace periods from ending prematurely, which - could result in data being unceremoniously freed out from - under your read-side code, which can greatly increase the - actuarial risk of your kernel. - - As a rough rule of thumb, any dereference of an RCU-protected - pointer must be covered by rcu_read_lock(), rcu_read_lock_bh(), - rcu_read_lock_sched(), or by the appropriate update-side lock. - Disabling of preemption can serve as rcu_read_lock_sched(), but - is less readable and prevents lockdep from detecting locking issues. - - Letting RCU-protected pointers "leak" out of an RCU read-side - critical section is every bid as bad as letting them leak out - from under a lock. Unless, of course, you have arranged some - other means of protection, such as a lock or a reference count - -before- letting them out of the RCU read-side critical section. - -3. Does the update code tolerate concurrent accesses? - - The whole point of RCU is to permit readers to run without - any locks or atomic operations. This means that readers will - be running while updates are in progress. There are a number - of ways to handle this concurrency, depending on the situation: - - a. Use the RCU variants of the list and hlist update - primitives to add, remove, and replace elements on - an RCU-protected list. Alternatively, use the other - RCU-protected data structures that have been added to - the Linux kernel. - - This is almost always the best approach. - - b. Proceed as in (a) above, but also maintain per-element - locks (that are acquired by both readers and writers) - that guard per-element state. Of course, fields that - the readers refrain from accessing can be guarded by - some other lock acquired only by updaters, if desired. - - This works quite well, also. - - c. Make updates appear atomic to readers. For example, - pointer updates to properly aligned fields will - appear atomic, as will individual atomic primitives. - Sequences of operations performed under a lock will -not- - appear to be atomic to RCU readers, nor will sequences - of multiple atomic primitives. - - This can work, but is starting to get a bit tricky. - - d. Carefully order the updates and the reads so that - readers see valid data at all phases of the update. - This is often more difficult than it sounds, especially - given modern CPUs' tendency to reorder memory references. - One must usually liberally sprinkle memory barriers - (smp_wmb(), smp_rmb(), smp_mb()) through the code, - making it difficult to understand and to test. - - It is usually better to group the changing data into - a separate structure, so that the change may be made - to appear atomic by updating a pointer to reference - a new structure containing updated values. - -4. Weakly ordered CPUs pose special challenges. Almost all CPUs - are weakly ordered -- even x86 CPUs allow later loads to be - reordered to precede earlier stores. RCU code must take all of - the following measures to prevent memory-corruption problems: - - a. Readers must maintain proper ordering of their memory - accesses. The rcu_dereference() primitive ensures that - the CPU picks up the pointer before it picks up the data - that the pointer points to. This really is necessary - on Alpha CPUs. If you don't believe me, see: - - http://www.openvms.compaq.com/wizard/wiz_2637.html - - The rcu_dereference() primitive is also an excellent - documentation aid, letting the person reading the - code know exactly which pointers are protected by RCU. - Please note that compilers can also reorder code, and - they are becoming increasingly aggressive about doing - just that. The rcu_dereference() primitive therefore also - prevents destructive compiler optimizations. However, - with a bit of devious creativity, it is possible to - mishandle the return value from rcu_dereference(). - Please see rcu_dereference.txt in this directory for - more information. - - The rcu_dereference() primitive is used by the - various "_rcu()" list-traversal primitives, such - as the list_for_each_entry_rcu(). Note that it is - perfectly legal (if redundant) for update-side code to - use rcu_dereference() and the "_rcu()" list-traversal - primitives. This is particularly useful in code that - is common to readers and updaters. However, lockdep - will complain if you access rcu_dereference() outside - of an RCU read-side critical section. See lockdep.txt - to learn what to do about this. - - Of course, neither rcu_dereference() nor the "_rcu()" - list-traversal primitives can substitute for a good - concurrency design coordinating among multiple updaters. - - b. If the list macros are being used, the list_add_tail_rcu() - and list_add_rcu() primitives must be used in order - to prevent weakly ordered machines from misordering - structure initialization and pointer planting. - Similarly, if the hlist macros are being used, the - hlist_add_head_rcu() primitive is required. - - c. If the list macros are being used, the list_del_rcu() - primitive must be used to keep list_del()'s pointer - poisoning from inflicting toxic effects on concurrent - readers. Similarly, if the hlist macros are being used, - the hlist_del_rcu() primitive is required. - - The list_replace_rcu() and hlist_replace_rcu() primitives - may be used to replace an old structure with a new one - in their respective types of RCU-protected lists. - - d. Rules similar to (4b) and (4c) apply to the "hlist_nulls" - type of RCU-protected linked lists. - - e. Updates must ensure that initialization of a given - structure happens before pointers to that structure are - publicized. Use the rcu_assign_pointer() primitive - when publicizing a pointer to a structure that can - be traversed by an RCU read-side critical section. - -5. If call_rcu() or call_srcu() is used, the callback function will - be called from softirq context. In particular, it cannot block. - -6. Since synchronize_rcu() can block, it cannot be called - from any sort of irq context. The same rule applies - for synchronize_srcu(), synchronize_rcu_expedited(), and - synchronize_srcu_expedited(). - - The expedited forms of these primitives have the same semantics - as the non-expedited forms, but expediting is both expensive and - (with the exception of synchronize_srcu_expedited()) unfriendly - to real-time workloads. Use of the expedited primitives should - be restricted to rare configuration-change operations that would - not normally be undertaken while a real-time workload is running. - However, real-time workloads can use rcupdate.rcu_normal kernel - boot parameter to completely disable expedited grace periods, - though this might have performance implications. - - In particular, if you find yourself invoking one of the expedited - primitives repeatedly in a loop, please do everyone a favor: - Restructure your code so that it batches the updates, allowing - a single non-expedited primitive to cover the entire batch. - This will very likely be faster than the loop containing the - expedited primitive, and will be much much easier on the rest - of the system, especially to real-time workloads running on - the rest of the system. - -7. As of v4.20, a given kernel implements only one RCU flavor, - which is RCU-sched for PREEMPT=n and RCU-preempt for PREEMPT=y. - If the updater uses call_rcu() or synchronize_rcu(), - then the corresponding readers my use rcu_read_lock() and - rcu_read_unlock(), rcu_read_lock_bh() and rcu_read_unlock_bh(), - or any pair of primitives that disables and re-enables preemption, - for example, rcu_read_lock_sched() and rcu_read_unlock_sched(). - If the updater uses synchronize_srcu() or call_srcu(), - then the corresponding readers must use srcu_read_lock() and - srcu_read_unlock(), and with the same srcu_struct. The rules for - the expedited primitives are the same as for their non-expedited - counterparts. Mixing things up will result in confusion and - broken kernels, and has even resulted in an exploitable security - issue. - - One exception to this rule: rcu_read_lock() and rcu_read_unlock() - may be substituted for rcu_read_lock_bh() and rcu_read_unlock_bh() - in cases where local bottom halves are already known to be - disabled, for example, in irq or softirq context. Commenting - such cases is a must, of course! And the jury is still out on - whether the increased speed is worth it. - -8. Although synchronize_rcu() is slower than is call_rcu(), it - usually results in simpler code. So, unless update performance is - critically important, the updaters cannot block, or the latency of - synchronize_rcu() is visible from userspace, synchronize_rcu() - should be used in preference to call_rcu(). Furthermore, - kfree_rcu() usually results in even simpler code than does - synchronize_rcu() without synchronize_rcu()'s multi-millisecond - latency. So please take advantage of kfree_rcu()'s "fire and - forget" memory-freeing capabilities where it applies. - - An especially important property of the synchronize_rcu() - primitive is that it automatically self-limits: if grace periods - are delayed for whatever reason, then the synchronize_rcu() - primitive will correspondingly delay updates. In contrast, - code using call_rcu() should explicitly limit update rate in - cases where grace periods are delayed, as failing to do so can - result in excessive realtime latencies or even OOM conditions. - - Ways of gaining this self-limiting property when using call_rcu() - include: - - a. Keeping a count of the number of data-structure elements - used by the RCU-protected data structure, including - those waiting for a grace period to elapse. Enforce a - limit on this number, stalling updates as needed to allow - previously deferred frees to complete. Alternatively, - limit only the number awaiting deferred free rather than - the total number of elements. - - One way to stall the updates is to acquire the update-side - mutex. (Don't try this with a spinlock -- other CPUs - spinning on the lock could prevent the grace period - from ever ending.) Another way to stall the updates - is for the updates to use a wrapper function around - the memory allocator, so that this wrapper function - simulates OOM when there is too much memory awaiting an - RCU grace period. There are of course many other - variations on this theme. - - b. Limiting update rate. For example, if updates occur only - once per hour, then no explicit rate limiting is - required, unless your system is already badly broken. - Older versions of the dcache subsystem take this approach, - guarding updates with a global lock, limiting their rate. - - c. Trusted update -- if updates can only be done manually by - superuser or some other trusted user, then it might not - be necessary to automatically limit them. The theory - here is that superuser already has lots of ways to crash - the machine. - - d. Periodically invoke synchronize_rcu(), permitting a limited - number of updates per grace period. - - The same cautions apply to call_srcu() and kfree_rcu(). - - Note that although these primitives do take action to avoid memory - exhaustion when any given CPU has too many callbacks, a determined - user could still exhaust memory. This is especially the case - if a system with a large number of CPUs has been configured to - offload all of its RCU callbacks onto a single CPU, or if the - system has relatively little free memory. - -9. All RCU list-traversal primitives, which include - rcu_dereference(), list_for_each_entry_rcu(), and - list_for_each_safe_rcu(), must be either within an RCU read-side - critical section or must be protected by appropriate update-side - locks. RCU read-side critical sections are delimited by - rcu_read_lock() and rcu_read_unlock(), or by similar primitives - such as rcu_read_lock_bh() and rcu_read_unlock_bh(), in which - case the matching rcu_dereference() primitive must be used in - order to keep lockdep happy, in this case, rcu_dereference_bh(). - - The reason that it is permissible to use RCU list-traversal - primitives when the update-side lock is held is that doing so - can be quite helpful in reducing code bloat when common code is - shared between readers and updaters. Additional primitives - are provided for this case, as discussed in lockdep.txt. - -10. Conversely, if you are in an RCU read-side critical section, - and you don't hold the appropriate update-side lock, you -must- - use the "_rcu()" variants of the list macros. Failing to do so - will break Alpha, cause aggressive compilers to generate bad code, - and confuse people trying to read your code. - -11. Any lock acquired by an RCU callback must be acquired elsewhere - with softirq disabled, e.g., via spin_lock_irqsave(), - spin_lock_bh(), etc. Failing to disable softirq on a given - acquisition of that lock will result in deadlock as soon as - the RCU softirq handler happens to run your RCU callback while - interrupting that acquisition's critical section. - -12. RCU callbacks can be and are executed in parallel. In many cases, - the callback code simply wrappers around kfree(), so that this - is not an issue (or, more accurately, to the extent that it is - an issue, the memory-allocator locking handles it). However, - if the callbacks do manipulate a shared data structure, they - must use whatever locking or other synchronization is required - to safely access and/or modify that data structure. - - Do not assume that RCU callbacks will be executed on the same - CPU that executed the corresponding call_rcu() or call_srcu(). - For example, if a given CPU goes offline while having an RCU - callback pending, then that RCU callback will execute on some - surviving CPU. (If this was not the case, a self-spawning RCU - callback would prevent the victim CPU from ever going offline.) - Furthermore, CPUs designated by rcu_nocbs= might well -always- - have their RCU callbacks executed on some other CPUs, in fact, - for some real-time workloads, this is the whole point of using - the rcu_nocbs= kernel boot parameter. - -13. Unlike other forms of RCU, it -is- permissible to block in an - SRCU read-side critical section (demarked by srcu_read_lock() - and srcu_read_unlock()), hence the "SRCU": "sleepable RCU". - Please note that if you don't need to sleep in read-side critical - sections, you should be using RCU rather than SRCU, because RCU - is almost always faster and easier to use than is SRCU. - - Also unlike other forms of RCU, explicit initialization and - cleanup is required either at build time via DEFINE_SRCU() - or DEFINE_STATIC_SRCU() or at runtime via init_srcu_struct() - and cleanup_srcu_struct(). These last two are passed a - "struct srcu_struct" that defines the scope of a given - SRCU domain. Once initialized, the srcu_struct is passed - to srcu_read_lock(), srcu_read_unlock() synchronize_srcu(), - synchronize_srcu_expedited(), and call_srcu(). A given - synchronize_srcu() waits only for SRCU read-side critical - sections governed by srcu_read_lock() and srcu_read_unlock() - calls that have been passed the same srcu_struct. This property - is what makes sleeping read-side critical sections tolerable -- - a given subsystem delays only its own updates, not those of other - subsystems using SRCU. Therefore, SRCU is less prone to OOM the - system than RCU would be if RCU's read-side critical sections - were permitted to sleep. - - The ability to sleep in read-side critical sections does not - come for free. First, corresponding srcu_read_lock() and - srcu_read_unlock() calls must be passed the same srcu_struct. - Second, grace-period-detection overhead is amortized only - over those updates sharing a given srcu_struct, rather than - being globally amortized as they are for other forms of RCU. - Therefore, SRCU should be used in preference to rw_semaphore - only in extremely read-intensive situations, or in situations - requiring SRCU's read-side deadlock immunity or low read-side - realtime latency. You should also consider percpu_rw_semaphore - when you need lightweight readers. - - SRCU's expedited primitive (synchronize_srcu_expedited()) - never sends IPIs to other CPUs, so it is easier on - real-time workloads than is synchronize_rcu_expedited(). - - Note that rcu_assign_pointer() relates to SRCU just as it does to - other forms of RCU, but instead of rcu_dereference() you should - use srcu_dereference() in order to avoid lockdep splats. - -14. The whole point of call_rcu(), synchronize_rcu(), and friends - is to wait until all pre-existing readers have finished before - carrying out some otherwise-destructive operation. It is - therefore critically important to -first- remove any path - that readers can follow that could be affected by the - destructive operation, and -only- -then- invoke call_rcu(), - synchronize_rcu(), or friends. - - Because these primitives only wait for pre-existing readers, it - is the caller's responsibility to guarantee that any subsequent - readers will execute safely. - -15. The various RCU read-side primitives do -not- necessarily contain - memory barriers. You should therefore plan for the CPU - and the compiler to freely reorder code into and out of RCU - read-side critical sections. It is the responsibility of the - RCU update-side primitives to deal with this. - - For SRCU readers, you can use smp_mb__after_srcu_read_unlock() - immediately after an srcu_read_unlock() to get a full barrier. - -16. Use CONFIG_PROVE_LOCKING, CONFIG_DEBUG_OBJECTS_RCU_HEAD, and the - __rcu sparse checks to validate your RCU code. These can help - find problems as follows: - - CONFIG_PROVE_LOCKING: check that accesses to RCU-protected data - structures are carried out under the proper RCU - read-side critical section, while holding the right - combination of locks, or whatever other conditions - are appropriate. - - CONFIG_DEBUG_OBJECTS_RCU_HEAD: check that you don't pass the - same object to call_rcu() (or friends) before an RCU - grace period has elapsed since the last time that you - passed that same object to call_rcu() (or friends). - - __rcu sparse checks: tag the pointer to the RCU-protected data - structure with __rcu, and sparse will warn you if you - access that pointer without the services of one of the - variants of rcu_dereference(). - - These debugging aids can help you find problems that are - otherwise extremely difficult to spot. - -17. If you register a callback using call_rcu() or call_srcu(), and - pass in a function defined within a loadable module, then it in - necessary to wait for all pending callbacks to be invoked after - the last invocation and before unloading that module. Note that - it is absolutely -not- sufficient to wait for a grace period! - The current (say) synchronize_rcu() implementation is -not- - guaranteed to wait for callbacks registered on other CPUs. - Or even on the current CPU if that CPU recently went offline - and came back online. - - You instead need to use one of the barrier functions: - - o call_rcu() -> rcu_barrier() - o call_srcu() -> srcu_barrier() - - However, these barrier functions are absolutely -not- guaranteed - to wait for a grace period. In fact, if there are no call_rcu() - callbacks waiting anywhere in the system, rcu_barrier() is within - its rights to return immediately. - - So if you need to wait for both an RCU grace period and for - all pre-existing call_rcu() callbacks, you will need to execute - both rcu_barrier() and synchronize_rcu(), if necessary, using - something like workqueues to to execute them concurrently. - - See rcubarrier.txt for more information. diff --git a/Documentation/RCU/index.rst b/Documentation/RCU/index.rst index 81a0a1e5f767..c1ba4d130bb0 100644 --- a/Documentation/RCU/index.rst +++ b/Documentation/RCU/index.rst @@ -1,3 +1,5 @@ +.. SPDX-License-Identifier: GPL-2.0 + .. _rcu_concepts: ============ @@ -8,6 +10,7 @@ RCU concepts :maxdepth: 3 arrayRCU + checklist rcubarrier rcu_dereference whatisRCU -- cgit v1.2.3 From a3b0a79f8903f955250505f99d1e37b6c7d7b060 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Tue, 21 Apr 2020 19:04:03 +0200 Subject: docs: RCU: Convert lockdep-splat.txt to ReST - Add a SPDX header; - Add a document title; - Some whitespace fixes and new line breaks; - Mark literal blocks as such; - Add it to RCU/index.rst. Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Paul E. McKenney --- Documentation/RCU/index.rst | 1 + Documentation/RCU/lockdep-splat.rst | 115 ++++++++++++++++++++++++++++++++++++ Documentation/RCU/lockdep-splat.txt | 110 ---------------------------------- 3 files changed, 116 insertions(+), 110 deletions(-) create mode 100644 Documentation/RCU/lockdep-splat.rst delete mode 100644 Documentation/RCU/lockdep-splat.txt (limited to 'Documentation/RCU') diff --git a/Documentation/RCU/index.rst b/Documentation/RCU/index.rst index c1ba4d130bb0..430a37132b2c 100644 --- a/Documentation/RCU/index.rst +++ b/Documentation/RCU/index.rst @@ -11,6 +11,7 @@ RCU concepts arrayRCU checklist + lockdep-splat rcubarrier rcu_dereference whatisRCU diff --git a/Documentation/RCU/lockdep-splat.rst b/Documentation/RCU/lockdep-splat.rst new file mode 100644 index 000000000000..2a5c79db57dc --- /dev/null +++ b/Documentation/RCU/lockdep-splat.rst @@ -0,0 +1,115 @@ +.. SPDX-License-Identifier: GPL-2.0 + +================= +Lockdep-RCU Splat +================= + +Lockdep-RCU was added to the Linux kernel in early 2010 +(http://lwn.net/Articles/371986/). This facility checks for some common +misuses of the RCU API, most notably using one of the rcu_dereference() +family to access an RCU-protected pointer without the proper protection. +When such misuse is detected, an lockdep-RCU splat is emitted. + +The usual cause of a lockdep-RCU slat is someone accessing an +RCU-protected data structure without either (1) being in the right kind of +RCU read-side critical section or (2) holding the right update-side lock. +This problem can therefore be serious: it might result in random memory +overwriting or worse. There can of course be false positives, this +being the real world and all that. + +So let's look at an example RCU lockdep splat from 3.0-rc5, one that +has long since been fixed:: + + ============================= + WARNING: suspicious RCU usage + ----------------------------- + block/cfq-iosched.c:2776 suspicious rcu_dereference_protected() usage! + +other info that might help us debug this:: + + rcu_scheduler_active = 1, debug_locks = 0 + 3 locks held by scsi_scan_6/1552: + #0: (&shost->scan_mutex){+.+.}, at: [] + scsi_scan_host_selected+0x5a/0x150 + #1: (&eq->sysfs_lock){+.+.}, at: [] + elevator_exit+0x22/0x60 + #2: (&(&q->__queue_lock)->rlock){-.-.}, at: [] + cfq_exit_queue+0x43/0x190 + + stack backtrace: + Pid: 1552, comm: scsi_scan_6 Not tainted 3.0.0-rc5 #17 + Call Trace: + [] lockdep_rcu_dereference+0xbb/0xc0 + [] __cfq_exit_single_io_context+0xe9/0x120 + [] cfq_exit_queue+0x7c/0x190 + [] elevator_exit+0x36/0x60 + [] blk_cleanup_queue+0x4a/0x60 + [] scsi_free_queue+0x9/0x10 + [] __scsi_remove_device+0x84/0xd0 + [] scsi_probe_and_add_lun+0x353/0xb10 + [] ? error_exit+0x29/0xb0 + [] ? _raw_spin_unlock_irqrestore+0x3d/0x80 + [] __scsi_scan_target+0x112/0x680 + [] ? trace_hardirqs_off_thunk+0x3a/0x3c + [] ? error_exit+0x29/0xb0 + [] ? kobject_del+0x40/0x40 + [] scsi_scan_channel+0x86/0xb0 + [] scsi_scan_host_selected+0x140/0x150 + [] do_scsi_scan_host+0x89/0x90 + [] do_scan_async+0x20/0x160 + [] ? do_scsi_scan_host+0x90/0x90 + [] kthread+0xa6/0xb0 + [] kernel_thread_helper+0x4/0x10 + [] ? finish_task_switch+0x80/0x110 + [] ? retint_restore_args+0xe/0xe + [] ? __kthread_init_worker+0x70/0x70 + [] ? gs_change+0xb/0xb + +Line 2776 of block/cfq-iosched.c in v3.0-rc5 is as follows:: + + if (rcu_dereference(ioc->ioc_data) == cic) { + +This form says that it must be in a plain vanilla RCU read-side critical +section, but the "other info" list above shows that this is not the +case. Instead, we hold three locks, one of which might be RCU related. +And maybe that lock really does protect this reference. If so, the fix +is to inform RCU, perhaps by changing __cfq_exit_single_io_context() to +take the struct request_queue "q" from cfq_exit_queue() as an argument, +which would permit us to invoke rcu_dereference_protected as follows:: + + if (rcu_dereference_protected(ioc->ioc_data, + lockdep_is_held(&q->queue_lock)) == cic) { + +With this change, there would be no lockdep-RCU splat emitted if this +code was invoked either from within an RCU read-side critical section +or with the ->queue_lock held. In particular, this would have suppressed +the above lockdep-RCU splat because ->queue_lock is held (see #2 in the +list above). + +On the other hand, perhaps we really do need an RCU read-side critical +section. In this case, the critical section must span the use of the +return value from rcu_dereference(), or at least until there is some +reference count incremented or some such. One way to handle this is to +add rcu_read_lock() and rcu_read_unlock() as follows:: + + rcu_read_lock(); + if (rcu_dereference(ioc->ioc_data) == cic) { + spin_lock(&ioc->lock); + rcu_assign_pointer(ioc->ioc_data, NULL); + spin_unlock(&ioc->lock); + } + rcu_read_unlock(); + +With this change, the rcu_dereference() is always within an RCU +read-side critical section, which again would have suppressed the +above lockdep-RCU splat. + +But in this particular case, we don't actually dereference the pointer +returned from rcu_dereference(). Instead, that pointer is just compared +to the cic pointer, which means that the rcu_dereference() can be replaced +by rcu_access_pointer() as follows:: + + if (rcu_access_pointer(ioc->ioc_data) == cic) { + +Because it is legal to invoke rcu_access_pointer() without protection, +this change would also suppress the above lockdep-RCU splat. diff --git a/Documentation/RCU/lockdep-splat.txt b/Documentation/RCU/lockdep-splat.txt deleted file mode 100644 index b8096316fd11..000000000000 --- a/Documentation/RCU/lockdep-splat.txt +++ /dev/null @@ -1,110 +0,0 @@ -Lockdep-RCU was added to the Linux kernel in early 2010 -(http://lwn.net/Articles/371986/). This facility checks for some common -misuses of the RCU API, most notably using one of the rcu_dereference() -family to access an RCU-protected pointer without the proper protection. -When such misuse is detected, an lockdep-RCU splat is emitted. - -The usual cause of a lockdep-RCU slat is someone accessing an -RCU-protected data structure without either (1) being in the right kind of -RCU read-side critical section or (2) holding the right update-side lock. -This problem can therefore be serious: it might result in random memory -overwriting or worse. There can of course be false positives, this -being the real world and all that. - -So let's look at an example RCU lockdep splat from 3.0-rc5, one that -has long since been fixed: - -============================= -WARNING: suspicious RCU usage ------------------------------ -block/cfq-iosched.c:2776 suspicious rcu_dereference_protected() usage! - -other info that might help us debug this: - - -rcu_scheduler_active = 1, debug_locks = 0 -3 locks held by scsi_scan_6/1552: - #0: (&shost->scan_mutex){+.+.}, at: [] -scsi_scan_host_selected+0x5a/0x150 - #1: (&eq->sysfs_lock){+.+.}, at: [] -elevator_exit+0x22/0x60 - #2: (&(&q->__queue_lock)->rlock){-.-.}, at: [] -cfq_exit_queue+0x43/0x190 - -stack backtrace: -Pid: 1552, comm: scsi_scan_6 Not tainted 3.0.0-rc5 #17 -Call Trace: - [] lockdep_rcu_dereference+0xbb/0xc0 - [] __cfq_exit_single_io_context+0xe9/0x120 - [] cfq_exit_queue+0x7c/0x190 - [] elevator_exit+0x36/0x60 - [] blk_cleanup_queue+0x4a/0x60 - [] scsi_free_queue+0x9/0x10 - [] __scsi_remove_device+0x84/0xd0 - [] scsi_probe_and_add_lun+0x353/0xb10 - [] ? error_exit+0x29/0xb0 - [] ? _raw_spin_unlock_irqrestore+0x3d/0x80 - [] __scsi_scan_target+0x112/0x680 - [] ? trace_hardirqs_off_thunk+0x3a/0x3c - [] ? error_exit+0x29/0xb0 - [] ? kobject_del+0x40/0x40 - [] scsi_scan_channel+0x86/0xb0 - [] scsi_scan_host_selected+0x140/0x150 - [] do_scsi_scan_host+0x89/0x90 - [] do_scan_async+0x20/0x160 - [] ? do_scsi_scan_host+0x90/0x90 - [] kthread+0xa6/0xb0 - [] kernel_thread_helper+0x4/0x10 - [] ? finish_task_switch+0x80/0x110 - [] ? retint_restore_args+0xe/0xe - [] ? __kthread_init_worker+0x70/0x70 - [] ? gs_change+0xb/0xb - -Line 2776 of block/cfq-iosched.c in v3.0-rc5 is as follows: - - if (rcu_dereference(ioc->ioc_data) == cic) { - -This form says that it must be in a plain vanilla RCU read-side critical -section, but the "other info" list above shows that this is not the -case. Instead, we hold three locks, one of which might be RCU related. -And maybe that lock really does protect this reference. If so, the fix -is to inform RCU, perhaps by changing __cfq_exit_single_io_context() to -take the struct request_queue "q" from cfq_exit_queue() as an argument, -which would permit us to invoke rcu_dereference_protected as follows: - - if (rcu_dereference_protected(ioc->ioc_data, - lockdep_is_held(&q->queue_lock)) == cic) { - -With this change, there would be no lockdep-RCU splat emitted if this -code was invoked either from within an RCU read-side critical section -or with the ->queue_lock held. In particular, this would have suppressed -the above lockdep-RCU splat because ->queue_lock is held (see #2 in the -list above). - -On the other hand, perhaps we really do need an RCU read-side critical -section. In this case, the critical section must span the use of the -return value from rcu_dereference(), or at least until there is some -reference count incremented or some such. One way to handle this is to -add rcu_read_lock() and rcu_read_unlock() as follows: - - rcu_read_lock(); - if (rcu_dereference(ioc->ioc_data) == cic) { - spin_lock(&ioc->lock); - rcu_assign_pointer(ioc->ioc_data, NULL); - spin_unlock(&ioc->lock); - } - rcu_read_unlock(); - -With this change, the rcu_dereference() is always within an RCU -read-side critical section, which again would have suppressed the -above lockdep-RCU splat. - -But in this particular case, we don't actually dereference the pointer -returned from rcu_dereference(). Instead, that pointer is just compared -to the cic pointer, which means that the rcu_dereference() can be replaced -by rcu_access_pointer() as follows: - - if (rcu_access_pointer(ioc->ioc_data) == cic) { - -Because it is legal to invoke rcu_access_pointer() without protection, -this change would also suppress the above lockdep-RCU splat. -- cgit v1.2.3 From 058cc23bcad08aca62987cc795fe406ac39146d0 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Tue, 21 Apr 2020 19:04:04 +0200 Subject: docs: RCU: Convert lockdep.txt to ReST - Add a SPDX header; - Adjust document title; - Mark literal blocks as such; - Add it to RCU/index.rst. Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Paul E. McKenney --- Documentation/RCU/index.rst | 1 + Documentation/RCU/lockdep.rst | 116 ++++++++++++++++++++++++++++++++++++++++++ Documentation/RCU/lockdep.txt | 112 ---------------------------------------- 3 files changed, 117 insertions(+), 112 deletions(-) create mode 100644 Documentation/RCU/lockdep.rst delete mode 100644 Documentation/RCU/lockdep.txt (limited to 'Documentation/RCU') diff --git a/Documentation/RCU/index.rst b/Documentation/RCU/index.rst index 430a37132b2c..fa7a2a8949b7 100644 --- a/Documentation/RCU/index.rst +++ b/Documentation/RCU/index.rst @@ -11,6 +11,7 @@ RCU concepts arrayRCU checklist + lockdep lockdep-splat rcubarrier rcu_dereference diff --git a/Documentation/RCU/lockdep.rst b/Documentation/RCU/lockdep.rst new file mode 100644 index 000000000000..f1fc8ae3846a --- /dev/null +++ b/Documentation/RCU/lockdep.rst @@ -0,0 +1,116 @@ +.. SPDX-License-Identifier: GPL-2.0 + +======================== +RCU and lockdep checking +======================== + +All flavors of RCU have lockdep checking available, so that lockdep is +aware of when each task enters and leaves any flavor of RCU read-side +critical section. Each flavor of RCU is tracked separately (but note +that this is not the case in 2.6.32 and earlier). This allows lockdep's +tracking to include RCU state, which can sometimes help when debugging +deadlocks and the like. + +In addition, RCU provides the following primitives that check lockdep's +state:: + + rcu_read_lock_held() for normal RCU. + rcu_read_lock_bh_held() for RCU-bh. + rcu_read_lock_sched_held() for RCU-sched. + srcu_read_lock_held() for SRCU. + +These functions are conservative, and will therefore return 1 if they +aren't certain (for example, if CONFIG_DEBUG_LOCK_ALLOC is not set). +This prevents things like WARN_ON(!rcu_read_lock_held()) from giving false +positives when lockdep is disabled. + +In addition, a separate kernel config parameter CONFIG_PROVE_RCU enables +checking of rcu_dereference() primitives: + + rcu_dereference(p): + Check for RCU read-side critical section. + rcu_dereference_bh(p): + Check for RCU-bh read-side critical section. + rcu_dereference_sched(p): + Check for RCU-sched read-side critical section. + srcu_dereference(p, sp): + Check for SRCU read-side critical section. + rcu_dereference_check(p, c): + Use explicit check expression "c" along with + rcu_read_lock_held(). This is useful in code that is + invoked by both RCU readers and updaters. + rcu_dereference_bh_check(p, c): + Use explicit check expression "c" along with + rcu_read_lock_bh_held(). This is useful in code that + is invoked by both RCU-bh readers and updaters. + rcu_dereference_sched_check(p, c): + Use explicit check expression "c" along with + rcu_read_lock_sched_held(). This is useful in code that + is invoked by both RCU-sched readers and updaters. + srcu_dereference_check(p, c): + Use explicit check expression "c" along with + srcu_read_lock_held()(). This is useful in code that + is invoked by both SRCU readers and updaters. + rcu_dereference_raw(p): + Don't check. (Use sparingly, if at all.) + rcu_dereference_protected(p, c): + Use explicit check expression "c", and omit all barriers + and compiler constraints. This is useful when the data + structure cannot change, for example, in code that is + invoked only by updaters. + rcu_access_pointer(p): + Return the value of the pointer and omit all barriers, + but retain the compiler constraints that prevent duplicating + or coalescsing. This is useful when when testing the + value of the pointer itself, for example, against NULL. + +The rcu_dereference_check() check expression can be any boolean +expression, but would normally include a lockdep expression. However, +any boolean expression can be used. For a moderately ornate example, +consider the following:: + + file = rcu_dereference_check(fdt->fd[fd], + lockdep_is_held(&files->file_lock) || + atomic_read(&files->count) == 1); + +This expression picks up the pointer "fdt->fd[fd]" in an RCU-safe manner, +and, if CONFIG_PROVE_RCU is configured, verifies that this expression +is used in: + +1. An RCU read-side critical section (implicit), or +2. with files->file_lock held, or +3. on an unshared files_struct. + +In case (1), the pointer is picked up in an RCU-safe manner for vanilla +RCU read-side critical sections, in case (2) the ->file_lock prevents +any change from taking place, and finally, in case (3) the current task +is the only task accessing the file_struct, again preventing any change +from taking place. If the above statement was invoked only from updater +code, it could instead be written as follows:: + + file = rcu_dereference_protected(fdt->fd[fd], + lockdep_is_held(&files->file_lock) || + atomic_read(&files->count) == 1); + +This would verify cases #2 and #3 above, and furthermore lockdep would +complain if this was used in an RCU read-side critical section unless one +of these two cases held. Because rcu_dereference_protected() omits all +barriers and compiler constraints, it generates better code than do the +other flavors of rcu_dereference(). On the other hand, it is illegal +to use rcu_dereference_protected() if either the RCU-protected pointer +or the RCU-protected data that it points to can change concurrently. + +Like rcu_dereference(), when lockdep is enabled, RCU list and hlist +traversal primitives check for being called from within an RCU read-side +critical section. However, a lockdep expression can be passed to them +as a additional optional argument. With this lockdep expression, these +traversal primitives will complain only if the lockdep expression is +false and they are called from outside any RCU read-side critical section. + +For example, the workqueue for_each_pwq() macro is intended to be used +either within an RCU read-side critical section or with wq->mutex held. +It is thus implemented as follows:: + + #define for_each_pwq(pwq, wq) + list_for_each_entry_rcu((pwq), &(wq)->pwqs, pwqs_node, + lock_is_held(&(wq->mutex).dep_map)) diff --git a/Documentation/RCU/lockdep.txt b/Documentation/RCU/lockdep.txt deleted file mode 100644 index 89db949eeca0..000000000000 --- a/Documentation/RCU/lockdep.txt +++ /d