diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2017-09-04 11:52:29 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2017-09-04 11:52:29 -0700 |
commit | 5f82e71a001d14824a7728ad9e49f6aea420f161 (patch) | |
tree | bf5dfa7cf0840ec834899ae925913973bd1e65d1 /Documentation | |
parent | 6c51e67b64d169419fb13318035bb442f9176612 (diff) | |
parent | edc2988c548db05e33b921fed15821010bc74895 (diff) |
Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
- Add 'cross-release' support to lockdep, which allows APIs like
completions, where it's not the 'owner' who releases the lock, to be
tracked. It's all activated automatically under
CONFIG_PROVE_LOCKING=y.
- Clean up (restructure) the x86 atomics op implementation to be more
readable, in preparation of KASAN annotations. (Dmitry Vyukov)
- Fix static keys (Paolo Bonzini)
- Add killable versions of down_read() et al (Kirill Tkhai)
- Rework and fix jump_label locking (Marc Zyngier, Paolo Bonzini)
- Rework (and fix) tlb_flush_pending() barriers (Peter Zijlstra)
- Remove smp_mb__before_spinlock() and convert its usages, introduce
smp_mb__after_spinlock() (Peter Zijlstra)
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (56 commits)
locking/lockdep/selftests: Fix mixed read-write ABBA tests
sched/completion: Avoid unnecessary stack allocation for COMPLETION_INITIALIZER_ONSTACK()
acpi/nfit: Fix COMPLETION_INITIALIZER_ONSTACK() abuse
locking/pvqspinlock: Relax cmpxchg's to improve performance on some architectures
smp: Avoid using two cache lines for struct call_single_data
locking/lockdep: Untangle xhlock history save/restore from task independence
locking/refcounts, x86/asm: Disable CONFIG_ARCH_HAS_REFCOUNT for the time being
futex: Remove duplicated code and fix undefined behaviour
Documentation/locking/atomic: Finish the document...
locking/lockdep: Fix workqueue crossrelease annotation
workqueue/lockdep: 'Fix' flush_work() annotation
locking/lockdep/selftests: Add mixed read-write ABBA tests
mm, locking/barriers: Clarify tlb_flush_pending() barriers
locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE and CONFIG_LOCKDEP_COMPLETIONS truly non-interactive
locking/lockdep: Explicitly initialize wq_barrier::done::map
locking/lockdep: Rename CONFIG_LOCKDEP_COMPLETE to CONFIG_LOCKDEP_COMPLETIONS
locking/lockdep: Reword title of LOCKDEP_CROSSRELEASE config
locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE part of CONFIG_PROVE_LOCKING
locking/refcounts, x86/asm: Implement fast refcount overflow protection
locking/lockdep: Fix the rollback and overwrite detection logic in crossrelease
...
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/atomic_bitops.txt | 66 | ||||
-rw-r--r-- | Documentation/atomic_t.txt | 242 | ||||
-rw-r--r-- | Documentation/locking/crossrelease.txt | 874 | ||||
-rw-r--r-- | Documentation/memory-barriers.txt | 101 | ||||
-rw-r--r-- | Documentation/static-keys.txt | 20 | ||||
-rw-r--r-- | Documentation/translations/ko_KR/memory-barriers.txt | 5 |
6 files changed, 1211 insertions, 97 deletions
diff --git a/Documentation/atomic_bitops.txt b/Documentation/atomic_bitops.txt new file mode 100644 index 000000000000..5550bfdcce5f --- /dev/null +++ b/Documentation/atomic_bitops.txt @@ -0,0 +1,66 @@ + +On atomic bitops. + + +While our bitmap_{}() functions are non-atomic, we have a number of operations +operating on single bits in a bitmap that are atomic. + + +API +--- + +The single bit operations are: + +Non-RMW ops: + + test_bit() + +RMW atomic operations without return value: + + {set,clear,change}_bit() + clear_bit_unlock() + +RMW atomic operations with return value: + + test_and_{set,clear,change}_bit() + test_and_set_bit_lock() + +Barriers: + + smp_mb__{before,after}_atomic() + + +All RMW atomic operations have a '__' prefixed variant which is non-atomic. + + +SEMANTICS +--------- + +Non-atomic ops: + +In particular __clear_bit_unlock() suffers the same issue as atomic_set(), +which is why the generic version maps to clear_bit_unlock(), see atomic_t.txt. + + +RMW ops: + +The test_and_{}_bit() operations return the original value of the bit. + + +ORDERING +-------- + +Like with atomic_t, the rule of thumb is: + + - non-RMW operations are unordered; + + - RMW operations that have no return value are unordered; + + - RMW operations that have a return value are fully ordered. + +Except for test_and_set_bit_lock() which has ACQUIRE semantics and +clear_bit_unlock() which has RELEASE semantics. + +Since a platform only has a single means of achieving atomic operations +the same barriers as for atomic_t are used, see atomic_t.txt. + diff --git a/Documentation/atomic_t.txt b/Documentation/atomic_t.txt new file mode 100644 index 000000000000..913396ac5824 --- /dev/null +++ b/Documentation/atomic_t.txt @@ -0,0 +1,242 @@ + +On atomic types (atomic_t atomic64_t and atomic_long_t). + +The atomic type provides an interface to the architecture's means of atomic +RMW operations between CPUs (atomic operations on MMIO are not supported and +can lead to fatal traps on some platforms). + +API +--- + +The 'full' API consists of (atomic64_ and atomic_long_ prefixes omitted for +brevity): + +Non-RMW ops: + + atomic_read(), atomic_set() + atomic_read_acquire(), atomic_set_release() + + +RMW atomic operations: + +Arithmetic: + + atomic_{add,sub,inc,dec}() + atomic_{add,sub,inc,dec}_return{,_relaxed,_acquire,_release}() + atomic_fetch_{add,sub,inc,dec}{,_relaxed,_acquire,_release}() + + +Bitwise: + + atomic_{and,or,xor,andnot}() + atomic_fetch_{and,or,xor,andnot}{,_relaxed,_acquire,_release}() + + +Swap: + + atomic_xchg{,_relaxed,_acquire,_release}() + atomic_cmpxchg{,_relaxed,_acquire,_release}() + atomic_try_cmpxchg{,_relaxed,_acquire,_release}() + + +Reference count (but please see refcount_t): + + atomic_add_unless(), atomic_inc_not_zero() + atomic_sub_and_test(), atomic_dec_and_test() + + +Misc: + + atomic_inc_and_test(), atomic_add_negative() + atomic_dec_unless_positive(), atomic_inc_unless_negative() + + +Barriers: + + smp_mb__{before,after}_atomic() + + + +SEMANTICS +--------- + +Non-RMW ops: + +The non-RMW ops are (typically) regular LOADs and STOREs and are canonically +implemented using READ_ONCE(), WRITE_ONCE(), smp_load_acquire() and +smp_store_release() respectively. + +The one detail to this is that atomic_set{}() should be observable to the RMW +ops. That is: + + C atomic-set + + { + atomic_set(v, 1); + } + + P1(atomic_t *v) + { + atomic_add_unless(v, 1, 0); + } + + P2(atomic_t *v) + { + atomic_set(v, 0); + } + + exists + (v=2) + +In this case we would expect the atomic_set() from CPU1 to either happen +before the atomic_add_unless(), in which case that latter one would no-op, or +_after_ in which case we'd overwrite its result. In no case is "2" a valid +outcome. + +This is typically true on 'normal' platforms, where a regular competing STORE +will invalidate a LL/SC or fail a CMPXCHG. + +The obvious case where this is not so is when we need to implement atomic ops +with a lock: + + CPU0 CPU1 + + atomic_add_unless(v, 1, 0); + lock(); + ret = READ_ONCE(v->counter); // == 1 + atomic_set(v, 0); + if (ret != u) WRITE_ONCE(v->counter, 0); + WRITE_ONCE(v->counter, ret + 1); + unlock(); + +the typical solution is to then implement atomic_set{}() with atomic_xchg(). + + +RMW ops: + +These come in various forms: + + - plain operations without return value: atomic_{}() + + - operations which return the modified value: atomic_{}_return() + + these are limited to the arithmetic operations because those are + reversible. Bitops are irreversible and therefore the modified value + is of dubious utility. + + - operations which return the original value: atomic_fetch_{}() + + - swap operations: xchg(), cmpxchg() and try_cmpxchg() + + - misc; the special purpose operations that are commonly used and would, + given the interface, normally be implemented using (try_)cmpxchg loops but + are time critical and can, (typically) on LL/SC architectures, be more + efficiently implemented. + +All these operations are SMP atomic; that is, the operations (for a single +atomic variable) can be fully ordered and no intermediate state is lost or +visible. + + +ORDERING (go read memory-barriers.txt first) +-------- + +The rule of thumb: + + - non-RMW operations are unordered; + + - RMW operations that have no return value are unordered; + + - RMW operations that have a return value are fully ordered; + + - RMW operations that are conditional are unordered on FAILURE, + otherwise the above rules apply. + +Except of course when an operation has an explicit ordering like: + + {}_relaxed: unordered + {}_acquire: the R of the RMW (or atomic_read) is an ACQUIRE + {}_release: the W of the RMW (or atomic_set) is a RELEASE + +Where 'unordered' is against other memory locations. Address dependencies are +not defeated. + +Fully ordered primitives are ordered against everything prior and everything +subsequent. Therefore a fully ordered primitive is like having an smp_mb() +before and an smp_mb() after the primitive. + + +The barriers: + + smp_mb__{before,after}_atomic() + +only apply to the RMW ops and can be used to augment/upgrade the ordering +inherent to the used atomic op. These barriers provide a full smp_mb(). + +These helper barriers exist because architectures have varying implicit +ordering on their SMP atomic primitives. For example our TSO architectures +provide full ordered atomics and these barriers are no-ops. + +Thus: + + atomic_fetch_add(); + +is equivalent to: + + smp_mb__before_atomic(); + atomic_fetch_add_relaxed(); + smp_mb__after_atomic(); + +However the atomic_fetch_add() might be implemented more efficiently. + +Further, while something like: + + smp_mb__before_atomic(); + atomic_dec(&X); + +is a 'typical' RELEASE pattern, the barrier is strictly stronger than +a RELEASE. Similarly for something like: + + atomic_inc(&X); + smp_mb__after_atomic(); + +is an ACQUIRE pattern (though very much not typical), but again the barrier is +strictly stronger than ACQUIRE. As illustrated: + + C strong-acquire + + { + } + + P1(int *x, atomic_t *y) + { + r0 = READ_ONCE(*x); + smp_rmb(); + r1 = atomic_read(y); + } + + P2(int *x, atomic_t *y) + { + atomic_inc(y); + smp_mb__after_atomic(); + WRITE_ONCE(*x, 1); + } + + exists + (r0=1 /\ r1=0) + +This should not happen; but a hypothetical atomic_inc_acquire() -- +(void)atomic_fetch_inc_acquire() for instance -- would allow the outcome, +since then: + + P1 P2 + + t = LL.acq *y (0) + t++; + *x = 1; + r0 = *x (1) + RMB + r1 = *y (0) + SC *y, t; + +is allowed. diff --git a/Documentation/locking/crossrelease.txt b/Documentation/locking/crossrelease.txt new file mode 100644 index 000000000000..bdf1423d5f99 --- /dev/null +++ b/Documentation/locking/crossrelease.txt @@ -0,0 +1,874 @@ +Crossrelease +============ + +Started by Byungchul Park <byungchul.park@lge.com> + +Contents: + + (*) Background + + - What causes deadlock + - How lockdep works + + (*) Limitation + + - Limit lockdep + - Pros from the limitation + - Cons from the limitation + - Relax the limitation + + (*) Crossrelease + + - Introduce crossrelease + - Introduce commit + + (*) Implementation + + - Data structures + - How crossrelease works + + (*) Optimizations + + - Avoid duplication + - Lockless for hot paths + + (*) APPENDIX A: What lockdep does to work aggresively + + (*) APPENDIX B: How to avoid adding false dependencies + + +========== +Background +========== + +What causes deadlock +-------------------- + +A deadlock occurs when a context is waiting for an event to happen, +which is impossible because another (or the) context who can trigger the +event is also waiting for another (or the) event to happen, which is +also impossible due to the same reason. + +For example: + + A context going to trigger event C is waiting for event A to happen. + A context going to trigger event A is waiting for event B to happen. + A context going to trigger event B is waiting for event C to happen. + +A deadlock occurs when these three wait operations run at the same time, +because event C cannot be triggered if event A does not happen, which in +turn cannot be triggered if event B does not happen, which in turn +cannot be triggered if event C does not happen. After all, no event can +be triggered since any of them never meets its condition to wake up. + +A dependency might exist between two waiters and a deadlock might happen +due to an incorrect releationship between dependencies. Thus, we must +define what a dependency is first. A dependency exists between them if: + + 1. There are two waiters waiting for each event at a given time. + 2. The only way to wake up each waiter is to trigger its event. + 3. Whether one can be woken up depends on whether the other can. + +Each wait in the example creates its dependency like: + + Event C depends on event A. + Event A depends on event B. + Event B depends on event C. + + NOTE: Precisely speaking, a dependency is one between whether a + waiter for an event can be woken up and whether another waiter for + another event can be woken up. However from now on, we will describe + a dependency as if it's one between an event and another event for + simplicity. + +And they form circular dependencies like: + + -> C -> A -> B - + / \ + \ / + ---------------- + + where 'A -> B' means that event A depends on event B. + +Such circular dependencies lead to a deadlock since no waiter can meet +its condition to wake up as described. + +CONCLUSION + +Circular dependencies cause a deadlock. + + +How lockdep works +----------------- + +Lockdep tries to detect a deadlock by checking dependencies created by +lock operations, acquire and release. Waiting for a lock corresponds to +waiting for an event, and releasing a lock corresponds to triggering an +event in the previous section. + +In short, lockdep does: + + 1. Detect a new dependency. + 2. Add the dependency into a global graph. + 3. Check if that makes dependencies circular. + 4. Report a deadlock or its possibility if so. + +For example, consider a graph built by lockdep that looks like: + + A -> B - + \ + -> E + / + C -> D - + + where A, B,..., E are different lock classes. + +Lockdep will add a dependency into the graph on detection of a new +dependency. For example, it will add a dependency 'E -> C' when a new +dependency between lock E and lock C is detected. Then the graph will be: + + A -> B - + \ + -> E - + / \ + -> C -> D - \ + / / + \ / + ------------------ + + where A, B,..., E are different lock classes. + +This graph contains a subgraph which demonstrates circular dependencies: + + -> E - + / \ + -> C -> D - \ + / / + \ / + ------------------ + + where C, D and E are different lock classes. + +This is the condition under which a deadlock might occur. Lockdep +reports it on detection after adding a new dependency. This is the way +how lockdep works. + +CONCLUSION + +Lockdep detects a deadlock or its possibility by checking if circular +dependencies were created after adding each new dependency. + + +========== +Limitation +========== + +Limit lockdep +------------- + +Limiting lockdep to work on only typical locks e.g. spin locks and +mutexes, which are released within the acquire context, the +implementation becomes simple but its capacity for detection becomes +limited. Let's check pros and cons in next section. + + +Pros from the limitation +------------------------ + +Given the limitation, when acquiring a lock, locks in a held_locks +cannot be released if the context cannot acquire it so has to wait to +acquire it, which means all waiters for the locks in the held_locks are +stuck. It's an exact case to create dependencies between each lock in +the held_locks and the lock to acquire. + +For example: + + CONTEXT X + --------- + acquire A + acquire B /* Add a dependency 'A -> B' */ + release B + release A + + where A and B are different lock classes. + +When acquiring lock A, the held_locks of CONTEXT X is empty thus no +dependency is added. But when acquiring lock B, lockdep detects and adds +a new dependency 'A -> B' between lock A in the held_locks and lock B. +They can be simply added whenever acquiring each lock. + +And data required by lockdep exists in a local structure, held_locks +embedded in task_struct. Forcing to access the data within the context, +lockdep can avoid racy problems without explicit locks while handling +the local data. + +Lastly, lockdep only needs to keep locks currently being held, to build +a dependency graph. However, relaxing the limitation, it needs to keep +even locks already released, because a decision whether they created +dependencies might be long-deferred. + +To sum up, we can expect several advantages from the limitation: + + 1. Lockdep can easily identify a dependency when acquiring a lock. + 2. Races are avoidable while accessing local locks in a held_locks. + 3. Lockdep only needs to keep locks currently being held. + +CONCLUSION + +Given the limitation, the implementation becomes simple and efficient. + + +Cons from the limitation +------------------------ + +Given the limitation, lockdep is applicable only to typical locks. For +example, page locks for page access or completions for synchronization +cannot work with lockdep. + +Can we detect deadlocks below, under the limitation? + +Example 1: + + CONTEXT X CONTEXT Y CONTEXT Z + --------- --------- ---------- + mutex_lock A + lock_page B + lock_page B + mutex_lock A /* DEADLOCK */ + unlock_page B held by X + unlock_page B + mutex_unlock A + mutex_unlock A + + where A and B are different lock classes. + +No, we cannot. + +Example 2: + + CONTEXT X CONTEXT Y + --------- --------- + mutex_lock A + mutex_lock A + wait_for_complete B /* DEADLOCK */ + complete B + mutex_unlock A + mutex_unlock A + + where A is a lock class and B is a completion variable. + +No, we cannot. + +CONCLUSION + +Given the limitation, lockdep cannot detect a deadlock or its +possibility caused by page locks or completions. + + +Relax the limitation +-------------------- + +Under the limitation, things to create dependencies are limited to +typical locks. However, synchronization primitives like page locks and +completions, which are allowed to be released in any context, also +create dependencies and can cause a deadlock. So lockdep should track +these locks to do a better job. We have to relax the limitation for +these locks to work with lockdep. + +Detecting dependencies is very important for lockdep to work because +adding a dependency means adding an opportunity to check whether it +causes a deadlock. The more lockdep adds dependencies, the more it +thoroughly works. Thus Lockdep has to do its best to detect and add as +many true dependencies into a graph as possible. + +For example, considering only typical locks, lockdep builds a graph like: + + A -> B - + \ + -> E + / + C -> D - + + where A, B,..., E are different lock classes. + +On the other hand, under the relaxation, additional dependencies might +be created and added. Assuming additional 'FX -> C' and 'E -> GX' are +added thanks to the relaxation, the graph will be: + + A -> B - + \ + -> E -> GX + / + FX -> C -> D - + + where A, B,..., E, FX and GX are different lock classes, and a suffix + 'X' is added on non-typical locks. + +The latter graph gives us more chances to check circular dependencies +than the former. However, it might suffer performance degradation since +relaxing the limitation, with which design and implementation of lockdep +can be efficient, might introduce inefficiency inevitably. So lockdep +should provide two options, strong detection and efficient detection. + +Choosing efficient detection: + + Lockdep works with only locks restricted to be released within the + acquire context. However, lockdep works efficiently. + +Choosing strong detection: + + Lockdep works with all synchronization primitives. However, lockdep + suffers performance degradation. + +CONCLUSION + +Relaxing the limitation, lockdep can add additional dependencies giving +additional opportunities to check circular dependencies. + + +============ +Crossrelease +============ + +Introduce crossrelease +---------------------- + +In order to allow lockdep to handle additional dependencies by what +might be released in any context, namely 'crosslock', we have to be able +to identify those created by crosslocks. The proposed 'crossrelease' +feature provoides a way to do that. + +Crossrelease feature has to do: + + 1. Identify dependencies created by crosslocks. + 2. Add the dependencies into a dependency graph. + +That's all. Once a meaningful dependency is added into graph, then +lockdep would work with the graph as it did. The most important thing +crossrelease feature has to do is to correctly identify and add true +dependencies into the global graph. + +A dependency e.g. 'A -> B' can be identified only in the A's release +context because a decision required to identify the dependency can be +made only in the release context. That is to decide whether A can be +released so that a waiter for A can be woken up. It cannot be made in +other than the A's release context. + +It's no matter for typical locks because each acquire context is same as +its release context, thus lockdep can decide whether a lock can be +released in the acquire context. However for crosslocks, lockdep cannot +make the decision in the acquire context but has to wait until the +release context is identified. + +Therefore, deadlocks by crosslocks cannot be detected just when it +happens, because those cannot be identified until the crosslocks are +released. However, deadlock possibilities can be detected and it's very +worth. See 'APPENDIX A' section to check why. + +CONCLUSION + +Using crossrelease feature, lockdep can work with what might be released +in any context, namely crosslock. + + +Introduce commit +---------------- + +Since crossrelease defers the work adding true dependencies of +crosslocks until they are actually released, crossrelease has to queue +all acquisitions which might create dependencies with the crosslocks. +Then it identifies dependencies using the queued data in batches at a +proper time. We call it 'commit'. + +There are four types of dependencies: + +1. TT type: 'typical lock A -> typical lock B' + + Just when acquiring B, lockdep can see it's in the A's release + context. So the dependency between A and B can be identified + immediately. Commit is unnecessary. + +2. TC type: 'typical lock A -> crosslock BX' + + Just when acquiring BX, lockdep can see it's in the A's release + context. So the dependency between A and BX can be identified + immediately. Commit is unnecessary, too. + +3. CT type: 'crosslock AX -> typical lock B' + + When acquiring B, lockdep cannot identify the dependency because + there's no way to know if it's in the AX's release context. It has + to wait until the decision can be made. Commit is necessary. + +4. CC type: 'crosslock AX -> crosslock BX' + + When acquiring BX, lockdep cannot identify the dependency because + there's no way to know if it's in the AX's release context. It has + to wait until the decision can be made. Commit is necessary. + But, handling CC type is not implemented yet. It's a future work. + +Lockdep can work without commit for typical locks, but commit step is +necessary once crosslocks are involved. Introducing commit, lockdep +performs three steps. What lockdep does in each step is: + +1. Acquisition: For typical locks, lockdep does what it originally did + and queues the lock so that CT type dependencies can be checked using + it at the commit step. For crosslocks, it saves data which will be + used at the commit step and increases a reference count for it. + +2. Commit: No action is reauired for typical locks. For crosslocks, + lockdep adds CT type dependencies using the data saved at the + acquisition step. + +3. Release: No changes are required for typical locks. When a crosslock + is released, it decreases a reference count for it. + +CONCLUSION + +Crossrelease introduces commit step to handle dependencies of crosslocks +in batches at a proper time. + + +============== +Implementation +============== + +Data structures +--------------- + +Crossrelease introduces two main data structures. + +1. hist_lock + + This is an array embedded in task_struct, for keeping lock history so + that dependencies can be added using them at the commit step. Since + it's local data, it can be accessed locklessly in the owner context. + The array is filled at the acquisition step and consumed at the + commit step. And it's managed in circular manner. + +2. cross_lock + + One per lockdep_map exists. This is for keeping data of crosslocks + and used at the commit step. + + +How crossrelease works +---------------------- + +It's the key of how crossrelease works, to defer necessary works to an +appropriate point in time and perform in at once at the commit step. +Let's take a look with examples step by step, starting from how lockdep +works without crossrelease for typical locks. + + acquire A /* Push A onto held_locks */ + acquire B /* Push B onto held_locks and add 'A -> B' */ + acquire C /* Push C onto held_locks and add 'B -> C' */ + release C /* Pop C from held_locks */ + release B /* Pop B from held_locks */ + release A /* Pop A from held_locks */ + + where A, B and C are different lock classes. + + NOTE: This document assumes that readers already understand how + lockdep works without crossrelease thus omits details. But there's + one thing to note. Lockdep pretends to pop a lock from held_locks + when releasing it. But it's subtly different from the original pop + operation because lockdep allows other than the top to be poped. + +In this case, lockdep adds 'the top of held_locks -> the lock to acquire' +dependency every time acquiring a lock. + +After adding 'A -> B', a dependency graph will be: + + A -> B + + where A and B are different lock classes. + +And after adding 'B -> C', the graph will be: + + A -> B -> C + + where A, B and C are different lock classes. + +Let's performs commit step even for typical locks to add dependencies. +Of course, commit step is not necessary for them, however, it would work +well because this is a more general way. + + acquire A + /* + * Queue A into hist_locks + * + * In hist_locks: A + * In graph: Empty + */ + + acquire B + /* + * Queue B into hist_locks + * + * In hist_locks: A, B + * In graph: Empty + */ + + acquire C + /* + * Queue C into hist_locks + * + * In hist_locks: A, B, C + * In graph: Empty + */ + + commit C + /* + * Add 'C -> ?' + * Answer the following to decide '?' + * What has been queued since acquire C: Nothing + * + * In hist_locks: A, B, C + * In graph: Empty + */ + + release C + + commit B + /* + * Add 'B -> ?' + * Answer the following to decide '?' + * What has been queued since acquire B: C + * + * In hist_locks: A, B, C + * In graph: 'B -> C' + */ + + release B + + commit A + /* + * Add 'A -> ?' + * Answer the following to decide '?' + * What has been queued since acquire A: B, C + * + * In hist_locks: A, B, C + * In graph: 'B -> C', 'A -> B', 'A -> C' + */ + + release A + + where A, B and C are different lock classes. + +In this case, dependencies are added at the commit step as described. + +After commits for A, B and C, the graph will be: + + A -> B -> C + + where A, B and C are different lock classes. + + NOTE: A dependency 'A -> C' is optimized out. + +We can see the former graph built without commit step is same as the +latter graph built using commit steps. Of course the former way leads to +earlier finish for building the graph, which means we can detect a +deadlock or its possibility sooner. So the former way would be prefered +when possible. But we cannot avoid using the latter way for crosslocks. + +Let's look at how commit steps work for crosslocks. In this case, the +commit step is performed only on crosslock AX as real. And it assumes +that the AX release context is different from the AX acquire context. + + BX RELEASE CONTEXT BX ACQUIRE CONTEXT + ------------------ ------------------ + acquire A + /* + * Push A onto held_locks + * Queue A into hist_locks + * + * In held_locks: A + * In hist_locks: A + * In graph: Empty + */ + + acquire BX + /* + * Add 'the top of held_locks -> BX' + * + * In held_locks: A + * In hist_locks: A + * In graph: 'A -> BX' + */ + + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + It must be guaranteed that the following operations are seen after + acquiring BX globally. It can be done by things like barrier. + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + acquire C + /* + * Push C onto held_locks + * Queue C into hist_locks + * + * In held_locks: C + * In hist_locks: C + * In graph: 'A -> BX' + */ + + release C + /* + * Pop C from held_locks + * + * In held_locks: Empty + * In hist_locks: C + * In graph: 'A -> BX' + */ + acquire D + /* + * Push D onto held_locks + * Queue D into hist_locks + * Add 'the top of held_locks -> D' + * + * In held_locks: A, D + * In hist_locks: A, D + * In graph: 'A -> BX', 'A -> D' + */ + acquire E + /* + * Push E onto held_locks + * Queue E into hist_locks + * + * In held_locks: E + * In hist_locks: C, E + * In graph: 'A -> BX', 'A -> D' + */ + + release E + /* + * Pop E from held_locks + * + * In held_locks: Empty + * In hist_locks: D, E + * In graph: 'A -> BX', 'A -> D' + */ + release D + /* + * Pop D from held_locks + * + * In held_locks: A + * In hist_locks: A, D + * In graph: 'A -> BX', 'A -> D' + */ + commit BX + /* + * Add 'BX -> ?' + * What has been queued since acquire BX: C, E + * + * In held_locks: Empty + * In hist_locks: D, E + * In graph: 'A -> BX', 'A -> D', + * 'BX -> C', 'BX -> E' + */ + + release BX + /* + * In held_locks: Empty + * In hist_locks: D, E + * In graph: 'A -> BX', 'A -> D', + * 'BX -> C', 'BX -> E' + */ + release A + /* + * Pop A from held_locks + * + * In held_locks: Empty + * In hist_locks: A, D + * In graph: 'A -> BX', 'A -> D', + * 'BX -> C', 'BX -> E' + */ + + where A, BX, C,..., E are different lock classes, and a suffix 'X' is + added on crosslocks. + +Crossrelease considers all acquisitions after acqiuring BX are +candidates which might create dependencies with BX. True dependencies +will be determined when identifying the release context of BX. Meanwhile, +all typical locks are queued so that they can be used at the commit step. +And then two dependencies 'BX -> C' and 'BX -> E' are added at the +commit step when identifying the release context. + +The final graph will be, with crossrelease: + + -> C + / + -> BX - + / \ + A - -> E + \ + -> D + + where A, BX, C,..., E are different lock classes, and a suffix 'X' is + added on crosslocks. + +However, the final graph will be, without crossrelease: + + A -> D + + where A and D are different lock classes. + +The former graph has three more dependencies, 'A -> BX', 'BX -> C' and +'BX -> E' giving additional opportunities to check if they cause +deadlocks. This way lockdep can detect a deadlock or its possibility +caused by crosslocks. + +CONCLUSION + +We checked how crossrelease works with several examples. + + +============= +Optimizations +============= + +Avoid duplication +----------------- + +Crossrelease feature uses a cache like what lockdep already uses for +dependency chains, but this time it's for caching CT type dependencies. +Once that dependency is cached, the same will never be added again. + + +Lockless for hot paths +---------------------- + +To keep all locks for later use at the commit step, crossrelease adopts +a local array embedded in task_struct, which makes access to the data +lockless by forcing it to happen only within the owner context. It's +like how lockdep handles held_locks. Lockless implmentation is important +since typical locks are very frequently acquired and released. + + +================================================= +APPENDIX A: What lockdep does to work aggresively |