summaryrefslogtreecommitdiffstats
path: root/tools
diff options
context:
space:
mode:
Diffstat (limited to 'tools')
-rw-r--r--tools/memory-model/Documentation/cheatsheet.txt30
-rw-r--r--tools/memory-model/Documentation/explanation.txt1840
-rw-r--r--tools/memory-model/Documentation/recipes.txt570
-rw-r--r--tools/memory-model/Documentation/references.txt107
-rw-r--r--tools/memory-model/MAINTAINERS15
-rw-r--r--tools/memory-model/README220
-rw-r--r--tools/memory-model/linux-kernel.bell53
-rw-r--r--tools/memory-model/linux-kernel.cat124
-rw-r--r--tools/memory-model/linux-kernel.cfg21
-rw-r--r--tools/memory-model/linux-kernel.def108
-rw-r--r--tools/memory-model/litmus-tests/CoRR+poonceonce+Once.litmus19
-rw-r--r--tools/memory-model/litmus-tests/CoRW+poonceonce+Once.litmus18
-rw-r--r--tools/memory-model/litmus-tests/CoWR+poonceonce+Once.litmus18
-rw-r--r--tools/memory-model/litmus-tests/CoWW+poonceonce.litmus11
-rw-r--r--tools/memory-model/litmus-tests/IRIW+mbonceonces+OnceOnce.litmus35
-rw-r--r--tools/memory-model/litmus-tests/IRIW+poonceonces+OnceOnce.litmus33
-rw-r--r--tools/memory-model/litmus-tests/ISA2+poonceonces.litmus28
-rw-r--r--tools/memory-model/litmus-tests/ISA2+pooncerelease+poacquirerelease+poacquireonce.litmus28
-rw-r--r--tools/memory-model/litmus-tests/LB+ctrlonceonce+mbonceonce.litmus23
-rw-r--r--tools/memory-model/litmus-tests/LB+poacquireonce+pooncerelease.litmus21
-rw-r--r--tools/memory-model/litmus-tests/LB+poonceonces.litmus21
-rw-r--r--tools/memory-model/litmus-tests/MP+onceassign+derefonce.litmus25
-rw-r--r--tools/memory-model/litmus-tests/MP+polocks.litmus24
-rw-r--r--tools/memory-model/litmus-tests/MP+poonceonces.litmus20
-rw-r--r--tools/memory-model/litmus-tests/MP+pooncerelease+poacquireonce.litmus20
-rw-r--r--tools/memory-model/litmus-tests/MP+porevlocks.litmus24
-rw-r--r--tools/memory-model/litmus-tests/MP+wmbonceonce+rmbonceonce.litmus22
-rw-r--r--tools/memory-model/litmus-tests/R+mbonceonces.litmus21
-rw-r--r--tools/memory-model/litmus-tests/R+poonceonces.litmus19
-rw-r--r--tools/memory-model/litmus-tests/README125
-rw-r--r--tools/memory-model/litmus-tests/S+poonceonces.litmus19
-rw-r--r--tools/memory-model/litmus-tests/S+wmbonceonce+poacquireonce.litmus20
-rw-r--r--tools/memory-model/litmus-tests/SB+mbonceonces.litmus23
-rw-r--r--tools/memory-model/litmus-tests/SB+poonceonces.litmus21
-rw-r--r--tools/memory-model/litmus-tests/WRC+poonceonces+Once.litmus27
-rw-r--r--tools/memory-model/litmus-tests/WRC+pooncerelease+rmbonceonce+Once.litmus28
-rw-r--r--tools/memory-model/litmus-tests/Z6.0+pooncelock+poonceLock+pombonce.litmus33
-rw-r--r--tools/memory-model/litmus-tests/Z6.0+pooncelock+pooncelock+pombonce.litmus32
-rw-r--r--tools/memory-model/litmus-tests/Z6.0+pooncerelease+poacquirerelease+mbonceonce.litmus28
-rw-r--r--tools/memory-model/lock.cat99
40 files changed, 3973 insertions, 0 deletions
diff --git a/tools/memory-model/Documentation/cheatsheet.txt b/tools/memory-model/Documentation/cheatsheet.txt
new file mode 100644
index 000000000000..1917712bce99
--- /dev/null
+++ b/tools/memory-model/Documentation/cheatsheet.txt
@@ -0,0 +1,30 @@
+ Prior Operation Subsequent Operation
+ --------------- ---------------------------
+ C Self R W RWM Self R W DR DW RMW SV
+ __ ---- - - --- ---- - - -- -- --- --
+
+Store, e.g., WRITE_ONCE() Y Y
+Load, e.g., READ_ONCE() Y Y Y
+Unsuccessful RMW operation Y Y Y
+smp_read_barrier_depends() Y Y Y
+*_dereference() Y Y Y Y
+Successful *_acquire() R Y Y Y Y Y Y
+Successful *_release() C Y Y Y W Y
+smp_rmb() Y R Y Y R
+smp_wmb() Y W Y Y W
+smp_mb() & synchronize_rcu() CP Y Y Y Y Y Y Y Y
+Successful full non-void RMW CP Y Y Y Y Y Y Y Y Y Y Y
+smp_mb__before_atomic() CP Y Y Y a a a a Y
+smp_mb__after_atomic() CP a a Y Y Y Y Y
+
+
+Key: C: Ordering is cumulative
+ P: Ordering propagates
+ R: Read, for example, READ_ONCE(), or read portion of RMW
+ W: Write, for example, WRITE_ONCE(), or write portion of RMW
+ Y: Provides ordering
+ a: Provides ordering given intervening RMW atomic operation
+ DR: Dependent read (address dependency)
+ DW: Dependent write (address, data, or control dependency)
+ RMW: Atomic read-modify-write operation
+ SV Same-variable access
diff --git a/tools/memory-model/Documentation/explanation.txt b/tools/memory-model/Documentation/explanation.txt
new file mode 100644
index 000000000000..867e0ea69b6d
--- /dev/null
+++ b/tools/memory-model/Documentation/explanation.txt
@@ -0,0 +1,1840 @@
+Explanation of the Linux-Kernel Memory Model
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+:Author: Alan Stern <stern@rowland.harvard.edu>
+:Created: October 2017
+
+.. Contents
+
+ 1. INTRODUCTION
+ 2. BACKGROUND
+ 3. A SIMPLE EXAMPLE
+ 4. A SELECTION OF MEMORY MODELS
+ 5. ORDERING AND CYCLES
+ 6. EVENTS
+ 7. THE PROGRAM ORDER RELATION: po AND po-loc
+ 8. A WARNING
+ 9. DEPENDENCY RELATIONS: data, addr, and ctrl
+ 10. THE READS-FROM RELATION: rf, rfi, and rfe
+ 11. CACHE COHERENCE AND THE COHERENCE ORDER RELATION: co, coi, and coe
+ 12. THE FROM-READS RELATION: fr, fri, and fre
+ 13. AN OPERATIONAL MODEL
+ 14. PROPAGATION ORDER RELATION: cumul-fence
+ 15. DERIVATION OF THE LKMM FROM THE OPERATIONAL MODEL
+ 16. SEQUENTIAL CONSISTENCY PER VARIABLE
+ 17. ATOMIC UPDATES: rmw
+ 18. THE PRESERVED PROGRAM ORDER RELATION: ppo
+ 19. AND THEN THERE WAS ALPHA
+ 20. THE HAPPENS-BEFORE RELATION: hb
+ 21. THE PROPAGATES-BEFORE RELATION: pb
+ 22. RCU RELATIONS: link, gp-link, rscs-link, and rcu-path
+ 23. ODDS AND ENDS
+
+
+
+INTRODUCTION
+------------
+
+The Linux-kernel memory model (LKMM) is rather complex and obscure.
+This is particularly evident if you read through the linux-kernel.bell
+and linux-kernel.cat files that make up the formal version of the
+memory model; they are extremely terse and their meanings are far from
+clear.
+
+This document describes the ideas underlying the LKMM. It is meant
+for people who want to understand how the memory model was designed.
+It does not go into the details of the code in the .bell and .cat
+files; rather, it explains in English what the code expresses
+symbolically.
+
+Sections 2 (BACKGROUND) through 5 (ORDERING AND CYCLES) are aimed
+toward beginners; they explain what memory models are and the basic
+notions shared by all such models. People already familiar with these
+concepts can skim or skip over them. Sections 6 (EVENTS) through 12
+(THE FROM_READS RELATION) describe the fundamental relations used in
+many memory models. Starting in Section 13 (AN OPERATIONAL MODEL),
+the workings of the LKMM itself are covered.
+
+Warning: The code examples in this document are not written in the
+proper format for litmus tests. They don't include a header line, the
+initializations are not enclosed in braces, the global variables are
+not passed by pointers, and they don't have an "exists" clause at the
+end. Converting them to the right format is left as an exercise for
+the reader.
+
+
+BACKGROUND
+----------
+
+A memory consistency model (or just memory model, for short) is
+something which predicts, given a piece of computer code running on a
+particular kind of system, what values may be obtained by the code's
+load instructions. The LKMM makes these predictions for code running
+as part of the Linux kernel.
+
+In practice, people tend to use memory models the other way around.
+That is, given a piece of code and a collection of values specified
+for the loads, the model will predict whether it is possible for the
+code to run in such a way that the loads will indeed obtain the
+specified values. Of course, this is just another way of expressing
+the same idea.
+
+For code running on a uniprocessor system, the predictions are easy:
+Each load instruction must obtain the value written by the most recent
+store instruction accessing the same location (we ignore complicating
+factors such as DMA and mixed-size accesses.) But on multiprocessor
+systems, with multiple CPUs making concurrent accesses to shared
+memory locations, things aren't so simple.
+
+Different architectures have differing memory models, and the Linux
+kernel supports a variety of architectures. The LKMM has to be fairly
+permissive, in the sense that any behavior allowed by one of these
+architectures also has to be allowed by the LKMM.
+
+
+A SIMPLE EXAMPLE
+----------------
+
+Here is a simple example to illustrate the basic concepts. Consider
+some code running as part of a device driver for an input device. The
+driver might contain an interrupt handler which collects data from the
+device, stores it in a buffer, and sets a flag to indicate the buffer
+is full. Running concurrently on a different CPU might be a part of
+the driver code being executed by a process in the midst of a read(2)
+system call. This code tests the flag to see whether the buffer is
+ready, and if it is, copies the data back to userspace. The buffer
+and the flag are memory locations shared between the two CPUs.
+
+We can abstract out the important pieces of the driver code as follows
+(the reason for using WRITE_ONCE() and READ_ONCE() instead of simple
+assignment statements is discussed later):
+
+ int buf = 0, flag = 0;
+
+ P0()
+ {
+ WRITE_ONCE(buf, 1);
+ WRITE_ONCE(flag, 1);
+ }
+
+ P1()
+ {
+ int r1;
+ int r2 = 0;
+
+ r1 = READ_ONCE(flag);
+ if (r1)
+ r2 = READ_ONCE(buf);
+ }
+
+Here the P0() function represents the interrupt handler running on one
+CPU and P1() represents the read() routine running on another. The
+value 1 stored in buf represents input data collected from the device.
+Thus, P0 stores the data in buf and then sets flag. Meanwhile, P1
+reads flag into the private variable r1, and if it is set, reads the
+data from buf into a second private variable r2 for copying to
+userspace. (Presumably if flag is not set then the driver will wait a
+while and try again.)
+
+This pattern of memory accesses, where one CPU stores values to two
+shared memory locations and another CPU loads from those locations in
+the opposite order, is widely known as the "Message Passing" or MP
+pattern. It is typical of memory access patterns in the kernel.
+
+Please note that this example code is a simplified abstraction. Real
+buffers are usually larger than a single integer, real device drivers
+usually use sleep and wakeup mechanisms rather than polling for I/O
+completion, and real code generally doesn't bother to copy values into
+private variables before using them. All that is beside the point;
+the idea here is simply to illustrate the overall pattern of memory
+accesses by the CPUs.
+
+A memory model will predict what values P1 might obtain for its loads
+from flag and buf, or equivalently, what values r1 and r2 might end up
+with after the code has finished running.
+
+Some predictions are trivial. For instance, no sane memory model would
+predict that r1 = 42 or r2 = -7, because neither of those values ever
+gets stored in flag or buf.
+
+Some nontrivial predictions are nonetheless quite simple. For
+instance, P1 might run entirely before P0 begins, in which case r1 and
+r2 will both be 0 at the end. Or P0 might run entirely before P1
+begins, in which case r1 and r2 will both be 1.
+
+The interesting predictions concern what might happen when the two
+routines run concurrently. One possibility is that P1 runs after P0's
+store to buf but before the store to flag. In this case, r1 and r2
+will again both be 0. (If P1 had been designed to read buf
+unconditionally then we would instead have r1 = 0 and r2 = 1.)
+
+However, the most interesting possibility is where r1 = 1 and r2 = 0.
+If this were to occur it would mean the driver contains a bug, because
+incorrect data would get sent to the user: 0 instead of 1. As it
+happens, the LKMM does predict this outcome can occur, and the example
+driver code shown above is indeed buggy.
+
+
+A SELECTION OF MEMORY MODELS
+----------------------------
+
+The first widely cited memory model, and the simplest to understand,
+is Sequential Consistency. According to this model, systems behave as
+if each CPU executed its instructions in order but with unspecified
+timing. In other words, the instructions from the various CPUs get
+interleaved in a nondeterministic way, always according to some single
+global order that agrees with the order of the instructions in the
+program source for each CPU. The model says that the value obtained
+by each load is simply the value written by the most recently executed
+store to the same memory location, from any CPU.
+
+For the MP example code shown above, Sequential Consistency predicts
+that the undesired result r1 = 1, r2 = 0 cannot occur. The reasoning
+goes like this:
+
+ Since r1 = 1, P0 must store 1 to flag before P1 loads 1 from
+ it, as loads can obtain values only from earlier stores.
+
+ P1 loads from flag before loading from buf, since CPUs execute
+ their instructions in order.
+
+ P1 must load 0 from buf before P0 stores 1 to it; otherwise r2
+ would be 1 since a load obtains its value from the most recent
+ store to the same address.
+
+ P0 stores 1 to buf before storing 1 to flag, since it executes
+ its instructions in order.
+
+ Since an instruction (in this case, P1's store to flag) cannot
+ execute before itself, the specified outcome is impossible.
+
+However, real computer hardware almost never follows the Sequential
+Consistency memory model; doing so would rule out too many valuable
+performance optimizations. On ARM and PowerPC architectures, for
+instance, the MP example code really does sometimes yield r1 = 1 and
+r2 = 0.
+
+x86 and SPARC follow yet a different memory model: TSO (Total Store
+Ordering). This model predicts that the undesired outcome for the MP
+pattern cannot occur, but in other respects it differs from Sequential
+Consistency. One example is the Store Buffer (SB) pattern, in which
+each CPU stores to its own shared location and then loads from the
+other CPU's location:
+
+ int x = 0, y = 0;
+
+ P0()
+ {
+ int r0;
+
+ WRITE_ONCE(x, 1);
+ r0 = READ_ONCE(y);
+ }
+
+ P1()
+ {
+ int r1;
+
+ WRITE_ONCE(y, 1);
+ r1 = READ_ONCE(x);
+ }
+
+Sequential Consistency predicts that the outcome r0 = 0, r1 = 0 is
+impossible. (Exercise: Figure out the reasoning.) But TSO allows
+this outcome to occur, and in fact it does sometimes occur on x86 and
+SPARC systems.
+
+The LKMM was inspired by the memory models followed by PowerPC, ARM,
+x86, Alpha, and other architectures. However, it is different in
+detail from each of them.
+
+
+ORDERING AND CYCLES
+-------------------
+
+Memory models are all about ordering. Often this is temporal ordering
+(i.e., the order in which certain events occur) but it doesn't have to
+be; consider for example the order of instructions in a program's
+source code. We saw above that Sequential Consistency makes an
+important assumption that CPUs execute instructions in the same order
+as those instructions occur in the code, and there are many other
+instances of ordering playing central roles in memory models.
+
+The counterpart to ordering is a cycle. Ordering rules out cycles:
+It's not possible to have X ordered before Y, Y ordered before Z, and
+Z ordered before X, because this would mean that X is ordered before
+itself. The analysis of the MP example under Sequential Consistency
+involved just such an impossible cycle:
+
+ W: P0 stores 1 to flag executes before
+ X: P1 loads 1 from flag executes before
+ Y: P1 loads 0 from buf executes before
+ Z: P0 stores 1 to buf executes before
+ W: P0 stores 1 to flag.
+
+In short, if a memory model requires certain accesses to be ordered,
+and a certain outcome for the loads in a piece of code can happen only
+if those accesses would form a cycle, then the memory model predicts
+that outcome cannot occur.
+
+The LKMM is defined largely in terms of cycles, as we will see.
+
+
+EVENTS
+------
+
+The LKMM does not work directly with the C statements that make up
+kernel source code. Instead it considers the effects of those
+statements in a more abstract form, namely, events. The model
+includes three types of events:
+
+ Read events correspond to loads from shared memory, such as
+ calls to READ_ONCE(), smp_load_acquire(), or
+ rcu_dereference().
+
+ Write events correspond to stores to shared memory, such as
+ calls to WRITE_ONCE(), smp_store_release(), or atomic_set().
+
+ Fence events correspond to memory barriers (also known as
+ fences), such as calls to smp_rmb() or rcu_read_lock().
+
+These categories are not exclusive; a read or write event can also be
+a fence. This happens with functions like smp_load_acquire() or
+spin_lock(). However, no single event can be both a read and a write.
+Atomic read-modify-write accesses, such as atomic_inc() or xchg(),
+correspond to a pair of events: a read followed by a write. (The
+write event is omitted for executions where it doesn't occur, such as
+a cmpxchg() where the comparison fails.)
+
+Other parts of the code, those which do not involve interaction with
+shared memory, do not give rise to events. Thus, arithmetic and
+logical computations, control-flow instructions, or accesses to
+private memory or CPU registers are not of central interest to the
+memory model. They only affect the model's predictions indirectly.
+For example, an arithmetic computation might determine the value that
+gets stored to a shared memory location (or in the case of an array
+index, the address where the value gets stored), but the memory model
+is concerned only with the store itself -- its value and its address
+-- not the computation leading up to it.
+
+Events in the LKMM can be linked by various relations, which we will
+describe in the following sections. The memory model requires certain
+of these relations to be orderings, that is, it requires them not to
+have any cycles.
+
+
+THE PROGRAM ORDER RELATION: po AND po-loc
+-----------------------------------------
+
+The most important relation between events is program order (po). You
+can think of it as the order in which statements occur in the source
+code after branches are taken into account and loops have been
+unrolled. A better description might be the order in which
+instructions are presented to a CPU's execution unit. Thus, we say
+that X is po-before Y (written as "X ->po Y" in formulas) if X occurs
+before Y in the instruction stream.
+
+This is inherently a single-CPU relation; two instructions executing
+on different CPUs are never linked by po. Also, it is by definition
+an ordering so it cannot have any cycles.
+
+po-loc is a sub-relation of po. It links two memory accesses when the
+first comes before the second in program order and they access the
+same memory location (the "-loc" suffix).
+
+Although this may seem straightforward, there is one subtle aspect to
+program order we need to explain. The LKMM was inspired by low-level
+architectural memory models which describe the behavior of machine
+code, and it retains their outlook to a considerable extent. The
+read, write, and fence events used by the model are close in spirit to
+individual machine instructions. Nevertheless, the LKMM describes
+kernel code written in C, and the mapping from C to machine code can
+be extremely complex.
+
+Optimizing compilers have great freedom in the way they translate
+source code to object code. They are allowed to apply transformations
+that add memory accesses, eliminate accesses, combine them, split them
+into pieces, or move them around. Faced with all these possibilities,
+the LKMM basically gives up. It insists that the code it analyzes
+must contain no ordinary accesses to shared memory; all accesses must
+be performed using READ_ONCE(), WRITE_ONCE(), or one of the other
+atomic or synchronization primitives. These primitives prevent a
+large number of compiler optimizations. In particular, it is
+guaranteed that the compiler will not remove such accesses from the
+generated code (unless it can prove the accesses will never be
+executed), it will not change the order in which they occur in the
+code (within limits imposed by the C standard), and it will not
+introduce extraneous accesses.
+
+This explains why the MP and SB examples above used READ_ONCE() and
+WRITE_ONCE() rather than ordinary memory accesses. Thanks to this
+usage, we can be certain that in the MP example, P0's write event to
+buf really is po-before its write event to flag, and similarly for the
+other shared memory accesses in the examples.
+
+Private variables are not subject to this restriction. Since they are
+not shared between CPUs, they can be accessed normally without
+READ_ONCE() or WRITE_ONCE(), and there will be no ill effects. In
+fact, they need not even be stored in normal memory at all -- in
+principle a private variable could be stored in a CPU register (hence
+the convention that these variables have names starting with the
+letter 'r').
+
+
+A WARNING
+---------
+
+The protections provided by READ_ONCE(), WRITE_ONCE(), and others are
+not perfect; and under some circumstances it is possible for the
+compiler to undermine the memory model. Here is an example. Suppose
+both branches of an "if" statement store the same value to the same
+location:
+
+ r1 = READ_ONCE(x);
+ if (r1) {
+ WRITE_ONCE(y, 2);
+ ... /* do something */
+ } else {
+ WRITE_ONCE(y, 2);
+ ... /* do something else */
+ }
+
+For this code, the LKMM predicts that the load from x will always be
+executed before either of the stores to y. However, a compiler could
+lift the stores out of the conditional, transforming the code into
+something resembling:
+
+ r1 = READ_ONCE(x);
+ WRITE_ONCE(y, 2);
+ if (r1) {
+ ... /* do something */
+ } else {
+ ... /* do something else */
+ }
+
+Given this version of the code, the LKMM would predict that the load
+from x could be executed after the store to y. Thus, the memory
+model's original prediction could be invalidated by the compiler.
+
+Another issue arises from the fact that in C, arguments to many
+operators and function calls can be evaluated in any order. For
+example:
+
+ r1 = f(5) + g(6);
+
+The object code might call f(5) either before or after g(6); the
+memory model cannot assume there is a fixed program order relation
+between them. (In fact, if the functions are inlined then the
+compiler might even interleave their object code.)
+
+
+DEPENDENCY RELATIONS: data, addr, and ctrl
+------------------------------------------
+
+We say that two events are linked by a dependency relation when the
+execution of the second event depends in some way on a value obtained
+from memory by the first. The first event must be a read, and the
+value it obtains must somehow affect what the second event does.
+There are three kinds of dependencies: data, address (addr), and
+control (ctrl).
+
+A read and a write event are linked by a data dependency if the value
+obtained by the read affects the value stored by the write. As a very
+simple example:
+
+ int x, y;
+
+ r1 = READ_ONCE(x);
+ WRITE_ONCE(y, r1 + 5);
+
+The value stored by the WRITE_ONCE obviously depends on the value
+loaded by the READ_ONCE. Such dependencies can wind through
+arbitrarily complicated computations, and a write can depend on the
+values of multiple reads.
+
+A read event and another memory access event are linked by an address
+dependency if the value obtained by the read affects the location
+accessed by the other event. The second event can be either a read or
+a write. Here's another simple example:
+
+ int a[20];
+ int i;
+
+ r1 = READ_ONCE(i);
+ r2 = READ_ONCE(a[r1]);
+
+Here the location accessed by the second READ_ONCE() depends on the
+index value loaded by the first. Pointer indirection also gives rise
+to address dependencies, since the address of a location accessed
+through a pointer will depend on the value read earlier from that
+pointer.
+
+Finally, a read event and another memory access event are linked by a
+control dependency if the value obtained by the read affects whether
+the second event is executed at all. Simple example:
+
+ int x, y;
+
+ r1 = READ_ONCE(x);
+ if (r1)
+ WRITE_ONCE(y, 1984);
+
+Execution of the WRITE_ONCE() is controlled by a conditional expression
+which depends on the value obtained by the READ_ONCE(); hence there is
+a control dependency from the load to the store.
+
+It should be pretty obvious that events can only depend on reads that
+come earlier in program order. Symbolically, if we have R ->data X,
+R ->addr X, or R ->ctrl X (where R is a read event), then we must also
+have R ->po X. It wouldn't make sense for a computation to depend
+somehow on a value that doesn't get loaded from shared memory until
+later in the code!
+
+
+THE READS-FROM RELATION: rf, rfi, and rfe
+-----------------------------------------
+
+The reads-from relation (rf) links a write event to a read event when
+the value loaded by the read is the value that was stored by the
+write. In colloquial terms, the load "reads from" the store. We
+write W ->rf R to indicate that the load R reads from the store W. We
+further distinguish the cases where the load and the store occur on
+the same CPU (internal reads-from, or rfi) and where they occur on
+different CPUs (external reads-from, or rfe).
+
+For our purposes, a memory location's initial value is treated as
+though it had been written there by an imaginary initial store that
+executes on a separate CPU before the program runs.
+
+Usage of the rf relation implicitly assumes that loads will always
+read from a single store. It doesn't apply properly in the presence
+of load-tearing, where a load obtains some of its bits from one store
+and some of them from another store. Fortunately, use of READ_ONCE()
+and WRITE_ONCE() will prevent load-tearing; it's not possible to have:
+
+ int x = 0;
+
+ P0()
+ {
+ WRITE_ONCE(x, 0x1234);
+ }
+
+ P1()
+ {
+ int r1;
+
+ r1 = READ_ONCE(x);
+ }
+
+and end up with r1 = 0x1200 (partly from x's initial value and partly
+from the value stored by P0).
+
+On the other hand, load-tearing is unavoidable when mixed-size
+accesses are used. Consider this example:
+
+ union {
+ u32 w;
+ u16 h[2];
+ } x;
+
+ P0()
+ {
+ WRITE_ONCE(x.h[0], 0x1234);
+ WRITE_ONCE(x.h[1], 0x5678);
+ }
+
+ P1()
+ {
+ int r1;
+
+ r1 = READ_ONCE(x.w);
+ }
+
+If r1 = 0x56781234 (little-endian!) at the end, then P1 must have read
+from both of P0's stores. It is possible to handle mixed-size and
+unaligned accesses in a memory model, but the LKMM currently does not
+attempt to do so. It requires all accesses to be properly aligned and
+of the location's actual size.
+
+
+CACHE COHERENCE AND THE COHERENCE ORDER RELATION: co, coi, and coe
+------------------------------------------------------------------
+
+Cache coherence is a general principle requiring that in a
+multi-processor system, the CPUs must share a consistent view of the
+memory contents. Specifically, it requires that for each location in
+shared memory, the stores to that location must form a single global
+ordering which all the CPUs agree on (the coherence order), and this
+ordering must be consistent with the program order for accesses to
+that location.
+
+To put it another way, for any variable x, the coherence order (co) of
+the stores to x is simply the order in which the stores overwrite one
+another. The imaginary store which establishes x's initial value
+comes first in the coherence order; the store which directly
+overwrites the initial value comes second; the store which overwrites
+that value comes third, and so on.
+
+You can think of the coherence order as being the order in which the
+stores reach x's location in memory (or if you prefer a more
+hardware-centric view, the order in which the stores get written to
+x's cache line). We write W ->co W' if W comes before W' in the
+coherence order, that is, if the value stored by W gets overwritten,
+directly or indirectly, by the value stored by W'.
+
+Coherence order is required to be consistent with program order. This
+requirement takes the form of four coherency rules:
+
+ Write-write coherence: If W ->po-loc W' (i.e., W comes before
+ W' in program order and they access the same location), where W
+ and W' are two stores, then W ->co W'.
+
+ Write-read coherence: If W ->po-loc R, where W is a store and R
+ is a load, then R must read from W or from some other store
+ which comes after W in the coherence order.
+
+ Read-write coherence: If R ->po-loc W, where R is a load and W
+ is a store, then the store which R reads from must come before
+ W in the coherence order.
+
+ Read-read coherence: If R ->po-loc R', where R and R' are two
+ loads, then either they read from the same store or else the
+ store read by R comes before the store read by R' in the
+ coherence order.
+
+This is sometimes referred to as sequential consistency per variable,
+because it means that the accesses to any single memory location obey
+the rules of the Sequential Consistency memory model. (According to
+Wikipedia, sequential consistency per variable and cache coherence
+mean the same thing except that cache coherence includes an extra
+requirement that every store eventually becomes visible to every CPU.)
+
+Any reasonable memory model will include cache coherence. Indeed, our
+expectation of cache coherence is so deeply ingrained that violations
+of its requirements look more like hardware bugs than programming
+errors:
+
+ int x;
+
+ P0()
+ {
+ WRITE_ONCE(x, 17);
+ WRITE_ONCE(x, 23);
+ }
+
+If the final value stored in x after this code ran was 17, you would
+think your computer was broken. It would be a violation of the