diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2020-06-01 15:45:27 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2020-06-01 15:45:27 -0700 |
commit | b23c4771ff62de8ca9b5e4a2d64491b2fb6f8f69 (patch) | |
tree | 3ff6b2bdfec161fbc383bba06bab6329e81b02f7 /Documentation/core-api | |
parent | c2b0fc847f3122e5a4176c3772626a7a8facced0 (diff) | |
parent | e35b5a4c494a75a683ddf4901a43e0a128d5bfe3 (diff) |
Merge tag 'docs-5.8' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"A fair amount of stuff this time around, dominated by yet another
massive set from Mauro toward the completion of the RST conversion. I
*really* hope we are getting close to the end of this. Meanwhile,
those patches reach pretty far afield to update document references
around the tree; there should be no actual code changes there. There
will be, alas, more of the usual trivial merge conflicts.
Beyond that we have more translations, improvements to the sphinx
scripting, a number of additions to the sysctl documentation, and lots
of fixes"
* tag 'docs-5.8' of git://git.lwn.net/linux: (130 commits)
Documentation: fixes to the maintainer-entry-profile template
zswap: docs/vm: Fix typo accept_threshold_percent in zswap.rst
tracing: Fix events.rst section numbering
docs: acpi: fix old http link and improve document format
docs: filesystems: add info about efivars content
Documentation: LSM: Correct the basic LSM description
mailmap: change email for Ricardo Ribalda
docs: sysctl/kernel: document unaligned controls
Documentation: admin-guide: update bug-hunting.rst
docs: sysctl/kernel: document ngroups_max
nvdimm: fixes to maintainter-entry-profile
Documentation/features: Correct RISC-V kprobes support entry
Documentation/features: Refresh the arch support status files
Revert "docs: sysctl/kernel: document ngroups_max"
docs: move locking-specific documents to locking/
docs: move digsig docs to the security book
docs: move the kref doc into the core-api book
docs: add IRQ documentation at the core-api book
docs: debugging-via-ohci1394.txt: add it to the core-api book
docs: fix references for ipmi.rst file
...
Diffstat (limited to 'Documentation/core-api')
-rw-r--r-- | Documentation/core-api/debugging-via-ohci1394.rst | 185 | ||||
-rw-r--r-- | Documentation/core-api/dma-api-howto.rst | 929 | ||||
-rw-r--r-- | Documentation/core-api/dma-api.rst | 745 | ||||
-rw-r--r-- | Documentation/core-api/dma-attributes.rst | 140 | ||||
-rw-r--r-- | Documentation/core-api/dma-isa-lpc.rst | 152 | ||||
-rw-r--r-- | Documentation/core-api/index.rst | 9 | ||||
-rw-r--r-- | Documentation/core-api/irq/concepts.rst | 24 | ||||
-rw-r--r-- | Documentation/core-api/irq/index.rst | 11 | ||||
-rw-r--r-- | Documentation/core-api/irq/irq-affinity.rst | 70 | ||||
-rw-r--r-- | Documentation/core-api/irq/irq-domain.rst | 270 | ||||
-rw-r--r-- | Documentation/core-api/irq/irqflags-tracing.rst | 52 | ||||
-rw-r--r-- | Documentation/core-api/kobject.rst | 28 | ||||
-rw-r--r-- | Documentation/core-api/kref.rst | 323 | ||||
-rw-r--r-- | Documentation/core-api/printk-basics.rst | 115 | ||||
-rw-r--r-- | Documentation/core-api/printk-formats.rst | 2 | ||||
-rw-r--r-- | Documentation/core-api/rbtree.rst | 429 |
16 files changed, 3471 insertions, 13 deletions
diff --git a/Documentation/core-api/debugging-via-ohci1394.rst b/Documentation/core-api/debugging-via-ohci1394.rst new file mode 100644 index 000000000000..981ad4f89fd3 --- /dev/null +++ b/Documentation/core-api/debugging-via-ohci1394.rst @@ -0,0 +1,185 @@ +=========================================================================== +Using physical DMA provided by OHCI-1394 FireWire controllers for debugging +=========================================================================== + +Introduction +------------ + +Basically all FireWire controllers which are in use today are compliant +to the OHCI-1394 specification which defines the controller to be a PCI +bus master which uses DMA to offload data transfers from the CPU and has +a "Physical Response Unit" which executes specific requests by employing +PCI-Bus master DMA after applying filters defined by the OHCI-1394 driver. + +Once properly configured, remote machines can send these requests to +ask the OHCI-1394 controller to perform read and write requests on +physical system memory and, for read requests, send the result of +the physical memory read back to the requester. + +With that, it is possible to debug issues by reading interesting memory +locations such as buffers like the printk buffer or the process table. + +Retrieving a full system memory dump is also possible over the FireWire, +using data transfer rates in the order of 10MB/s or more. + +With most FireWire controllers, memory access is limited to the low 4 GB +of physical address space. This can be a problem on IA64 machines where +memory is located mostly above that limit, but it is rarely a problem on +more common hardware such as x86, x86-64 and PowerPC. + +At least LSI FW643e and FW643e2 controllers are known to support access to +physical addresses above 4 GB, but this feature is currently not enabled by +Linux. + +Together with a early initialization of the OHCI-1394 controller for debugging, +this facility proved most useful for examining long debugs logs in the printk +buffer on to debug early boot problems in areas like ACPI where the system +fails to boot and other means for debugging (serial port) are either not +available (notebooks) or too slow for extensive debug information (like ACPI). + +Drivers +------- + +The firewire-ohci driver in drivers/firewire uses filtered physical +DMA by default, which is more secure but not suitable for remote debugging. +Pass the remote_dma=1 parameter to the driver to get unfiltered physical DMA. + +Because the firewire-ohci driver depends on the PCI enumeration to be +completed, an initialization routine which runs pretty early has been +implemented for x86. This routine runs long before console_init() can be +called, i.e. before the printk buffer appears on the console. + +To activate it, enable CONFIG_PROVIDE_OHCI1394_DMA_INIT (Kernel hacking menu: +Remote debugging over FireWire early on boot) and pass the parameter +"ohci1394_dma=early" to the recompiled kernel on boot. + +Tools +----- + +firescope - Originally developed by Benjamin Herrenschmidt, Andi Kleen ported +it from PowerPC to x86 and x86_64 and added functionality, firescope can now +be used to view the printk buffer of a remote machine, even with live update. + +Bernhard Kaindl enhanced firescope to support accessing 64-bit machines +from 32-bit firescope and vice versa: +- http://v3.sk/~lkundrak/firescope/ + +and he implemented fast system dump (alpha version - read README.txt): +- http://halobates.de/firewire/firedump-0.1.tar.bz2 + +There is also a gdb proxy for firewire which allows to use gdb to access +data which can be referenced from symbols found by gdb in vmlinux: +- http://halobates.de/firewire/fireproxy-0.33.tar.bz2 + +The latest version of this gdb proxy (fireproxy-0.34) can communicate (not +yet stable) with kgdb over an memory-based communication module (kgdbom). + +Getting Started +--------------- + +The OHCI-1394 specification regulates that the OHCI-1394 controller must +disable all physical DMA on each bus reset. + +This means that if you want to debug an issue in a system state where +interrupts are disabled and where no polling of the OHCI-1394 controller +for bus resets takes place, you have to establish any FireWire cable +connections and fully initialize all FireWire hardware __before__ the +system enters such state. + +Step-by-step instructions for using firescope with early OHCI initialization: + +1) Verify that your hardware is supported: + + Load the firewire-ohci module and check your kernel logs. + You should see a line similar to:: + + firewire_ohci 0000:15:00.1: added OHCI v1.0 device as card 2, 4 IR + 4 IT + ... contexts, quirks 0x11 + + when loading the driver. If you have no supported controller, many PCI, + CardBus and even some Express cards which are fully compliant to OHCI-1394 + specification are available. If it requires no driver for Windows operating + systems, it most likely is. Only specialized shops have cards which are not + compliant, they are based on TI PCILynx chips and require drivers for Windows + operating systems. + + The mentioned kernel log message contains the string "physUB" if the + controller implements a writable Physical Upper Bound register. This is + required for physical DMA above 4 GB (but not utilized by Linux yet). + +2) Establish a working FireWire cable connection: + + Any FireWire cable, as long at it provides electrically and mechanically + stable connection and has matching connectors (there are small 4-pin and + large 6-pin FireWire ports) will do. + + If an driver is running on both machines you should see a line like:: + + firewire_core 0000:15:00.1: created device fw1: GUID 00061b0020105917, S400 + + on both machines in the kernel log when the cable is plugged in + and connects the two machines. + +3) Test physical DMA using firescope: + + On the debug host, make sure that /dev/fw* is accessible, + then start firescope:: + + $ firescope + Port 0 (/dev/fw1) opened, 2 nodes detected + + FireScope + --------- + Target : <unspecified> + Gen : 1 + [Ctrl-T] choose target + [Ctrl-H] this menu + [Ctrl-Q] quit + + ------> Press Ctrl-T now, the output should be similar to: + + 2 nodes available, local node is: 0 + 0: ffc0, uuid: 00000000 00000000 [LOCAL] + 1: ffc1, uuid: 00279000 ba4bb801 + + Besides the [LOCAL] node, it must show another node without error message. + +4) Prepare for debugging with early OHCI-1394 initialization: + + 4.1) Kernel compilation and installation on debug target + + Compile the kernel to be debugged with CONFIG_PROVIDE_OHCI1394_DMA_INIT + (Kernel hacking: Provide code for enabling DMA over FireWire early on boot) + enabled and install it on the machine to be debugged (debug target). + + 4.2) Transfer the System.map of the debugged kernel to the debug host + + Copy the System.map of the kernel be debugged to the debug host (the host + which is connected to the debugged machine over the FireWire cable). + +5) Retrieving the printk buffer contents: + + With the FireWire cable connected, the OHCI-1394 driver on the debugging + host loaded, reboot the debugged machine, booting the kernel which has + CONFIG_PROVIDE_OHCI1394_DMA_INIT enabled, with the option ohci1394_dma=early. + + Then, on the debugging host, run firescope, for example by using -A:: + + firescope -A System.map-of-debug-target-kernel + + Note: -A automatically attaches to the first non-local node. It only works + reliably if only connected two machines are connected using FireWire. + + After having attached to the debug target, press Ctrl-D to view the + complete printk buffer or Ctrl-U to enter auto update mode and get an + updated live view of recent kernel messages logged on the debug target. + + Call "firescope -h" to get more information on firescope's options. + +Notes +----- + +Documentation and specifications: http://halobates.de/firewire/ + +FireWire is a trademark of Apple Inc. - for more information please refer to: +https://en.wikipedia.org/wiki/FireWire diff --git a/Documentation/core-api/dma-api-howto.rst b/Documentation/core-api/dma-api-howto.rst new file mode 100644 index 000000000000..358d495456d1 --- /dev/null +++ b/Documentation/core-api/dma-api-howto.rst @@ -0,0 +1,929 @@ +========================= +Dynamic DMA mapping Guide +========================= + +:Author: David S. Miller <davem@redhat.com> +:Author: Richard Henderson <rth@cygnus.com> +:Author: Jakub Jelinek <jakub@redhat.com> + +This is a guide to device driver writers on how to use the DMA API +with example pseudo-code. For a concise description of the API, see +DMA-API.txt. + +CPU and DMA addresses +===================== + +There are several kinds of addresses involved in the DMA API, and it's +important to understand the differences. + +The kernel normally uses virtual addresses. Any address returned by +kmalloc(), vmalloc(), and similar interfaces is a virtual address and can +be stored in a ``void *``. + +The virtual memory system (TLB, page tables, etc.) translates virtual +addresses to CPU physical addresses, which are stored as "phys_addr_t" or +"resource_size_t". The kernel manages device resources like registers as +physical addresses. These are the addresses in /proc/iomem. The physical +address is not directly useful to a driver; it must use ioremap() to map +the space and produce a virtual address. + +I/O devices use a third kind of address: a "bus address". If a device has +registers at an MMIO address, or if it performs DMA to read or write system +memory, the addresses used by the device are bus addresses. In some +systems, bus addresses are identical to CPU physical addresses, but in +general they are not. IOMMUs and host bridges can produce arbitrary +mappings between physical and bus addresses. + +From a device's point of view, DMA uses the bus address space, but it may +be restricted to a subset of that space. For example, even if a system +supports 64-bit addresses for main memory and PCI BARs, it may use an IOMMU +so devices only need to use 32-bit DMA addresses. + +Here's a picture and some examples:: + + CPU CPU Bus + Virtual Physical Address + Address Address Space + Space Space + + +-------+ +------+ +------+ + | | |MMIO | Offset | | + | | Virtual |Space | applied | | + C +-------+ --------> B +------+ ----------> +------+ A + | | mapping | | by host | | + +-----+ | | | | bridge | | +--------+ + | | | | +------+ | | | | + | CPU | | | | RAM | | | | Device | + | | | | | | | | | | + +-----+ +-------+ +------+ +------+ +--------+ + | | Virtual |Buffer| Mapping | | + X +-------+ --------> Y +------+ <---------- +------+ Z + | | mapping | RAM | by IOMMU + | | | | + | | | | + +-------+ +------+ + +During the enumeration process, the kernel learns about I/O devices and +their MMIO space and the host bridges that connect them to the system. For +example, if a PCI device has a BAR, the kernel reads the bus address (A) +from the BAR and converts it to a CPU physical address (B). The address B +is stored in a struct resource and usually exposed via /proc/iomem. When a +driver claims a device, it typically uses ioremap() to map physical address +B at a virtual address (C). It can then use, e.g., ioread32(C), to access +the device registers at bus address A. + +If the device supports DMA, the driver sets up a buffer using kmalloc() or +a similar interface, which returns a virtual address (X). The virtual +memory system maps X to a physical address (Y) in system RAM. The driver +can use virtual address X to access the buffer, but the device itself +cannot because DMA doesn't go through the CPU virtual memory system. + +In some simple systems, the device can do DMA directly to physical address +Y. But in many others, there is IOMMU hardware that translates DMA +addresses to physical addresses, e.g., it translates Z to Y. This is part +of the reason for the DMA API: the driver can give a virtual address X to +an interface like dma_map_single(), which sets up any required IOMMU +mapping and returns the DMA address Z. The driver then tells the device to +do DMA to Z, and the IOMMU maps it to the buffer at address Y in system +RAM. + +So that Linux can use the dynamic DMA mapping, it needs some help from the +drivers, namely it has to take into account that DMA addresses should be +mapped only for the time they are actually used and unmapped after the DMA +transfer. + +The following API will work of course even on platforms where no such +hardware exists. + +Note that the DMA API works with any bus independent of the underlying +microprocessor architecture. You should use the DMA API rather than the +bus-specific DMA API, i.e., use the dma_map_*() interfaces rather than the +pci_map_*() interfaces. + +First of all, you should make sure:: + + #include <linux/dma-mapping.h> + +is in your driver, which provides the definition of dma_addr_t. This type +can hold any valid DMA address for the platform and should be used +everywhere you hold a DMA address returned from the DMA mapping functions. + +What memory is DMA'able? +======================== + +The first piece of information you must know is what kernel memory can +be used with the DMA mapping facilities. There has been an unwritten +set of rules regarding this, and this text is an attempt to finally +write them down. + +If you acquired your memory via the page allocator +(i.e. __get_free_page*()) or the generic memory allocators +(i.e. kmalloc() or kmem_cache_alloc()) then you may DMA to/from +that memory using the addresses returned from those routines. + +This means specifically that you may _not_ use the memory/addresses +returned from vmalloc() for DMA. It is possible to DMA to the +_underlying_ memory mapped into a vmalloc() area, but this requires +walking page tables to get the physical addresses, and then +translating each of those pages back to a kernel address using +something like __va(). [ EDIT: Update this when we integrate +Gerd Knorr's generic code which does this. ] + +This rule also means that you may use neither kernel image addresses +(items in data/text/bss segments), nor module image addresses, nor +stack addresses for DMA. These could all be mapped somewhere entirely +different than the rest of physical memory. Even if those classes of +memory could physically work with DMA, you'd need to ensure the I/O +buffers were cacheline-aligned. Without that, you'd see cacheline +sharing problems (data corruption) on CPUs with DMA-incoherent caches. +(The CPU could write to one word, DMA would write to a different one +in the same cache line, and one of them could be overwritten.) + +Also, this means that you cannot take the return of a kmap() +call and DMA to/from that. This is similar to vmalloc(). + +What about block I/O and networking buffers? The block I/O and +networking subsystems make sure that the buffers they use are valid +for you to DMA from/to. + +DMA addressing capabilities +=========================== + +By default, the kernel assumes that your device can address 32-bits of DMA +addressing. For a 64-bit capable device, this needs to be increased, and for +a device with limitations, it needs to be decreased. + +Special note about PCI: PCI-X specification requires PCI-X devices to support +64-bit addressing (DAC) for all transactions. And at least one platform (SGI +SN2) requires 64-bit consistent allocations to operate correctly when the IO +bus is in PCI-X mode. + +For correct operation, you must set the DMA mask to inform the kernel about +your devices DMA addressing capabilities. + +This is performed via a call to dma_set_mask_and_coherent():: + + int dma_set_mask_and_coherent(struct device *dev, u64 mask); + +which will set the mask for both streaming and coherent APIs together. If you +have some special requirements, then the following two separate calls can be +used instead: + + The setup for streaming mappings is performed via a call to + dma_set_mask():: + + int dma_set_mask(struct device *dev, u64 mask); + + The setup for consistent allocations is performed via a call + to dma_set_coherent_mask():: + + int dma_set_coherent_mask(struct device *dev, u64 mask); + +Here, dev is a pointer to the device struct of your device, and mask is a bit +mask describing which bits of an address your device supports. Often the +device struct of your device is embedded in the bus-specific device struct of +your device. For example, &pdev->dev is a pointer to the device struct of a +PCI device (pdev is a pointer to the PCI device struct of your device). + +These calls usually return zero to indicated your device can perform DMA +properly on the machine given the address mask you provided, but they might +return an error if the mask is too small to be supportable on the given +system. If it returns non-zero, your device cannot perform DMA properly on +this platform, and attempting to do so will result in undefined behavior. +You must not use DMA on this device unless the dma_set_mask family of +functions has returned success. + +This means that in the failure case, you have two options: + +1) Use some non-DMA mode for data transfer, if possible. +2) Ignore this device and do not initialize it. + +It is recommended that your driver print a kernel KERN_WARNING message when +setting the DMA mask fails. In this manner, if a user of your driver reports +that performance is bad or that the device is not even detected, you can ask +them for the kernel messages to find out exactly why. + +The standard 64-bit addressing device would do something like this:: + + if (dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64))) { + dev_warn(dev, "mydev: No suitable DMA available\n"); + goto ignore_this_device; + } + +If the device only supports 32-bit addressing for descriptors in the +coherent allocations, but supports full 64-bits for streaming mappings +it would look like this:: + + if (dma_set_mask(dev, DMA_BIT_MASK(64))) { + dev_warn(dev, "mydev: No suitable DMA available\n"); + goto ignore_this_device; + } + +The coherent mask will always be able to set the same or a smaller mask as +the streaming mask. However for the rare case that a device driver only +uses consistent allocations, one would have to check the return value from +dma_set_coherent_mask(). + +Finally, if your device can only drive the low 24-bits of +address you might do something like:: + + if (dma_set_mask(dev, DMA_BIT_MASK(24))) { + dev_warn(dev, "mydev: 24-bit DMA addressing not available\n"); + goto ignore_this_device; + } + +When dma_set_mask() or dma_set_mask_and_coherent() is successful, and +returns zero, the kernel saves away this mask you have provided. The +kernel will use this information later when you make DMA mappings. + +There is a case which we are aware of at this time, which is worth +mentioning in this documentation. If your device supports multiple +functions (for example a sound card provides playback and record +functions) and the various different functions have _different_ +DMA addressing limitations, you may wish to probe each mask and +only provide the functionality which the machine can handle. It +is important that the last call to dma_set_mask() be for the +most specific mask. + +Here is pseudo-code showing how this might be done:: + + #define PLAYBACK_ADDRESS_BITS DMA_BIT_MASK(32) + #define RECORD_ADDRESS_BITS DMA_BIT_MASK(24) + + struct my_sound_card *card; + struct device *dev; + + ... + if (!dma_set_mask(dev, PLAYBACK_ADDRESS_BITS)) { + card->playback_enabled = 1; + } else { + card->playback_enabled = 0; + dev_warn(dev, "%s: Playback disabled due to DMA limitations\n", + card->name); + } + if (!dma_set_mask(dev, RECORD_ADDRESS_BITS)) { + card->record_enabled = 1; + } else { + card->record_enabled = 0; + dev_warn(dev, "%s: Record disabled due to DMA limitations\n", + card->name); + } + +A sound card was used as an example here because this genre of PCI +devices seems to be littered with ISA chips given a PCI front end, +and thus retaining the 16MB DMA addressing limitations of ISA. + +Types of DMA mappings +===================== + +There are two types of DMA mappings: + +- Consistent DMA mappings which are usually mapped at driver + initialization, unmapped at the end and for which the hardware should + guarantee that the device and the CPU can access the data + in parallel and will see updates made by each other without any + explicit software flushing. + + Think of "consistent" as "synchronous" or "coherent". + + The current default is to return consistent memory in the low 32 + bits of the DMA space. However, for future compatibility you should + set the consistent mask even if this default is fine for your + driver. + + Good examples of what to use consistent mappings for are: + + - Network card DMA ring descriptors. + - SCSI adapter mailbox command data structures. + - Device firmware microcode executed out of + main memory. + + The invariant these examples all require is that any CPU store + to memory is immediately visible to the device, and vice + versa. Consistent mappings guarantee this. + + .. important:: + + Consistent DMA memory does not preclude the usage of + proper memory barriers. The CPU may reorder stores to + consistent memory just as it may normal memory. Example: + if it is important for the device to see the first word + of a descriptor updated before the second, you must do + something like:: + + desc->word0 = address; + wmb(); + desc->word1 = DESC_VALID; + + in order to get correct behavior on all platforms. + + Also, on some platforms your driver may need to flush CPU write + buffers in much the same way as it needs to flush write buffers + found in PCI bridges (such as by reading a register's value + after writing it). + +- Streaming DMA mappings which are usually mapped for one DMA + transfer, unmapped right after it (unless you use dma_sync_* below) + and for which hardware can optimize for sequential accesses. + + Think of "streaming" as "asynchronous" or "outside the coherency + domain". + + Good examples of what to use streaming mappings for are: + + - Networking buffers transmitted/received by a device. + - Filesystem buffers written/read by a SCSI device. + + The interfaces for using this type of mapping were designed in + such a way that an implementation can make whatever performance + optimizations the hardware allows. To this end, when using + such mappings you must be explicit about what you want to happen. + +Neither type of DMA mapping has alignment restrictions that come from +the underlying bus, although some devices may have such restrictions. +Also, systems with caches that aren't DMA-coherent will work better +when the underlying buffers don't share cache lines with other data. + + +Using Consistent DMA mappings +============================= + +To allocate and map large (PAGE_SIZE or so) consistent DMA regions, +you should do:: + + dma_addr_t dma_handle; + + cpu_addr = dma_alloc_coherent(dev, size, &dma_handle, gfp); + +where device is a ``struct device *``. This may be called in interrupt +context with the GFP_ATOMIC flag. + +Size is the length of the region you want to allocate, in bytes. + +This routine will allocate RAM for that region, so it acts similarly to +__get_free_pages() (but takes size instead of a page order). If your +driver needs regions sized smaller than a page, you may prefer using +the dma_pool interface, described below. + +The consistent DMA mapping interfaces, will by default return a DMA address +which is 32-bit addressable. Even if the device indicates (via the DMA mask) +that it may address the upper 32-bits, consistent allocation will only +return > 32-bit addresses for DMA if the consistent DMA mask has been +explicitly changed via dma_set_coherent_mask(). This is true of the +dma_pool interface as well. + +dma_alloc_coherent() returns two values: the virtual address which you +can use to access it from the CPU and dma_handle which you pass to the +card. + +The CPU virtual address and the DMA address are both +guaranteed to be aligned to the smallest PAGE_SIZE order which +is greater than or equal to the requested size. This invariant +exists (for example) to guarantee that if you allocate a chunk +which is smaller than or equal to 64 kilobytes, the extent of the +buffer you receive will not cross a 64K boundary. + +To unmap and free such a DMA region, you call:: + + dma_free_coherent(dev, size, cpu_addr, dma_handle); + +where dev, size are the same as in the above call and cpu_addr and +dma_handle are the values dma_alloc_coherent() returned to you. +This function may not be called in interrupt context. + +If your driver needs lots of smaller memory regions, you can write +custom code to subdivide pages returned by dma_alloc_coherent(), +or you can use the dma_pool API to do that. A dma_pool is like +a kmem_cache, but it uses dma_alloc_coherent(), not __get_free_pages(). +Also, it understands common hardware constraints for alignment, +like queue heads needing to be aligned on N byte boundaries. + +Create a dma_pool like this:: + + struct dma_pool *pool; + + pool = dma_pool_create(name, dev, size, align, boundary); + +The "name" is for diagnostics (like a kmem_cache name); dev and size +are as above. The device's hardware alignment requirement for this +type of data is "align" (which is expressed in bytes, and must be a +power of two). If your device has no boundary crossing restrictions, +pass 0 for boundary; passing 4096 says memory allocated from this pool +must not cross 4KByte boundaries (but at that time it may be better to +use dma_alloc_coherent() directly instead). + +Allocate memory from a DMA pool like this:: + + cpu_addr = dma_pool_alloc(pool, flags, &dma_handle); + +flags are GFP_KERNEL if blocking is permitted (not in_interrupt nor +holding SMP locks), GFP_ATOMIC otherwise. Like dma_alloc_coherent(), +this returns two values, cpu_addr and dma_handle. + +Free memory that was allocated from a dma_pool like this:: + + dma_pool_free(pool, cpu_addr, dma_handle); + +where pool is what you passed to dma_pool_alloc(), and cpu_addr and +dma_handle are the values dma_pool_alloc() returned. This function +may be called in interrupt context. + +Destroy a dma_pool by calling:: + + dma_pool_destroy(pool); + +Make sure you've called dma_pool_free() for all memory allocated +from a pool before you destroy the pool. This function may not +be called in interrupt context. + +DMA Direction +============= + +The interfaces described in subsequent portions of this document +take a DMA direction argument, which is an integer and takes on +one of the following values:: + + DMA_BIDIRECTIONAL + DMA_TO_DEVICE + DMA_FROM_DEVICE + DMA_NONE + +You should provide the exact DMA direction if you know it. + +DMA_TO_DEVICE means "from main memory to the device" +DMA_FROM_DEVICE means "from the device to main memory" +It is the direction in which the data moves during the DMA +transfer. + +You are _strongly_ encouraged to specify this as precisely +as you possibly can. + +If you absolutely cannot know the direction of the DMA transfer, +specify DMA_BIDIRECTIONAL. It means that the DMA can go in +either direction. The platform guarantees that you may legally +specify this, and that it will work, but this may be at the +cost of performance for example. + +The value DMA_NONE is to be used for debugging. One can +hold this in a data structure before you come to know the +precise direction, and this will help catch cases where your +direction tracking logic has failed to set things up properly. + +Another advantage of specifying this value precisely (outside of +potential platform-specific optimizations of such) is for debugging. +Some platforms actually have a write permission boolean which DMA +mappings can be marked with, much like page protections in the user +program address space. Such platforms can and do report errors in the +kernel logs when the DMA controller hardware detects violation of the +permission setting. + +Only streaming mappings specify a direction, consistent mappings +implicitly have a direction attribute setting of +DMA_BIDIRECTIONAL. + +The SCSI subsystem tells you the direction to use in the +'sc_data_direction' member of the SCSI command your driver is +working on. + +For Networking drivers, it's a rather simple affair. For transmit +packets, map/unmap them with the DMA_TO_DEVICE direction +specifier. For receive packets, just the opposite, map/unmap them +with the DMA_FROM_DEVICE direction specifier. + +Using Streaming DMA mappings +============================ + +The streaming DMA mapping routines can be called from interrupt +context. There are two versions of each map/unmap, one which will +map/unmap a single memory region, and one which will map/unmap a +scatterlist. + +To map a single region, you do:: + + struct device *dev = &my_dev->dev; + dma_addr_t dma_handle; + void *addr = buffer->ptr; + size_t size = buffer->len; + + dma_handle = dma_map_single(dev, addr, size, direction); + if (dma_mapping_error(dev, dma_handle)) { + /* + * reduce current DMA mapping usage, + * delay and try again later or + * reset driver. + */ + goto map_error_handling; + } + +and to unmap it:: + + dma_unmap_single(dev, dma_handle, size, direction); + +You should call dma_mapping_error() as dma_map_single() could fail and return +error. Doing so will ensure that the mapping code will work correctly on all +DMA implementations without any dependency on the specifics of the underlying +implementation. Using the returned address without checking for errors could +result in failures ranging from panics to silent data corruption. The same +applies to dma_map_page() as well. + +You should call dma_unmap_single() when the DMA activity is finished, e.g., +from the interrupt which told you that the DMA transfer is done. + +Using CPU pointers like this for single mappings has a disadvantage: +you cannot reference HIGHMEM memory in this way. Thus, there is a +map/unmap interface pair akin to dma_{map,unmap}_single(). These +interfaces deal with page/offset pairs instead of CPU pointers. +Specifically:: + + struct device *dev = &my_dev->dev; + dma_addr_t dma_handle; + struct page *page = buffer->page; + unsigned long offset = buffer->offset; + size_t size = buffer->len; + + dma_handle = dma_map_page(dev, page, offset, size, direction); + if (dma_mapping_error(dev, dma_handle)) { + /* + * reduce current DMA mapping usage, + * delay and try again later or + * reset driver. + */ + goto map_error_handling; + } + + ... + + dma_unmap_page(dev, dma_handle, size, direction); + +Here, "offset" means byte offset within the given page. + +You should call dma_mapping_error() as dma_map_page() could fail and return +error as outlined under the dma_map_single() discussion. + +You should call dma_unmap_page() when the DMA activity is finished, e.g., +from the interrupt which told you that the DMA transfer is done. + +With scatterlists, you map a region gathered from several regions by:: + + int i, count = dma_map_sg(dev, sglist, nents, direction); + struct scatterlist *sg; + + for_each_sg(sglist, sg, count, i) { + hw_address[i] = sg_dma_address(sg); + hw_len[i] = sg_dma_len(sg); + } + +where nents is the number of entries in the sglist. + +The implementation is free to merge several consecutive sglist entries +into one (e.g. if DMA mapping is done with PAGE_SIZE granularity, any +consecutive sglist entries can be merged into one provided the first one +ends and the second one starts on a page boundary - in fact this is a huge +advantage for cards which either cannot do scatter-gather or have very +limited number of scatter-gather entries) and returns the actual number +of sg entries it mapped them to. On failure 0 is returned. + +Then you should loop count times (note: this can be less than nents times) +and use sg_dma_address() and sg_dma_len() macros where you previously +accessed sg->address and sg->length as shown above. + |