fxr.watson.org: linux-2.4.22 sys/Documentation/DMA-mapping.txt

FreeBSD/Linux Kernel Cross Reference
sys/Documentation/DMA-mapping.txt

Version: - FREEBSD - FREEBSD-13-STABLE - FREEBSD-13-0 - FREEBSD-12-STABLE - FREEBSD-12-0 - FREEBSD-11-STABLE - FREEBSD-11-0 - FREEBSD-10-STABLE - FREEBSD-10-0 - FREEBSD-9-STABLE - FREEBSD-9-0 - FREEBSD-8-STABLE - FREEBSD-8-0 - FREEBSD-7-STABLE - FREEBSD-7-0 - FREEBSD-6-STABLE - FREEBSD-6-0 - FREEBSD-5-STABLE - FREEBSD-5-0 - FREEBSD-4-STABLE - FREEBSD-3-STABLE - FREEBSD22 - l41 - OPENBSD - linux-2.6 - MK84 - PLAN9 - xnu-8792
SearchContext: - none - 3 - 10

1 Dynamic DMA mapping 2 =================== 3 4 David S. Miller <davem@redhat.com> 5 Richard Henderson <rth@cygnus.com> 6 Jakub Jelinek <jakub@redhat.com> 7 8 Most of the 64bit platforms have special hardware that translates bus 9 addresses (DMA addresses) into physical addresses. This is similar to 10 how page tables and/or a TLB translates virtual addresses to physical 11 addresses on a CPU. This is needed so that e.g. PCI devices can 12 access with a Single Address Cycle (32bit DMA address) any page in the 13 64bit physical address space. Previously in Linux those 64bit 14 platforms had to set artificial limits on the maximum RAM size in the 15 system, so that the virt_to_bus() static scheme works (the DMA address 16 translation tables were simply filled on bootup to map each bus 17 address to the physical page __pa(bus_to_virt())). 18 19 So that Linux can use the dynamic DMA mapping, it needs some help from the 20 drivers, namely it has to take into account that DMA addresses should be 21 mapped only for the time they are actually used and unmapped after the DMA 22 transfer. 23 24 The following API will work of course even on platforms where no such 25 hardware exists, see e.g. include/asm-i386/pci.h for how it is implemented on 26 top of the virt_to_bus interface. 27 28 First of all, you should make sure 29 30 #include <linux/pci.h> 31 32 is in your driver. This file will obtain for you the definition of the 33 dma_addr_t (which can hold any valid DMA address for the platform) 34 type which should be used everywhere you hold a DMA (bus) address 35 returned from the DMA mapping functions. 36 37 What memory is DMA'able? 38 39 The first piece of information you must know is what kernel memory can 40 be used with the DMA mapping facilities. There has been an unwritten 41 set of rules regarding this, and this text is an attempt to finally 42 write them down. 43 44 If you acquired your memory via the page allocator 45 (i.e. __get_free_page*()) or the generic memory allocators 46 (i.e. kmalloc() or kmem_cache_alloc()) then you may DMA to/from 47 that memory using the addresses returned from those routines. 48 49 This means specifically that you may _not_ use the memory/addresses 50 returned from vmalloc() for DMA. It is possible to DMA to the 51 _underlying_ memory mapped into a vmalloc() area, but this requires 52 walking page tables to get the physical addresses, and then 53 translating each of those pages back to a kernel address using 54 something like __va(). [ EDIT: Update this when we integrate 55 Gerd Knorr's generic code which does this. ] 56 57 This rule also means that you may not use kernel image addresses 58 (ie. items in the kernel's data/text/bss segment, or your driver's) 59 nor may you use kernel stack addresses for DMA. Both of these items 60 might be mapped somewhere entirely different than the rest of physical 61 memory. 62 63 Also, this means that you cannot take the return of a kmap() 64 call and DMA to/from that. This is similar to vmalloc(). 65 66 What about block I/O and networking buffers? The block I/O and 67 networking subsystems make sure that the buffers they use are valid 68 for you to DMA from/to. 69 70 DMA addressing limitations 71 72 Does your device have any DMA addressing limitations? For example, is 73 your device only capable of driving the low order 24-bits of address 74 on the PCI bus for SAC DMA transfers? If so, you need to inform the 75 PCI layer of this fact. 76 77 By default, the kernel assumes that your device can address the full 78 32-bits in a SAC cycle. For a 64-bit DAC capable device, this needs 79 to be increased. And for a device with limitations, as discussed in 80 the previous paragraph, it needs to be decreased. 81 82 For correct operation, you must interrogate the PCI layer in your 83 device probe routine to see if the PCI controller on the machine can 84 properly support the DMA addressing limitation your device has. It is 85 good style to do this even if your device holds the default setting, 86 because this shows that you did think about these issues wrt. your 87 device. 88 89 The query is performed via a call to pci_set_dma_mask(): 90 91 int pci_set_dma_mask(struct pci_dev *pdev, u64 device_mask); 92 93 Here, pdev is a pointer to the PCI device struct of your device, and 94 device_mask is a bit mask describing which bits of a PCI address your 95 device supports. It returns zero if your card can perform DMA 96 properly on the machine given the address mask you provided. 97 98 If it returns non-zero, your device can not perform DMA properly on 99 this platform, and attempting to do so will result in undefined 100 behavior. You must either use a different mask, or not use DMA. 101 102 This means that in the failure case, you have three options: 103 104 1) Use another DMA mask, if possible (see below). 105 2) Use some non-DMA mode for data transfer, if possible. 106 3) Ignore this device and do not initialize it. 107 108 It is recommended that your driver print a kernel KERN_WARNING message 109 when you end up performing either #2 or #3. In this manner, if a user 110 of your driver reports that performance is bad or that the device is not 111 even detected, you can ask them for the kernel messages to find out 112 exactly why. 113 114 The standard 32-bit addressing PCI device would do something like 115 this: 116 117 if (pci_set_dma_mask(pdev, 0xffffffff)) { 118 printk(KERN_WARNING 119 "mydev: No suitable DMA available.\n"); 120 goto ignore_this_device; 121 } 122 123 Another common scenario is a 64-bit capable device. The approach 124 here is to try for 64-bit DAC addressing, but back down to a 125 32-bit mask should that fail. The PCI platform code may fail the 126 64-bit mask not because the platform is not capable of 64-bit 127 addressing. Rather, it may fail in this case simply because 128 32-bit SAC addressing is done more efficiently than DAC addressing. 129 Sparc64 is one platform which behaves in this way. 130 131 Here is how you would handle a 64-bit capable device which can drive 132 all 64-bits during a DAC cycle: 133 134 int using_dac; 135 136 if (!pci_set_dma_mask(pdev, 0xffffffffffffffff)) { 137 using_dac = 1; 138 } else if (!pci_set_dma_mask(pdev, 0xffffffff)) { 139 using_dac = 0; 140 } else { 141 printk(KERN_WARNING 142 "mydev: No suitable DMA available.\n"); 143 goto ignore_this_device; 144 } 145 146 If your 64-bit device is going to be an enormous consumer of DMA 147 mappings, this can be problematic since the DMA mappings are a 148 finite resource on many platforms. Please see the "DAC Addressing 149 for Address Space Hungry Devices" section near the end of this 150 document for how to handle this case. 151 152 Finally, if your device can only drive the low 24-bits of 153 address during PCI bus mastering you might do something like: 154 155 if (pci_set_dma_mask(pdev, 0x00ffffff)) { 156 printk(KERN_WARNING 157 "mydev: 24-bit DMA addressing not available.\n"); 158 goto ignore_this_device; 159 } 160 161 When pci_set_dma_mask() is successful, and returns zero, the PCI layer 162 saves away this mask you have provided. The PCI layer will use this 163 information later when you make DMA mappings. 164 165 There is a case which we are aware of at this time, which is worth 166 mentioning in this documentation. If your device supports multiple 167 functions (for example a sound card provides playback and record 168 functions) and the various different functions have _different_ 169 DMA addressing limitations, you may wish to probe each mask and 170 only provide the functionality which the machine can handle. It 171 is important that the last call to pci_set_dma_mask() be for the 172 most specific mask. 173 174 Here is pseudo-code showing how this might be done: 175 176 #define PLAYBACK_ADDRESS_BITS 0xffffffff 177 #define RECORD_ADDRESS_BITS 0x00ffffff 178 179 struct my_sound_card *card; 180 struct pci_dev *pdev; 181 182 ... 183 if (pci_set_dma_mask(pdev, PLAYBACK_ADDRESS_BITS)) { 184 card->playback_enabled = 1; 185 } else { 186 card->playback_enabled = 0; 187 printk(KERN_WARN "%s: Playback disabled due to DMA limitations.\n", 188 card->name); 189 } 190 if (pci_set_dma_mask(pdev, RECORD_ADDRESS_BITS)) { 191 card->record_enabled = 1; 192 } else { 193 card->record_enabled = 0; 194 printk(KERN_WARN "%s: Record disabled due to DMA limitations.\n", 195 card->name); 196 } 197 198 A sound card was used as an example here because this genre of PCI 199 devices seems to be littered with ISA chips given a PCI front end, 200 and thus retaining the 16MB DMA addressing limitations of ISA. 201 202 Types of DMA mappings 203 204 There are two types of DMA mappings: 205 206 - Consistent DMA mappings which are usually mapped at driver 207 initialization, unmapped at the end and for which the hardware should 208 guarantee that the device and the CPU can access the data 209 in parallel and will see updates made by each other without any 210 explicit software flushing. 211 212 Think of "consistent" as "synchronous" or "coherent". 213 214 Consistent DMA mappings are always SAC addressable. That is 215 to say, consistent DMA addresses given to the driver will always 216 be in the low 32-bits of the PCI bus space. 217 218 Good examples of what to use consistent mappings for are: 219 220 - Network card DMA ring descriptors. 221 - SCSI adapter mailbox command data structures. 222 - Device firmware microcode executed out of 223 main memory. 224 225 The invariant these examples all require is that any CPU store 226 to memory is immediately visible to the device, and vice 227 versa. Consistent mappings guarantee this. 228 229 IMPORTANT: Consistent DMA memory does not preclude the usage of 230 proper memory barriers. The CPU may reorder stores to 231 consistent memory just as it may normal memory. Example: 232 if it is important for the device to see the first word 233 of a descriptor updated before the second, you must do 234 something like: 235 236 desc->word0 = address; 237 wmb(); 238 desc->word1 = DESC_VALID; 239 240 in order to get correct behavior on all platforms. 241 242 - Streaming DMA mappings which are usually mapped for one DMA transfer, 243 unmapped right after it (unless you use pci_dma_sync below) and for which 244 hardware can optimize for sequential accesses. 245 246 This of "streaming" as "asynchronous" or "outside the coherency 247 domain". 248 249 Good examples of what to use streaming mappings for are: 250 251 - Networking buffers transmitted/received by a device. 252 - Filesystem buffers written/read by a SCSI device. 253 254 The interfaces for using this type of mapping were designed in 255 such a way that an implementation can make whatever performance 256 optimizations the hardware allows. To this end, when using 257 such mappings you must be explicit about what you want to happen. 258 259 Neither type of DMA mapping has alignment restrictions that come 260 from PCI, although some devices may have such restrictions. 261 262 Using Consistent DMA mappings. 263 264 To allocate and map large (PAGE_SIZE or so) consistent DMA regions, 265 you should do: 266 267 dma_addr_t dma_handle; 268 269 cpu_addr = pci_alloc_consistent(dev, size, &dma_handle); 270 271 where dev is a struct pci_dev *. You should pass NULL for PCI like buses 272 where devices don't have struct pci_dev (like ISA, EISA). This may be 273 called in interrupt context. 274 275 This argument is needed because the DMA translations may be bus 276 specific (and often is private to the bus which the device is attached 277 to). 278 279 Size is the length of the region you want to allocate, in bytes. 280 281 This routine will allocate RAM for that region, so it acts similarly to 282 __get_free_pages (but takes size instead of a page order). If your 283 driver needs regions sized smaller than a page, you may prefer using 284 the pci_pool interface, described below. 285 286 The consistent DMA mapping interfaces, for non-NULL dev, will always 287 return a DMA address which is SAC (Single Address Cycle) addressable. 288 Even if the device indicates (via PCI dma mask) that it may address 289 the upper 32-bits and thus perform DAC cycles, consistent allocation 290 will still only return 32-bit PCI addresses for DMA. This is true 291 of the pci_pool interface as well. 292 293 In fact, as mentioned above, all consistent memory provided by the 294 kernel DMA APIs are always SAC addressable. 295 296 pci_alloc_consistent returns two values: the virtual address which you 297 can use to access it from the CPU and dma_handle which you pass to the 298 card. 299 300 The cpu return address and the DMA bus master address are both 301 guaranteed to be aligned to the smallest PAGE_SIZE order which 302 is greater than or equal to the requested size. This invariant 303 exists (for example) to guarantee that if you allocate a chunk 304 which is smaller than or equal to 64 kilobytes, the extent of the 305 buffer you receive will not cross a 64K boundary. 306 307 To unmap and free such a DMA region, you call: 308 309 pci_free_consistent(dev, size, cpu_addr, dma_handle); 310 311 where dev, size are the same as in the above call and cpu_addr and 312 dma_handle are the values pci_alloc_consistent returned to you. 313 This function may not be called in interrupt context. 314 315 If your driver needs lots of smaller memory regions, you can write 316 custom code to subdivide pages returned by pci_alloc_consistent, 317 or you can use the pci_pool API to do that. A pci_pool is like 318 a kmem_cache, but it uses pci_alloc_consistent not __get_free_pages. 319 Also, it understands common hardware constraints for alignment, 320 like queue heads needing to be aligned on N byte boundaries. 321 322 Create a pci_pool like this: 323 324 struct pci_pool *pool; 325 326 pool = pci_pool_create(name, dev, size, align, alloc, flags); 327 328 The "name" is for diagnostics (like a kmem_cache name); dev and size 329 are as above. The device's hardware alignment requirement for this 330 type of data is "align" (which is expressed in bytes, and must be a 331 power of two). The flags are SLAB_ flags as you'd pass to 332 kmem_cache_create. Not all flags are understood, but SLAB_POISON may 333 help you find driver bugs. If you call this in a non- sleeping 334 context (f.e. in_interrupt is true or while holding SMP locks), pass 335 SLAB_ATOMIC. If your device has no boundary crossing restrictions, 336 pass 0 for alloc; passing 4096 says memory allocated from this pool 337 must not cross 4KByte boundaries (but at that time it may be better to 338 go for pci_alloc_consistent directly instead). 339 340 Allocate memory from a pci pool like this: 341 342 cpu_addr = pci_pool_alloc(pool, flags, &dma_handle); 343 344 flags are SLAB_KERNEL if blocking is permitted (not in_interrupt nor 345 holding SMP locks), SLAB_ATOMIC otherwise. Like pci_alloc_consistent, 346 this returns two values, cpu_addr and dma_handle. 347 348 Free memory that was allocated from a pci_pool like this: 349 350 pci_pool_free(pool, cpu_addr, dma_handle); 351 352 where pool is what you passed to pci_pool_alloc, and cpu_addr and 353 dma_handle are the values pci_pool_alloc returned. This function 354 may be called in interrupt context. 355 356 Destroy a pci_pool by calling: 357 358 pci_pool_destroy(pool); 359 360 Make sure you've called pci_pool_free for all memory allocated 361 from a pool before you destroy the pool. This function may not 362 be called in interrupt context. 363 364 DMA Direction 365 366 The interfaces described in subsequent portions of this document 367 take a DMA direction argument, which is an integer and takes on 368 one of the following values: 369 370 PCI_DMA_BIDIRECTIONAL 371 PCI_DMA_TODEVICE 372 PCI_DMA_FROMDEVICE 373 PCI_DMA_NONE 374 375 One should provide the exact DMA direction if you know it. 376 377 PCI_DMA_TODEVICE means "from main memory to the PCI device" 378 PCI_DMA_FROMDEVICE means "from the PCI device to main memory" 379 It is the direction in which the data moves during the DMA 380 transfer. 381 382 You are _strongly_ encouraged to specify this as precisely 383 as you possibly can. 384 385 If you absolutely cannot know the direction of the DMA transfer, 386 specify PCI_DMA_BIDIRECTIONAL. It means that the DMA can go in 387 either direction. The platform guarantees that you may legally 388 specify this, and that it will work, but this may be at the 389 cost of performance for example. 390 391 The value PCI_DMA_NONE is to be used for debugging. One can 392 hold this in a data structure before you come to know the 393 precise direction, and this will help catch cases where your 394 direction tracking logic has failed to set things up properly. 395 396 Another advantage of specifying this value precisely (outside of 397 potential platform-specific optimizations of such) is for debugging. 398 Some platforms actually have a write permission boolean which DMA 399 mappings can be marked with, much like page protections in the user 400 program address space. Such platforms can and do report errors in the 401 kernel logs when the PCI controller hardware detects violation of the 402 permission setting. 403 404 Only streaming mappings specify a direction, consistent mappings 405 implicitly have a direction attribute setting of 406 PCI_DMA_BIDIRECTIONAL. 407 408 The SCSI subsystem provides mechanisms for you to easily obtain 409 the direction to use, in the SCSI command: 410 411 scsi_to_pci_dma_dir(SCSI_DIRECTION) 412 413 Where SCSI_DIRECTION is obtained from the 'sc_data_direction' 414 member of the SCSI command your driver is working on. The 415 mentioned interface above returns a value suitable for passing 416 into the streaming DMA mapping interfaces below. 417 418 For Networking drivers, it's a rather simple affair. For transmit 419 packets, map/unmap them with the PCI_DMA_TODEVICE direction 420 specifier. For receive packets, just the opposite, map/unmap them 421 with the PCI_DMA_FROMDEVICE direction specifier. 422 423 Using Streaming DMA mappings 424 425 The streaming DMA mapping routines can be called from interrupt 426 context. There are two versions of each map/unmap, one which will 427 map/unmap a single memory region, and one which will map/unmap a 428 scatterlist. 429 430 To map a single region, you do: 431 432 struct pci_dev *pdev = mydev->pdev; 433 dma_addr_t dma_handle; 434 void *addr = buffer->ptr; 435 size_t size = buffer->len; 436 437 dma_handle = pci_map_single(dev, addr, size, direction); 438 439 and to unmap it: 440 441 pci_unmap_single(dev, dma_handle, size, direction); 442 443 You should call pci_unmap_single when the DMA activity is finished, e.g. 444 from the interrupt which told you that the DMA transfer is done. 445 446 Using cpu pointers like this for single mappings has a disadvantage, 447 you cannot reference HIGHMEM memory in this way. Thus, there is a 448 map/unmap interface pair akin to pci_{map,unmap}_single. These 449 interfaces deal with page/offset pairs instead of cpu pointers. 450 Specifically: 451 452 struct pci_dev *pdev = mydev->pdev; 453 dma_addr_t dma_handle; 454 struct page *page = buffer->page; 455 unsigned long offset = buffer->offset; 456 size_t size = buffer->len; 457 458 dma_handle = pci_map_page(dev, page, offset, size, direction); 459 460 ... 461 462 pci_unmap_page(dev, dma_handle, size, direction); 463 464 Here, "offset" means byte offset within the given page. 465 466 With scatterlists, you map a region gathered from several regions by: 467 468 int i, count = pci_map_sg(dev, sglist, nents, direction); 469 struct scatterlist *sg; 470 471 for (i = 0, sg = sglist; i < count; i++, sg++) { 472 hw_address[i] = sg_dma_address(sg); 473 hw_len[i] = sg_dma_len(sg); 474 } 475 476 where nents is the number of entries in the sglist. 477 478 The implementation is free to merge several consecutive sglist entries 479 into one (e.g. if DMA mapping is done with PAGE_SIZE granularity, any 480 consecutive sglist entries can be merged into one provided the first one 481 ends and the second one starts on a page boundary - in fact this is a huge 482 advantage for cards which either cannot do scatter-gather or have very 483 limited number of scatter-gather entries) and returns the actual number 484 of sg entries it mapped them to. 485 486 Then you should loop count times (note: this can be less than nents times) 487 and use sg_dma_address() and sg_dma_len() macros where you previously 488 accessed sg->address and sg->length as shown above. 489 490 To unmap a scatterlist, just call: 491 492 pci_unmap_sg(dev, sglist, nents, direction); 493 494 Again, make sure DMA activity has already finished. 495 496 PLEASE NOTE: The 'nents' argument to the pci_unmap_sg call must be 497 the _same_ one you passed into the pci_map_sg call, 498 it should _NOT_ be the 'count' value _returned_ from the 499 pci_map_sg call. 500 501 Every pci_map_{single,sg} call should have its pci_unmap_{single,sg} 502 counterpart, because the bus address space is a shared resource (although 503 in some ports the mapping is per each BUS so less devices contend for the 504 same bus address space) and you could render the machine unusable by eating 505 all bus addresses. 506 507 If you need to use the same streaming DMA region multiple times and touch 508 the data in between the DMA transfers, just map it with 509 pci_map_{single,sg}, and after each DMA transfer call either: 510 511 pci_dma_sync_single(dev, dma_handle, size, direction); 512 513 or: 514 515 pci_dma_sync_sg(dev, sglist, nents, direction); 516 517 as appropriate. 518 519 After the last DMA transfer call one of the DMA unmap routines 520 pci_unmap_{single,sg}. If you don't touch the data from the first pci_map_* 521 call till pci_unmap_*, then you don't have to call the pci_dma_sync_* 522 routines at all. 523 524 Here is pseudo code which shows a situation in which you would need 525 to use the pci_dma_sync_*() interfaces. 526 527 my_card_setup_receive_buffer(struct my_card *cp, char *buffer, int len) 528 { 529 dma_addr_t mapping; 530 531 mapping = pci_map_single(cp->pdev, buffer, len, PCI_DMA_FROMDEVICE); 532 533 cp->rx_buf = buffer; 534 cp->rx_len = len; 535 cp->rx_dma = mapping; 536 537 give_rx_buf_to_card(cp); 538 } 539 540 ... 541 542 my_card_interrupt_handler(int irq, void *devid, struct pt_regs *regs) 543 { 544 struct my_card *cp = devid; 545 546 ... 547 if (read_card_status(cp) == RX_BUF_TRANSFERRED) { 548 struct my_card_header *hp; 549 550 /* Examine the header to see if we wish 551 * to accept the data. But synchronize 552 * the DMA transfer with the CPU first 553 * so that we see updated contents. 554 */ 555 pci_dma_sync_single(cp->pdev, cp->rx_dma, cp->rx_len, 556 PCI_DMA_FROMDEVICE); 557 558 /* Now it is safe to examine the buffer. */ 559 hp = (struct my_card_header *) cp->rx_buf; 560 if (header_is_ok(hp)) { 561 pci_unmap_single(cp->pdev, cp->rx_dma, cp->rx_len, 562 PCI_DMA_FROMDEVICE); 563 pass_to_upper_layers(cp->rx_buf); 564 make_and_setup_new_rx_buf(cp); 565 } else { 566 /* Just give the buffer back to the card. */ 567 give_rx_buf_to_card(cp); 568 } 569 } 570 } 571 572 Drivers converted fully to this interface should not use virt_to_bus any 573 longer, nor should they use bus_to_virt. Some drivers have to be changed a 574 little bit, because there is no longer an equivalent to bus_to_virt in the 575 dynamic DMA mapping scheme - you have to always store the DMA addresses 576 returned by the pci_alloc_consistent, pci_pool_alloc, and pci_map_single 577 calls (pci_map_sg stores them in the scatterlist itself if the platform 578 supports dynamic DMA mapping in hardware) in your driver structures and/or 579 in the card registers. 580 581 All PCI drivers should be using these interfaces with no exceptions. 582 It is planned to completely remove virt_to_bus() and bus_to_virt() as 583 they are entirely deprecated. Some ports already do not provide these 584 as it is impossible to correctly support them. 585 586 64-bit DMA and DAC cycle support 587 588 Do you understand all of the text above? Great, then you already 589 know how to use 64-bit DMA addressing under Linux. Simply make 590 the appropriate pci_set_dma_mask() calls based upon your cards 591 capabilities, then use the mapping APIs above. 592 593 It is that simple. 594 595 Well, not for some odd devices. See the next section for information 596 about that. 597 598 DAC Addressing for Address Space Hungry Devices 599 600 There exists a class of devices which do not mesh well with the PCI 601 DMA mapping API. By definition these "mappings" are a finite 602 resource. The number of total available mappings per bus is platform 603 specific, but there will always be a reasonable amount. 604 605 What is "reasonable"? Reasonable means that networking and block I/O 606 devices need not worry about using too many mappings. 607 608 As an example of a problematic device, consider compute cluster cards. 609 They can potentially need to access gigabytes of memory at once via 610 DMA. Dynamic mappings are unsuitable for this kind of access pattern. 611 612 To this end we've provided a small API by which a device driver 613 may use DAC cycles to directly address all of physical memory. 614 Not all platforms support this, but most do. It is easy to determine 615 whether the platform will work properly at probe time. 616 617 First, understand that there may be a SEVERE performance penalty for 618 using these interfaces on some platforms. Therefore, you MUST only 619 use these interfaces if it is absolutely required. %99 of devices can 620 use the normal APIs without any problems. 621 622 Note that for streaming type mappings you must either use these 623 interfaces, or the dynamic mapping interfaces above. You may not mix 624 usage of both for the same device. Such an act is illegal and is 625 guaranteed to put a banana in your tailpipe. 626 627 However, consistent mappings may in fact be used in conjunction with 628 these interfaces. Remember that, as defined, consistent mappings are 629 always going to be SAC addressable. 630 631 The first thing your driver needs to do is query the PCI platform 632 layer with your devices DAC addressing capabilities: 633 634 int pci_dac_set_dma_mask(struct pci_dev *pdev, u64 mask); 635 636 This routine behaves identically to pci_set_dma_mask. You may not 637 use the following interfaces if this routine fails. 638 639 Next, DMA addresses using this API are kept track of using the 640 dma64_addr_t type. It is guaranteed to be big enough to hold any 641 DAC address the platform layer will give to you from the following 642 routines. If you have consistent mappings as well, you still 643 use plain dma_addr_t to keep track of those. 644 645 All mappings obtained here will be direct. The mappings are not 646 translated, and this is the purpose of this dialect of the DMA API. 647 648 All routines work with page/offset pairs. This is the _ONLY_ way to 649 portably refer to any piece of memory. If you have a cpu pointer 650 (which may be validly DMA'd too) you may easily obtain the page 651 and offset using something like this: 652 653 struct page *page = virt_to_page(ptr); 654 unsigned long offset = ((unsigned long)ptr & ~PAGE_MASK); 655 656 Here are the interfaces: 657 658 dma64_addr_t pci_dac_page_to_dma(struct pci_dev *pdev, 659 struct page *page, 660 unsigned long offset, 661 int direction); 662 663 The DAC address for the tuple PAGE/OFFSET are returned. The direction 664 argument is the same as for pci_{map,unmap}_single(). The same rules 665 for cpu/device access apply here as for the streaming mapping 666 interfaces. To reiterate: 667 668 The cpu may touch the buffer before pci_dac_page_to_dma. 669 The device may touch the buffer after pci_dac_page_to_dma 670 is made, but the cpu may NOT. 671 672 When the DMA transfer is complete, invoke: 673 674 void pci_dac_dma_sync_single(struct pci_dev *pdev, 675 dma64_addr_t dma_addr, 676 size_t len, int direction); 677 678 This must be done before the CPU looks at the buffer again. 679 This interface behaves identically to pci_dma_sync_{single,sg}(). 680 681 If you need to get back to the PAGE/OFFSET tuple from a dma64_addr_t 682 the following interfaces are provided: 683 684 struct page *pci_dac_dma_to_page(struct pci_dev *pdev, 685 dma64_addr_t dma_addr); 686 unsigned long pci_dac_dma_to_offset(struct pci_dev *pdev, 687 dma64_addr_t dma_addr); 688 689 This is possible with the DAC interfaces purely because they are 690 not translated in any way. 691 692 Optimizing Unmap State Space Consumption 693 694 On many platforms, pci_unmap_{single,page}() is simply a nop. 695 Therefore, keeping track of the mapping address and length is a waste 696 of space. Instead of filling your drivers up with ifdefs and the like 697 to "work around" this (which would defeat the whole purpose of a 698 portable API) the following facilities are provided. 699 700 Actually, instead of describing the macros one by one, we'll 701 transform some example code. 702 703 1) Use DECLARE_PCI_UNMAP_{ADDR,LEN} in state saving structures. 704 Example, before: 705 706 struct ring_state { 707 struct sk_buff *skb; 708 dma_addr_t mapping; 709 __u32 len; 710 }; 711 712 after: 713 714 struct ring_state { 715 struct sk_buff *skb; 716 DECLARE_PCI_UNMAP_ADDR(mapping) 717 DECLARE_PCI_UNMAP_LEN(len) 718 }; 719 720 NOTE: DO NOT put a semicolon at the end of the DECLARE_*() 721 macro. 722 723 2) Use pci_unmap_{addr,len}_set to set these values. 724 Example, before: 725 726 ringp->mapping = FOO; 727 ringp->len = BAR; 728 729 after: 730 731 pci_unmap_addr_set(ringp, mapping, FOO); 732 pci_unmap_len_set(ringp, len, BAR); 733 734 3) Use pci_unmap_{addr,len} to access these values. 735 Example, before: 736 737 pci_unmap_single(pdev, ringp->mapping, ringp->len, 738 PCI_DMA_FROMDEVICE); 739 740 after: 741 742 pci_unmap_single(pdev, 743 pci_unmap_addr(ringp, mapping), 744 pci_unmap_len(ringp, len), 745 PCI_DMA_FROMDEVICE); 746 747 It really should be self-explanatory. We treat the ADDR and LEN 748 separately, because it is possible for an implementation to only 749 need the address in order to perform the unmap operation. 750 751 Platform Issues 752 753 If you are just writing drivers for Linux and do not maintain 754 an architecture port for the kernel, you can safely skip down 755 to "Closing". 756 757 1) Struct scatterlist requirements. 758 759 Struct scatterlist must contain, at a minimum, the following 760 members: 761 762 char *address; 763 struct page *page; 764 unsigned int offset; 765 unsigned int length; 766 767 The "address" member will disappear in 2.5.x 768 769 This means that your pci_{map,unmap}_sg() and all other 770 interfaces dealing with scatterlists must be able to cope 771 properly with page being non NULL. 772 773 A scatterlist is in one of two states. The base address is 774 either specified by "address" or by a "page+offset" pair. 775 If "address" is NULL, then "page+offset" is being used. 776 If "page" is NULL, then "address" is being used. 777 778 In 2.5.x, all scatterlists will use "page+offset". But during 779 2.4.x we still have to support the old method. 780 781 2) More to come... 782 783 Closing 784 785 This document, and the API itself, would not be in it's current 786 form without the feedback and suggestions from numerous individuals. 787 We would like to specifically mention, in no particular order, the 788 following people: 789 790 Russell King <rmk@arm.linux.org.uk> 791 Leo Dagum <dagum@barrel.engr.sgi.com> 792 Ralf Baechle <ralf@oss.sgi.com> 793 Grant Grundler <grundler@cup.hp.com> 794 Jay Estabrook <Jay.Estabrook@compaq.com> 795 Thomas Sailer <sailer@ife.ee.ethz.ch> 796 Andrea Arcangeli <andrea@suse.de> 797 Jens Axboe <axboe@suse.de> 798 David Mosberger-Tang <davidm@hpl.hp.com>

Cache object: e1ad8cf1af446f596f612b11404297a0

FreeBSD/Linux Kernel Cross Reference sys/Documentation/DMA-mapping.txt

FreeBSD/Linux Kernel Cross Reference
sys/Documentation/DMA-mapping.txt