Emulating DMA Engines on GPUs for Performance and Portability

Supercomputing 2011 Paper

Getting Started

Programming Model

CudaDMA Object API

Version 2.0

CudaDMA Sequential

CudaDMA Strided

CudaDMA Indirect

Best Practices

Buffering Techniques


CudaDMA is a library of DMA objects that support efficient movement of data between off-chip global memory and on-chip shared memory in CUDA kernels. CudaDMA objects support many different data transfer patterns including sequential, strided, and indirect patterns. CudaDMA objects provide both productivity and performance improvements in CUDA code:

By handling the data-movement challenges on GPUs, CudaDMA makes it easier both to write CUDA code and achieve high performance.

Important Note

CudaDMA will only be supported through the Kepler architecture. NVIDIA is considering adopting CudaDMA as a supported CUDA library if there is sufficient interest. Please if you're interested in continuing to use CudaDMA in the future.


The CudaDMA API currently supports two implementations: the original version that was written for Fermi, and a new version that has additional features for targeting Kepler but is also backwards compatible with Fermi.

CudaDMA Version 1.0 (Fermi) contained in include/cudaDMA.h in the github repository.

CudaDMA Version 2.0 (Kepler+Fermi) contained in include/cudaDMAv2.h in the github repository.



Mike Bauer
Henry Cook
Brucek Khailany