Learn about CUDA
(2010-11-28 10:58:28)
标签:
cudagpuit |
分类: 学术科研--为伊消得憔悴 |
Define: an acronym for Compute Unified Device
Architecture
Advantages over GPGPU:
Scattered reads – code can read from arbitrary addresses in
memory.
Shared memory – CUDA exposes a fast shared memory region (up
to 48KB in size) that can be shared amongst
threads.
Faster downloads and readbacks to and from the GPU
Full support for integer and bitwise operations, including
integer texture lookups.
Limitations:
Fermi GPUs(compute capability 2.0) have(nearly) full support
of C++, but the member functions can't be virtual.
Texture rendering is not supported.(we don't concern!)
Double precision only supports round-to-nearest-even and chop
operations.
The bus bandwidth and latency between the CPU and the GPU may
be a bottleneck.
Threads should be running in groups of at least 32 for best
performance, with total number of threads numbering in the
thousands. Branches in the program code do not impact performance
significantly, provided that each of 32 threads takes
the same execution path.