The Cache and memory Hierarchy
- Registers
- L1+Shared Memory
- Local Memory
Local memory
is intended to be a location in the memory used to store "spilled"
registers. When a thread block needs more register storage than an SM can
provide, register spillage happens. An automated variable typically sits in a
register, with the following exceptions: (1) Constant numbers are used to index
arrays that the compiler cannot determine; (2) Massive arrays or structures
that would take up too much register space; Whenever a kernel utilises more
registers than are accessible on the SM, the compiler may opt to spill any
variable to local memory.
- L2 Cache
All loads and stores from and to global memory, including copies to and from the CPU host and texture requests, are handled by a 768 KB shared L2 cache amongst the 16 SMs. In order to control access to data that needs to be shared across thread blocks or even kernels, the L2 cache subsystem additionally implements atomic operations.
- Global memory
Global memory access on Fermi is relatively long when compared to shared memory access and accessible to the host and all threads
(CPU).A long latency(400-800 cycles).
With a configurable, error-protected
memory hierarchy and support for languages like C++ and FORTRAN, Fermi is the first
computing architecture to provide such a high level of doubleprecision
floating-point performance from a single chip. Fermi is the first full-featured
GPU computer architecture in the world as a result.
Comments
Post a Comment