An Overview Of Fermi Architecture


What is Fermi?

A graphics processing unit (GPU) microarchitecture created by Nvidia with the codename "Fermi" was initially made available to consumers in April 2010 as a replacement for the Tesla microarchitecture. It served as the main microarchitecture for the GeForce 400 and GeForce 500 series graphics cards. It was succeeded by Kepler and utilized in conjunction with Kepler in the GeForce 600, 700, and 800 series, the latter two only being found in mobile GPUs. Fermi was employed by Nvidia Tesla computing modules, the Quadro x000 series, and Quadro NVS products in the workstation market. Fermi GPUs for mobile devices were produced in 40nm and 28nm, respectively, for all desktop Fermi GPUs. The earliest microarchitecture from NVIDIA to get support for Microsoft's rendering is Fermi.

An overview of Fermi Architecture:

Up to 512 CUDA cores are available on the first GPU built on the Fermi architecture, which has 3.0 billion transistors. Each clock for a thread, a CUDA core carries out a floating point or integer command. The 16 SMs with 32 cores each include all 512 CUDA cores. The GPU supports up to 6 GB of GDDR5 DRAM memory thanks to its six 64-bit memory divisions and 384-bit memory interface. The GPU is linked to the CPU by a host interface using PCI-Express. Thread blocks are distributed to SM thread schedulers by the Giga Thread global scheduler.




·       

  • Streaming Multiprocessor of Third Generation: The third generation SM features a number of architectural changes that not only increase its power but also its programmability and efficiency.

  • 16 Load/Store Units :Each SM has 16 load/store units, allowing source and destination addresses to be calculated.
  • Four Special Function Units: Transcendental instructions like sin, cosine, reciprocal, and square root are carried out by Special Function Units (SFUs). A warp runs over eight clocks, while each SFU runs one instruction per thread, per clock. The dispatch unit can issue to other execution units when the SFU is occupied thanks to the decoupling of the SFU pipeline from the dispatch unit.
  • 512 High Performance CUDA cores: In comparison to earlier SM designs, each SM has 32 CUDA processors, a fourfold increase. Each CUDA processor contains a fully pipelined floating point unit and integer arithmetic logic unit (ALU) (FPU).Floating point arithmetic was used by earlier GPUs. The fused multiply-add (FMA) instruction is available for both single and double precision arithmetic in the Fermi architecture, which implements the new floating-point standard. By performing the multiplication and addition with just one last rounding step and maintaining accuracy in the addition, FMA outperforms a multiply-add (MAD) instruction. Compared to carrying out the operations separately, FMA is more accurate.

Comments

Popular posts from this blog

Programming Model

The Cache and memory Hierarchy