Nvidia GeForce GTX 280 and 260
GTX 200 Architecture Features
The chip powering these two new graphics cards, called the GTX 200 chip, is an absolute monster of a processor. Nvidia proudly proclaims that its the biggest processor TSMC (Nvidias primary chip fabrication partner) has ever built. Its not just a clocked-up or expanded version of the G92 chip powering Nvidias most recent high-end cards, but a totally new architecture.
The basic layout of the GTX 280 looks like this: You have a geometry shader processing unit, vertex shader, and setup/raster units. These feed into the unified shader array of no less than 240 stream processing units. Thats 10 blocks of 24 shader units, each block is three groups of eight.
Some L2 texture cache sits between these and the memory interface units, typically referred to as “ROPs” or render back-ends (eight “blocks” of those with four-per-block, which is double the ROP power of the G92).
Each stream processor includes a register file twice the size of those in the stream processors of earlier Nvidia chips, along with a floating point compute unit, an integer compute unit, and a move/compare unit. Theres also a double-precision floating point unit (IEEE compliant), which is useful for some GP-GPU tasks but not particularly handy for graphics.
Call that configuration one processing “core,” if you will. Eight of them each access a 16K block of local shared memory. Three of those eight-processor groups together access the same bank of L1 cache, creating a 24-unit processing “block.” The block has eight texture address/filter units associated with it. The GTX 280 chip has 10 of these blocks, for a total of 240 stream processors and 80 texture units.
In the GeForce GTX 280 products, all of these functional elements are enabled. In the GeForce GTX 260 products, some of the units are disabled. These are typically called “salvage chips,” where GPUs that had some defects can have certain parts disabled and still function well as lower-performing parts, allowing Nvidia to effectively use some of the “bad” chips on a wafer.
In this case, the GeForce GTX 260 cards have two blocks of stream processors disabled for a total of 192 stream processors and 64 texture mapping/filtering units. One of the memory access/ROP units is also disabled, for 24 render back-ends instead of 32.
Whether or not you consider the GTX 200 GPU an extension of the G92 class or not is a matter of perspective. Certainly some of the capabilities are similar, and there are no added support for features present in DirectX 10.1, for instance. On the other hand, there are some significant tweaks to the design.
The scheduler has been upgraded to handle more threads at once, which it would need to do to effectively utilize all those stream processing units. We already mentioned the larger register file and support for double precision floating point math. Instruction co-issuing in the stream processors is now more efficient. Texture filter units employ a more efficient scheduler, which Nvidia claims is 22% more efficient than those in the G92 chip.
The blend rate of the ROP units is doubled (per-ROP) compared to previous generation chips. Geometry shading performance, a real sore spot of the G80 and G92 generation GPUs, has been massively improved in the GTX 200 GPU
Entry filed under: Intel Company.