PDA

View Full Version : RSX compared to other Nvidia graphics processors


Hawk
11-27-2006, 09:02 PM
A friend of mine e-mailed me somekind of comparison about graphics chips. He asked me could this be right. It shows that RSX could be more than 7900 GTX.
die size:
RSX=240 mm2
G70 (7800 GTX)=334 mm2 (110 nm)
G71 (7900 GTX)=196 mm2
G73 (7600 GS)=126 mm2

Manufacturing process
RSX 90 nm
G70 110 nm
G71 90 nm
G73 90 nm

Transistor count:
RSX 300M+
G70 300M
G71 278M
G73 177M

Memory connection:
RSX 128 bit GDDR3 (650-700 MHz) and 128 bit XDR (3,2 Ghz)
G70 256 bit GDDR3 (600 MHz)
G71 256 bit GDDR3 (800 Mhz)
G73 128 bit GDDR3 (700 MHz)

Memory bandwith:
RSX 22,4 GB/s+35 GB/s=57,4 GB/s
G70 38.4GB/s
G71 51.2 GB/s
G73 22.4GB/s

Is there something else we can compare, or is here something wrong?

Links:
http://www.beyond3d.com/index.php
RSX 11.11.06
http://www.beyond3d.com/previews/nvidia/g70/index.php?p=01#chip
http://www.beyond3d.com/previews/nvidia/g71/index.php?p=02
http://www.beyond3d.com/previews/nvidia/g73/index.php?p=01#arch

cornholio12
11-27-2006, 09:14 PM
MILR is that you?

rsx is a 7995

Fats
11-27-2006, 09:38 PM
Wonder where MILR is...

cpiasminc
11-28-2006, 02:06 AM
RSX 128 bit GDDR3 (650-700 MHz) and 128 bit XDR (3,2 Ghz)
I'm pretty sure it's 650 MHz, not 650-700. And XDR is 64-bit, not 128.

RSX 22,4 GB/s+35 GB/s=57,4 GB/s
At 650 MHz, the bandwidth is 20.8 GB/sec, and another thing to point out is that the FlexIO link is a point-to-point bus architecture, which means that the figures given are often the sum of both directions (in+out) because the link doesn't necessarily have to be symmetrical, while XDR and GDDR-3 are memory bus architectures, so the speeds quoted are usually just bandwidths in one direction (both directions being equal).

You might notice that the XDR is quoted at 25.6 GB/sec -- quite less than 35 GB/sec, which wouldn't make much sense until you think of the 35 GB/sec of the FlexIO link being 20 one way and 15 the other while XDR supports 25.6 each way (read/write).

Of course, designating these things differently isn't a totally bad thing because with memory, you're usually only concerned with one direction of data flow at a time, so only getting unidirectional bandwidth is what you care about. Whereas device-to-device bandwidth, you're often dealing with both.

But as far as your figures are concerned, you might want to denote total bandwidths so that numbers are all put in the same perspective.

i.e.
Memory bandwith:
RSX 41.6 GB/s (20.8 each way)+35 GB/s(20 in, 15 out)=76.6 GB/s
G70 76.8 GB/s (38.4 each way)
G71 102.4 GB/s (51.2 each way)
G73 44.8 GB/s (22.4 each way)

While it is a bit twisted, it is at least measuring the same thing for all cases.

Another way of stating it might be --
Memory bandwith:
RSX 20.8 GB/s+20 GB/s = 40.8 in,
20.8 GB/s+15 GB/s = 35.8 out
G70 38.4GB/s in/out
G71 51.2 GB/s in/out
G73 22.4GB/s in/out

Zer0-Sum
11-28-2006, 06:27 AM
Are there any OFFICIAL specs on the RSX? I know it is real, because it is in PS3, but really, why keep said spec sheet a secret. To me it more intriguing of the desire on NVidia's and Sony's part to keep it a secret than the specs sheet itself. Though I would like to see it. Alas, no one can show the info....

Smokey
11-28-2006, 06:20 PM
yeah where the hell is MILR?

Heinrich4
11-28-2006, 07:21 PM
I'm pretty sure it's 650 MHz, not 650-700. And XDR is 64-bit, not 128.


At 650 MHz, the bandwidth is 20.8 GB/sec, and another thing to point out is that the FlexIO link is a point-to-point bus architecture, which means that the figures given are often the sum of both directions (in+out) because the link doesn't necessarily have to be symmetrical, while XDR and GDDR-3 are memory bus architectures, so the speeds quoted are usually just bandwidths in one direction (both directions being equal).

You might notice that the XDR is quoted at 25.6 GB/sec -- quite less than 35 GB/sec, which wouldn't make much sense until you think of the 35 GB/sec of the FlexIO link being 20 one way and 15 the other while XDR supports 25.6 each way (read/write).

Of course, designating these things differently isn't a totally bad thing because with memory, you're usually only concerned with one direction of data flow at a time, so only getting unidirectional bandwidth is what you care about. Whereas device-to-device bandwidth, you're often dealing with both.

But as far as your figures are concerned, you might want to denote total bandwidths so that numbers are all put in the same perspective.

i.e.
Memory bandwith:
RSX 41.6 GB/s (20.8 each way)+35 GB/s(20 in, 15 out)=76.6 GB/s
G70 76.8 GB/s (38.4 each way)
G71 102.4 GB/s (51.2 each way)
G73 44.8 GB/s (22.4 each way)

While it is a bit twisted, it is at least measuring the same thing for all cases.

Another way of stating it might be --
Memory bandwith:
RSX 20.8 GB/s+20 GB/s = 40.8 in,
20.8 GB/s+15 GB/s = 35.8 out
G70 38.4GB/s in/out
G71 51.2 GB/s in/out
G73 22.4GB/s in/out

Thanx a lot for information.

Cpiasminc how general impact in performance of an eventual addition of post-vertex transform cache and compression caches an addition with 96KB per shader pipe(G70 has 48KB and G71=64KB) in the RSX over G71 (performance with GDDR3 only and GDDR3 20.8GB/sec + FlexIO/XDRAM for textures etc)?

antuk15
11-28-2006, 08:23 PM
RSX has a BIGGER die space then ANY of the PC 7*00's PC chips. As discussed about at Beyond3D

F089/H
11-28-2006, 10:55 PM
Cp has got the right Idea...Although you don't need me to tell you that!

cpiasminc
11-28-2006, 11:30 PM
RSX has a BIGGER die space then ANY of the PC 7*00's PC chips. As discussed about at Beyond3D
Relatively speaking, yes, but there are older PC 7xxx built on larger process nodes, so they'll take up more area even with fewer transistors. RSX is only physically larger than other G70 variants at the same 90 nm process node. You might be confusing transistor count with physical size.

Cpiasminc how general impact in performance of an eventual addition of post-vertex transform cache and compression caches an addition with 96KB per shader pipe(G70 has 48KB and G71=64KB) in the RSX over G71 (performance with GDDR3 only and GDDR3 20.8GB/sec + FlexIO/XDRAM for textures etc)?
As far as post-transform caches, if anything, larger caches makes the performance harder to predict. But the main point of it is that you can accomodate left-overs from longer trilists before it starts dropping things out, and if you can save reprocessing verts, the overall net throughput goes up, and when your GPU's vertex processing power is utterly put to shame by your CPU's, it is something of a necessity. How much it goes up is completely a matter of chance (it always is), but of course, if your trilists get too much longer than what can fit in the cache, then it doesn't do anything for you.

The texture caches are just for hiding latency. While the latency to GDDR is really quite good, the latency to XDR is obviously much, much higher if you were to read textures from main memory or render to a main memory rendertarget. So really, its main purpose is to even out the difference so it performs like normal under any conditions. Of course, if you never use main memory texturing or rendertargets, it's a net gain, but nothing to write home about.

MainMan
11-29-2006, 12:00 AM
As far as post-transform caches, if anything, larger caches makes the performance harder to predict. But the main point of it is that you can accomodate left-overs from longer trilists before it starts dropping things out, and if you can save reprocessing verts, the overall net throughput goes up, and when your GPU's vertex processing power is utterly put to shame by your CPU's, it is something of a necessity. How much it goes up is completely a matter of chance (it always is), but of course, if your trilists get too much longer than what can fit in the cache, then it doesn't do anything for you.

Post-transform caching win is not totally a matter of chance. Ever since nvidia chips cached the last few vertices/edges it's been a good idea to preprocess indexed triangle lists for better reuse. Also, if you know the number of cached vertices and replacement strategy it can be used to optimize subdivision of cylindrical skins etc. for optimal vertex reuse.

cpiasminc
11-29-2006, 01:45 AM
Also, if you know the number of cached vertices and replacement strategy it can be used to optimize subdivision of cylindrical skins etc. for optimal vertex reuse.
Again, if you can get away with that. It's really quite rare that you'll even run across a typical content library for 1 game where you can find even a significant percentage of meshes for which that's a feasible thing to do (significant enough that it's worth pursuing). That's what I meant by "matter of chance." It's certainly not something from which that you'll ever get good results totally out of automation -- you get results, but there's never been even a single method that even comes close to impressing me (admittedly, I'm not easily impressed) when given a non-trivial case. Doing it by hand invariably gets the best results, but demands a lot from the artists, and ultimately adds up.

Heinrich4
11-29-2006, 12:56 PM
Relatively speaking, yes, but there are older PC 7xxx built on larger process nodes, so they'll take up more area even with fewer transistors. RSX is only physically larger than other G70 variants at the same 90 nm process node. You might be confusing transistor count with physical size.


As far as post-transform caches, if anything, larger caches makes the performance harder to predict. But the main point of it is that you can accomodate left-overs from longer trilists before it starts dropping things out, and if you can save reprocessing verts, the overall net throughput goes up, and when your GPU's vertex processing power is utterly put to shame by your CPU's, it is something of a necessity. How much it goes up is completely a matter of chance (it always is), but of course, if your trilists get too much longer than what can fit in the cache, then it doesn't do anything for you.

The texture caches are just for hiding latency. While the latency to GDDR is really quite good, the latency to XDR is obviously much, much higher if you were to read textures from main memory or render to a main memory rendertarget. So really, its main purpose is to even out the difference so it performs like normal under any conditions. Of course, if you never use main memory texturing or rendertargets, it's a net gain, but nothing to write home about.

Thanx a lot for the information cpiasminc.

But you still it keeps the estimate of performance RSX like a Geforce 7900GT with use with FlexIO?

And without this flexIO bandwidth(counting only with GDDR3 20.8GB/sec) would be something like in general/overall performance to geforce 7600GT or 7800GT at best hipothesis ( although to know that in vgs the performance gpus is optimized etc...) ?

(and Xenos/R-500/C1 would be similar in performance shaders,MSAA etc to a R-520/Radeon 1800XT under resolutions 720P? )

Hawk
11-29-2006, 11:07 PM
Thanks for the info CPI. I just wonder this memory bandwith in/out issue. I have tried to find this info all over the net, but this is the best thing what I have found.
http://www.hothardware.com/viewarticle.aspx

Memory Bandwidth Calculations:
GeForce 6200s with TurboCache that have only a single memory chip, are limited to only a 32-bit memory bus. So, at 350MHz, this means a 16MB card has 2.8GB/s of bandwidth available to local memory. The 32MB card, because it is equipped with two chips, however, has a 64-bit memory bus for a total of 5.6GB/s of memory bandwidth to local memory. But because the nature of NVIDIA's TurboCache technology allows these cards to render directly to system memory as well, they also can take full advantage of the 8GB/s of bandwidth offered up by a PCI Express X16 graphics slot.

Due to article it says that 8 GB/s memory bandwith, and when you look at this it shoes that it's 4 GB/s in and 4 GB/s out:
http://www.hothardware.com/viewarticle.aspx?page=3&articleid=614

Could it be possible that G71 total memeory bandwith is 51,2 GB/s and that makes 25,6 GB/s in and 25,6 GB/s out. Could you tell me how you found that total memeory bandwith in G71 102,4 GB/s, or is it just something you have to know? Maybe I'm missing something or I just don't understand.

Hopefully you have some link to that information.

cpiasminc
11-30-2006, 01:49 AM
I just wonder this memory bandwith in/out issue.
<snip>
Memory Bandwidth Calculations:
GeForce 6200s with TurboCache that have only a single memory chip, are limited to only a 32-bit memory bus. So, at 350MHz, this means a 16MB card has 2.8GB/s of bandwidth available to local memory. The 32MB card, because it is equipped with two chips, however, has a 64-bit memory bus for a total of 5.6GB/s of memory bandwidth to local memory. But because the nature of NVIDIA's TurboCache technology allows these cards to render directly to system memory as well, they also can take full advantage of the 8GB/s of bandwidth offered up by a PCI Express X16 graphics slot.

Due to article it says that 8 GB/s memory bandwith, and when you look at this it shoes that it's 4 GB/s in and 4 GB/s out:
http://www.hothardware.com/viewarticle.aspx?page=3&articleid=614
Again, just an example of how a point-to-point bus is counted as total bandwidth both read and write summed together, and memory bandwidth isn't.

The main thing you have to realize is that point-to-point busses are not necessarily symmetrical. AGP, for instance gave you high bandwidth from chipset->graphics card, but utterly dismal bandwidth going the other way.

You might notice in the wording of that particular article, they were somewhat careful in saying that a PCI-E 16x slot provides up to 8 GB/sec (4 each way, which can be simultaneous, hence why it's summed), but when talking of local memory bandwidth, they said that a 350 MHz DDR link gave them 2.8 GB/sec to local memory (though from is also valid, but not both).

You simply have to remember that a a memory architecture, by nature, must be bidirectional, but not necessarily full duplex (note that I'm talking about actual RAM architectures, not something like a small buffer you fill with data that's meant to be streamed off somewhere). I'm sure there are asymmetrical memory architectures out there, but I believe they would be explicitly stated as having different bandwidths each way, but more importantly, when you're dealing with point-to-point bus architectures, your concerns are a little different than with memory. Bus links like FlexIO or Hypertransport are likely to be used for tons of simultaneous transfers, so it's fair to think of the aggregate bandwidth. With memory, the scope of concern is usually individual requests or streams, and an individual operation will only go in one direction.

Could it be possible that G71 total memeory bandwith is 51,2 GB/s and that makes 25,6 GB/s in and 25,6 GB/s out. Could you tell me how you found that total memeory bandwith in G71 102,4 GB/s, or is it just something you have to know? Maybe I'm missing something or I just don't understand.

Hopefully you have some link to that information.
Well, it's mainly a matter of convention, but that's not quite what I mean -- I admit I expressed it badly, though I tried to give multiple ways of organizing the data and referred to the first set as "twisted" to get the point across.

I was just referring to expressing everything as total sums or as separate read/write bandwidths since having quantities added together when they're not really measuring the same thing bothers me(GDDR bandwidth being measured differently than FlexIO is a bit off). I would have less issue if this wasn't really related to a single device working in multiple memory pools simultaneously (as opposed to, say, a CPU accessing memory as well as streaming data off to the GPU, which is a case where the two bandwidths aren't really worth summing at any point anyway).

Unless GDDR-3 is dual-ported (which I doubt, but people are allowed to do that if they felt it necessary), you can't really get bandwidth both ways at the same time, so the unidirectional bandwidth is the total bandwidth you can get. XDR does allow for simultaneous read/write, according to Rambus, but because it's a serial bus, I doubt this is done via a full duplex layout (I could be wrong on this... I've never really bothered to look it up). So the 20.8 GB/sec is the bandwidth that RSX has to its local memory... it's just that adding 20.8 to 35 for FlexIO that I didn't care for because one you can go both ways simultaneously, but the other, you can't.

MainMan
12-02-2006, 11:39 PM
Again, if you can get away with that. It's really quite rare that you'll even run across a typical content library for 1 game where you can find even a significant percentage of meshes for which that's a feasible thing to do (significant enough that it's worth pursuing). That's what I meant by "matter of chance." It's certainly not something from which that you'll ever get good results totally out of automation -- you get results, but there's never been even a single method that even comes close to impressing me (admittedly, I'm not easily impressed) when given a non-trivial case. Doing it by hand invariably gets the best results, but demands a lot from the artists, and ultimately adds up.

It's worth optimizing a bit when you are reusing a mesh over and over with different textures... suppose you are doing a general character skin or an object you instantiate a lot. Actually I coded a quicky that did best effort vertex reordering and I hazily remember the numbers going from an average of ~2 new vertices per triangle in a ~1500 poly character mesh output by Maya to something like ~1.3 new vertices per triangle on the optimized skin. That was assuming GFX chip cached the last 3 vertices/edges. Rendering speed went up by a bit... I didn't measure it exactly because the texture was insanely high-res and not yet mip-mapped in R&D stage. Anyways, with PS3 sized cache and a similarly detailed mesh I'm confident simple pre-processing heuristics can get you somewhere on the low side between 1/(vertex reuse) and 1.3 of new vertex work per triangle.

cpiasminc
12-03-2006, 07:56 PM
Actually I coded a quicky that did best effort vertex reordering and I hazily remember the numbers going from an average of ~2 new vertices per triangle in a ~1500 poly character mesh output by Maya to something like ~1.3 new vertices per triangle on the optimized skin.
Starting point of 2 verticies per triangle on a character sounds pretty scary to me. I expect that of trees or something, but for a character, that sounds like your starting point was pretty dismal. In my case, I tend to see an average pretty close to 1.33 vertices per triangle to begin with, and that's without optimizations on the part of the exporter or efforts on the part of the artists... Granted, a lot of that is due to just mesh complexity (5000 polys is pretty average for a character), which makes for a lot more room for vertex sharing. I've seen more exceptional cases when artists optimized meshes by hand (we have some artists who are specifically "technically-inclined" artists for such purposes), and things get really interesting then.

MainMan
12-11-2006, 12:35 PM
Starting point of 2 verticies per triangle on a character sounds pretty scary to me. I expect that of trees or something, but for a character, that sounds like your starting point was pretty dismal. In my case, I tend to see an average pretty close to 1.33 vertices per triangle to begin with, and that's without optimizations on the part of the exporter or efforts on the part of the artists... Granted, a lot of that is due to just mesh complexity (5000 polys is pretty average for a character), which makes for a lot more room for vertex sharing. I've seen more exceptional cases when artists optimized meshes by hand (we have some artists who are specifically "technically-inclined" artists for such purposes), and things get really interesting then.

The recipe for a vertex mess was as follows. The artists did a low poly mesh, refined it, designed details for something like 5000-10000 polys, simplified it, refined some pieces again, cut in half vertically, and mirrored a half to get a symmetrical skin. Hard edges multiplied vertex instances due to different normals and texture changes further degraded vertex sharing, expecially with stupid old versions of directx where you had to pack everything per vertex.