PDA

View Full Version : The memory bandwidth of RSX does not bug me anymore!


Crossbar
04-30-2006, 10:44 AM
After reading these posts by nAo at B3D, I am no longer that concerned about the RSX memory bandwidth. The RSX obviously handles the shrinked memory bus (compaired to G70,G71) through rendering certain pixels to local store and by utilising compression for HDR and AA somehow, beside the FlexIO path to the XDR memory.
If they didn't have EDRAM they'd need a 256bit bus or two bus' to make any sort of high def rendering practical.I'm not sure I agree with this..;)
Opaque pixels are likely to be processed with long shaders, hence most of the available per pixel bandwidth would be used to fetch textures, not color + z.
Tiling is interesting, but I've yet to see real evidence that it actually reduces transistor counts at a given performance level. Just to make it clear, I was not advocating any kind of tiled rendering GPU, I was thinking that a small on chip tile cache would use much less transistors than n MBytes of edram.
I'd like to use those transistors for more ALUs..
I also believe that next gen consoles CPUs might be very good at tiling geometry ;)
Basicly one would render all the opaque pixels as in current GPUs (that tile cache I'm talking about would not be even used in this case) and all the transparent pixels in tile order.
A relatively cheap way to achieve near-to-theoretical fillrate without devoting tons of transistors to edram.
I think it's way to early to be declaring MS's or Sony's graphics chip choices bad or good, we have to wait and see.
I'm not advocating Xenos nor RSX, I was just expressing an opinion on a hypothetical GPU :)

Marco

Empirically, if that was the case, I'd expect to see a much lower hit for antialiasing than what we see in games on the PC (where cards have twice the BW per ROP/shader unit), especially in scenes where there aren't many transparent pixels on the screen.I have completely different empirical data..but you know, closed platforms are different ;)
I dunno, I just don't see how the bandwidth usage per pixel will stay the same if we want to advance graphics. Indeed, math ops will continue to go up, but to say the number of pixels drawn and textures accessed (for a given resolution) will decrease or stay constant is dreaming, IMHO. Smoke/fog/fire/dust/fur/grass will always look better with more pixels. That's definitively true, but I'm not saying edram is not useful, I'm saying I can live without it and I'd prefer to spend the same amount of transistors on more imo useful features.
Post processing is used more nowadays, and it needs plenty of bandwidththough on Xenos edram is not helpful here..
I kept thinking they just wrote horrible code, but seeing it again and again makes me think they must be doing something useful that gobbles the power.
...
Regarding bandwidth, IHV's obviously make their decisions for a reason, and stripped down value cards with half the bus width show notable performance drops too.
Gimme more compression..:)
I was speaking from a console dev perspective, in the next 2 or 3 years you will be surprised from what this half bus GPUs can do.. no doubts about it :)

Marco
It will be very interesting to get the details about this. As nAo seems quite relaxed about talking about these things, some information under NDA will probably be disclosed soon.

The details so far really follow in the Sony tradition of using the transistors as smart as possible, I am happy to see this. ;-)

Applefiend
04-30-2006, 11:15 AM
I honestly don't believe Sony would bang out a budget GPU after spending all that time and money on Cell, only to cripple it. That's the assertion.

Sony are a very smart company with hardware, just as Nintendo are a very smart company from a game design standpoint, and you know... Microsoft are good with system software, forcing the competition out of business and have deep pockets. Unfortunately now, possibly after E3 and maybe up to a year after PS3 release it's all a bit of a faith based initiative though. :D

Darkon
04-30-2006, 12:38 PM
i'm still wondering to what extent RSX is exactly modified

Viano
04-30-2006, 12:42 PM
i'm still wondering to what extent RSX is exactly modified

love.

overclocked
04-30-2006, 12:45 PM
I think we will se a PS4 and Xbox720, then they will merge or something..

xbdestroya
04-30-2006, 02:29 PM
Ok this is the third time these nAo quotes have been posted Crossbar, just for the record. ;)

Anyway even though I believe that there is *something* different/special about the cache in RSX, I don't think we can 100% assume that nAo is speaking to the RSX here specifically. It certainly seems like he might be, but he does say the discussion is centered on a hypothetical GPU. Since nAo is prone both to 'nudge nudge, wink wink' style information sharing *and* just pure architectural hypothesization/analysis, it's hard to know which one it is in this case - nAo talking about RSX or nAo stating eDRAM alternatives independently of the reality.

overclocked
04-30-2006, 02:54 PM
I dont think this should be a "subforum" for b3d, dont know what you think about that XB? I had an idea earlier that i will take behind the scenes with you about making some tech-related stuff easy to come by.

woundingchaney
04-30-2006, 02:57 PM
I dont think this should be a "subforum" for b3d, dont know what you think about that XB? I had an idea earlier that i will take behind the scenes with you about making some tech-related stuff easy to come by.
Unfortunetely many sites have become a sub forum for B3D.

version
04-30-2006, 03:05 PM
ps2's vram bandwith was 50GB/s
resolution on ps3 will be 8 times bigger(in kutaragi's world 32 times, because 2*1920*1080 and 120 fps , damn)
for rsx need 300GB/s - 1 Terabyte/s

possibility: tilecache or edram

rog27
04-30-2006, 04:15 PM
I love when pc devs/journalists/hardware people get their metaphorical asses handed to them by console devs, especially when said pc people assume they know as much as console people in the know. PC people don't seem to grasp closed box systems and how, though they're a bit more cumbersome to work with, they are worthy long-term investments when you fully wrap your head around them.

xbdestroya
04-30-2006, 04:21 PM
I dont think this should be a "subforum" for b3d, dont know what you think about that XB? I had an idea earlier that i will take behind the scenes with you about making some tech-related stuff easy to come by.

Feel free to PM me with any ideas, because I'm always open to ideas! :)

I have mixed feelings on this particular matter, but at the end of the day I think there can be no denying that the forums share a sort of inherent link if only because so many of us are members on both. I mean in this thread alone we have Crossbar the thread starter, myself the forum moderator, you the annoyed party, and Version the version - all B3D members! Even going beyond B3D (BB3D?), we've got members here on the Gamespot forums, on the IGN forums, on the GAF forums, on this and that German forum, etc etc... and certainly I enjoy it when they post something that I otherwise would not have seen. I think it all leads to these forums being one of the 'quickest' sources of PS3 information, regardless of where it originates. Not only that, but I frequently enjoy the different perspectives and opinions brought up in dicussion here on such topics.

Anyway I know where you're coming from though in being annoyed at sometimes reading the same thing in multiple forums, in multiple threads, in multiple posts.

PS - And then Rog posted while I was writing this, so another member!

By the way Rog I agree with you, console devs - or I guess more specifically devs that had to go through PS2's trial by fire - seem more prepared to get their hands dirty with understanding the architectures and thinking outside of the PC 'paradigm' than do a lot of other devs in the gaming industry. nAo himself though was not a game developer until last year; even before that point, however, I think it was clear that he was someone ready to think very outside the box and something of a visionary as to the potential applications of these architectures.

overclocked
04-30-2006, 05:12 PM
Feel free to PM me with any ideas, because I'm always open to ideas! :)

I have mixed feelings on this particular matter, but at the end of the day I think there can be no denying that the forums share a sort of inherent link if only because so many of us are members on both. I mean in this thread alone we have Crossbar the thread starter, myself the forum moderator, you the annoyed party, and Version the version - all B3D members! Even going beyond B3D (BB3D?), we've got members here on the Gamespot forums, on the IGN forums, on the GAF forums, on this and that German forum, etc etc... and certainly I enjoy it when they post something that I otherwise would not have seen. I think it all leads to these forums being one of the 'quickest' sources of PS3 information, regardless of where it originates. Not only that, but I frequently enjoy the different perspectives and opinions brought up in dicussion here on such topics.

Anyway I know where you're coming from though in being annoyed at sometimes reading the same thing in multiple forums, in multiple threads, in multiple posts.

PS - And then Rog posted while I was writing this, so another member!

By the way Rog I agree with you, console devs - or I guess more specifically devs that had to go through PS2's trial by fire - seem more prepared to get their hands dirty with understanding the architectures and thinking outside of the PC 'paradigm' than do a lot of other devs in the gaming industry. nAo himself though was not a game developer until last year; even before that point, however, I think it was clear that he was someone ready to think very outside the box and something of a visionary as to the potential applications of these architectures.

Yeah will definately do. Its as you say, its very much of the same crossreading. I answer you in a thread on B3D then here, then a new thread is started here witch i think is new, it gets confusing and timeconsuming.
There must be ways to structure up things a little better from my and certainly you and others point of view :thumbl:

Killing Moon
04-30-2006, 05:40 PM
Sony are a very smart company with hardware, just as Nintendo are a very smart company from a game design standpoint, and you know...
Not the throw the thread off subject, but I don't think that was a good analogy.

version
04-30-2006, 06:26 PM
sony able to sell a lowcost hardware ,as though that would be is the best ....

Crossbar
04-30-2006, 09:01 PM
Sorry about the double post I actually followed the "Watch Impress" thread a couple of pages back, but I honestly couldn't see that nAos post was the focus of the discussion.

Anyway, what I really wanted to do was kick an old thread on this topic, where we actually discussed some of these solutions, but it had expired.:shrug:

@overclocked: I have kind of mixed feelings for this kind of "sub forum" posts, but as this forum tries to collect relevant information about PS3 from all over the world, I guess it's inevitable that we sometimes will pick up subjects and posts from B3D as there pop some relevant information from time to time. :)

LiquidEagle
04-30-2006, 09:09 PM
I haven't been to B3D but I think this forum has a personality & vibe it could definitely call its own :-p

btw Crossbar, a good way to not let the RSX's mem bandwidth bug you is to keep yourself ignorant of facts & such. The bandwidth never bugged me, 'cause I never knew what difference it makes! :laugh:

xbdestroya
04-30-2006, 09:11 PM
Sorry about the double post I actually followed the "Watch Impress" thread a couple of pages back, but I honestly couldn't see that nAos post was the focus of the discussion.

Anyway, what I really wanted to do was kick an old thread on this topic, where we actually discussed some of these solutions, but it had expired.:shrug:


Oh that old RSX discussion expired? I hate how these threads do that. I completely understand though you wanting to start a new discussion on it, if only because a lot of the other thread has been consumed with screenshot comparisons and Rev/Wii talk of late.

Well for my part I'll say between nAo's comments the past 48 hours and Barbarians comments a couple of months ago, I'm definitely feeling a little bit more calm on the bandwidth issue, even if we don't really have explicit confirmation yet.

Crossbar
04-30-2006, 09:47 PM
btw Crossbar, a good way to not let the RSX's mem bandwidth bug you is to keep yourself ignorant of facts & such. The bandwidth never bugged me, 'cause I never knew what difference it makes! :laugh:
Thanks, I will keep that in mind next time something irrelevant starts to bug me. :aimhappy:

Crossbar
04-30-2006, 09:55 PM
Well for my part I'll say between nAo's comments the past 48 hours and Barbarians comments a couple of months ago, I'm definitely feeling a little bit more calm on the bandwidth issue, even if we don't really have explicit confirmation yet.
Yeah, the texture cache size of Barbarian keeps staying secret, but according to nAo the tile cache he wants to use doesn't need to be larger than 64x64 make that 128 bit HDR and it is 65 kB. I wonder if these caches are parts of a common larger local store or if it's divided in separate stores.

Annoying that the information only leads to more questions.:dazed:

version
04-30-2006, 10:08 PM
i see 2 possible choices:

1. edram , minimum 16MB, this is 120 million transistors, and a relatively small gpu with 200millions trans, similar than xenos

2. a "big" cache for textures and tiles, 2-4MB , 16-51 million transisors, +logic

which cheaper? better?

Crossbar
05-01-2006, 05:12 AM
i see 2 possible choices:

1. edram , minimum 16MB, this is 120 million transistors, and a relatively small gpu with 200millions trans, similar than xenos

2. a "big" cache for textures and tiles, 2-4MB , 16-51 million transisors, +logic

which cheaper? better?
If you are not going for EDRAM you are going for SRAM (which is considerable faster, and you don't need some esoteric process for it) and that is usually 6 transistors per cell, can be 4 but that is rarely used today in CPUs and then you need some address logic of course.

My guess it's will be < 512 kB SRAM cache.

8_Bit
05-01-2006, 12:37 PM
Good info here. Thanks for this thread.

I hadn't even thought of the possibilities of the potential of some sort of cache on the RSX GPU. It makes sense.

overclocked
05-01-2006, 02:19 PM
i see 2 possible choices:

1. edram , minimum 16MB, this is 120 million transistors, and a relatively small gpu with 200millions trans, similar than xenos

2. a "big" cache for textures and tiles, 2-4MB , 16-51 million transisors, +logic

which cheaper? better?

Cant help to like your posts..
They just "pops" out.. And then silence.:hugegrin:

rog27
05-01-2006, 03:36 PM
It's already been mentioned over at B3D that going with embedded texture/tiling cache would have been a much better use of transistors than the Edram option.

And we pretty much know NVidia/Sony opted for this route with the RSX, as Barbarian has alluded to in the past.

version
05-02-2006, 04:25 AM
http://web.axelero.hu/varga1973/rsx.JPG

version
05-02-2006, 04:39 AM
http://patft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=1&f=G&l=50&co1=AND&d=PTXT&s1=sony&s2=superconnect&OS=sony+AND+superconnect&RS=sony+AND+superconnect

yoshaw
05-02-2006, 04:46 AM
http://patft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=1&f=G&l=50&co1=AND&d=PTXT&s1=sony&s2=superconnect&OS=sony+AND+superconnect&RS=sony+AND+superconnect

Can you explain that in laymans language? And also this, which company does that patent benefit more than the other?

ddaryl
05-02-2006, 12:22 PM
I think we will se a PS4 and Xbox720, then they will merge or something..

merge... Sony and MS on a single game console... It will never happen !

Why would Sony want to shrink there profit margin to accomodate MS ?

gablar16
05-02-2006, 01:18 PM
On being a sub forum for B3D:

I absolutly love PSInext. I've really learned a lot in it and everyday I learn more. I've learned so much, in fact, that I can begin to understand the much more technical jargon in B3D. I usually browse both forums and when they are having a technical discusion at B3D that too complicated, PSInext usually breaks it down for me. I think the interaction between the two forums is vital.

On using the LS for tiling:

IS it even posible to use it that way? I would love to hear the opinion of the techies if anyone have any. Either way its very exiting times for technology fans everywhere.

xbdestroya
05-02-2006, 04:09 PM
On being a sub forum for B3D:

I absolutly love PSInext. I've really learned a lot in it and everyday I learn more. I've learned so much, in fact, that I can begin to understand the much more technical jargon in B3D. I usually browse both forums and when they are having a technical discusion at B3D that too complicated, PSInext usually breaks it down for me. I think the interaction between the two forums is vital.

That's very true Gablar, I feel similarly about the roles the forums can play.

EDIT: We'll just go with the below. ;)

cpiasminc
05-02-2006, 06:44 PM
On using the LS for tiling:

IS it even posible to use it that way? I would love to hear the opinion of the techies if anyone have any. Either way its very exiting times for technology fans everywhere.
Hmmm... if you're referring to things like framebuffer post-processing across several SPEs, then yeah, you'll basically have to construct mini-tiles to fit within the SRAMs.

If you're referring to using a small tile cache and committing results to a framebuffer in VRAM, then that's certainly possible. Essentially, it's not so far removed from what Xenos does (when tiling, that is), it's just that with Xenos, the framebuffer space and the tile cache are in the same eDRAM.

If you're keeping a main framebuffer out in external "VRAM," the benefits are overall less than with a full eDRAM/SRAM framebuffer (because Xenos' eDRAM can't fit any type of framebuffer larger than 32bits, it cannot normally commit tiles to the framebuffer in full-quality HDR). Just note that whatever amount of space you've got, that cache must be used for at least 2 tiles. Must be 2 because you want to be able to actively render to one tile while the other is being committed out to the main framebuffer.

If you can break up the geometry perfectly (which is in fact the main difficulty with tiling no matter the platform) so that you move from tile to tile and all the geometry for a given tile is sent together. You don't want to jump from tile to tile, especially not when alpha-blending since alpha blending needs to readback from the buffer. Any geometry that crosses tile borders needs to be processed multiple times, but that cost is relatively small. The smaller the tile, the bigger the pain in the neck this gets.

Overall, the main advantage includes the fact that you have less latency and more bandwidth when working with a tile. The bandwidth strain on a commit is basically no big improvement, but the more you fill in one pass over a tile, the better, as it would mean fewer framebuffer writes.

Various lower-tier GPU manufacturers have attempted this before on the PC with limited success. But they at least showed that low-power, low-spec graphics cards that could be sold for dirt cheap could be within 10-15% below the top of the line at the time. As resolutions increased though, I think the scaling started to falter. Having more and more resolution often meant more and more tiles because cache capacities wouldn't scale as quickly... and that meant that performance would dip if you didn't explicitly optimize your geometry delivery for the tile alignment.

Crossbar
05-02-2006, 08:43 PM
Interesting Cp!!! I have a few questions on this.
Hmmm... if you're referring to things like framebuffer post-processing across several SPEs, then yeah, you'll basically have to construct mini-tiles to fit within the SRAMs.
Would it be possible to render tiles directly to the SPEs LS and letting the SPE assemble the result to the final frame buffer in the GDDR3 memory? If possiblé what kind of benfits would that provide?

If you're referring to using a small tile cache and committing results to a framebuffer in VRAM, then that's certainly possible. Essentially, it's not so far removed from what Xenos does (when tiling, that is), it's just that with Xenos, the framebuffer space and the tile cache are in the same eDRAM.

If you're keeping a main framebuffer out in external "VRAM," the benefits are overall less than with a full eDRAM/SRAM framebuffer (because Xenos' eDRAM can't fit any type of framebuffer larger than 32bits, it cannot normally commit tiles to the framebuffer in full-quality HDR). Just note that whatever amount of space you've got, that cache must be used for at least 2 tiles. Must be 2 because you want to be able to actively render to one tile while the other is being committed out to the main framebuffer. nAo specifically said that it was not tiling similar to the 360 he had in mind, but just rendering "fast relatively simple transparent pixels" to the tile cache which provides "huge fill rate" and having the CPU to tile the transparant geometry. What do you make out of that?
If you can break up the geometry perfectly (which is in fact the main difficulty with tiling no matter the platform) so that you move from tile to tile and all the geometry for a given tile is sent together. You don't want to jump from tile to tile, especially not when alpha-blending since alpha blending needs to readback from the buffer. Any geometry that crosses tile borders needs to be processed multiple times, but that cost is relatively small. The smaller the tile, the bigger the pain in the neck this gets.

Overall, the main advantage includes the fact that you have less latency and more bandwidth when working with a tile. The bandwidth strain on a commit is basically no big improvement, but the more you fill in one pass over a tile, the better, as it would mean fewer framebuffer writes.nAo is also talking about compression in connection to the discussion of bandwidth requirement for MSAA. Would it be possible to have compressed data in the frame buffer. From what you said earlier about all compression technologies reducing quality I find this hard to believe. Do you have a clue what he has on his mind?

cpiasminc
05-02-2006, 10:17 PM
Would it be possible to render tiles directly to the SPEs LS and letting the SPE assemble the result to the final frame buffer in the GDDR3 memory? If possiblé what kind of benfits would that provide?
Hmmm... the possibility side of things depends a lot on the way the SPE LSes are addressed (which is more of an OS-specific thing). There's really not much point in having the SPEs stitch together tiles. While they could theoretically do it, a small problem with that is that the SPEs won't really have any obvious way of knowing *when* to commit a tile. About the only real advantage I can see is just that you can have a lot more working tiles (i.e. 5 working tiles while the 6th is being committed to the framebuffer). But I think the latency disadvantages and the fact that you have to idle the SPEs during rendering so that the GPU can take its time to fill all the pixels bound for a given tile far outweigh any advantages, though.

nAo specifically said that it was not tiling similar to the 360 he had in mind, but just rendering "fast relatively simple transparent pixels" to the tile cache which provides "huge fill rate" and having the CPU to tile the transparant geometry. What do you make out of that?
Yeah, he's basically referring to the fact that the opaque pixels are the ones that are costly on pixel shaders (these are usually the ones that see the loads of dynamic lighting and per-pixel lord-knows-what), but not so much on bandwidth because they don't vary so much on a sample-level and they are simply a direct write-to-framebuffer.

The transparent pixels tend to be light on the pixel shading, but they're heavy on fillrate -- partly because they would be overdraw-dependent, and partly because blending is a read+write operation. What he's referring to is the idea of letting the tilecache be used specifically for those transparent pixels that need the kind of fillrate that a local SRAM can give you. It's simply a cheaper (from the hardware side) way of getting that near-theoretical fillrate. It would also mean that on the shader-expensive stuff, you wouldn't bother yourself with worrying about tiling... just on the shader-cheap fillrate-expensive stuff.

nAo is also talking about compression in connection to the discussion of bandwidth requirement for MSAA. Would it be possible to have compressed data in the frame buffer. From what you said earlier about all compression technologies reducing quality I find this hard to believe. Do you have a clue what he has on his mind?
I think he has something different from the likes of DXT3/4 and JPEG. While JPEG on the frames of video may not be as noticeable, JPEG on textures generally is mainly because of what is static and what is dynamic about it. With textures, since the textures aren't really changing, you'll get the time to notice problems, whereas with framebuffer compression, the errors are there for a fraction of a second.

Either way, My first guess would be that he's thinking more along the lines of HDR framebuffer formats. NAO32 is nice for instance, but it would be all the sweeter if it was natively supported on the hardware, which would mean there wouldn't be the need for all the converting back and forth so that alpha blends would be possible. SGI had been using it for years for their HDR renders so that they could get the quality they needed in a compact format.

NPR throws a few wrinkles into the problem, though. The closer to photoreal an image is, the better it will compress using conventional techniques. With a luminance-chrominance format like Luv or YCbCr, you can take advantage of the low variance of chrominance to store those channels subsampled. With a photograph, it's often hard to tell apart 24 bpp from 9 bpp using this sort of technique. But it can suck on NPR (e.g. toon-shaded, aEmber ;), etc. )

gablar16
05-03-2006, 12:12 AM
That's very true Gablar, I feel similarly about the roles the forums can play.

EDIT: We'll just go with the below. ;)


Thanks XBD, this is a great forum thanks in part for great mods like yourself

CPI and crossbar thanks for the tech info. I wont pretend I understood evything you said but I think I get the jest of it. It it is theoretically possible to do it but it just wouldnt work like in the 360. It would be optimized for certain effects (opaque pixels??).