View Full Version : Ray Tracing on Cell: Paper, Slides, PS3-Specific Info
8_Bit
10-18-2006, 04:10 AM
Was just trying to do a bit of following up a bit on something I read on NeoGAF, supposedly attributed to IBM Cell guy Barry Minor, (which turned out not to be the case, but turned out to have some interesting info from Mr. Minor on procedurally-generated content nonetheless,) which led me to wonder if there was anything new out about Ray Tracing on the Cell.
Turns out, the IEEE had a conference in Sept. all about the future of interactive ray tracing, and a paper (by the SCI Institute at the University of Utah's Carsten Benthin, Et. Al.) was presented specifically about ray tracing on the Cell Processor, including extrapolated benchmarks of the PS3-Cell running their Ray Tracing scheme: ("[A single SPE in] Cell is 4-8 times faster than a [single-core] x86 CPU") ...and a whole lot of other info, some of which is way over my head, even though I have a pretty decent grasp on most of these concepts.
For those of you with no knowledge of what ray tracing is, I'm afraid I don't have the time or energy to go into a detailed explanation of even the basics, (though that information can easily be found in many places via a web search) but it is safe to say that ray tracing is one of the "holy grails" of advanced graphics technology, and what separates "Toy Story-like" graphics from actual "Toy Story" graphics. Total ray tracing of graphics in real-time on consoles is probably still a concept for the future, but partial use of ray tracing for specific objects in a scene or for specific effects now seems to be within the grasp of PS3's advanced architecture.
A few notes on their setup: they are running on a twin-Cell Mercury blade, not the PS3. They extrapolated their PS3 benchmarks based on what they know from working down to the single SPE level though, so their rough numbers should be accurate. Their conclusion of the benchmarks: "We have shown how to efficiently map the ray tracing algorithm to the Cell processor, with the result that a single SPE achieves roughly the same traversal performance as the fastest known x86- based systems, and using all of a Cell’s SPEs yields nearly an order of magnitude higher traversal performance than on an Opteron."
A few other points of interest:
They note that one of the more significant problems right now in their development on the Mercury system is shading, concluding, "The remaining bottleneck is shading," but noting earlier in the paper that " Ideally, the Cell would be used as a ray tracing processor only, with shading being done on a GPU. In a Playstation 3, for example, GPU and Cell have a high-bandwidth connection, and sending rays back and forth would be feasible. In that setup, the GPU could do what it’s best at—shading—and the Cell would only trace rays. Since we currently do not have a Playstation 3, yet, we have to temporarily realize the shading on the Cell," and also, "As a Cell will be used as the CPU for the PlayStation 3, a direct high-bandwidth connection to the PS3’s GPU will exist. If a sufficiently high ray traversal performance could be achieved and the shading could entirely be done on the GPU, [which, from comments from other developers we've heard appears to be the case] ray traced effects could finally be delivered to commodity game consoles."
They try to look into potential future iterations of the Cell Processor, stating "As the SPEs are exclusively designed for high clock rates, we can expect future versions of the Cell processor to have a higher clock rate and an increased number of SPEs. Even the current generation of SPEs has been reported to run stable at 5.2 GHz[4], so we can expect a great performance boost from future generations." In the slides, they envision a future cell with:
">64 SPEs @ 3.2 GHz ~ 1.6 TFlops
–Architectural improvements, e.g.
larger local store, new instructions, better branch prediction, higher memory bandwidth,... "
...so the Cell we may see in the future looks like an absolute monster!
For those of you that are into this sort of thing, I present these links for your perusal and consideration:
The paper:
http://www.sci.utah.edu/~wald/Publications/2006///Cell/download//cell.pdf
The slides:
http://www.sci.utah.edu/~wald/RT06/papers/cell_vortrag_final.pdf
Interesting stuff here, to be sure. For those of you who like to dig deeper into this highly-technical stuff, enjoy.
.
.
Siraris
10-18-2006, 04:43 AM
Fascinating stuff. Although what's the point of "shading" the rays? You use the rays to calculate lighting on objects and then shade those objects accordingly.
Coded-Dude
10-18-2006, 05:21 AM
excellent read(now I have to visit the links)
thanks for postingthis stuff......+ rep
Garfunkel
10-18-2006, 05:23 AM
i look forward to a nice bright future for cell, do you think we will see partial ray tracing in future games on ps3?
Nice find.
I think I remember reading somewhere that the clouds in Warhawk are done via raytracing. I could be wrong though. Although I of course cannot tell if they were simply by looking at them, they looked pretty nice when I saw the game.
Cell is indeed a beast.
Siraris
10-18-2006, 05:52 AM
The clouds in Warhawk are done with Raycasting, which is not actually Raytracing. It's a less powerful form of tracing.
xbdestroya
10-18-2006, 06:00 AM
Thanks for the slides 8 Bit.
I have to say though, that they don't really provide us with any information we didn't already have with regard to Cell and raytracing. You give a good synopsis of what they cover in your post however.
Frankly, I'll be interested to see what the architectural revisions for Cell end up being as time goes on. If they occur at all would be a testament to the architecture's success...
We've got the improved DP-performance variant supposedly in the works at IBM, so I guess that's phase 1. And Toshiba's working on their stuff as well, centered mainly around the embedded space (as discussed in a very recent thread).
I'm honestly more interested in seeing how *many* variants and revisions of Cell we actually get rather than just the straight-line evolution that may occur in the context of HP-computing. In that vein though (HPC), I think the directions and trends assumed by the Utah guys are all on the right track. I'd question the branching build-out on the SPEs though and wonder instead about more robust PPU replacements (perhaps OOE).
Lekko
10-18-2006, 06:00 AM
i look forward to a nice bright future for cell, do you think we will see partial ray tracing in future games on ps3?
Warhawk already apparently uses raytracing on the clouds, although we're not sure exactly what capacity or what method they are using to accomplish that. It's more than likely a form of raytracing, but not quite the kind you hear about in articles.
Lekko
10-18-2006, 06:39 AM
huh... interesting bit of news here: In the 'results' slide, there are three cell numbers: Single-cell, Dual-cell, and PS3-cell. Now the Single-cell and PS3-cell should be about the same, but it shows that the PS3 cell outperforms the single-cell in every benchmark.
I already figured that the Cell in the PS3 would be different than the ones in the servers, knowing that the PS3 version would have one SPE deactivated, and one dedicated to the OS (which they are probably using all eight anyways). But given that, then PS3 cell should be evenly matched with the single-cell from a server workstation, right? What's the PS3 version juicing up on? I thought the PS3 Cell was the same as the server one.... how is it more powerfull?
Siraris
10-18-2006, 06:50 AM
huh... interesting bit of news here: In the 'results' slide, there are three cell numbers: Single-cell, Dual-cell, and PS3-cell. Now the Single-cell and PS3-cell should be about the same, but it shows that the PS3 cell outperforms the single-cell in every benchmark.
I already figured that the Cell in the PS3 would be different than the ones in the servers, knowing that the PS3 version would have one SPE deactivated, and one dedicated to the OS (which they are probably using all eight anyways). But given that, then PS3 cell should be evenly matched with the single-cell from a server workstation, right? What's the PS3 version juicing up on? I thought the PS3 Cell was the same as the server one.... how is it more powerfull?
It's not interesting, the PS3 cell is clocked 800 mhz faster than the single cell.
Lekko
10-18-2006, 07:08 AM
It's not interesting, the PS3 cell is clocked 800 mhz faster than the single cell.
Oh, so the PS3 cell is 'juicing up' on mhz. Heh, I read too fast, thanks for clearing that up, I completely forgot about the clock speed difference.
yoshaw
10-18-2006, 08:06 AM
Offtopic, Anyone got idea, what the heck this slide means when it says 4 processors for PS3. Like maybe, a simpleton answer whether it means anything for future of cell or its present incarnation in PS3? Again, 4 processors? Is it adding the GPU in the equation too? I'm confused.
http://xs208.xs.to/xs208/06423/FutureArchitecture.jpg
^No, I'm not sure if it was in one of the papers posted by 8-bit. Here's the link.
http://www.graphicshardware.org/presentations/pharr-keynote-gh06.pdf
frosty
10-18-2006, 09:20 AM
Um... PS3 has 4 processors? vertex and fragment I imagine are the GPU, unless there is something we don't know?
Rubbernek
10-18-2006, 10:46 AM
http://www.sciam.com/article.cfm?chanID=sa006&colID=1&articleID=000637F9-3815-14C0-AFE483414B7F4945
A Great Leap in Graphics
The quality of 3-D computer graphics is poised for a quantum jump forward, thanks to speedier ways to simulate the flight of light
By W. Wayt Gibbs
For those of us who frittered our formative years away blasting blocky space invaders, video games today can widen the eyes and slacken the jaw. The primitive pixelated ape of Donkey Kong has evolved into a three-dimensional King Kong of startling detail. Some newer Xbox 360 games render their lead characters from an intricate mesh of more than 20,000 polygons, each tiny patch drawn dozens of times a second with its own subtle texture, shading and gloss.
Beyond the booming game industry, the evolution of graphics has lifted interactive software for design, engineering, architecture, medical imaging and scientific visualization to new heights of performance. Much of the credit belongs to advances in graphics processing units (GPUs), the microchips at the heart of computer video cards that transform 3-D scenes into 2-D frames at speeds faster than a trigger twitch. As the rendering capabilities of GPUs soared, so did the revenues of ATI, NVIDIA and Intel, which make the most popular models....continued at Scientific American Digital
Anyone have access to this?
"speedier ways to simulate the flight of light" sounds like there could be some raytracing/casting info in there.
liver_kick
10-18-2006, 11:14 AM
Offtopic, Anyone got idea, what the heck this slide means when it says 4 processors for PS3. Again, 4 processors? Is it adding the GPU in the equation too? I'm confused.
Yeah Frosty has it right. Vertex and Fragment (aka Pixel Shader) refer to the GPU and PPU/SPU obviously refer to Cell. More accurately they're categorizing PS3's four "types" of processors. The slides seem to be basically discussing the various hardware architectures (PC and console) and how they impact graphics processing in different ways.
frosty
10-18-2006, 11:22 AM
Well, I would have thought that too, but I've heard each pixel and vertex shader "pipeline" referred to as it's own individual core to the GPU, so one would thing RSX would in effect be 24 processors. And, it groups all the SPUs into one processor as well. Why?
liver_kick
10-18-2006, 12:23 PM
Well, I would have thought that too, but I've heard each pixel and vertex shader "pipeline" referred to as it's own individual core to the GPU, so one would thing RSX would in effect be 24 processors. And, it groups all the SPUs into one processor as well. Why?
The slide is categorizing the type of core/processor (not the number of cores themselves). So SPU counts as one type/characteristic, Vertex another etc. The focus is architecture and how it impacts graphics programming.
xbdestroya
10-18-2006, 02:35 PM
Right, Liver Kicks got it.
The slide is alluding to a possible future-push in CPU/GPU architecture beingdiscussed right now of when things may go back onto a single die. The difference though, is that many envision that these single dies will still have a multitude of specialized execution units, whereas these guys are saying maybe four 'types' is too many. In that vein, you could go to an all-SPE model and go unified pipes and you'd be down to two 'types,' and you could play around with different combinations from there.
I think the eventual Cell in PS4 may very well look more like what was envisioned in the original Broadband Engine patent, only with NVidia tech where the 'Visualizers' were before... and speaking of which, things may look very different just in general depending on what happens in the CPU business between these companies in the next five years (will nvidia be bought out?) or if the GPU makes massive GPGPU gains (perhaps the next chip will be *all* GPU?)
F089/H
10-18-2006, 03:23 PM
Oh, so the PS3 cell is 'juicing up' on mhz. Heh, I read too fast, thanks for clearing that up, I completely forgot about the clock speed difference.
Actually,anyone remember when they were first doing Cell yield tests,and Got a proccessor up to 4.6Ghz..but they ended up toning it down to 4.0Ghz then to 3.6Ghz and just settled on 3.2 due to heat and energy consumption issues.....But they said they could4.6 Ghz but seeing the magnitude they said it's like 7 x.2 what the Hey:thumbr:
yoshaw
10-18-2006, 03:45 PM
Thanks, frosty, liverkick and Xb. Your answers were very helpful.
to me the slide suggests that ps3 should have been made with a more powerful cell chip, and a less powerful gpu. As it states more cpu/spu is required and less rasterization, or gpu vertex work, vertex to 2d in other words, is required. I don't know if they are suggesting that having a shader unit in the gpu is redundant, as rsx is 10x faster at shaders than the other chip.
cpiasminc
10-18-2006, 07:06 PM
Although what's the point of "shading" the rays? You use the rays to calculate lighting on objects and then shade those objects accordingly.
They're not talking about shading rays... they're talking about the fact that a primary ray accounts for one screen-space sample, and that sample need be shaded. Since it's a raytracer, they've got pixels in the outer loop anyway, so they're suggesting running the inner loop (geometry) on the CPU, and send shading information off to the GPU.
Warhawk already apparently uses raytracing on the clouds, although we're not sure exactly what capacity or what method they are using to accomplish that. It's more than likely a form of raytracing, but not quite the kind you hear about in articles.
My best guess, FWICT, is that they do raycasting on a per-particle basis. The clouds are basically collections of non-moving particles. Every particle texture has no color variance, only alpha variance. They light each particle by casting a ray from it to the lightsources to see how many other particles in the cloud it hit (with each particle modeled as a sphere) and the amount of length the ray has inside each particle to estimate how the light has attenuated due to partial scattering against other cloud particles. The number of rays to cast per cloud per light is thus, pretty small -- a few hundred or so.
to me the slide suggests that ps3 should have been made with a more powerful cell chip, and a less powerful gpu. As it states more cpu/spu is required and less rasterization, or gpu vertex work, vertex to 2d in other words, is required. I don't know if they are suggesting that having a shader unit in the gpu is redundant, as rsx is 10x faster at shaders than the other chip.
I don't know how you arrived at that. The kinds of things they're talking about are problems for 2026, not 2006. PS3 has to be designed for the here and now, and raytracing architectures are not anywhere near ready. And while they don't talk about it much, the thing is that as you scale up on parallelism in the CPU, you're demanding more of the memory architecture since memory access patterns are horrifically incoherent -- even 128 SPEs isn't enough granularity to get the job done with today's memory architectures, and the rate of evolution of RAM, for lack of a better word, is piss-poor compared to CPUs and GPUs. That's why they talk of putting the shading pass on the GPU.
Moreover, a CPU has more work to do than rendering, so it would do well to have a lot more ports to memory and an enormous memory bus. Such memory architectures exist... in machines that cost over $8 million... and they're far from tightly and gluelessly integrated, so the latencies are on the order of dozens of microseconds, which for a 3.2 GHz processor might as well be an eternity.
If anything, for the here and now, RSX and Xenos are not beefy enough in many aspects. In fact, I probably wouldn't be satisfied unless we had both of them put together with 4x the eDRAM.
makeitlookreal
10-18-2006, 08:01 PM
CPI,
What kind of memory would you prefer to be in the current PS3 and on the other hand what kind of memory would you prefer to be in the PS4?
<cut for brevity>
I don't know how you arrived at that. The kinds of things they're talking about are problems for 2026, not 2006. PS3 has to be designed for the here and now, and raytracing architectures are not anywhere near ready. And while they don't talk about it much, the thing is that as you scale up on parallelism in the CPU, you're demanding more of the memory architecture since memory access patterns are horrifically incoherent -- even 128 SPEs isn't enough granularity to get the job done with today's memory architectures, and the rate of evolution of RAM, for lack of a better word, is piss-poor compared to CPUs and GPUs. That's why they talk of putting the shading pass on the GPU.
Moreover, a CPU has more work to do than rendering, so it would do well to have a lot more ports to memory and an enormous memory bus. Such memory architectures exist... in machines that cost over $8 million... and they're far from tightly and gluelessly integrated, so the latencies are on the order of dozens of microseconds, which for a 3.2 GHz processor might as well be an eternity.
If anything, for the here and now, RSX and Xenos are not beefy enough in many aspects. In fact, I probably wouldn't be satisfied unless we had both of them put together with 4x the eDRAM.
How did I arrive at that? I will explain. The slide specifically states there are 'too many' cpus in ps3, so I deferred from this the chips in the system were not balanced. If you are telling me the slide stating 'are probably too many' means that in the year 2026 spus lack in 'granularity' and memory will force machines in 2026 to cost over $8 million, I cannot argue with you. I did read the entire pdf, but ofcourse when I referred speficially to the one slide, it was because it specified ps3, so I made the the assumtion he was talking about ps3. I hope that helps.
cpiasminc
10-18-2006, 10:11 PM
How did I arrive at that? I will explain. The slide specifically states there are 'too many' cpus in ps3, so I deferred from this the chips in the system were not balanced.
Again, I think it was said a few times, but they were talking about the heterogeneity vs. homogeneity argument, saying that 4 different types of processing units is probably too many for what they need, and again, they're confining the analysis to their own range of problems. Also, for raytracing, they have little need of per-vertex operations or geometry operations since the raytracer demands full scene data access, which is something that no rasterizer GPU will ever do, that makes the vertex shaders on the GPU utterly useless for raytracing.
You'll notice that the ideal they have in mind is more raw number of units, but the only GPU unit they had a use for was the pixel shader units. Which is again to say, that they want more granularity because they're essentially processing pixels on both ends, and there's a lot of pixels and a lot of memory accesses associated with each one, so that means more contexts are needed to cover up the latencies. Something that GPUs have, but CPUs don't.
If it were to come down to a question of balance for normal rendering pipelines, then yeah, there's an imbalance. The vertex processing power on the GPU is far from overwhelming. In fact, next-gen in general, the GPU side hasn't improved that much in its data-based limits in relation to how much CPU power has. Fillrate, for instance, is less than double that of the predecessors, but you've got a hell of a lot more pixels to fill. The vertex units just don't move data through the pipe fast enough to keep up with the CPU in any of the next-gen machines.
If you say the the side saying the chips in PS3 'are probably too many' means that in the year 2026 spus lack in 'granularity' and memory will force machines in 2026 to cost over $8 million, I cannot argue with you.
Ummm... well, when I was talking about granularity, I was talking about how small-scale the thread contexts can afford to be. With only 7 SPUs that have no SMT of their own and several million primary ray samples to take, that doesn't amount to a lot of granularity. Hence why, for rasterization purposes, Cell is just fine for processing vertices because you can encapsulate all vertex processing in small chunks. You can't really do that with pixels. There's a lot of memory accesses, and you need more threads (i.e. finer-grain parallelism) to cover up those access latencies. That's one of the limiting factors with raytracing scaling up all the way to things like Monte Carlo raytracing in realtime because you end up memory access limited in the end even if you've got 100 TFLOPS of computing power to blow.
And when I was talking about $8,000,000 machines, I was talking about supercomputing layouts that have wide and heavily multiported memory architectures, which is conducive to having lots of independent thread contexts, since it allows simultaneity of memory access, though it does demand some flagging devices to "dirty"-mark blocks which aren't used in read-only operations (C++ and its const_cast be damned).
What kind of memory would you prefer to be in the current PS3 and on the other hand what kind of memory would you prefer to be in the PS4?
As far as *type* of memory, I have nothing against XDR. I just think 2 coherent channels isn't enough. I would prefer at least 4 independent channels so that SPEs can access pools on demand a little more readily and the access queues get shorter, and the size of those local stores becomes less of a factor. I would also like to see SPE Local store blocks be shared in the same virtual address space. Again, though, this is more of a problem when you get into trivially parallel problems like raytracing where the job queue approach is basically ideal for the problem, which isn't always the case. For more peer-thread or subtask thread oriented problems, I think the memory architecture as is is okay... Though, bandwidth and capacity are always welcomed ad nauseam.
Lekko
10-18-2006, 10:40 PM
Question Cpi: wouldn't procedural texturing help out incredibly well in this situation? With procedural textures, your memory footprint for textures would shrink enough that it could fit far easier in the overall memory access bandwidth, as well as increase the 'computation-to-memory access ratio'. One of their main concerns is that with homogeneous raytracing engine on the cell is the need to access the entire scene data, which would consist of (I assume) geometry data, texture data, and lighting data. Is there anything I am really missing?
Geometry you can compress a bit with simpler geometric models which smooth out with subsurface division, textures you can compress with procedural texturing, and light sources, well.. I don't see them as being that large in memory.
Although I'm pretty sure that if this engine did run well on Cell, you really wouldn't have a game since you don't have any AI or physics work being done.... well, here's to an open-source raytracing PS3 screensaver!
8_Bit
10-18-2006, 10:46 PM
I have to say though, that they don't really provide us with any information we didn't already have with regard to Cell and raytracing. You give a good synopsis of what they cover in your post however.
Uh, thanks...sorta. With all due respect, I think your basically calling this all a rehash is a bit unfair, however.
Most importantly, perhaps, I wasn't aware of any work on ray tracing on the Cell outside of Barry Minor's team's work at IBM, and can't find anything further suggesting such on the "series of tubes" (that just makes me laugh, sorry) known as the interwebs either. So I thought this paper, even if only proof of further academic study on the subject, and as proof that the Cell is still a real source of interest for the ray tracing community, to be presented by someone besides Minor or IBM at a conference devoted to the subject, was a little more significant than you do.
Oh, well, we'll have to agree to disagree on that small point.
Thanks anyway for the props on my synopsis.
:happy:
8_Bit
10-18-2006, 11:03 PM
Question Cpi: wouldn't procedural texturing help out incredibly well in this situation? With procedural textures, your memory footprint for textures would shrink enough that it could fit far easier in the overall memory access bandwidth, as well as increase the 'computation-to-memory access ratio'. One of their main concerns is that with homogeneous raytracing engine on the cell is the need to access the entire scene data, which would consist of (I assume) geometry data, texture data, and lighting data. Is there anything I am really missing?
Geometry you can compress a bit with simpler geometric models which smooth out with subsurface division, textures you can compress with procedural texturing, and light sources, well.. I don't see them as being that large in memory.
Although I'm pretty sure that if this engine did run well on Cell, you really wouldn't have a game since you don't have any AI or physics work being done.... well, here's to an open-source raytracing PS3 screensaver!
I'll try my best to answer this one.
With procedural textures, your memory required for input (the memory that would be required for the fetch and store of the texture data) is reduced, but the overall memory required to do the procedural generation of the texture is increased dramatically so this would do the opposite of what you want it to do, if your goal is to save RAM.
Procedural generation is great if you are short on disc space, or want to reduce the labor requirements of programming, or want to reduce load time from a disc, but it requires more RAM than you'd likely desire, certainly for this generation of consoles, and likely in the future as well, given the slower pace of memory chip development.
cpiasminc
10-18-2006, 11:58 PM
With procedural textures, your memory required for input (the memory that would be required for the fetch and store of the texture data) is reduced, but the overall memory required to do the procedural generation of the texture is increased dramatically so this would do the opposite of what you want it to do, if your goal is to save RAM.
More or less. Though if it's running on Cell, you have the ability to run more complex algorithms than you could on any GPU, and you're in one memory space, and if it's cheap enough that you can afford to calculate it explicitly per pixel, then it doesn't take up much memory. The same technically applies to procedurally texturing on the GPU, but since they run at lower clock speeds, each instruction is more costly. It all boils down to the fact that you have to render a lot of pixels, so you can assume that however many cycles the procedure takes up, you're going to have to multiply by 10 million.
If you don't have the luxury of calculating at each pixel (or you have to get the results from one memory pool to another), well... so much for saving memory. And yeah, compressing it on the spot is something of a PITN. But even if you do that, the best you can do is taking up the same amount of RAM as any other texture.
In addition, I'd add that in all cases where procedural generation is mentioned in literature, its expressive power is overblown by several orders of magnitude. It's really not generic enough to cover more than a minuscule percentage of all texturing problems, and probably never will be.
xbdestroya
10-19-2006, 01:06 AM
So I thought this paper...was a little more significant than you do.
Yes. :smoke:
Raytracing just comes up a lot here is all, I'm not taking it out on you specifically. I agree that this is one of the few papers to emerge on the matter outside of Barry's team, but at the same time they're not really telling us anything Barry himself hasn't.
Anyway... again I definitely think for those who want to raytrace and have the money to drop, a couple of Cell blades is the way to go. I honestly think Mercury is selling to the industrial design space... them or IBM... so I have no doubt that soon (if not already) Cell will in fact be utilized in this manner for 'practical' purposes.
*************************************
On the procedural stuff, Cell is going to be the most practically suited architecture to it presently out there in 'gaming,' but the problems with procedural generation lay elsewhere - it's just not a 'quality' solution at this time. And in terms of Cell, it's not for lack of power. It's just something that needs refinement from the art pipe all the way down.
More on that soon actually, as I'm in the midsts of another interview that touches on exactly this topic. (albeit briefly)
Again, I think it was said a few times, but they were talking about the heterogeneity vs. homogeneity argument, saying that 4 different types of processing units is probably too many for what they need, and again, they're confining the analysis to their own range of problems. Also, for raytracing, they have little need of per-vertex operations or geometry operations since the raytracer demands full scene data access, which is something that no rasterizer GPU will ever do, that makes the vertex shaders on the GPU utterly useless for raytracing.
You'll notice that the ideal they have in mind is more raw number of units, but the only GPU unit they had a use for was the pixel shader units. Which is again to say, that they want more granularity because they're essentially processing pixels on both ends, and there's a lot of pixels and a lot of memory accesses associated with each one, so that means more contexts are needed to cover up the latencies. Something that GPUs have, but CPUs don't.
If it were to come down to a question of balance for normal rendering pipelines, then yeah, there's an imbalance. The vertex processing power on the GPU is far from overwhelming. In fact, next-gen in general, the GPU side hasn't improved that much in its data-based limits in relation to how much CPU power has. Fillrate, for instance, is less than double that of the predecessors, but you've got a hell of a lot more pixels to fill. The vertex units just don't move data through the pipe fast enough to keep up with the CPU in any of the next-gen machines.
Ummm... well, when I was talking about granularity, I was talking about how small-scale the thread contexts can afford to be. With only 7 SPUs that have no SMT of their own and several million primary ray samples to take, that doesn't amount to a lot of granularity. Hence why, for rasterization purposes, Cell is just fine for processing vertices because you can encapsulate all vertex processing in small chunks. You can't really do that with pixels. There's a lot of memory accesses, and you need more threads (i.e. finer-grain parallelism) to cover up those access latencies. That's one of the limiting factors with raytracing scaling up all the way to things like Monte Carlo raytracing in realtime because you end up memory access limited in the end even if you've got 100 TFLOPS of computing power to blow.
And when I was talking about $8,000,000 machines, I was talking about supercomputing layouts that have wide and heavily multiported memory architectures, which is conducive to having lots of independent thread contexts, since it allows simultaneity of memory access, though it does demand some flagging devices to "dirty"-mark blocks which aren't used in read-only operations (C++ and its const_cast be damned).
As far as *type* of memory, I have nothing against XDR. I just think 2 coherent channels isn't enough. I would prefer at least 4 independent channels so that SPEs can access pools on demand a little more readily and the access queues get shorter, and the size of those local stores becomes less of a factor. I would also like to see SPE Local store blocks be shared in the same virtual address space. Again, though, this is more of a problem when you get into trivially parallel problems like raytracing where the job queue approach is basically ideal for the problem, which isn't always the case. For more peer-thread or subtask thread oriented problems, I think the memory architecture as is is okay... Though, bandwidth and capacity are always welcomed ad nauseam.
I am sure we would all like to see much more ray-tracing using current level technology. Although, I can assure you you did not need to explain to me why ray-tracing does not exist on any affordable peice of equipement, I can say I am glad you were able to give it some thought. Ofcourse, knowing that fully ray-traced games are not possible on any conveniently obtainable equipement, it is good to know the shaders are there, as we all can conclude ray-tracing will not be convenient in the home for many years.
vBulletin® v3.7.1, Copyright ©2000-2010, Jelsoft Enterprises Ltd.