View Full Version : NEC claims to develop new technique for software compilation and auto-threading
xbdestroya
12-21-2005, 02:56 PM
I'm not sure how much discussion this can spur without more details being known, but thought that people would want a heads-up on something that may *eventually* make programming for these processors a little easier.
...Parallelization with conventional multi-processor technology requires the manual modification of application source programs. Manual labor increases the development and verification cost for software development, which is in turn made more complex by the growing size and complexity of the software itself. Therefore, multi-processor technology, which can automatically parallelize application programs without manual modification, has been long sought after in this field.
The new technology of NEC is a compiler that can be implemented on a field programmable logic array (FPGA) and that handles parallelization better than software compilers designed for the same type of work. NEC claims that an application that was tailored for execution on a 4 processor machine manually for 4 months runs only 95% faster compared to the same application without optimizations on a computer with 1 processor, whereas the same application parallelized automatically in less than 3 minutes gives 183% performance gain over single-processor machine.
The distinctive feature of the new technology is the ability of the automatic parallelizing compiler that utilizes profile (execution history) information to aggressively exploit parallelization patterns, which are effective for accelerating the speed of application programs. In addition, although the parallelization is speculative, the speculation is almost always completely accurate, according to NEC. The speculation hardware works as a safety net by handling any rare misses, guaranteeing the correctness of the execution. This ensures that the compiler is not conservative in decisions concerned with these cases, resulting in an increase in the amount of parallelism exploited. The parallelism exploitation is supported by the speculative execution hardware that realizes efficient handling of detection of incorrect execution orders caused by the parallel execution of the program parts, cancellation of the incorrectly executed part, and re-execution of it...
Article (http://www.xbitlabs.com/news/other/display/20051220214003.html)
cpiasminc
12-21-2005, 05:34 PM
Doesn't say much about the languages supported and what sort of augmentation is necessary to get it working nicely. There's no such thing as a magic trick that grabs parallelism out nowhere and just makes things run like they were made for MP.
For instance, if it's a Scheme compiler or something, well... extracting parallelism out of that isn't really anything new. A lot of Scheme compilers will do it without you knowing, and there's no way you can make a game in Scheme (unless it's text-based). Cilk-like constructs added to C++? Well, that's not so bad, but you do have to augment the codebase with little markers for viable spots to spawn off as another job or thread.
Nerve-Damage
12-21-2005, 06:09 PM
Well I (we) got our answer from the one over at B3D.......
xbdestroya
12-21-2005, 06:19 PM
Well I (we) got our answer from the one over at B3D.......
Well there wasn't really a question being asked to have answered. :smoke:
But One did find some more info out of Japan and NEC that I was hoping for.
It turns out to be more like Intels Mitosis - in that it is hardware specific - than I had originally hoped, but for the rest of the crew here here's the press release out of NEC:
NEC Develops Multicore Processor Technology Enabling Automatic Parallelization of Application Programs
- Dramatically reduces software development time & cost of multicore processors -
***** For immediate use December 19, 2005
Tokyo, December 19, 2005 --- NEC Corporation today announced that it has succeeded in the development of multicore processor technology capable of performing automatic parallelization of application programs, without modifying them.
Key features of the multicore processor technology
(1) An automatic parallelizing compiler, capable of effective extraction of parallelism from an application program utilizing its profile information (1*).
(2) An additional instruction-set, designed to minimize parallelization overheads.
(3) Processor architecture, which efficiently handles speculative execution (2*).
(4) Implementation realized by a simple extension to conventional processors.
The distinctive feature of this new technology is the ability of the automatic parallelizing compiler that utilizes profile information to aggressively exploit parallelization patterns, which are effective for accelerating the speed of application programs. In addition, although the parallelization is speculative, the speculation is almost always completely accurate. The speculation hardware works as a safety net by handling any rare misses, guaranteeing the correctness of the execution. This ensures that the compiler is not conservative in decisions concerned with these cases, resulting in an increase in the amount of parallelism exploited. The parallelism exploitation is supported by the speculative execution hardware that realizes efficient handling of detection of incorrect execution orders caused by the parallel execution of the program parts, cancellation of the incorrectly executed part, and re-execution of it. Moreover, the parallelization process can be performed in a practical period of time.
In an increasingly networked society, the need for enhanced functionality and performance of terminals such as mobile phones and information appliances, while maintaining a low level of power consumption, is growing. Recently, many system-on-chips (SoCs) employing multicore and multiprocessor technology have been introduced practically to meet this expanding demand. This technology deploys multiple processor cores on a chip and effectively utilizes these multiple resources by parallelizing application programs. However, parallelization with conventional multiprocessor technology requires the manual modification of application source programs. Manual labor increases the development and verification cost for software development, which is in turn made more complex by the growing size and complexity of the software itself. Therefore, multiprocessor technology, which can automatically parallelize application programs without manual modification, has been long sought after in this field. However, nobody has succeeded in bringing automatic parallelization technology to a practical stage to date.
NEC believes that its automatic parallelization technology is the first to be brought to a stage of practical use. This is supported by the fact that NEC has succeeded in operating this technology on a field-programmable gate array (FPGA). Moreover, its implementation has confirmed that only a marginal hardware extension is required and that application program speed is actually accelerated.
The newly developed technology realizes automatic parallelization of application programs and a dramatic reduction in time and cost of parallelization. In addition, we have observed cases where automatic parallelization accelerates the speed of programs at a greater rate than that of manual parallelization. For example, one test showed that manual parallelization of an application program took four months of time with one person carrying out the task, however, automatic parallelization cut this time to just three minutes with no manual labor involved at all. In addition, the application program that has been parallelized manually runs 1.95 times faster with four processors than the original application program running with one processor. However, the application program that has been parallelized automatically runs 2.83 times faster with four processors, which indicates that automatic parallelization achieves greater acceleration than manual parallelization. This shows that automatic parallelization facilitates development of software with high functionality and performance through multicore and multiprocessor technology, at lower cost over a shorter time frame. This will lead to the provision of terminals such as cellular phones and information appliances with enhanced functionality and performance.
NEC will continue to advance the research and development of its multicore processor technology toward early release of products incorporating it.
<About NEC Corporation >
NEC Corporation (NASDAQ: NIPNY) (FTSE: 6701q.l) is one of the world's leading providers of Internet, broadband network and enterprise business solutions dedicated to meeting the specialized needs of its diverse and global base of customers. Ranked as one of the world's top patent-producing companies, NEC delivers tailored solutions in the key fields of computer, networking and electron devices, by integrating its technical strengths in IT and Networks, and by providing advanced semiconductor solutions through NEC Electronics Corporation. The NEC Group employs more than 140,000 people worldwide and had net sales of 4,855 billion yen (approx. $45.4 billion) in the fiscal year ended March 2005. For additional information, please visit the NEC home page at: http://www.nec.com
* Newsroom: http://www.nec.co.jp/press/en/
***
overclocked
12-21-2005, 07:32 PM
These things are interesting, i always liked the idea of something like it.
But however talking reality "now" and Cell coding with compilers it will not work. Would be great if you could get 50% of hand optimized code in a compiler but its like always up to the low-level coders if you want the most.
Saibo
12-21-2005, 10:13 PM
These things are interesting, i always liked the idea of something like it.
But however talking reality "now" and Cell coding with compilers it will not work. Would be great if you could get 50% of hand optimized code in a compiler but its like always up to the low-level coders if you want the most.
Its abit disappointing, yesterday i was reading about various Cell programming model by IBM. wont it be easier to write in a parrarel programming language than use something like C/C++.. I was reading over Cilk and NESL but since im not a even a novice programmer yet, i wont know which language is better for that short of thing(Cell architecture).
What i want to do is buy 5-6 PS3 and create a program/demo to run on them in parrarel.
http://www.cs.cmu.edu/~scandal/nesl.html
cpiasminc
12-21-2005, 10:28 PM
Between NESL and Cilk, I'm kind of preferential to Cilk, but I'm also somewhat biased in that I'm largely raised on imperative languages, and so I'm much more accustomed to them -- where NESL is somewhat derived from ML, Cilk is derived from C. In turn, that also means that bindings for a lot of standard functionality like UI and rendering can be tied into Cilk quite nicely. What Cilk lacks that it really needs is a C++ counterpart.
However, I like the level of granularity that NESL can achieve that Cilk really can't (not without a lot of work). NESL can thread down to single data-element level, though it abstracts a lot of that behind hidden doors. Cilk is a lot more explicit in that you have to mark a function as a sort of "job" and then say you want to spawn a call to it as a thread.
All the same, neither of them are that well-suited to peer thread SMP type of architectures like 360. They're more suited to "job" level granularity which is more the domain of CELL's SPEs.
Saibo
12-22-2005, 03:22 AM
Between NESL and Cilk, I'm kind of preferential to Cilk, but I'm also somewhat biased in that I'm largely raised on imperative languages, and so I'm much more accustomed to them -- where NESL is somewhat derived from ML, Cilk is derived from C. In turn, that also means that bindings for a lot of standard functionality like UI and rendering can be tied into Cilk quite nicely. What Cilk lacks that it really needs is a C++ counterpart.
However, I like the level of granularity that NESL can achieve that Cilk really can't (not without a lot of work). NESL can thread down to single data-element level, though it abstracts a lot of that behind hidden doors. Cilk is a lot more explicit in that you have to mark a function as a sort of "job" and then say you want to spawn a call to it as a thread.
All the same, neither of them are that well-suited to peer thread SMP type of architectures like 360. They're more suited to "job" level granularity which is more the domain of CELL's SPEs.
I guess Cilk would be the better choice, since im learning C right now. Crazy programming question.
Would it be possible/feasible to create a game engine, that worked in parrarel with multiple PS3, say 5?
Something like:
PS3(1) = process sound dynamic
PS3(2) = rendering (GI )
PS3(3) = Physics dynamic
PS3(4) = Shader
PS3(5) = AI
I would assume, if one could create a engine that ran parrarel on a single PS3, it wouldn't be much tougher to speard that out on mutliple PS3?
overclocked
12-22-2005, 10:21 AM
Its abit disappointing, yesterday i was reading about various Cell programming model by IBM. wont it be easier to write in a parrarel programming language than use something like C/C++.. I was reading over Cilk and NESL but since im not a even a novice programmer yet, i wont know which language is better for that short of thing(Cell architecture).
What i want to do is buy 5-6 PS3 and create a program/demo to run on them in parrarel.
http://www.cs.cmu.edu/~scandal/nesl.html
Im not a good coder at all didnt like it, i mostly do pre-art and sometimes modelling so im not good at answering language questions. :shrug:
But i talked with an good friend of mine and he was quite thrilled over Cell and multicores in general. Its different as people are of course but he just loves to work in ASM and have TOTAL control over everything.
Btw i promise to FedEx you a bottle of swedish vodka if you have 5-6 PS3s on launch day! Then if you build a tech demo i FedEx ten! :rockon:
:cheers:
cpiasminc
12-23-2005, 12:14 AM
Would it be possible/feasible to create a game engine, that worked in parrarel with multiple PS3, say 5?
Something like:
PS3(1) = process sound dynamic
PS3(2) = rendering (GI )
PS3(3) = Physics dynamic
PS3(4) = Shader
PS3(5) = AI
I would assume, if one could create a engine that ran parrarel on a single PS3, it wouldn't be much tougher to speard that out on mutliple PS3?
As long as there is enough stuff to fill up more resources than a single CELL can provide, and programmers don't have to provide the glue logic. The glue logic for a cluster is no simple task at all, but if the hardware provides it (e.g most supercomputing clusters), or the OS/API provides it, then it's not a big deal.
The only difference is that it wouldn't be divided up at as broad a level as you're suggesting. Every PS3 would be taking part in everything and just be sent individual sub tasks of relatively small blocks of code, but code that gets repeated a lot. The real difficult part of it is that with multiple PS3s comes multiple PPEs, which means some capability for peer thread processing, beyond just the "job queue" approach you'd take with the SPEs. That's the main difference between parallel programming on CELL vs. XeCPU.
However, it will never happen for gaming. The speed at which you need response is too high. While you can get away with a little bit, it won't scale to 5 PS3s and still maintaining 60 fps -- the network hardware would be too high latency to keep up.
Crossbar
12-23-2005, 08:48 AM
I'm not sure how much discussion this can spur without more details being known, but thought that people would want a heads-up on something that may *eventually* make programming for these processors a little easier.
Article (http://www.xbitlabs.com/news/other/display/20051220214003.html)
A special compiler for a special cpu-architecture that for one specific application gives better result than some manual optimisation. There are a lot of unknown variables in there. What kind of language does the compiler support, what does the cpu look like, what was the application and how good was the guy doing the optimisations?
It's great though that it is a commercial company who markets this cpu + compiler, this kind of tools are often academic mumbo-jumbo that are to esoteric for any practiacal use. I've seen a few pass by.
It would be really interesting to see how it really work, because this is really a complex task, dividing a single thread into multiple threads and making sure there are no cross dependencies between the threads that will mess up the result. Doing that for branch heavy game logic is more or less undoable. My guess is that this CPU has a small overhead for generating lightweight threads and the compiler mainly scans the code for loops doing massive calculations and forks the loop into lightweight threads and divides the work on the threads and then synchronise the results and continue as a single thread afterwards.
I don't think we will see any similar compiler for the coming console generation because recompiling single threaded code into multiple threads can never beat writing the code for multiple threads right from the start. I also think that the overhead for creating lightweight threads on the PPE is quite big not to mention the SPEs and that limits the benfits from parallelising small pieces of code.
vBulletin® v3.6.7, Copyright ©2000-2008, Jelsoft Enterprises Ltd.