I did some basic benchmarking on the ps3 against a P4 2,8GHZ both on the PPU and the SPU, and well, my P4 beats the ps3 in every test. Only when using 7 threads or more on the SPU, the ps3 beats my P4, but only with little difference. My Pentium D 3,4G would kill the ps3 again.
What are good things to test on a PS3? I tested the vector types a bit but multiplication of vec_float4 consume four times more time than muls with float, so why shall I use the ps3 vector types?
Keep in mind that having a videogame console that has even that much raw computing power is quite an achievement on its own. Typically more effort is put into the graphics hardware and the cpu just limps along.
Perhaps you're seeing the lack of maturity in the PS3 toolchain, or maybe code could be better optimized to work around some of the quirks like the lack of branch prediction. Still, that memcpy speed is interesting. Keep at it - I'm interested in why it is performing like that.
Neels wrote:I did some basic benchmarking on the ps3 against a P4 2,8GHZ both on the PPU and the SPU, and well, my P4 beats the ps3 in every test. Only when using 7 threads or more on the SPU, the ps3 beats my P4, but only with little difference. My Pentium D 3,4G would kill the ps3 again.
Could you post the benchmarking code and some info about the compiler, P4 OS and compiler settings you have used? Did you specified "-O2" on PS3, used a 64 bit version of GCC and a version > 4?
if you have a speedup when using more threads than SPUs available, then there is an other bottleneck than just raw computation power. make some simpler tests like some vector multiplications.
The whole functions is measured, but when doing only one memcpy the time is 6ms on my P4 so I don't care.
The P4 is the following machine:
Intel Pentium 4 521 2,8GHZ 1MB Cache
RAM noname DDR1 2GB
Mainboard Asrock 775i65GV
OS Windows XP Professional SP2
Benchmark Code compiled with Intel C++ Compiler 8.0
The German c't magazine has recently tested the PS3's processor capabilities.
The PS3 reached a SPECint_2000 base score of 400, which is comparable to an Athlon at 1,33 GHz. Also, they wrote that both Cell and the Xbox360 processor aren't any good at single-threaded applications.
The Cell only excels at processing when using the SPUs, which are optimised for calculating single-precision floating-point numbers. With 6 active SPUs, the PS3 is twice as fast as a Core 2 Duo 6400. (Then again, the Core 2 easily beats the PS3 when it comes to double-precision.)
This might be an explanation for your test results, Neels.
@Neels:
you might find this thread on beyond3d interesting, especially the posts from "inefficient". The people in that thread are also trying to benchmark the PS3 / PPU / SPEs, but they are already on page 9, so you might find some ideas how to optimize, which they already found out ;)
btw. I don't understand how you're using memcpy on SPE, what memory are you copying to where? Is it LS (LocalStore) -> LS, XDR -> XDR, etc ??
What confuses me the most about it, is that SPEs have 256kB memory but you're copying 1MB at once.