FPU intruction latency!

Discuss the development of new homebrew software, tools and libraries.

Moderators: cheriff, TyRaNiD

Post Reply
brunocardoso
Posts: 10
Joined: Sun Jun 01, 2008 12:20 pm
Contact:

FPU intruction latency!

Post by brunocardoso »

Hi,

I'm curious about FPU instruction latency! It happens that on ps2dev wiki
i found : "Sqrt (28 cycles), div(28 cycles), most others 1 cycle" while in
R4000 user manual - http://tinyurl.com/5qjlxt page 174 - we have
different values. So, if the R4000 FPU used in PSP has different latency
info, where can I obtain those real values?

Thanks,
crazyc
Posts: 408
Joined: Fri Jun 17, 2005 10:13 am

Re: FPU intruction latency!

Post by crazyc »

The CPU in the psp is not an R4000, and the stuff in the wiki is for the vfpu not the fpu.
brunocardoso
Posts: 10
Joined: Sun Jun 01, 2008 12:20 pm
Contact:

Post by brunocardoso »

that is very strange, since this info is labeled under FPU, and there is also another topic about VPFU... i really just need some real info
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

ok,

there are TWO coprocessors on PSP :

- a standard MIPS FPU which only deals with 32-bit float

- a customized vectorial FPU (whence its name : VFPU)

both are different :
- different set of instructions
- different set of registers

for standard FPU : cycles are probably the same as R4400's with a usual pitch of 1 cycle and a latency of several cycles.

for VFPU : cycles are totally different and specific to PSP because this VFPU is specific to PSP. But instructions has also a usual pitch of 1 cycles and a latency of several cycles.

I don't know for FPU but for VFPU, it is pretty hard to have a VFPU instruction running in only 1 cycle because there are strong dependencies between an instruction and the next one (RAR, RAW hazards, etc.) which tend to make them running at the latency cycle.
brunocardoso
Posts: 10
Joined: Sun Jun 01, 2008 12:20 pm
Contact:

Post by brunocardoso »

ok. but what about the wiki, the info about FPU latency there is wrong, right?
"for standard FPU : cycles are probably the same as R4400's with a usual pitch of 1 cycle and a latency of several cycles."
what do you mean by pitch?
J.F.
Posts: 2906
Joined: Sun Feb 22, 2004 11:41 am

Post by J.F. »

It means that after the latency and with proper handling of dependencies, you get a result on every clock cycle. Sorry to sound crass, but if you don't even understand instruction timing and latencies, you don't need to be worrying about them. It's like trying to optimize a major C application after reading just the first chapter of "C For Dummies".
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

brunocardoso wrote:ok. but what about the wiki, the info about FPU latency there is wrong, right?
"for standard FPU : cycles are probably the same as R4400's with a usual pitch of 1 cycle and a latency of several cycles."
what do you mean by pitch?
Pitch is the minimal cycles after which the next instruction may start to execute. It is usually only 1 cycle when there is no dependency with the previous instruction. But it doesn't mean this instruction is executed in only 1 cycle. The exact time to execute is given by the latency.

take this example :

Code: Select all

add.s $f0, $f1, $f2
add.s $f3, $f4, $f5
add.s $f6, $f7, $f8
add.s $f9, $f10, $f11
as there is no dependency, the second add.s will "start" 1 cycle just after the first add.s, which means add.s has a pitch of 1 cycle (only sqr.s and div.s has a pitch of 28 cycles, the others have a pitch of 1 cycle).

but if you take this example :

Code: Select all

add.s $f0, $f1, $f2
add.s $f3, $f4, $f0
...
you can see the second add.s need the result of the first add.s so it need to wait for the first add.s to give the result. Now add.s has a latency of 4 cycles, which means the second add.s wouldn't "start" 1 cycle after the first add.s but 4 cycles instead.

I think the latencies given by the R4000 manual are the same for PSP.
Last edited by hlide on Mon Jun 09, 2008 7:22 am, edited 1 time in total.
brunocardoso
Posts: 10
Joined: Sun Jun 01, 2008 12:20 pm
Contact:

Post by brunocardoso »

nice explanation! The "pitch" name makes pipeline discussions easier. ;)
The allegrex FPU support double precision instructions using aliased registers? By aliased i mean a double precision instruction that uses 2 32-bit registers for each operand and for the result (i know this is supported by the default R4000 processor).

Thanks,
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

brunocardoso wrote: The allegrex FPU support double precision instructions using aliased registers? By aliased i mean a double precision instruction that uses 2 32-bit registers for each operand and for the result (i know this is supported by the default R4000 processor).

Thanks,
Absolutely not. Allegrex has only 32-bit wide integer and float operations. Any attempt to use a 64-bit wide integer or float instruction raises a reserved instruction exception (unimplemented instruction). It would be great if FPU can do 64-bit float operation this way but it cannot.
brunocardoso
Posts: 10
Joined: Sun Jun 01, 2008 12:20 pm
Contact:

Post by brunocardoso »

On the Mips R4000 User Manual we have:

Is 32-bit binary fixed-point instructions supported on Allegrex FPU?

"In the instruction formats shown in Tables 6-9 through 6-12, the fmt
appended to the instruction opcode specifies the data format: S specifies
single-precision binary floating-point, D specifies double-precision binary
floating-point, W specifies 32-bit binary fixed-point, and L specifies 64-bit
(long) binary fixed-point."

As said before, D and L above isnt supported by allegrex, but what about W ?
Can I use a instruction like : cvt.w.s $f4, $f5 ?
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

all the 32-bit formats, which include .s and .w, are supported in Allegrex FPU. .d and .l are not.

Is there any reason to know about all those details ?
brunocardoso
Posts: 10
Joined: Sun Jun 01, 2008 12:20 pm
Contact:

Post by brunocardoso »

Hi hlide,

I'm working on implementing support for the Allegrex Core into LLVM Mips
backend. LLVM, among other things, is a compiler infrastructure
(www.llvm.org) which provides aggressively optimizations. I have a lot of
cool long term plans, like supporting intrinsics for the VFPU, etc...
But now I'm currently improving the Mips backend, adding support for FPU
and more stuff.
It is likely that I will be poking around on this forum a lot ;)
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

oh yes, I remember your name as I follow LLVM dev news.

I was very upset about the ARM llvm-gcc version which doesn't seem to optimize very well (unless it is Apple DevKit which really sucks, plus the fact you need an iMac to dev). For my iTouch, I used llvm-gcc 4.0 to compile yapse4all (PSX emulator) but found the generated code so weird and less optimized as i would expect for it. But well, it may be an issue only with ARM.

Yes, I can help you in this regard. I'm still interested with llvm-gcc, especially for the VFPU integration which really lacks in the actual gcc.

Do you think I could access the MIPS backend so I can see how it works ?

BTW, you could also PM me if necessary.
brunocardoso
Posts: 10
Joined: Sun Jun 01, 2008 12:20 pm
Contact:

Post by brunocardoso »

It was a long time ago since you tried to use the ARM backend?
Btw, llvm-gcc 4.0 is abandoned now, only 4.2 is maintained.
Yes, it is being developed in llvm trunk. To checkout :
http://tinyurl.com/6nf9la
The backend src is at : trunk/lib/Target/Mips
LLVM targets are developed as libraries inside llvm and llvm-gcc
does not have anything to do with targets, it only recognizes
the triple-target and call target libraries during cross-compiling
(to compile CRT and libgcc files) time.
Dont expect much for now since it's in highly experimental
stage, and there is a lot of stuff not commited (but I often
merge stuff to the trunk, it is quite up-to-date). Dont use
llvm 2.3 to test mips because it is broken there, use the
trunk instead. If you have any suggestions or comments for allegrex,
let me know! (I'm also on the #pspdev at IRC.)

Thanks,
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

this post http://forums.ps2dev.org/viewtopic.php?t=10471 gives you some details about Allegrex instructions. I made it as a new topic for search purpose. It may help you to sort out what you can use for your backend. It is not exhaustive and VFPU is still missing (there are hundreds instructions !).
brunocardoso
Posts: 10
Joined: Sun Jun 01, 2008 12:20 pm
Contact:

Post by brunocardoso »

Thank you again Hlide, this will help a lot!
Post Reply