FPU intruction latency!
-
- Posts: 10
- Joined: Sun Jun 01, 2008 12:20 pm
- Contact:
FPU intruction latency!
Hi,
I'm curious about FPU instruction latency! It happens that on ps2dev wiki
i found : "Sqrt (28 cycles), div(28 cycles), most others 1 cycle" while in
R4000 user manual - http://tinyurl.com/5qjlxt page 174 - we have
different values. So, if the R4000 FPU used in PSP has different latency
info, where can I obtain those real values?
Thanks,
I'm curious about FPU instruction latency! It happens that on ps2dev wiki
i found : "Sqrt (28 cycles), div(28 cycles), most others 1 cycle" while in
R4000 user manual - http://tinyurl.com/5qjlxt page 174 - we have
different values. So, if the R4000 FPU used in PSP has different latency
info, where can I obtain those real values?
Thanks,
Re: FPU intruction latency!
The CPU in the psp is not an R4000, and the stuff in the wiki is for the vfpu not the fpu.
-
- Posts: 10
- Joined: Sun Jun 01, 2008 12:20 pm
- Contact:
ok,
there are TWO coprocessors on PSP :
- a standard MIPS FPU which only deals with 32-bit float
- a customized vectorial FPU (whence its name : VFPU)
both are different :
- different set of instructions
- different set of registers
for standard FPU : cycles are probably the same as R4400's with a usual pitch of 1 cycle and a latency of several cycles.
for VFPU : cycles are totally different and specific to PSP because this VFPU is specific to PSP. But instructions has also a usual pitch of 1 cycles and a latency of several cycles.
I don't know for FPU but for VFPU, it is pretty hard to have a VFPU instruction running in only 1 cycle because there are strong dependencies between an instruction and the next one (RAR, RAW hazards, etc.) which tend to make them running at the latency cycle.
there are TWO coprocessors on PSP :
- a standard MIPS FPU which only deals with 32-bit float
- a customized vectorial FPU (whence its name : VFPU)
both are different :
- different set of instructions
- different set of registers
for standard FPU : cycles are probably the same as R4400's with a usual pitch of 1 cycle and a latency of several cycles.
for VFPU : cycles are totally different and specific to PSP because this VFPU is specific to PSP. But instructions has also a usual pitch of 1 cycles and a latency of several cycles.
I don't know for FPU but for VFPU, it is pretty hard to have a VFPU instruction running in only 1 cycle because there are strong dependencies between an instruction and the next one (RAR, RAW hazards, etc.) which tend to make them running at the latency cycle.
-
- Posts: 10
- Joined: Sun Jun 01, 2008 12:20 pm
- Contact:
It means that after the latency and with proper handling of dependencies, you get a result on every clock cycle. Sorry to sound crass, but if you don't even understand instruction timing and latencies, you don't need to be worrying about them. It's like trying to optimize a major C application after reading just the first chapter of "C For Dummies".
Pitch is the minimal cycles after which the next instruction may start to execute. It is usually only 1 cycle when there is no dependency with the previous instruction. But it doesn't mean this instruction is executed in only 1 cycle. The exact time to execute is given by the latency.brunocardoso wrote:ok. but what about the wiki, the info about FPU latency there is wrong, right?
"for standard FPU : cycles are probably the same as R4400's with a usual pitch of 1 cycle and a latency of several cycles."
what do you mean by pitch?
take this example :
Code: Select all
add.s $f0, $f1, $f2
add.s $f3, $f4, $f5
add.s $f6, $f7, $f8
add.s $f9, $f10, $f11
but if you take this example :
Code: Select all
add.s $f0, $f1, $f2
add.s $f3, $f4, $f0
...
I think the latencies given by the R4000 manual are the same for PSP.
Last edited by hlide on Mon Jun 09, 2008 7:22 am, edited 1 time in total.
-
- Posts: 10
- Joined: Sun Jun 01, 2008 12:20 pm
- Contact:
nice explanation! The "pitch" name makes pipeline discussions easier. ;)
The allegrex FPU support double precision instructions using aliased registers? By aliased i mean a double precision instruction that uses 2 32-bit registers for each operand and for the result (i know this is supported by the default R4000 processor).
Thanks,
The allegrex FPU support double precision instructions using aliased registers? By aliased i mean a double precision instruction that uses 2 32-bit registers for each operand and for the result (i know this is supported by the default R4000 processor).
Thanks,
Absolutely not. Allegrex has only 32-bit wide integer and float operations. Any attempt to use a 64-bit wide integer or float instruction raises a reserved instruction exception (unimplemented instruction). It would be great if FPU can do 64-bit float operation this way but it cannot.brunocardoso wrote: The allegrex FPU support double precision instructions using aliased registers? By aliased i mean a double precision instruction that uses 2 32-bit registers for each operand and for the result (i know this is supported by the default R4000 processor).
Thanks,
-
- Posts: 10
- Joined: Sun Jun 01, 2008 12:20 pm
- Contact:
On the Mips R4000 User Manual we have:
Is 32-bit binary fixed-point instructions supported on Allegrex FPU?
"In the instruction formats shown in Tables 6-9 through 6-12, the fmt
appended to the instruction opcode specifies the data format: S specifies
single-precision binary floating-point, D specifies double-precision binary
floating-point, W specifies 32-bit binary fixed-point, and L specifies 64-bit
(long) binary fixed-point."
As said before, D and L above isnt supported by allegrex, but what about W ?
Can I use a instruction like : cvt.w.s $f4, $f5 ?
Is 32-bit binary fixed-point instructions supported on Allegrex FPU?
"In the instruction formats shown in Tables 6-9 through 6-12, the fmt
appended to the instruction opcode specifies the data format: S specifies
single-precision binary floating-point, D specifies double-precision binary
floating-point, W specifies 32-bit binary fixed-point, and L specifies 64-bit
(long) binary fixed-point."
As said before, D and L above isnt supported by allegrex, but what about W ?
Can I use a instruction like : cvt.w.s $f4, $f5 ?
-
- Posts: 10
- Joined: Sun Jun 01, 2008 12:20 pm
- Contact:
Hi hlide,
I'm working on implementing support for the Allegrex Core into LLVM Mips
backend. LLVM, among other things, is a compiler infrastructure
(www.llvm.org) which provides aggressively optimizations. I have a lot of
cool long term plans, like supporting intrinsics for the VFPU, etc...
But now I'm currently improving the Mips backend, adding support for FPU
and more stuff.
It is likely that I will be poking around on this forum a lot ;)
I'm working on implementing support for the Allegrex Core into LLVM Mips
backend. LLVM, among other things, is a compiler infrastructure
(www.llvm.org) which provides aggressively optimizations. I have a lot of
cool long term plans, like supporting intrinsics for the VFPU, etc...
But now I'm currently improving the Mips backend, adding support for FPU
and more stuff.
It is likely that I will be poking around on this forum a lot ;)
oh yes, I remember your name as I follow LLVM dev news.
I was very upset about the ARM llvm-gcc version which doesn't seem to optimize very well (unless it is Apple DevKit which really sucks, plus the fact you need an iMac to dev). For my iTouch, I used llvm-gcc 4.0 to compile yapse4all (PSX emulator) but found the generated code so weird and less optimized as i would expect for it. But well, it may be an issue only with ARM.
Yes, I can help you in this regard. I'm still interested with llvm-gcc, especially for the VFPU integration which really lacks in the actual gcc.
Do you think I could access the MIPS backend so I can see how it works ?
BTW, you could also PM me if necessary.
I was very upset about the ARM llvm-gcc version which doesn't seem to optimize very well (unless it is Apple DevKit which really sucks, plus the fact you need an iMac to dev). For my iTouch, I used llvm-gcc 4.0 to compile yapse4all (PSX emulator) but found the generated code so weird and less optimized as i would expect for it. But well, it may be an issue only with ARM.
Yes, I can help you in this regard. I'm still interested with llvm-gcc, especially for the VFPU integration which really lacks in the actual gcc.
Do you think I could access the MIPS backend so I can see how it works ?
BTW, you could also PM me if necessary.
-
- Posts: 10
- Joined: Sun Jun 01, 2008 12:20 pm
- Contact:
It was a long time ago since you tried to use the ARM backend?
Btw, llvm-gcc 4.0 is abandoned now, only 4.2 is maintained.
Yes, it is being developed in llvm trunk. To checkout :
http://tinyurl.com/6nf9la
The backend src is at : trunk/lib/Target/Mips
LLVM targets are developed as libraries inside llvm and llvm-gcc
does not have anything to do with targets, it only recognizes
the triple-target and call target libraries during cross-compiling
(to compile CRT and libgcc files) time.
Dont expect much for now since it's in highly experimental
stage, and there is a lot of stuff not commited (but I often
merge stuff to the trunk, it is quite up-to-date). Dont use
llvm 2.3 to test mips because it is broken there, use the
trunk instead. If you have any suggestions or comments for allegrex,
let me know! (I'm also on the #pspdev at IRC.)
Thanks,
Btw, llvm-gcc 4.0 is abandoned now, only 4.2 is maintained.
Yes, it is being developed in llvm trunk. To checkout :
http://tinyurl.com/6nf9la
The backend src is at : trunk/lib/Target/Mips
LLVM targets are developed as libraries inside llvm and llvm-gcc
does not have anything to do with targets, it only recognizes
the triple-target and call target libraries during cross-compiling
(to compile CRT and libgcc files) time.
Dont expect much for now since it's in highly experimental
stage, and there is a lot of stuff not commited (but I often
merge stuff to the trunk, it is quite up-to-date). Dont use
llvm 2.3 to test mips because it is broken there, use the
trunk instead. If you have any suggestions or comments for allegrex,
let me know! (I'm also on the #pspdev at IRC.)
Thanks,
this post http://forums.ps2dev.org/viewtopic.php?t=10471 gives you some details about Allegrex instructions. I made it as a new topic for search purpose. It may help you to sort out what you can use for your backend. It is not exhaustive and VFPU is still missing (there are hundreds instructions !).
-
- Posts: 10
- Joined: Sun Jun 01, 2008 12:20 pm
- Contact: