VFPU diggins
Yeah, that looks a lot more reliable now. Also much nearer to my approximated cycles, so I'm happy :)
<Don't push the river, it flows.>
http://wordpress.fx-world.org - my devblog
http://wiki.fx-world.org - VFPU documentation wiki
Alexander Berl
http://wordpress.fx-world.org - my devblog
http://wiki.fx-world.org - VFPU documentation wiki
Alexander Berl
!!! NEW INSTRUCTIONS !!!
MrMr[iCE] asked me if there were the inverse instruction for vi2c.
And effectively mips-opc.c doesn't seem to have one at our surprise. Why ? it is because they forget to add the inverse instruction or is it because it is unimplemented ?
diggin in mips-opc.c, we have :
vi2uc.q --> 0xd03c8080
vi2c.q --> 0xd03d8080
vi2us.q --> 0xd03e8080
vi2s.q --> 0xd03f8080
and
vus2i.p --> 0xd03a0080
vs2i.p --> 0xd03b0080
vi2us.p --> 0xd03e0080
vi2s.p --> 0xd03f0080
Oh well, we can make some analogies and found out if vc2i.q and vuc2i.q really exist :
vi2us.p - vus2i.p = 0x00040000
vi2s.p - vs2i.p = 0x00040000
so lets's check if vc2i.s = vi2c.q + vi2s.p - vs2i.p = 0xd0398080
BINGO ! C010 == C000 as a result so 0xD0398081 seems to act as vc2i.s C010, S000 !!!
now, why .s instead of .q ? because the source is a scalar vfpu register (vi2c.q has a vector vfpu register as source whereas the target was a scalr vfpu register, so by analogy it would make sense to use here .s instead of .q)
ok now, let's check if vuc2i.s = vi2uc.q + vi2us.p - vus2i.p = 0xd0388080
hmmmm... C010 != C000
but we have always the same result this way :
C010.x = (S000[7..0] * 0x01010101) >> 1;
C010.y = (S000[15..8] * 0x01010101) >> 1;
C010.z = (S000[23..16] * 0x01010101) >> 1;
C010.w = (S000[31..24] * 0x01010101) >> 1;
anyway if you want to make a complex calculus on a RGBA8888, you can do it this way :
vuc2i.s C000, S000
vi2f.q C000, C000, 23
... your calculus here
vf2iz.q C000, C000, 23
vi2uc.q S000, C000
although this instruction isn't the invert instruction of vi2uc.q.
so now I propose to add those both instructions to psp-as which doesn't reckonize them.
About the case of vuc2i.s, if someone has some idea about what its name should be, i would like to hear it.
EDIT: corrected the fact that "uc2i" halfs the result.
MrMr[iCE] asked me if there were the inverse instruction for vi2c.
And effectively mips-opc.c doesn't seem to have one at our surprise. Why ? it is because they forget to add the inverse instruction or is it because it is unimplemented ?
diggin in mips-opc.c, we have :
vi2uc.q --> 0xd03c8080
vi2c.q --> 0xd03d8080
vi2us.q --> 0xd03e8080
vi2s.q --> 0xd03f8080
and
vus2i.p --> 0xd03a0080
vs2i.p --> 0xd03b0080
vi2us.p --> 0xd03e0080
vi2s.p --> 0xd03f0080
Oh well, we can make some analogies and found out if vc2i.q and vuc2i.q really exist :
vi2us.p - vus2i.p = 0x00040000
vi2s.p - vs2i.p = 0x00040000
so lets's check if vc2i.s = vi2c.q + vi2s.p - vs2i.p = 0xd0398080
Code: Select all
asm volatile
(
"lv.q C000, %0\n"
"vi2c.q\tS000, C000\n"
".word 0xD0398081\n" // "vc2i.s\tC010, S000\n"
"sv.q\tC010, %0\n"
: "+m"(res2) : : "memory"
);
printf("%08x %08x %08x %08x\n", res2.x, res2.y, res2.z, res2.w);
now, why .s instead of .q ? because the source is a scalar vfpu register (vi2c.q has a vector vfpu register as source whereas the target was a scalr vfpu register, so by analogy it would make sense to use here .s instead of .q)
ok now, let's check if vuc2i.s = vi2uc.q + vi2us.p - vus2i.p = 0xd0388080
Code: Select all
asm volatile
(
"lv.q C000, %0\n"
"vi2uc.q\tS000, C000\n"
".word 0xD0388081\n" // "vc2i.s\tC010, S000\n"
"sv.q\tC010, %0\n"
: "+m"(res2) : : "memory"
);
printf("%08x %08x %08x %08x\n", res2.x, res2.y, res2.z, res2.w);
but we have always the same result this way :
C010.x = (S000[7..0] * 0x01010101) >> 1;
C010.y = (S000[15..8] * 0x01010101) >> 1;
C010.z = (S000[23..16] * 0x01010101) >> 1;
C010.w = (S000[31..24] * 0x01010101) >> 1;
anyway if you want to make a complex calculus on a RGBA8888, you can do it this way :
vuc2i.s C000, S000
vi2f.q C000, C000, 23
... your calculus here
vf2iz.q C000, C000, 23
vi2uc.q S000, C000
although this instruction isn't the invert instruction of vi2uc.q.
so now I propose to add those both instructions to psp-as which doesn't reckonize them.
About the case of vuc2i.s, if someone has some idea about what its name should be, i would like to hear it.
EDIT: corrected the fact that "uc2i" halfs the result.
Last edited by hlide on Sun Mar 18, 2007 10:35 pm, edited 2 times in total.
hummm some suggestions about what those functions do :
- vsbz.s sd, ss : may change the binary logarithmic scale of a floating point value to 0 ?
- vsbz.s sd, ss : may change the binary logarithmic scale of a floating point value to 0 ?
- sd = (-1)^(ss.s) x 2^0 x (1 + ss.m/2^23).
- sd = (-1)^(ss.s) x 2^N x (1 + ss.m/2^23) where N is given by st.
- sd = ss.e - 127.
- sd = (-1)^(ss.s) x 2^(N-127) x (1 + (ss.m % 2^N)/2^23) where N is given by imm.
Hmm, I've been playing around with vuc2i, and I've been getting slightly different results. Am I missing something?
Which results in:
Col: 004080c0
Res: 60606060 40404040 20202020 00808080
So it looks to me like vuc2i.s is actually doing:
I guess the final >>1 prevents the top bit from ever being set, which means that when you use vi2f.q (which is signed) your data is correctly comes through unsigned.
vuc2i.s C000, S000
vi2f.q C000, C000, 23
(at least that works correctly for me - shifting by 24 gives me half-bright colours.)
StrmnNrmn
(Edit - fix code)
Code: Select all
u32 col = 0x014080c0;
i4 res;
asm volatile
(
"lv.s S200, %0\n"
".word 0xd0388080 | (8<<8) | (40)\n" // vuc2i.s R200, S200
"sv.q\tR200, %1\n"
: "+m"(col), "+m"(res) : : "memory"
);
printf("Col: %08x\n", col);
printf("Res: %08x %08x %08x %08x\n", res.x, res.y, res.z, res.w);
Col: 004080c0
Res: 60606060 40404040 20202020 00808080
So it looks to me like vuc2i.s is actually doing:
Code: Select all
vuc2i.s vd.q, vs.s
{
vd.q[0] = (vs.s[0]( 0.. 7) * 0x01010101) >> 1;
vd.q[1] = (vs.s[0]( 8..15) * 0x01010101) >> 1;
vd.q[2] = (vs.s[0](16..23) * 0x01010101) >> 1;
vd.q[3] = (vs.s[0](24..31) * 0x01010101) >> 1;
}
I think this actually needs to be:anyway if you want to make a complex calculus on a RGBA8888, you can do it this way :
vuc2i.s C000, S000
vi2f.q C000, C000, 24
...
vuc2i.s C000, S000
vi2f.q C000, C000, 23
(at least that works correctly for me - shifting by 24 gives me half-bright colours.)
StrmnNrmn
(Edit - fix code)
This is a little pedantic, but the pseudo-code on the first page seems to imply that the low-order bits are discarded:
I think this would more accurately be described as:
i.e. the fractional bits are taken into account:
Prints out:
00800000
0.500000
StrmnNrmn
Code: Select all
vi2f.q/t/p/s vd, vs, imm 1 0
{
for (i = 0; i < |q/t/p/s|; ++i)
vd[i] = (float)(vs[i] >> imm);
}
Code: Select all
vi2f.q/t/p/s vd, vs, imm 1 0
{
for (i = 0; i < |q/t/p/s|; ++i)
vd[i] = (float)(vs[i]) / (float)(1<<imm);
}
Code: Select all
u32 col = 0x00800000;
float res;
asm volatile
(
"lv.s S200, %0\n"
"vi2f.s S200, S200, 24\n"
"sv.s\tS200, %1\n"
: "+m"(col), "+m"(res) : : "memory"
);
printf("Col: %08x\n", col);
printf("Res: %f\n", res);
00800000
0.500000
StrmnNrmn
nope, this topic is quite old indeed and not corrected.StrmnNrmn wrote:Hmm, I've been playing around with vuc2i, and I've been getting slightly different results. Am I missing something?
if you looked at all the posts of topic, especially those MrMr[iCE] and mine you would have found that vuc2i is indeed as you say here.
For cycles, you can look at http://wiki.fx-world.org/doku.php?id=general:cycles. Details on instructions still need to be done.
EDIT: in fact, no. There isn't any post correcting vuc2i. Most corrections were indeed done when dicussing on IRC, not in this topic. Well your post is welcome :P
EDIT2: i corrected my post where i found out "vuc2i" because it is something i already knew but forgot to modify in this post.
That's an excellent resource. Somehow I've managed to miss the overlook the URL while searching around on these forums. Thanks :)hlide wrote:For cycles, you can look at http://wiki.fx-world.org/doku.php?id=general:cycles. Details on instructions still need to be done.
the main processor of PSP has two coprocessors, a FPU and a VFPU.nDEV wrote:Holy crap!
You guys ARE GENIUS...wtf is all this?! i dont understand anything :lol.gif:
FPU is quite standard and is used for single floating point computation. Every homebrews is using it when using C float. VFPU probably stands for Vector Floating point Unit and is a very powerful SIMD-like FPU. Very few homebrews uses it or marginally because gcc has no knowlegde about it (only gas - the assembler - has) the same way gcc has for SSE or Altivec.
VFPU offers 128 single float registers (instead of 32 from FPU) which can be arranged to be accessed as :
- 8 non-overlapping 4x4 matrixes
- 8 non-overlapping 3x3 or 16 overlapping 3x3 matrixes
- 16 non-overlapping 2x2 matrixes
In each matrix, we can also access a register as a column or a row. Or simply a register as en element of the matrix.
That's very powerful to compute matrixes and vectors which are used for 2D and 3D. Quaternions are also easier and faster to compute with VFPU.
SIMD = Single Instruction Multiple Data, see wikipedia for some explanation.
Thanks , thats wayyy to advanced for me.hlide wrote:the main processor of PSP has two coprocessors, a FPU and a VFPU.nDEV wrote:Holy crap!
You guys ARE GENIUS...wtf is all this?! i dont understand anything :lol.gif:
FPU is quite standard and is used for single floating point computation. Every homebrews is using it when using C float. VFPU probably stands for Vector Floating point Unit and is a very powerful SIMD-like FPU. Very few homebrews uses it or marginally because gcc has no knowlegde about it (only gas - the assembler - has) the same way gcc has for SSE or Altivec.
VFPU offers 128 single float registers (instead of 32 from FPU) which can be arranged to be accessed as :
- 8 non-overlapping 4x4 matrixes
- 8 non-overlapping 3x3 or 16 overlapping 3x3 matrixes
- 16 non-overlapping 2x2 matrixes
In each matrix, we can also access a register as a column or a row. Or simply a register as en element of the matrix.
That's very powerful to compute matrixes and vectors which are used for 2D and 3D. Quaternions are also easier and faster to compute with VFPU.
SIMD = Single Instruction Multiple Data, see wikipedia for some explanation.
Anyway , impressive work!!
nice work, guys!
I'm trying to speed up to VFPU diggins and for now collecting all the info on VFPU tha I can find. Already dug thru this forum.
Any other links you think will do good? Yep, I've already been to wiki.fx-world.org :-)
P.S. In fact, I'm building (yet another) wiki with PSP info. I'll share the link later, when there will be more stuff.
I'm trying to speed up to VFPU diggins and for now collecting all the info on VFPU tha I can find. Already dug thru this forum.
Any other links you think will do good? Yep, I've already been to wiki.fx-world.org :-)
P.S. In fact, I'm building (yet another) wiki with PSP info. I'll share the link later, when there will be more stuff.
Freelance game industry veteran. 8]
LIST UPDATED
Looking at POPS binary, i found out a special use of VCMOVF/T instruction :
i was wondering what this number 6 meaned. As a recall :
0 : comparison on X component
1 : comparison on Y component
2 : comparison on Z component
3 : comparison on W component
4 : OR comparison on all components
5 : AND comparison on all components
The code using it seems to imply we can set the components individually according their own comparisons.
So :
would be equivalent to :
I thought I knew every bit of VFPU :)
Looking at POPS binary, i found out a special use of VCMOVF/T instruction :
Code: Select all
vcmovt.t vd, vs, 6
0 : comparison on X component
1 : comparison on Y component
2 : comparison on Z component
3 : comparison on W component
4 : OR comparison on all components
5 : AND comparison on all components
The code using it seems to imply we can set the components individually according their own comparisons.
So :
Code: Select all
vcmovt.t vd, vs, 6
Code: Select all
vcmovt.s vd[0], vs[0], 0
vcmovt.s vd[1], vs[1], 1
vcmovt.s vd[2], vs[2], 2
-
- Posts: 87
- Joined: Thu Oct 01, 2009 8:43 pm
Do your own research. >:(anmabagima wrote:Hi there,
this might be a stupid question of a noob: but what is the intention of this ? Where and how do I use this usually in my PSP development project?
Regards
AnMaBaGiMa
Search this forum and use google.
Code: Select all
int main(){
SetupCallbacks();
makeNiceGame();
sceKernelExitGame();
}
-
- Posts: 87
- Joined: Thu Oct 01, 2009 8:43 pm
Hi,
thanks for this hint....I've done a research and found this topic. I've searched for something like ASM to be used within PSP development to speed things up. However, the results I get pointed me to this thread. But what is this telling me ? What I've found out so far is that the PSP do have it's own dialect of assembler. It seem not to be comparable with ix86 assembler where you accessing registers like AX, BX, ESI and all this stuff. So I'm here and wondering if some wane could give me just a small finger tipp into the right direction...
Thanks...
thanks for this hint....I've done a research and found this topic. I've searched for something like ASM to be used within PSP development to speed things up. However, the results I get pointed me to this thread. But what is this telling me ? What I've found out so far is that the PSP do have it's own dialect of assembler. It seem not to be comparable with ix86 assembler where you accessing registers like AX, BX, ESI and all this stuff. So I'm here and wondering if some wane could give me just a small finger tipp into the right direction...
Thanks...
Search for "VFPU" since it's in the topic title.
I bet you don't have any coding experience since you are not even able to figure out what the topic title means.
I bet you don't have any coding experience since you are not even able to figure out what the topic title means.
Code: Select all
int main(){
SetupCallbacks();
makeNiceGame();
sceKernelExitGame();
}
The PSP uses a MIPS 32bit processor. So in assembler program you can use all MIPS commands and more: FPU and VFPU.anmabagima wrote:What I've found out so far is that the PSP do have it's own dialect of assembler. It seem not to be comparable with ix86 assembler where you accessing registers like AX, BX, ESI and all this stuff
FPU = Floating Point Unit
VFPU = Video FPU => done by the GPU
So the VFPU commands are "ready to use" for maths: calculating a cosinus by using VFPU is +/- 800% faster than a normal cosinus (using libMaths).
And it can do a lot of matrices operations (rotate, translate, multiply...)
I'm French, and 15 years old, so my English is not good...
-
- Posts: 87
- Joined: Thu Oct 01, 2009 8:43 pm
Usually I could start and argue why I feel I have already some code exerience - take a look at http://www.anmabagima.de/jojojoris wrote:Search for "VFPU" since it's in the topic title.
I bet you don't have any coding experience since you are not even able to figure out what the topic title means.
Look for the project page and you will find my C++ tutorial for DDraw on Windows.
However, my assumtion was we are in that forum to help each other not to affront someone...Anyway..thanks to anyone else for the help...
Uh... no. VFPU = VECTOR FPU, and it's another CPU coprocessor, just like the FPU. It just works on vectors instead of single values. The VFPU commands are CPU assembler commands specific to the Allegro CPU inside the PSP.dridri wrote:The PSP uses a MIPS 32bit processor. So in assembler program you can use all MIPS commands and more: FPU and VFPU.anmabagima wrote:What I've found out so far is that the PSP do have it's own dialect of assembler. It seem not to be comparable with ix86 assembler where you accessing registers like AX, BX, ESI and all this stuff
FPU = Floating Point Unit
VFPU = Video FPU => done by the GPU
So the VFPU commands are "ready to use" for maths: calculating a cosinus by using VFPU is +/- 800% faster than a normal cosinus (using libMaths).
And it can do a lot of matrices operations (rotate, translate, multiply...)
Here's one place to start with the vfpu: http://forums.ps2dev.org/viewtopic.php?p=67320#67320