VFPU playground, code generation for gas-unsupported opcodes

Discuss the development of new homebrew software, tools and libraries.

Moderators: cheriff, TyRaNiD

mrbrown
Site Admin
Posts: 1537
Joined: Sat Jan 17, 2004 11:24 am

Post by mrbrown »

By VFPU assember I mean gas. If you saw what it took to implement support for register operands you'd have chickened out like I did :).

Oh and games always run in usermode. Just as there's no way for us to get into the kernel once in usermode, official games can't either.
jsgf
Posts: 254
Joined: Tue Jul 12, 2005 11:02 am
Contact:

Post by jsgf »

holger wrote:
jsgf wrote:I would like to see a very simple, thin libvfpu which provites two things:
  1. a set of macros to make inline assembler access to the VFPU easy (like gcc/icc's xmmintrin.h for SSE)
yes, this would be nice, but requires some work in the toolchain, so that gcc knows how to schedule the VFPU registers.
It doesn't require it. It would be nice to have a constraint for VFPU registers so that gcc can reorder the asm statements with respect to other code while maintaining dependencies properly, and doubly nice if there were a gcc type for VFPU register variables, but not essential. A plain asm() with memory-use constraints should be enough.
jsgf wrote:[*] a simple lightweight context switching mechanism to allow multiple libraries to share the VFPU without stomping on each other
becomes obsolete with the above...
Perhaps. But that's quite a bit more work.
holger
Posts: 204
Joined: Thu Aug 18, 2005 10:57 am

Post by holger »

jsgf wrote:
holger wrote:
jsgf wrote:I would like to see a very simple, thin libvfpu which provites two things:
  1. a set of macros to make inline assembler access to the VFPU easy (like gcc/icc's xmmintrin.h for SSE)
yes, this would be nice, but requires some work in the toolchain, so that gcc knows how to schedule the VFPU registers.
It doesn't require it. It would be nice to have a constraint for VFPU registers so that gcc can reorder the asm statements with respect to other code while maintaining dependencies properly, and doubly nice if there were a gcc type for VFPU register variables, but not essential. A plain asm() with memory-use constraints should be enough.
you can implement this by wrapping all __asm__ volatile (cgen_asm()) macros with inline functions, but it would not be of much use -- the great thing about intrinsics is that you get rid of the load of register scheduling...
jsgf
Posts: 254
Joined: Tue Jul 12, 2005 11:02 am
Contact:

Post by jsgf »

holger wrote:you can implement this by wrapping all __asm__ volatile (cgen_asm()) macros with inline functions, but it would not be of much use -- the great thing about intrinsics is that you get rid of the load of register scheduling...
Yep, I'm with you there. But since that will require a non-trivial amount of gcc hacking, it would be nice to have a workable substitute for now, if nothing else so we can get a feeling for how and where the VFPU is actually useful.
holger
Posts: 204
Joined: Thu Aug 18, 2005 10:57 am

Post by holger »

I fear writing big section of inline asm (right now macro-based, hope gas-support comes soon), or dynamic macro-based code generation, is the only option now.
User avatar
groepaz
Posts: 305
Joined: Thu Sep 01, 2005 7:44 am
Contact:

Post by groepaz »

i have added the info from this thread to my doc, those who have messaged me before can get it from its previous location... let me know if there are any obvious errors :)
MrMr[iCE]
Posts: 43
Joined: Mon Oct 03, 2005 4:55 pm

Post by MrMr[iCE] »

I've written up a reference that shows how the registers are mapped in the various single/pair/triple/quad modes. This should help a bit when trying to juggle all those matrices/vectors around the vfpu register space.

http://bradburn.net/mr.mr/vfpu.html
http://bradburn.net/mr.mr/vfpu2.html <-- this one is nice for a cheat sheet

EDIT: I've also added this to the wiki.
holger
Posts: 204
Joined: Thu Aug 18, 2005 10:57 am

Post by holger »

nice explanation! but the wiki seems dead these minutes...
MrMr[iCE]
Posts: 43
Joined: Mon Oct 03, 2005 4:55 pm

Post by MrMr[iCE] »

Heh the wiki just links to the first page I pasted above. Ill do a proper entry for the wiki later.
User avatar
sherpya
Posts: 61
Joined: Mon Oct 03, 2005 5:49 pm

Post by sherpya »

why not adding it directly to gas? it's not possible?
MrMr[iCE]
Posts: 43
Joined: Mon Oct 03, 2005 4:55 pm

Post by MrMr[iCE] »

no just very difficult to do...im not familiar with adding opcodes to gas, and I have no clue how to get gcc to schedule the register usage. That requires someone who really knows binutils and gcc to do that. For now well stick to the macro stuff, much easier to use =)
nugi
Posts: 6
Joined: Sun Sep 11, 2005 3:31 am

load 1,2 byte integer?

Post by nugi »

Yeah~ good job.

I'm wondering how to load/save 1 or 2 byte integer to vector register of GPU at once. For example, loading 32bit color value(RGBA) to C000 register. And after some processing write C000 to memory(32bit).

Is it possible? I cant find a way from current codegen.h. hmm~~~
chp
Posts: 313
Joined: Wed Jun 23, 2004 7:16 am

Post by chp »

I have added very preliminary VFPU support to GUM now, just as a working example. To enable this support, remove the comment from

Code: Select all

//#define GUM_USE_VFPU
in gumInternal.h, rebuild the library and set THREAD_ATTR_VFPU in the desired program. I have tested a few of the samples and they have all run fine.

Only the stack-functions have been fixed so far, I intend to finish the rest of them tomorrow.

Thanks to holger and MrMr[iCE] for their work on this. libpspvgum from MrMr[iCE] was used as initial inspiration for this implementation.
GE Dominator
MrMr[iCE]
Posts: 43
Joined: Mon Oct 03, 2005 4:55 pm

Post by MrMr[iCE] »

I've updated the wiki again, now theres information on loading/storing values into the vfpu.
holger
Posts: 204
Joined: Thu Aug 18, 2005 10:57 am

Post by holger »

maybe a note about lvl.q/lvr.q/svl.q/svr.q makes sense, to ease unaligned load/stores. Semantics are similiar to unaligned word load/stores.
curlyfuzz
Posts: 1
Joined: Mon Oct 24, 2005 6:41 am

Post by curlyfuzz »

MrMr[iCE] wrote:and here comes another run of ops ive tested:

Code: Select all

/*
+-----------------------------------------+--+--------------+-+--------------+
|31                                    16 |15| 14         8 |7| 6         0  |
+-----------------------------------------+--+--------------+-+--------------+
| opcode 0xd0180000 &#40;s&#41;                   | 0| vfpu_rs&#91;6-0&#93; |0| vfpu_rd&#91;6-0&#93; |
| opcode 0xd0180080 &#40;p&#41;                   | 0| vfpu_rs&#91;6-0&#93; |1| vfpu_rd&#91;6-0&#93; |
| opcode 0xd0188000 &#40;t&#41;                   | 1| vfpu_rs&#91;6-0&#93; |0| vfpu_rd&#91;6-0&#93; |
| opcode 0xd0188080 &#40;q&#41;                   | 1| vfpu_rs&#91;6-0&#93; |1| vfpu_rd&#91;6-0&#93; |
+-----------------------------------------+--+--------------+-+--------------+

	NegativeReciprocal.Single/Pair/Triple/Quad

	vnrcp.s  %vfpu_rd, %vfpu_rs   ; calculate negative reciprocal
	vnrcp.p  %vfpu_rd, %vfpu_rs   ; calculate negative reciprocal
	vnrcp.t  %vfpu_rd, %vfpu_rs   ; calculate negative reciprocal
	vnrcp.q  %vfpu_rd, %vfpu_rs   ; calculate negative reciprocal

	%vfpu_rd&#58;   VFPU Vector Target Register &#40;&#91;s|p|t|q&#93;reg 0..127&#41;
	%vfpu_rs&#58;   VFPU Vector Source Register &#40;&#91;s|p|t|q&#93;reg 0..127&#41;

	vfpu_regs&#91;%vfpu_rd&#93; <- -1/vfpu_regs&#91;%vfpu_rs&#93;
*/

#define vnrcp_s&#40;vfpu_rd, vfpu_rs&#41; &#40;0xd0180000 | &#40;vfpu_rs << 8&#41; | &#40;vfpu_rd&#41;&#41;
#define vnrcp_p&#40;vfpu_rd, vfpu_rs&#41; &#40;0xd0180080 | &#40;vfpu_rs << 8&#41; | &#40;vfpu_rd&#41;&#41;
#define vnrcp_t&#40;vfpu_rd, vfpu_rs&#41; &#40;0xd0188000 | &#40;vfpu_rs << 8&#41; | &#40;vfpu_rd&#41;&#41;
#define vnrcp_q&#40;vfpu_rd, vfpu_rs&#41; &#40;0xd0188080 | &#40;vfpu_rs << 8&#41; | &#40;vfpu_rd&#41;&#41;


/*
+-----------------------------------------+--+--------------+-+--------------+
|31                                    16 |15| 14         8 |7| 6         0  |
+-----------------------------------------+--+--------------+-+--------------+
| opcode 0xd01a0000 &#40;s&#41;                   | 0| vfpu_rs&#91;6-0&#93; |0| vfpu_rd&#91;6-0&#93; |
| opcode 0xd01a0080 &#40;p&#41;                   | 0| vfpu_rs&#91;6-0&#93; |1| vfpu_rd&#91;6-0&#93; |
| opcode 0xd01a8000 &#40;t&#41;                   | 1| vfpu_rs&#91;6-0&#93; |0| vfpu_rd&#91;6-0&#93; |
| opcode 0xd01a8080 &#40;q&#41;                   | 1| vfpu_rs&#91;6-0&#93; |1| vfpu_rd&#91;6-0&#93; |
+-----------------------------------------+--+--------------+-+--------------+

	NegativeSin.Single/Pair/Triple/Quad

	vnsin.s  %vfpu_rd, %vfpu_rs   ; calculate negative sin
	vnsin.p  %vfpu_rd, %vfpu_rs   ; calculate negative sin
	vnsin.t  %vfpu_rd, %vfpu_rs   ; calculate negative sin
	vnsin.q  %vfpu_rd, %vfpu_rs   ; calculate negative sin

	%vfpu_rd&#58;   VFPU Vector Target Register &#40;&#91;s|p|t|q&#93;reg 0..127&#41;
	%vfpu_rs&#58;   VFPU Vector Source Register &#40;&#91;s|p|t|q&#93;reg 0..127&#41;

	vfpu_regs&#91;%vfpu_rd&#93; <- sqrt&#40;vfpu_regs&#91;%vfpu_rs&#93;&#41;
*/

#define vnsin_s&#40;vfpu_rd, vfpu_rs&#41; &#40;0xd01a0000 | &#40;vfpu_rs << 8&#41; | &#40;vfpu_rd&#41;&#41;
#define vnsin_p&#40;vfpu_rd, vfpu_rs&#41; &#40;0xd01a0080 | &#40;vfpu_rs << 8&#41; | &#40;vfpu_rd&#41;&#41;
#define vnsin_t&#40;vfpu_rd, vfpu_rs&#41; &#40;0xd01a8000 | &#40;vfpu_rs << 8&#41; | &#40;vfpu_rd&#41;&#41;
#define vnsin_q&#40;vfpu_rd, vfpu_rs&#41; &#40;0xd01a8080 | &#40;vfpu_rs << 8&#41; | &#40;vfpu_rd&#41;&#41;


/*
+-----------------------------------------+--+--------------+-+--------------+
|31                                    16 |15| 14         8 |7| 6         0  |
+-----------------------------------------+--+--------------+-+--------------+
| opcode 0xd01c0000 &#40;s&#41;                   | 0| vfpu_rs&#91;6-0&#93; |0| vfpu_rd&#91;6-0&#93; |
| opcode 0xd01c0080 &#40;p&#41;                   | 0| vfpu_rs&#91;6-0&#93; |1| vfpu_rd&#91;6-0&#93; |
| opcode 0xd01c8000 &#40;t&#41;                   | 1| vfpu_rs&#91;6-0&#93; |0| vfpu_rd&#91;6-0&#93; |
| opcode 0xd01c8080 &#40;q&#41;                   | 1| vfpu_rs&#91;6-0&#93; |1| vfpu_rd&#91;6-0&#93; |
+-----------------------------------------+--+--------------+-+--------------+

	ReciprocalExp2.Single/Pair/Triple/Quad

	vrexp2.s  %vfpu_rd, %vfpu_rs   ; calculate 1/&#40;2^y&#41;
	vrexp2.p  %vfpu_rd, %vfpu_rs   ; calculate 1/&#40;2^y&#41;
	vrexp2.t  %vfpu_rd, %vfpu_rs   ; calculate 1/&#40;2^y&#41;
	vrexp2.q  %vfpu_rd, %vfpu_rs   ; calculate 1/&#40;2^y&#41;

	%vfpu_rd&#58;   VFPU Vector Target Register &#40;&#91;s|p|t|q&#93;reg 0..127&#41;
	%vfpu_rs&#58;   VFPU Vector Source Register &#40;&#91;s|p|t|q&#93;reg 0..127&#41;

	vfpu_regs&#91;%vfpu_rd&#93; <- 1/exp2&#40;vfpu_regs&#91;%vfpu_rs&#93;&#41;
*/

#define vrexp2_s&#40;vfpu_rd, vfpu_rs&#41; &#40;0xd01c0000 | &#40;vfpu_rs << 8&#41; | &#40;vfpu_rd&#41;&#41;
#define vrexp2_p&#40;vfpu_rd, vfpu_rs&#41; &#40;0xd01c0080 | &#40;vfpu_rs << 8&#41; | &#40;vfpu_rd&#41;&#41;
#define vrexp2_t&#40;vfpu_rd, vfpu_rs&#41; &#40;0xd01c8000 | &#40;vfpu_rs << 8&#41; | &#40;vfpu_rd&#41;&#41;
#define vrexp2_q&#40;vfpu_rd, vfpu_rs&#41; &#40;0xd01c8080 | &#40;vfpu_rs << 8&#41; | &#40;vfpu_rd&#41;&#41;
Seems like these opcodes are the same as for reciprocal/sin/exp2 but with the flag 0x00080000 ored into the opcode. (meaning to negate the input register before the calculation). Maybe this is a more general feature?
MrMr[iCE]
Posts: 43
Joined: Mon Oct 03, 2005 4:55 pm

Post by MrMr[iCE] »

Honestly I don't know. I'm using the opcode list in binutils, this is what you would see if these opcodes were found in a binary with psp-objdump.
User avatar
groepaz
Posts: 305
Joined: Thu Sep 01, 2005 7:44 am
Contact:

Post by groepaz »

bit 24-26 seem to be more like an "extended opcode" field, not directly related to a specific feature...

edit: doh...0x00080000 isnt bit 24-26 :=P there is indeed a small chance that what you say is true :)
jonny
Posts: 351
Joined: Thu Sep 22, 2005 5:46 pm
Contact:

Post by jonny »

nugi
Posts: 6
Joined: Sun Sep 11, 2005 3:31 am

Thread down?

Post by nugi »

No more opcodes from this thread?

Here's some more opcodes from my test codes... no documents sorry~
I think there is no sense to use gas to assemble vfpu codes. The way of using vfpu codes was sufficient to me and greatly helped me. Thank you~
I hope this thread not to be closed due to no contribution!

int to short.
#define vi2s_p(vfpu_rd,vfpu_rs) (0xd03f0080 | ((vfpu_rs) << 8) | (vfpu_rd))
#define vi2s_q(vfpu_rd,vfpu_rs) (0xd03f8080 | ((vfpu_rs) << 8) | (vfpu_rd))

int to unsigned char.
#define vi2uc_q(vfpu_rd,vfpu_rs) (0xd03c8080 | ((vfpu_rs) << 8) | (vfpu_rd))

int to float.
#define vi2f_s(vfpu_rd,vfpu_rs,scale) (0xd2800000 | ((scale) << 16) | ((vfpu_rs) << 8) | (vfpu_rd))
#define vi2f_p(vfpu_rd,vfpu_rs,scale) (0xd2800080 | ((scale) << 16) | ((vfpu_rs) << 8) | (vfpu_rd))
#define vi2f_t(vfpu_rd,vfpu_rs,scale) (0xd2808000 | ((scale) << 16) | ((vfpu_rs) << 8) | (vfpu_rd))
#define vi2f_q(vfpu_rd,vfpu_rs,scale) (0xd2808080 | ((scale) << 16) | ((vfpu_rs) << 8) | (vfpu_rd))

float to int round to near.
#define vf2in_s(vfpu_rd,vfpu_rs,scale) (0xd2000000 | ((scale) << 16) | ((vfpu_rs) << 8) | (vfpu_rd))
#define vf2in_p(vfpu_rd,vfpu_rs,scale) (0xd2000080 | ((scale) << 16) | ((vfpu_rs) << 8) | (vfpu_rd))
#define vf2in_t(vfpu_rd,vfpu_rs,scale) (0xd2008000 | ((scale) << 16) | ((vfpu_rs) << 8) | (vfpu_rd))
#define vf2in_q(vfpu_rd,vfpu_rs,scale) (0xd2008080 | ((scale) << 16) | ((vfpu_rs) << 8) | (vfpu_rd))

also there are vf2id, vf2id instructions with different rounding methods.

Maybe something is wrong in unused instruction to me.
Brunni
Posts: 186
Joined: Sat Oct 08, 2005 10:27 pm

Post by Brunni »

Hello. I have some problems with lv_q. For example, if I do this:

Code: Select all

float vfpu_add&#40;float f1, float f2&#41;
&#123;
   vfpu_vars&#91;0&#93; = f1;
   vfpu_vars&#91;1&#93; = f2;
   register void *ptr __asm &#40;"a0"&#41; = vfpu_vars; 
   __asm__ volatile &#40; 
      cgen_asm&#40;lv_q&#40;0, 0, R_a0, 0&#41;&#41;
      cgen_asm&#40;vadd_s&#40;124, 0, 1&#41;&#41;
      cgen_asm&#40;sv_q&#40;31, 0 * 4, R_a0, 0&#41;&#41; 
   &#58; "=r"&#40;ptr&#41; &#58; "r"&#40;ptr&#41; &#58; "memory"&#41;; 
   return vfpu_vars&#91;0&#93;;
&#125;
This won't work as expected (i.e. add f1 and f2 and return the result). It seems that f1 is not loaded, if I load something else in register 1 before, it will keep this value after lv_q.
Instead, something like this will work:

Code: Select all

float vfpu_add&#40;float f1, float f2&#41;
&#123;
   vfpu_vars&#91;0&#93; = f1;
   vfpu_vars&#91;1&#93; = f2;
   register void *ptr __asm &#40;"a0"&#41; = vfpu_vars; 
   __asm__ volatile &#40; 
      cgen_asm&#40;lv_s&#40;0, 0, R_a0, 0&#41;&#41;
      cgen_asm&#40;lv_s&#40;1, 1, R_a0, 0&#41;&#41;
      cgen_asm&#40;vadd_s&#40;124, 0, 1&#41;&#41;
      cgen_asm&#40;sv_q&#40;31, 0 * 4, R_a0, 0&#41;&#41; 
   &#58; "=r"&#40;ptr&#41; &#58; "r"&#40;ptr&#41; &#58; "memory"&#41;; 
   return vfpu_vars&#91;0&#93;;
&#125;
But in the .h file it's indicated:

Code: Select all

    lv.q %vfpu_rt, offset&#40;%base&#41;

   %fpu_rt&#58;   VFPU Vector Target Register &#40;column0-31/row32-63&#41;
But here fpu_rt seems to be just the register number... so maybe it's my fault (I'm a real beginner), or there is really something I didn't understand. If someone could help me please...
Thanks in advance ^^
Sorry for my bad english
Image Oldschool library for PSP - PC version released
nugi
Posts: 6
Joined: Sun Sep 11, 2005 3:31 am

Hmm....

Post by nugi »

Some information and suggestion.
1. Before start, final address should be aligned to 16 for quad version(q) and 4 for single version(s).
2. Use Q_C000 style register defined in codegen.h rather than using direct register numbers.
3. Example : lv_q(Q_C000, 0, R_a0)
Loads 16 byte data(4xfloat) into Q_C000 register from address pointed by R_a0(==vfpu_vars).
That is, if float vfpu_vars[4]={100, 101, 102, 103} then S_S000=100, S_S001=101... As noted, vfpu_vars should be 16 byte aligned.
4. a simple test program(newvfpu.c?) will greatly help! Find it in this thread and use it.
Post Reply