Inline VFPU help

Discuss the development of new homebrew software, tools and libraries.

Moderators: cheriff, TyRaNiD

Post Reply
starman2049
Posts: 75
Joined: Mon Sep 19, 2005 5:41 am

Inline VFPU help

Post by starman2049 »

I am trying to replace the sqrt() function using the codegen.h approach and have written the following but it has a bug. I am a new at inline assembler and can't get the following to work (it just returns the value I pass it)

Code: Select all

float mysqrt(float val)
{
   register float a=val;
   register float b;

   __asm__ volatile (
      cgen_asm(vsqrt_s(R_a0, R_a1))
      :"=r"(b)
      :"r"(a), "r"(b)
   );
   return (b);
}
Can anyone lead me out of the dark on how to do this correctly?
popcornx
Posts: 6
Joined: Tue Mar 14, 2006 7:56 am

Post by popcornx »

hmm I'm a MIPS n00b too but I'll still try to help

Code: Select all

float mysqrt(float val)
{
   int b=0;

   register float par1 asm($16)=val; //parameter 1
   register float par2 asm($17)=2; //parameter 2

   asm (
      :
      :"r"(par1),"r"(par2) // insert val into reg $16 & 2 into $17
   );

  asm("move $a0,$16");//move the first paratermeter
 asm("move $a1,$17");//move the first paratermeter
  asm("jal vsqrt");
  asm("sw $ra,($18");

  register float retv asm($18); //return value
     asm (
      :"=r"(retv)
      :
   );
   return (retv);
} 
I'm a n00b so I dunno if this works or not...I hope it does!!!
if not you can ask skylark or fanjita for help. I think they will respond.
chp
Posts: 313
Joined: Wed Jun 23, 2004 7:16 am

Post by chp »

Actually, since you're trying to use the VFPU, I wonder if you can even use the general registers for this... Something like this should work: (note that VFPU is supported in the current binutils, so no need for codegen)

Code: Select all

float mysqrt(float val)
{
        float ret;

        __asm__ volatile (
                "mtv %1, S000\n"
                "vsqrt.s S001, S000\n"
                "mfv %0, S001\n"
        : "=r"(ret) : "r"(val) );

        return ret;
}
This should work (I looked at the output and it looked ok), but I don't have access to hardware to test it with right now.
GE Dominator
starman2049
Posts: 75
Joined: Mon Sep 19, 2005 5:41 am

Post by starman2049 »

Can't get anywhere with this. I'm on september '05 toolchain and sdk so the VFPU stuff is not builtin so I had to stick with codegen.h approach. Unfortunatly there is no mtv/mvf and I spent some time trying to add those opcodes to little avail.

Then I decided to update my toolchain and grabbed the latest "famous" version from oopo.net haggled through that for a bit to find that it looks like the VFPU stuff is only in the version in SVN as of yet.

I've never been able to checkout stuff from svn.ps2dev.net - it keeps saying that the hostname is invalid. I tried punching TCP/3690 open on my router but can't get to it.

I tried the beta toolchain from oopo.net, but it failed in the install so I went back to the latest "famous" version.

Is there a userid/pw that is needed for svn.pspdev.org?
Oobles
Site Admin
Posts: 347
Joined: Sat Jan 17, 2004 9:49 am
Location: Melbourne, Australia
Contact:

Post by Oobles »

No userid or password is required for svn.ps2dev.org. Please make sure you have the correct URL. In your last message you refer to it as svn.ps2ev.net and svn.pspdev.org which are both wrong.

If you have a restrictive firewall you might not be able to access the server. Let me know if you have any more problems. I believe someone did create a mirror of subversion that can be accessed through HTTP. You should be able to find it on forums via search.

David. aka Oobles.
User avatar
dot_blank
Posts: 498
Joined: Wed Sep 28, 2005 8:47 am
Location: Brasil

Post by dot_blank »

tried and below works:

user@host$ svn co svn://ps2dev.org/psp/trunk/pspsdk

then future updating is:

user@host$ cd pspsdk
user@host$ svn up
10011011 00101010 11010111 10001001 10111010
starman2049
Posts: 75
Joined: Mon Sep 19, 2005 5:41 am

Post by starman2049 »

Well I still can't get into svn, but I was able to get my sqrt(), sin(), cos(), etc over to VFPU and got a very nice speed-up so thank you to everyone who helped with this!!
starman2049
Posts: 75
Joined: Mon Sep 19, 2005 5:41 am

Post by starman2049 »

These are probably done in newlib or elsewhere already, but if you have the latest toolchain you should be able to implement your own math functions that are MUCH faster. Note you have to add "|" PSP_THREAD_ATTR_VFPU into your PSP_MAIN_THREAD_ATTR() def in main.c.

Code: Select all

float mysqrtf(float val)
{
	float ret;

	__asm__ volatile (
		"mtv %1, S000\n"
		"vsqrt.s S001, S000\n"
		"mfv %0, S001\n"
		: "=r"(ret) : "r"(val));

	return ret;
}

Code: Select all

float mysinf(float val)
	{
	float ret; 

	val *= 0.6366197f; // convert to deg/90

	__asm__ volatile (
		"mtv %1, S000\n"
		"vsin.s S001, S000\n"
		"mfv %0, S001\n"
		: "=r"(ret) : "r"(val));

	return ret;
	}

Code: Select all

float mycosf(float val)
	{
	float ret; 

	val *= 0.6366197f; // convert to deg/90

	__asm__ volatile (
		"mtv %1, S000\n"
		"vcos.s S001, S000\n"
		"mfv %0, S001\n"
		: "=r"(ret) : "r"(val));

	return ret;
	}
User avatar
dot_blank
Posts: 498
Joined: Wed Sep 28, 2005 8:47 am
Location: Brasil

Post by dot_blank »

well these seem to work quiet well :)
i think the toolchain will do good with maybe a
vfpu math library ....something in the sense of

Code: Select all

#include <vmath.h>
and then one would simply use them as such

Code: Select all

vcos&#40;2*c_wave&#41;;
vinf&#40;value&#41;;
//etc...
what do the mods think?
10011011 00101010 11010111 10001001 10111010
chp
Posts: 313
Joined: Wed Jun 23, 2004 7:16 am

Post by chp »

Just as a little sidenote, the sin/cos can be rewritten to

Code: Select all

float mysinf&#40;float val&#41;
   &#123;
   float ret;

   __asm__ volatile &#40;
      "mtv %1, S000\n"
      "vcst.s S002, VFPU_2_PI\n"
      "vmul.s S001, S000, S002\n"
      "vsin.s S000, S001\n" // or vcos.s
      "mfv %0, S000\n"
      &#58; "=r"&#40;ret&#41; &#58; "r"&#40;val&#41;&#41;;

   return ret;
   &#125;
and if you use something like vrot.q instead of vsin, you can get sinus and cosinus computed at the same time. Something like this should do it:

Code: Select all

void vsincosf&#40;float angle, ScePspFVector4* result&#41;
&#123;
 __asm__ volatile &#40;
    "mtv %1, S000\n"
    "vcst.s S001, VFPU_2_PI\n"
    "vmul.s S002, S000, S001\n"
    "vrot.q C010, S002, &#91;s, c, 0, 0&#93;\n"
    "usv.q C010, 0 + %0\n"
    &#58; "+m"&#40;*result&#41; &#58; "r"&#40;angle&#41;&#41;;
&#125;
GE Dominator
starman2049
Posts: 75
Joined: Mon Sep 19, 2005 5:41 am

Post by starman2049 »

Code: Select all

float myacosf&#40;float val&#41;
	&#123;
	float ret; 

	__asm__ volatile &#40;
		"mtv %1, S000\n"
		"vasin.s S001, S000\n"
		"vone.s S002\n" 
		"vsub.s S000, S002, S001\n"
		"vcst.s S002, VFPU_PI_2\n" 
		"vmul.s S001, S000, S002\n" 
		"mfv %0, S001\n"
		&#58; "=r"&#40;ret&#41; &#58; "r"&#40;val&#41;&#41;;

	return ret;
	&#125;

float myasinf&#40;float val&#41;
	&#123;
	float ret; 

	__asm__ volatile &#40;
		"mtv %1, S000\n"
		"vasin.s S001, S000\n"
		"vcst.s S002, VFPU_PI_2\n" 
		"vmul.s S000, S001, S002\n" 
		"mfv %0, S000\n"
		&#58; "=r"&#40;ret&#41; &#58; "r"&#40;val&#41;&#41;;

	return ret;
	&#125;

starman2049
Posts: 75
Joined: Mon Sep 19, 2005 5:41 am

Post by starman2049 »

Here is an applyMatrix routine. This is about 4 times faster than in software.

Code: Select all

// NOTE&#58; v0, m0, v1 must be 16 byte aligned!!
// NOTE&#58; this is row-major matrix format
void myApplyMatrix&#40;FVECTOR v0, FMATRIX m0, FVECTOR v1&#41;
	&#123;
	__asm__ volatile &#40;
	"lv.q	R000, 0x0&#40;%1&#41;\n"
	"lv.q	R001, 0x10&#40;%1&#41;\n"
	"lv.q	R002, 0x20&#40;%1&#41;\n"
	"lv.q	R003, 0x30&#40;%1&#41;\n"

	"lv.q	R100, 0x0&#40;%2&#41;\n"

	"vdot.q	S200, R000, R100\n"
	"vdot.q	S210, R001, R100\n"
	"vdot.q	S220, R002, R100\n"
	"vdot.q	S230, R003, R100\n"
	"sv.q	R200, 0x0&#40;%0&#41;\n"
	&#58; &#58; "r" &#40;v0&#41; , "r" &#40;m0&#41; ,"r" &#40;v1&#41; &#41;;
	&#125;
starman2049
Posts: 75
Joined: Mon Sep 19, 2005 5:41 am

Post by starman2049 »

Here is a Matrix Multiply routine. This is about 10 times faster than in software:

Code: Select all

// NOTE&#58; m0, m1, m2 must be 16 byte aligned!!
// NOTE&#58; this is row-major matrix format
void myMulMatrix&#40;FMATRIX m2, FMATRIX m0, FMATRIX m1&#41;
	&#123;
	__asm__ volatile &#40;
	"lv.q	R000, 0x0&#40;%1&#41;\n"
	"lv.q	R001, 0x10&#40;%1&#41;\n"
	"lv.q	R002, 0x20&#40;%1&#41;\n"
	"lv.q	R003, 0x30&#40;%1&#41;\n"

	"lv.q	R100, 0x0&#40;%2&#41;\n"
	"lv.q	R101, 0x10&#40;%2&#41;\n"
	"lv.q	R102, 0x20&#40;%2&#41;\n"
	"lv.q	R103, 0x30&#40;%2&#41;\n"

	"vmmul.q	M200, M000, M100\n"

	"sv.q	R200, 0x0&#40;%0&#41;\n"
	"sv.q	R201, 0x10&#40;%0&#41;\n"
	"sv.q	R202, 0x20&#40;%0&#41;\n"
	"sv.q	R203, 0x30&#40;%0&#41;\n"
	&#58; &#58; "r" &#40;m2&#41; , "r" &#40;m0&#41; ,"r" &#40;m1&#41; &#41;;
	&#125;
starman2049
Posts: 75
Joined: Mon Sep 19, 2005 5:41 am

Post by starman2049 »

Here is a matrix copy routine. This is about 6 times faster than in software:

Code: Select all

// NOTE&#58; m0, m1 must be 16 byte aligned!!
// NOTE&#58; this is row-major matrix format
void myCopyMatrix&#40;FMATRIX m1, FMATRIX m0&#41;
	&#123;
	__asm__ volatile &#40;
	"lv.q	R000, 0x0&#40;%1&#41;\n"
	"lv.q	R001, 0x10&#40;%1&#41;\n"
	"lv.q	R002, 0x20&#40;%1&#41;\n"
	"lv.q	R003, 0x30&#40;%1&#41;\n"

	"sv.q	R000, 0x0&#40;%0&#41;\n"
	"sv.q	R001, 0x10&#40;%0&#41;\n"
	"sv.q	R002, 0x20&#40;%0&#41;\n"
	"sv.q	R003, 0x30&#40;%0&#41;\n"
	&#58; &#58; "r" &#40;m1&#41; , "r" &#40;m0&#41; &#41;;
	&#125;
Psilocybeing
Posts: 3
Joined: Thu May 04, 2006 7:26 am

Post by Psilocybeing »

Very nice, sin/cosine functions are used heavily in one of my projects, these will come in very handy. Thanks :)
chp
Posts: 313
Joined: Wed Jun 23, 2004 7:16 am

Post by chp »

starman2049 wrote:Here is an applyMatrix routine. This is about 4 times faster than in software.
<snip>
You should take a look at vtfm3/4 instead of using vdot, it should execute even faster. Example:

Code: Select all

void myApplyMatrix&#40;FVECTOR v0, FMATRIX m0, FVECTOR v1&#41;
   &#123;
   __asm__ volatile &#40;
   "lv.q   R000, 0x0&#40;%1&#41;\n"
   "lv.q   R001, 0x10&#40;%1&#41;\n"
   "lv.q   R002, 0x20&#40;%1&#41;\n"
   "lv.q   R003, 0x30&#40;%1&#41;\n"

   "lv.q   R100, 0x0&#40;%2&#41;\n"

   "vtfm4.q R200, E000, R100\n"
   "sv.q   R200, 0x0&#40;%0&#41;\n"
   &#58; &#58; "r" &#40;v0&#41; , "r" &#40;m0&#41; ,"r" &#40;v1&#41; &#41;;
   &#125;
I do however think most of the time is spent loading the matrix, so you should perhaps change your code to multiply more than one vertex per call (like using it for an array of vertices). The overhead you then would get when you need to multiply just one is minimal compared to the opposite situation. And don't forget that you can let the GE do all this job if you just intend to render it. :)
GE Dominator
starman2049
Posts: 75
Joined: Mon Sep 19, 2005 5:41 am

Post by starman2049 »

Here's a much faster atan2 routine. This isn't in VFPU format (yet), but I thought I would post for now and update later. This is about 10 times faster than the default atan2 routine. PLEAE NOTE: this is a low order approximation and should only be used when you need precision to a few digits. I use this 50 to 100 times per frame in some levels so it was a big boost for me (8 fps!)

Since the VFPU has an asin() in silicon and there is a known identity between atan and asin this could be done other ways, and could be done to higher order.

I'm personally hoping chp has a fancy matrix approach for this one :)

Code: Select all

float myatan2f&#40;float y, float x&#41;
	&#123;
	float angle;
	float coeff_1 = 3.141592654f/4.0f;
	float coeff_2 = 3.0f*coeff_1;
	float abs_y = fabs&#40;y&#41; + 0.00000001f;      // kludge to prevent 0/0 condition
	float r;

	if &#40;x >= 0.0f&#41;
		&#123;
		r = &#40;x - abs_y&#41; / &#40;x + abs_y&#41;;
		angle = coeff_1 - coeff_1 * r;
		&#125;
	else
		&#123;
		r = &#40;x + abs_y&#41; / &#40;abs_y - x&#41;;
		angle = coeff_2 - coeff_1 * r;
		&#125;

	if &#40;y < 0.0f&#41;
		return&#40;-angle&#41;;     // negate if in quad III or IV
	else
		return&#40;angle&#41;;
	&#125;

Post Reply