Inline VFPU help

starman2049 · Post by **starman2049** » Sun Apr 23, 2006 3:50 pm

I am trying to replace the sqrt() function using the codegen.h approach and have written the following but it has a bug. I am a new at inline assembler and can't get the following to work (it just returns the value I pass it)

Code: Select all

float mysqrt&#40;float val&#41;
&#123;
   register float a=val;
   register float b;

   __asm__ volatile &#40;
      cgen_asm&#40;vsqrt_s&#40;R_a0, R_a1&#41;&#41;
      &#58;"=r"&#40;b&#41;
      &#58;"r"&#40;a&#41;, "r"&#40;b&#41;
   &#41;;
   return &#40;b&#41;;
&#125;

Can anyone lead me out of the dark on how to do this correctly?

popcornx · Post by **popcornx** » Mon Apr 24, 2006 4:58 pm

hmm I'm a MIPS n00b too but I'll still try to help

Code: Select all

float mysqrt&#40;float val&#41;
&#123;
   int b=0;

   register float par1 asm&#40;$16&#41;=val; //parameter 1
   register float par2 asm&#40;$17&#41;=2; //parameter 2

   asm &#40;
      &#58;
      &#58;"r"&#40;par1&#41;,"r"&#40;par2&#41; // insert val into reg $16 & 2 into $17
   &#41;;

  asm&#40;"move $a0,$16"&#41;;//move the first paratermeter
 asm&#40;"move $a1,$17"&#41;;//move the first paratermeter
  asm&#40;"jal vsqrt"&#41;;
  asm&#40;"sw $ra,&#40;$18"&#41;;

  register float retv asm&#40;$18&#41;; //return value
     asm &#40;
      &#58;"=r"&#40;retv&#41;
      &#58;
   &#41;;
   return &#40;retv&#41;;
&#125;

I'm a n00b so I dunno if this works or not...I hope it does!!!
if not you can ask skylark or fanjita for help. I think they will respond.

chp · Post by **chp** » Mon Apr 24, 2006 6:05 pm

Actually, since you're trying to use the VFPU, I wonder if you can even use the general registers for this... Something like this should work: (note that VFPU is supported in the current binutils, so no need for codegen)

Code: Select all

float mysqrt&#40;float val&#41;
&#123;
        float ret;

        __asm__ volatile &#40;
                "mtv %1, S000\n"
                "vsqrt.s S001, S000\n"
                "mfv %0, S001\n"
        &#58; "=r"&#40;ret&#41; &#58; "r"&#40;val&#41; &#41;;

        return ret;
&#125;

This should work (I looked at the output and it looked ok), but I don't have access to hardware to test it with right now.

starman2049 · Post by **starman2049** » Thu Apr 27, 2006 8:22 am

Can't get anywhere with this. I'm on september '05 toolchain and sdk so the VFPU stuff is not builtin so I had to stick with codegen.h approach. Unfortunatly there is no mtv/mvf and I spent some time trying to add those opcodes to little avail.

Then I decided to update my toolchain and grabbed the latest "famous" version from oopo.net haggled through that for a bit to find that it looks like the VFPU stuff is only in the version in SVN as of yet.

I've never been able to checkout stuff from svn.ps2dev.net - it keeps saying that the hostname is invalid. I tried punching TCP/3690 open on my router but can't get to it.

I tried the beta toolchain from oopo.net, but it failed in the install so I went back to the latest "famous" version.

Is there a userid/pw that is needed for svn.pspdev.org?

Post by **Oobles** » Thu Apr 27, 2006 9:13 am

No userid or password is required for svn.ps2dev.org. Please make sure you have the correct URL. In your last message you refer to it as svn.ps2ev.net and svn.pspdev.org which are both wrong.

If you have a restrictive firewall you might not be able to access the server. Let me know if you have any more problems. I believe someone did create a mirror of subversion that can be accessed through HTTP. You should be able to find it on forums via search.

David. aka Oobles.

dot_blank · Post by **dot_blank** » Thu Apr 27, 2006 11:34 am

tried and below works:

user@host$ svn co svn://ps2dev.org/psp/trunk/pspsdk

then future updating is:

user@host$ cd pspsdk
user@host$ svn up

starman2049 · Post by **starman2049** » Thu Apr 27, 2006 7:12 pm

Well I still can't get into svn, but I was able to get my sqrt(), sin(), cos(), etc over to VFPU and got a very nice speed-up so thank you to everyone who helped with this!!

starman2049 · Post by **starman2049** » Fri Apr 28, 2006 11:58 am

These are probably done in newlib or elsewhere already, but if you have the latest toolchain you should be able to implement your own math functions that are MUCH faster. Note you have to add "|" PSP_THREAD_ATTR_VFPU into your PSP_MAIN_THREAD_ATTR() def in main.c.

Code: Select all

float mysqrtf&#40;float val&#41;
&#123;
	float ret;

	__asm__ volatile &#40;
		"mtv %1, S000\n"
		"vsqrt.s S001, S000\n"
		"mfv %0, S001\n"
		&#58; "=r"&#40;ret&#41; &#58; "r"&#40;val&#41;&#41;;

	return ret;
&#125;

Code: Select all

float mysinf&#40;float val&#41;
	&#123;
	float ret; 

	val *= 0.6366197f; // convert to deg/90

	__asm__ volatile &#40;
		"mtv %1, S000\n"
		"vsin.s S001, S000\n"
		"mfv %0, S001\n"
		&#58; "=r"&#40;ret&#41; &#58; "r"&#40;val&#41;&#41;;

	return ret;
	&#125;

Code: Select all

float mycosf&#40;float val&#41;
	&#123;
	float ret; 

	val *= 0.6366197f; // convert to deg/90

	__asm__ volatile &#40;
		"mtv %1, S000\n"
		"vcos.s S001, S000\n"
		"mfv %0, S001\n"
		&#58; "=r"&#40;ret&#41; &#58; "r"&#40;val&#41;&#41;;

	return ret;
	&#125;

dot_blank · Post by **dot_blank** » Fri Apr 28, 2006 3:02 pm

well these seem to work quiet well :)
i think the toolchain will do good with maybe a
vfpu math library ....something in the sense of

Code: Select all

#include <vmath.h>

and then one would simply use them as such

Code: Select all

vcos&#40;2*c_wave&#41;;
vinf&#40;value&#41;;
//etc...

what do the mods think?

chp · Post by **chp** » Fri Apr 28, 2006 7:21 pm

Just as a little sidenote, the sin/cos can be rewritten to

Code: Select all

float mysinf&#40;float val&#41;
   &#123;
   float ret;

   __asm__ volatile &#40;
      "mtv %1, S000\n"
      "vcst.s S002, VFPU_2_PI\n"
      "vmul.s S001, S000, S002\n"
      "vsin.s S000, S001\n" // or vcos.s
      "mfv %0, S000\n"
      &#58; "=r"&#40;ret&#41; &#58; "r"&#40;val&#41;&#41;;

   return ret;
   &#125;

and if you use something like vrot.q instead of vsin, you can get sinus and cosinus computed at the same time. Something like this should do it:

Code: Select all

void vsincosf&#40;float angle, ScePspFVector4* result&#41;
&#123;
 __asm__ volatile &#40;
    "mtv %1, S000\n"
    "vcst.s S001, VFPU_2_PI\n"
    "vmul.s S002, S000, S001\n"
    "vrot.q C010, S002, &#91;s, c, 0, 0&#93;\n"
    "usv.q C010, 0 + %0\n"
    &#58; "+m"&#40;*result&#41; &#58; "r"&#40;angle&#41;&#41;;
&#125;

starman2049 · Post by **starman2049** » Sat Apr 29, 2006 3:29 am

Code: Select all

float myacosf&#40;float val&#41;
	&#123;
	float ret; 

	__asm__ volatile &#40;
		"mtv %1, S000\n"
		"vasin.s S001, S000\n"
		"vone.s S002\n" 
		"vsub.s S000, S002, S001\n"
		"vcst.s S002, VFPU_PI_2\n" 
		"vmul.s S001, S000, S002\n" 
		"mfv %0, S001\n"
		&#58; "=r"&#40;ret&#41; &#58; "r"&#40;val&#41;&#41;;

	return ret;
	&#125;

float myasinf&#40;float val&#41;
	&#123;
	float ret; 

	__asm__ volatile &#40;
		"mtv %1, S000\n"
		"vasin.s S001, S000\n"
		"vcst.s S002, VFPU_PI_2\n" 
		"vmul.s S000, S001, S002\n" 
		"mfv %0, S000\n"
		&#58; "=r"&#40;ret&#41; &#58; "r"&#40;val&#41;&#41;;

	return ret;
	&#125;

starman2049 · Post by **starman2049** » Thu May 04, 2006 9:03 am

Here is an applyMatrix routine. This is about 4 times faster than in software.

Code: Select all

// NOTE&#58; v0, m0, v1 must be 16 byte aligned!!
// NOTE&#58; this is row-major matrix format
void myApplyMatrix&#40;FVECTOR v0, FMATRIX m0, FVECTOR v1&#41;
	&#123;
	__asm__ volatile &#40;
	"lv.q	R000, 0x0&#40;%1&#41;\n"
	"lv.q	R001, 0x10&#40;%1&#41;\n"
	"lv.q	R002, 0x20&#40;%1&#41;\n"
	"lv.q	R003, 0x30&#40;%1&#41;\n"

	"lv.q	R100, 0x0&#40;%2&#41;\n"

	"vdot.q	S200, R000, R100\n"
	"vdot.q	S210, R001, R100\n"
	"vdot.q	S220, R002, R100\n"
	"vdot.q	S230, R003, R100\n"
	"sv.q	R200, 0x0&#40;%0&#41;\n"
	&#58; &#58; "r" &#40;v0&#41; , "r" &#40;m0&#41; ,"r" &#40;v1&#41; &#41;;
	&#125;

starman2049 · Post by **starman2049** » Thu May 04, 2006 9:44 am

Here is a Matrix Multiply routine. This is about 10 times faster than in software:

Code: Select all

// NOTE&#58; m0, m1, m2 must be 16 byte aligned!!
// NOTE&#58; this is row-major matrix format
void myMulMatrix&#40;FMATRIX m2, FMATRIX m0, FMATRIX m1&#41;
	&#123;
	__asm__ volatile &#40;
	"lv.q	R000, 0x0&#40;%1&#41;\n"
	"lv.q	R001, 0x10&#40;%1&#41;\n"
	"lv.q	R002, 0x20&#40;%1&#41;\n"
	"lv.q	R003, 0x30&#40;%1&#41;\n"

	"lv.q	R100, 0x0&#40;%2&#41;\n"
	"lv.q	R101, 0x10&#40;%2&#41;\n"
	"lv.q	R102, 0x20&#40;%2&#41;\n"
	"lv.q	R103, 0x30&#40;%2&#41;\n"

	"vmmul.q	M200, M000, M100\n"

	"sv.q	R200, 0x0&#40;%0&#41;\n"
	"sv.q	R201, 0x10&#40;%0&#41;\n"
	"sv.q	R202, 0x20&#40;%0&#41;\n"
	"sv.q	R203, 0x30&#40;%0&#41;\n"
	&#58; &#58; "r" &#40;m2&#41; , "r" &#40;m0&#41; ,"r" &#40;m1&#41; &#41;;
	&#125;

starman2049 · Post by **starman2049** » Thu May 04, 2006 10:39 am

Here is a matrix copy routine. This is about 6 times faster than in software:

Code: Select all

// NOTE&#58; m0, m1 must be 16 byte aligned!!
// NOTE&#58; this is row-major matrix format
void myCopyMatrix&#40;FMATRIX m1, FMATRIX m0&#41;
	&#123;
	__asm__ volatile &#40;
	"lv.q	R000, 0x0&#40;%1&#41;\n"
	"lv.q	R001, 0x10&#40;%1&#41;\n"
	"lv.q	R002, 0x20&#40;%1&#41;\n"
	"lv.q	R003, 0x30&#40;%1&#41;\n"

	"sv.q	R000, 0x0&#40;%0&#41;\n"
	"sv.q	R001, 0x10&#40;%0&#41;\n"
	"sv.q	R002, 0x20&#40;%0&#41;\n"
	"sv.q	R003, 0x30&#40;%0&#41;\n"
	&#58; &#58; "r" &#40;m1&#41; , "r" &#40;m0&#41; &#41;;
	&#125;

Psilocybeing · Post by **Psilocybeing** » Thu May 04, 2006 11:27 am

Very nice, sin/cosine functions are used heavily in one of my projects, these will come in very handy. Thanks :)

chp · Post by **chp** » Thu May 04, 2006 9:47 pm

starman2049 wrote:Here is an applyMatrix routine. This is about 4 times faster than in software.
<snip>

You should take a look at vtfm3/4 instead of using vdot, it should execute even faster. Example:

Code: Select all

void myApplyMatrix&#40;FVECTOR v0, FMATRIX m0, FVECTOR v1&#41;
   &#123;
   __asm__ volatile &#40;
   "lv.q   R000, 0x0&#40;%1&#41;\n"
   "lv.q   R001, 0x10&#40;%1&#41;\n"
   "lv.q   R002, 0x20&#40;%1&#41;\n"
   "lv.q   R003, 0x30&#40;%1&#41;\n"

   "lv.q   R100, 0x0&#40;%2&#41;\n"

   "vtfm4.q R200, E000, R100\n"
   "sv.q   R200, 0x0&#40;%0&#41;\n"
   &#58; &#58; "r" &#40;v0&#41; , "r" &#40;m0&#41; ,"r" &#40;v1&#41; &#41;;
   &#125;

I do however think most of the time is spent loading the matrix, so you should perhaps change your code to multiply more than one vertex per call (like using it for an array of vertices). The overhead you then would get when you need to multiply just one is minimal compared to the opposite situation. And don't forget that you can let the GE do all this job if you just intend to render it. :)

starman2049 · Post by **starman2049** » Thu May 18, 2006 2:05 pm

Here's a much faster atan2 routine. This isn't in VFPU format (yet), but I thought I would post for now and update later. This is about 10 times faster than the default atan2 routine. PLEAE NOTE: this is a low order approximation and should only be used when you need precision to a few digits. I use this 50 to 100 times per frame in some levels so it was a big boost for me (8 fps!)

Since the VFPU has an asin() in silicon and there is a known identity between atan and asin this could be done other ways, and could be done to higher order.

I'm personally hoping chp has a fancy matrix approach for this one :)

Code: Select all

float myatan2f&#40;float y, float x&#41;
	&#123;
	float angle;
	float coeff_1 = 3.141592654f/4.0f;
	float coeff_2 = 3.0f*coeff_1;
	float abs_y = fabs&#40;y&#41; + 0.00000001f;      // kludge to prevent 0/0 condition
	float r;

	if &#40;x >= 0.0f&#41;
		&#123;
		r = &#40;x - abs_y&#41; / &#40;x + abs_y&#41;;
		angle = coeff_1 - coeff_1 * r;
		&#125;
	else
		&#123;
		r = &#40;x + abs_y&#41; / &#40;abs_y - x&#41;;
		angle = coeff_2 - coeff_1 * r;
		&#125;

	if &#40;y < 0.0f&#41;
		return&#40;-angle&#41;;     // negate if in quad III or IV
	else
		return&#40;angle&#41;;
	&#125;