Conditionals with VFPU?

Discuss the development of new homebrew software, tools and libraries.

Moderators: cheriff, TyRaNiD

Post Reply
onne
Posts: 24
Joined: Tue Aug 29, 2006 11:54 pm

Conditionals with VFPU?

Post by onne »

Hi...
I've been doing basic VFPU stuff for my renderer... But now I'd need some additional complexity, ie. conditions...

Here's my visibility check function, but I'm not sure if the condition check for the division by zero (The seperated part where I break the assembly and inset c++ in between...) could be done more efficiently or more nicely :)

...of course other optimization ideas are welcome as well.

Code: Select all

	...
	float length(0.0);

	__asm__ volatile (
		//Load Triangles
		"ulv.q   R100, 0x0(%1)\n" //vert1
		"ulv.q   R101, 0x0(%2)\n" //vert2
		"ulv.q   R102, 0x0(%3)\n" //vert3
		"ulv.q   R103, 0x0(%4)\n" //iViewVector

		//make vectors from vertices
		"vsub.t R100, R100, R101\n"
		"vsub.t R101, R101, R102\n"

		//cross to get normal
		"vcrsp.t R200, R100, R101\n"

		//length...:
		"vmul.t R100, R200, R200\n"
		"vadd.s S100, S100, S110\n"
		"vadd.s S100, S100, S120\n"		
//---------------------------------------------------
//Conditional start
//---------------------------------------------------
		"mfv %0, S100\n"

		: "=r" (length) : "r" (aVertex1), "r" (aVertex2), "r" (aVertex3), "r" (&iViewVector)
		);

		//CONDITION to prevent division by zero
		if(0.0 != length)
			{
//---------------------------------------------------
//Conditional end
//---------------------------------------------------
			__asm__ volatile ([/b]
				// 1/sqrt
				"vrsq.s S100, S100\n"

				//normalize
				"vscl.t R200, R200, S100\n"

				//Determine the angle between
				"vdot.t S200, R200, R103\n"

				//Return
				"usv.s   S200, 0x0(%0)\n"

				: :  "r" (aReturnValue)
				);
			}
		else
			{
			*aReturnValue = 0.0;
			}
		...
siberianstar
Posts: 70
Joined: Thu Jun 22, 2006 9:24 pm

Post by siberianstar »

That's going to be slower than not using vfpu. It's better you use vfpu there only to calc rsq.
onne
Posts: 24
Joined: Tue Aug 29, 2006 11:54 pm

Post by onne »

Ok... hmm.

So are you saying there are no comparison operators which with I could check if the content of S100 (in this case) is 0, and branch accordingly?

Yep, if that is the case, then it must be better of doing the part before the "Condition start" without VFPU.

thanks.
swetland
Posts: 5
Joined: Sun Dec 31, 2006 3:06 am
Location: Mountain View, CA

Post by swetland »

MIPS has reference manuals for their architectures available online:
http://www.mips.com/products/resource_l ... ecture.php

You do have to fill out a little form to get a login before you can download them, but it's pretty painless.
These two documents will probably answer most of your MIPS assembly questions:

MIPS32® Architecture for Programmers Volume I: Introduction to the MIPS32® Architecture (.pdf)
MIPS32® Architecture for Programmers Volume II: The MIPS32® Instruction Set (.pdf)

Brian
Tinnus
Posts: 67
Joined: Sat Jul 29, 2006 1:12 am

Post by Tinnus »

Yeah, since you depend solely on a S*** register you can just mfv it to a GPR and do the branching there with regular MIPS assembly. mfv takes 1 cycle IIRC and shouldn't be a speed problem...

edit: although gcc is probably already doing that for you. Check the generated .s file to see what the compiler produced...
Let's see what the PSP reserves... well, I'd say anything is better than Palm OS.
onne
Posts: 24
Joined: Tue Aug 29, 2006 11:54 pm

Post by onne »

Thanks, everyone... yep, I digged up the R4400 uman book... makes it easier. heh.

Although, as you said Tinnus, gcc is already doing a good job on this simple case.
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

onne wrote:Ok... hmm.

So are you saying there are no comparison operators which with I could check if the content of S100 (in this case) is 0, and branch accordingly?

Yep, if that is the case, then it must be better of doing the part before the "Condition start" without VFPU.

thanks.
There ARE VFPU comparison operators. It is just that we (Raphael and I) don't really try them to discover which flags cc are to used for each comparison operator.

comparisons are :

Code: Select all


vcmp.q/t/p/s cn, vt, vs // where cn == EQ/NE/LE/LT/GE/GT
{
  CC[cc] = BOPcn(vt, vs); // we guess it is something like that
}

vcmp.q/t/p/s cc, vs // where cn == EZ/EN/EI/ES/NZ/NI/NS
{
  CC[cc] = UOPcn(vs); // we guess it is something like that
}

vcmp.q/t/p/s cc // where cn == TR/FL
{
  CC[cc] = cn == TR; // we guess it is something like that
}


bvf cc, label <=> exec next insn; if &#40;CC&#91;cc&#93; == 0&#41; goto label;
bvt cc, label <=> exec next insn; if &#40;CC&#91;cc&#93; != 0&#41; goto label;
bvfl cc, label <=> if &#40;CC&#91;cc&#93; == 0&#41; &#123; exec next insn; goto label; &#125;
bvtl cc, label <=> if &#40;CC&#91;cc&#93; != 0&#41; &#123; exec next insn; goto label; &#125;

example &#58;

vcmp.s LT, s000, s001 -> CC&#91;0&#93; = s000 < s001
bvt 0, 0f -> if &#40;CC&#91;0&#93; == 0&#41; s000 += s001;
vnop
vadd.s s000, s001, s000
0f&#58;
...

b&#40;ranch&#41;v&#40;ector&#41;f&#40;alse&#41; 
b&#40;ranch&#41;v&#40;ector&#41;t&#40;rue&#41;
b&#40;ranch&#41;v&#40;ector&#41;f&#40;alse&#41;l&#40;ikely&#41;
b&#40;ranch&#41;v&#40;ector&#41;t&#40;rue&#41;l&#40;ikely&#41;
I must admit that the link between cc and cn (condition name) is unclear for me.

LT, LE seems to use cc = 0
EQ seems to use cc = 5
i let you to test them :

Code: Select all

vzero.s s003
vcmp.s EQ, s000, s003
bvt 5, 0f
vnop
... // your computation when s000 != 0

0&#58;
... // your computation when s000 == 0
onne
Posts: 24
Joined: Tue Aug 29, 2006 11:54 pm

Post by onne »

Hlide, thanks for your information... yep, it is logical to have some comparison operators for vfpu as well.

Oh, and the piece of code you suggested works nicely... thanks again. I guess one can get pretty far with the mentioned EQ, LT and LE anyway. If I have time I can try to examine other cc's...
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

ok, i think i found out how to use vfpu cc :

we have a VFPU control register ($131, VFPU_CC) which contains the last comparison flags which seems to work this way :

VFPU_CC:
bit 0 : comparison on component X is true (works with vcmp.q/t/p/s)
bit 1 : comparison on component Y is true (only works with vcmp.q/t/p)
bit 2 : comparison on component Z is true (only works with vcmp.q/t)
bit 3 : comparison on component W is true (only works with vcmp.q)
bit 4 : bit 0 == 1 or bit 1 == 1 or bit 2 == 1 or bit 3 == 1
bit 5 : bit 0 == 1 and bit 1 == 1 and bit 2 == 1 and bit 3 == 1
bit 6-31 : 0

so my guess (i didn't try it but i'm quite sure about it):

if you want to check if all components of r000 and r001 equal, use cc = 5 :

vcmp.q EQ, r000, r001
...
bvf 5, skip
...

if you want to check if at least one component of r000 and r001 equals, use cc = 4 :

vcmp.q EQ, r000, r001
...
bvf 4, skip
...


if you want to check if component X of r000 and r001 equals regardless the other omponents, use cc = 0 :

vcmp.q EQ, r000, r001
...
bvf 0, skip
...

if you want to check if component Y of r000 and r001 equals regardless the other omponents, use cc = 1 :

vcmp.q EQ, r000, r001
...
bvf 1, skip
...

etc.
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

onne wrote:Hlide, thanks for your information... yep, it is logical to have some comparison operators for vfpu as well.

Oh, and the piece of code you suggested works nicely... thanks again. I guess one can get pretty far with the mentioned EQ, LT and LE anyway. If I have time I can try to examine other cc's...
thx

i think what you need is probably the following one indeed :

Code: Select all

__asm__ volatile &#40;
...
"vcmp.s EZ, s100\n" // s100 != 0.0
"bvtl 0, 0f\n"
"sw $0, %5\n" // *aReturnValue = 0.0 if s100 != 0.0!
... // you computation when s100 != 0.0
"usv.s s200, %5\n"
"0&#58;\n"
... "m"&#40;aReturnValue&#41;&#41;;
or

Code: Select all

__asm__ volatile &#40;
...
"vcmp.s LT, s100&#91;|x|&#93;, s003\n" // |s100| < epsilon
"bvtl 0, 0f\n"
"sw $0, %5\n" // *aReturnValue = 0.0 if |s100| < epsilon !
... // you computation when s100 != 0.0
"usv.s s200, %5\n"
"0&#58;\n"
... "m"&#40;aReturnValue&#41;&#41;;
where s003 must contain a float representing epsilon.
Last edited by hlide on Fri Jan 05, 2007 11:40 pm, edited 2 times in total.
siberianstar
Posts: 70
Joined: Thu Jun 22, 2006 9:24 pm

Post by siberianstar »

this is very useful
onne
Posts: 24
Joined: Tue Aug 29, 2006 11:54 pm

Post by onne »

Wow, nice findings! Yep, it only seemed to work since I had set the other values to zero as well... :) And, yes I did some calc before the conditional jump as well...

Now, today I realized(after debugging the VFPU register values in different phases) that instead of having values going to _zero_ I seem to get values going to _infinity_... dough!

It could be though that I have a failure in my dataset... have to check that one too.



EDIT:
ok... I had some problems with my data, but now it seems to work better. But I keep wondering what do all the possible condition names mean? I tried to test them my self, but some of them seem weird. Any ideas for the ones with question marks?
  • FL - false
    TR - true
    EQ - equals
    NE - not equal
    LT - less than
    LE - less or equal
    GE - greater or equal
    GT - greater than
    EZ - vs == 0
    NZ - vs != 0
    EN - ??? false ???
    NN - ??? true ???
    EI - vs == Inf
    NI - vs != Inf
    ES - same as EI (or...???)
    NS - same as NI (or...???)
EDIT: See hlide's better explanation abount the conditions below...
Last edited by onne on Sat Jan 06, 2007 12:22 am, edited 1 time in total.
onne
Posts: 24
Joined: Tue Aug 29, 2006 11:54 pm

Post by onne »

And here is the working piece of code...
(although I couldn't get the bvtl work as I wanted... ie. I would have liked to replace the "vnop" with "vzero.s S530")

Code: Select all

	__asm__ volatile &#40;
		//Load Triangles
		"ulv.q   R400, 0x0&#40;%1&#41;\n" //vert1
		"ulv.q   R401, 0x0&#40;%2&#41;\n" //vert2
		"ulv.q   R402, 0x0&#40;%3&#41;\n" //vert3
		"ulv.q   R403, 0x0&#40;%4&#41;\n" //iViewVector

		//make vectors from vertices
		"vsub.t R500, R400, R401\n"
		"vsub.t R501, R401, R402\n"

		//cross to get normal
		"vcrsp.t R500, R500, R501\n"

		//calculate 1/length
		"vmul.t R400, R500, R500\n"
		"vadd.s S400, S400, S410\n"
		"vadd.s S400, S400, S420\n"
		"vrsq.s S400, S400\n"

		//Is valid result? &#40;ie. not NaN or Inf&#41;
		"vzero.s S530\n"
		"vcmp.s EI, S400\n"
		"bvt 0, 0f\n"
		"vnop\n"

		//normalize with 1/length
		"vscl.t R500, R500, S400\n"

		//Determine the angle between
		"vdot.t S530, R500, R403\n"

		//Return
		"0&#58;\n"
		"usv.s   S530, 0x00&#40;%0&#41;\n"

		&#58; &#58; "r" &#40;aReturnValue&#41;, "r" &#40;aVertex1&#41;, "r" &#40;aVertex2&#41;, "r" &#40;aVertex3&#41;, "r" &#40;&iViewVector&#41;
		&#41;;
Last edited by onne on Thu Jan 04, 2007 9:33 pm, edited 1 time in total.
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

bvtl doesn't work !?

you can at least replace vnop with vzero.s S530

and use bvt if bvtl doesn't work (computation is not what you expect ?)

Code: Select all

	__asm__ volatile &#40;
		//Load Triangles
		"ulv.q   R400, 0x0&#40;%1&#41;\n" //vert1
		"ulv.q   R401, 0x0&#40;%2&#41;\n" //vert2
		"ulv.q   R402, 0x0&#40;%3&#41;\n" //vert3
		"ulv.q   R403, 0x0&#40;%4&#41;\n" //iViewVector

		//make vectors from vertices
		"vsub.t R500, R400, R401\n"
		"vsub.t R501, R401, R402\n"

		//cross to get normal
		"vcrsp.t R500, R500, R501\n"

		//calculate 1/length
		"vmul.t R400, R500, R500\n"
		"vadd.s S400, S400, S410\n"
		"vadd.s S400, S400, S420\n"
		"vrsq.s S400, S400\n"

		//Is valid result? &#40;ie. not NaN or Inf&#41;
		"vcmp.s EI, S400\n"
		"bvt 0, 0f\n"
		"vzero.s S530\n"

		//normalize with 1/length
		"vscl.t R500, R500, S400\n"

		//Determine the angle between
		"vdot.t S530, R500, R403\n"

		//Return
		"0&#58;\n"
		"usv.s   S530, 0x00&#40;%0&#41;\n"

		&#58; &#58; "r" &#40;aReturnValue&#41;, "r" &#40;aVertex1&#41;, "r" &#40;aVertex2&#41;, "r" &#40;aVertex3&#41;, "r" &#40;&iViewVector&#41;
		&#41;;
onne
Posts: 24
Joined: Tue Aug 29, 2006 11:54 pm

Post by onne »

Well, in my case it seems that there is no instruction executed after either branch...

This is how I tested:

1. I give as input three identical vectors -> S400 will get a NaN value

Code: Select all

v1&#58; &#91;-0.184323, 0.000000, 0.730595&#93;
v2&#58; &#91;-0.184323, 0.000000, 0.730595&#93;
v3&#58; &#91;-0.184323, 0.000000, 0.730595&#93;
2. I use this code, where the expected result of output(S530) is 1.0 if the instruction after the branch would get executed... With both 'bvt' and 'bvtl', I still get 0.0 as result. (also if the comparison is not done at all, it will give an VFPU exception, when scaling with NaN value.)

Code: Select all

		//Is valid result? &#40;ie. not NaN or Inf&#41;
		"vzero.s S530\n"

		"vcmp.s EI, S400\n"
		"bvt 0, 0f\n"
		"vone.s S530\n"

...at least I cannot see anything wrong there, and in fact removing the "vnop" doesen't change my functionality either... It is quite weird though.
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

Code: Select all

vfpu_inf&#58;
.long   0x7f800000
vfpu_nan&#58;
.long   0x7fffffff

.global me_test_vfpu
me_test_vfpu&#58;
        lui             at, %hi&#40;vfpu_inf&#41;
        lv.s            S000, %lo&#40;vfpu_inf&#41;&#40;at&#41; 
        lv.s            S001, %lo&#40;vfpu_nan&#41;&#40;at&#41;
        
        vcmp.s          EI, S000
        mfvc            v0, $131
        vnop
        andi            v0, v0, 1
        sb              v0, 0x00&#40;a0&#41;
        
        vcmp.s          NI, S000
        mfvc            v0, $131
        vnop
        andi            v0, v0, 1
        sb              v0, 0x01&#40;a0&#41;
        
        vcmp.s          EI, S001
        mfvc            v0, $131
        vnop
        andi            v0, v0, 1
        sb              v0, 0x02&#40;a0&#41;
        
        vcmp.s          NI, S001
        mfvc            v0, $131
        vnop
        andi            v0, v0, 1
        sb              v0, 0x03&#40;a0&#41;

        vcmp.s          EN, S000
        mfvc            v0, $131
        vnop
        andi            v0, v0, 1
        sb              v0, 0x04&#40;a0&#41;
        
        vcmp.s          NN, S000
        mfvc            v0, $131
        vnop
        andi            v0, v0, 1
        sb              v0, 0x05&#40;a0&#41;
        
        vcmp.s          EN, S001
        mfvc            v0, $131
        vnop
        andi            v0, v0, 1
        sb              v0, 0x06&#40;a0&#41;
        
        vcmp.s          NN, S001
        mfvc            v0, $131
        vnop
        andi            v0, v0, 1
        sb              v0, 0x07&#40;a0&#41;

        vcmp.s          ES, S000
        mfvc            v0, $131
        vnop
        andi            v0, v0, 1
        sb              v0, 0x08&#40;a0&#41;
        
        vcmp.s          NS, S000
        mfvc            v0, $131
        vnop
        andi            v0, v0, 1
        sb              v0, 0x09&#40;a0&#41;

        vcmp.s          ES, S001
        mfvc            v0, $131
        vnop
        andi            v0, v0, 1
        sb              v0, 0x0a&#40;a0&#41;

        vcmp.s          NS, S001
        mfvc            v0, $131
        vnop
        andi            v0, v0, 1
        sb              v0, 0x0b&#40;a0&#41;
        
        jr              ra
        nop
results :

Code: Select all

                   Inf Inf Nan Nan
	EI NI EI NI ->  01 01   00   00
	EN NN EN NN ->  00 01   01   01
	ES NS ES NS ->  00 01   00   01
if someone can understand the meaning of EI, NI, EN, NN, ES and NS...

Note : wakeup Raphael !!!

EDIT:

ES and NS may be for plus/minus sign (bit S)
onne
Posts: 24
Joined: Tue Aug 29, 2006 11:54 pm

Post by onne »

That test result seems weird... I am atleast capable of testing if a value is NaN or not with EI and NI.

ie. the following code returns me 1, and if I comment out the vrsq, I will get 0, and the opposite happens with NI

Code: Select all

void CondTest&#40; float* aTestResult &#41;
	&#123;
	__asm__ volatile &#40;	
		"vzero.s S400\n"
		"vrsq.s S400, S400\n" // 1/sqrt&#40;0&#41; = NaN

		"vone.s S500\n"
		"vcmp.s EI, S400\n"
		"vnop\n"
		"bvtl 0, 0f\n"
		"vnop\n"
		"vzero.s S500\n"

		"0&#58;\n"
		"usv.s   S500, 0x0&#40;%0&#41;\n"
		&#58; &#58; "r" &#40;aTestResult&#41; 
		&#41;;
	&#125;
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

by the way, 1/sqrt(0) shouldn't be +inf and not NaN ?

sqrt(0) = 0, no ? so 1/sqrt(0) -> 1/0 -> +inf
onne
Posts: 24
Joined: Tue Aug 29, 2006 11:54 pm

Post by onne »

That's what I thought as well, but at least when I do printf("result: %f", conditional );, I get NaN... And also I do get Inf's as well, when such occurs. :) Seems like some of the basic stuff doesn't hold with vfpu maths. (for example as I said above about the conditions, NaN==NaN... even though it shouldn't )
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

IEEE Standard 754 states :
Storage Layout

IEEE floating point numbers have three basic components: the sign, the exponent, and the mantissa. The mantissa is composed of the fraction and an implicit leading digit (explained below). The exponent base (2) is implicit and need not be stored.

Code: Select all

+--------+-----------+------------+------+
|  Sign  | Exponent  |  Fraction  | Bias |
+--------+-----------+------------+------+
| 1 &#91;31&#93; | 8 &#91;30-23&#93; | 23 &#91;22-00&#93; | 127  |
+--------+-----------+------------+------+
Infinity

The values +infinity and -infinity are denoted with an exponent of all 1s and a fraction of all 0s. The sign bit distinguishes between negative infinity and positive infinity. Being able to denote infinity as a specific value is useful because it allows operations to continue past overflow situations. Operations with infinite values are well defined in IEEE floating point.

Not A Number

The value NaN (Not a Number) is used to represent a value that does not represent a real number. NaN's are represented by a bit pattern with an exponent of all 1s and a non-zero fraction. There are two categories of NaN: QNaN (Quiet NaN) and SNaN (Signalling NaN).

A QNaN is a NaN with the most significant fraction bit set. QNaN's propagate freely through most arithmetic operations. These values pop out of an operation when the result is not mathematically defined.

An SNaN is a NaN with the most significant fraction bit clear. It is used to signal an exception when used in operations. SNaN's can be handy to assign to uninitialized variables to trap premature usage.

Semantically, QNaN's denote indeterminate operations, while SNaN's denote invalid operations.

Special Operations

Operations on special numbers are well-defined by IEEE. In the simplest case, any operation with a NaN yields a NaN result. Other operations are as follows:

Code: Select all

+---------------------+---------+
|Operation            | Result  |
+---------------------+---------+
|        n ÷ ±Infinity|    0    |
|±Infinity × ±Infinity|±Infinity|
| ±nonzero ÷ 0        |±Infinity|
| Infinity + Infinity | Infinity| 
|       ±0 ÷ ±0       |   NaN   |
| Infinity - Infinity |   NaN   |
|±Infinity ÷ ±Infinity|   NaN   |
|±Infinity × 0        |   NaN   |
+---------------------+---------+
Summary

To sum up, the following are the corresponding values for a given representation:

Code: Select all

                       Float Values &#40;b = bias&#41;
+------+-----------------+---------------+----------------------------+
| Sign |    Exponent &#40;e&#41; |  Fraction &#40;f&#41; |           Value            |
+------+-----------------+---------------+----------------------------+
|  0   |      00..00     |    00..00     |            +0              |
+------+-----------------+---------------+----------------------------+
|  0   |      00..00     |    00..01     | Positive Denormalized Real |
|      |                 |      &#58;        |                            |
|      |                 |    11..11     |       0.f × 2&#40;-b+1&#41;        |
+------+-----------------+---------------+----------------------------+
|  0   |      00..01     |               |  Positive Normalized Real  |
|      |        &#58;        |    XX..XX     |                            |
|      |      11..10     |               |        1.f × 2&#40;e-b&#41;        |
+------+-----------------+---------------+----------------------------+
|  0   |      11..11     |    00..00     |         +Infinity          |
+------+-----------------+---------------+----------------------------+
|  1   |      00..00     |    00..00     |            -0              |
+------+-----------------+---------------+----------------------------+
|  1   |      00..00     |    00..01     | Negative Denormalized Real |
|      |                 |      &#58;        |                            |
|      |                 |    11..11     |      -0.f × 2&#40;-b+1&#41;        |
+------+-----------------+---------------+----------------------------+
|  1   |      00..01     |               |  Negative Normalized Real  |
|      |        &#58;        |    XX..XX     |                            |
|      |      11..10     |               |        1.f × 2&#40;e-b&#41;        |
+------+-----------------+---------------+----------------------------+
|  1   |      11..11     |    00..00     |         -Infinity          |
+------+-----------------+---------------+----------------------------+
|  X   |      11..11     |    00..01     |                            |
|      |                 |      &#58;        |            SNaN            |
|      |                 |    01..11     |                            |
+------+-----------------+---------------+----------------------------+
|  X   |      11..11     |    10..00     |                            |
|      |                 |      &#58;        |            QNaN            |
|      |                 |    11..11     |                            |
+------+-----------------+---------------+----------------------------+
by the way,

vzero.s S000; vrcp.s S000, S000 -> S000 = 1/0
vzero.s S001; vrsq.s S001, S001 -> S001 = 1/sqrt(0)

both give me 0x7F800000 which is +Infinity according to ieee754.

EDIT: it seems my previous code seems to be wrong for an unknown reason (?). A new code using "bvtl" instead of "mfvc" seems to work better... to see.
Last edited by hlide on Fri Jan 05, 2007 9:07 pm, edited 1 time in total.
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

okay this new code gives a more reliable results :

Code: Select all

me_vfpu_qnan&#58;
.long   0x7fffffff # QNaN
.global me_test_vfpu
me_test_vfpu&#58;
        lui              at, %hi&#40;me_vfpu_qnan&#41;
        
        vzero.s         S000
        lv.s            S001, %lo&#40;me_vfpu_qnan&#41;&#40;at&#41;

        sv.s            S000, 0x00&#40;a0&#41;
        
        vrcp.s          S000,S000
        vrsq.s          S001,S001
        
        sv.s            S000, 0x10&#40;a0&#41;
        sv.s            S001, 0x14&#40;a0&#41;
        
        li              v0, 1
        
        vcmp.s          EI, S000
        bvtl            0, 0f
        sb              v0, 0x00&#40;a0&#41;
        
0&#58;      vcmp.s          NI, S000
        bvtl            0, 0f
        sb              v0, 0x01&#40;a0&#41;
        
0&#58;      vcmp.s          EI, S001
        bvtl            0, 0f
        sb              v0, 0x02&#40;a0&#41;
        
0&#58;      vcmp.s          NI, S001
        bvtl            0, 0f
        sb              v0, 0x03&#40;a0&#41;

0&#58;      vcmp.s          EN, S000
        bvtl            0, 0f
        sb              v0, 0x04&#40;a0&#41;
        
0&#58;      vcmp.s          NN, S000
        bvtl            0, 0f
        sb              v0, 0x05&#40;a0&#41;
        
0&#58;      vcmp.s          EN, S001
        bvtl            0, 0f
        sb              v0, 0x06&#40;a0&#41;
        
0&#58;      vcmp.s          NN, S001
        bvtl            0, 0f
        sb              v0, 0x07&#40;a0&#41;

0&#58;      vcmp.s          ES, S000
        bvtl            0, 0f
        sb              v0, 0x08&#40;a0&#41;
        
0&#58;      vcmp.s          NS, S000
        bvtl            0, 0f
        sb              v0, 0x09&#40;a0&#41;

0&#58;      vcmp.s          ES, S001
        bvtl            0, 0f
        sb              v0, 0x0a&#40;a0&#41;

0&#58;      vcmp.s          NS, S001
        bvtl            0, 0f
        sb              v0, 0x0b&#40;a0&#41;
        
0&#58;      jr              ra
        nop
the result is following :

Code: Select all

00000000 01 00 00 01 -> S000 is an inf, S001 is not an inf
00000004 00 01 01 00 -> S000 is not a nan, S001 is a nan 
00000008 01 00 01 00 -> S000 is special, S001 is special

00000010 7F800000 -> S000 is +inf &#40;result of 1/0&#41;
00000014 7F800001 -> S001 was initialy stored with 7FFFFFFF which has been transformed to 7F800001 apparently by vrsq.s. Does vfpu handle both QNaN and SNaN. If not, what is the default behaviour of a VFPU NaN &#40;Quiet or Signalling&#41; ?
  
well I'm tremendously relieved now about the results.

I guess it is clear now about the meaning of Ex/Nx :

EZ : is zero ?
EI : is Inf ?
EN : is NaN ?
ES : is Special (Inf + Nan) ? (well i guess it must be something like this)

NZ : isn't zero ?
NI : isn't Inf ?
NN : isn't NaN ?
NS : isn't Special (Inf + Nan) ? (well i guess it must be something like this)
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

VFPU seems to treat SNaN as QNaN, since 0x7f800001 is normally a SNaN (that is a exception occurs if we try to use it in an operation). Tested in :

Code: Select all

lui    at, %&#40;vfpu_qnan&#41;
lv.s   S001, %&#40;vfpu_qnan&#41;&#40;at&#41; # S001 = QNaN
vrsq.s S001, S001 # S001 = 1/sqrt&#40;QNaN&#41; = SNaN
vadd.s S001, S001, S001 # S001 = SNaN + SNaN = SNaN
No exception occurs, so SNaN are treated as QNaN.
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

Another test :

Code: Select all


S000 = +Inf;
S001 = NaN;

# +Inf = +Inf ?
vcmp.s EQ, S000, S000 -> true

# +Inf <> +Inf ?
vcmp.s NE, S000, S000 -> false

# NaN = NaN ?
vcmp.s EQ, S001, S001 -> false

# NaN <> NaN ?
vcmp.s NE, S001, S001 -> true

# +Inf = NaN ?
vcmp.s EQ, S000, S001 -> false

# +Inf <> NaN ?
vcmp.s NE, S000, S001 -> true

onne : you shouldn't rely on printf for the right value, but on the hexadecimal value of the float.
onne
Posts: 24
Joined: Tue Aug 29, 2006 11:54 pm

Post by onne »

ok... thanks... Then I must have been getting the Inf when I thought of getting NaN... Weird with the printf though.
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

printf : %d is fetching a double float value instead of a single float value so you may expect bad display.

and yes printf is wrong :

0x7ff0000000000000LL -> +Inf (double) -> displays NaN !?
0x7ff0000000000001LL -> NaN (double) -> displays 0.0...0 !?
0x3ff0000000000001LL -> 1.0 (double) -> displays 1.0...0 (correct)
jimparis
Posts: 1145
Joined: Fri Jun 10, 2005 4:21 am
Location: Boston

Post by jimparis »

I don't have a PSP handy, but that could be a newlib bug; you could check what fpclassify(), isnan(), isinf() return.
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

jimparis wrote:I don't have a PSP handy, but that could be a newlib bug; you could check what fpclassify(), isnan(), isinf() return.
i got it :

Code: Select all

hli@HLIWORLD /d/game console/psp/meccode$ make kxploit psp-gcc -I. -I/d/devkitPro/devkitPSP/psp/sdk/include -Os -G0 -Wall -fsingle-precision-constant -funswitch-loops fbranch-target-load-optimize2 -msingle-float -mhard-float  -L. -L/d/devkitPro/devkitPSP/psp/sdk/lib  main.o mestub.o  -lpspdebug -lpspdisplay -lpspge -lpspctrl -lpspsdk -lc -lpspnet -lpspnet_inet -lpspnet_apctl -lpspnet_resolver -lpsputility -lpspuser lpspkernel -o meccode.elf
main.o&#58; In function `main'&#58;
main.c&#58;&#40;.text+0x708&#41;&#58; undefined reference to `__fpclassifyd'
collect2&#58; ld returned 1 exit status
make&#58; *** &#91;meccode.elf&#93; Error 1
and yes i'm using devkitpro

EDIT: i dunno if pspdevkit uses newlib.

well, i guess i should try to install cygwin... some megabytes to add...
Post Reply