Questions about mips asm/inline asm and other stuff
Posted: Sat Oct 27, 2007 8:30 am
Ok, I know... this is not truly ps2 development related (more mips gcc specific), but right now I'm just not sure whether those "problems" are not connected to ps2 more than I think they are.
Anyway, first question. Let's consider some simple code, that fetches word from memory address which is not always aligned (or even will be mostly unaligned).
The problem is, that with no optimizations (compiled with "ee-gcc -D_EE -G0 -g") and added debugging info this code gives me tlb load exceptions at the lwr (bad address). Checking the assembly, made the reason obvious. This code has been compiled into (simplifying):
Fine, so as I wrote, it's obvious why it leads to exception (register modified and then used again as base, which shouldn't happen unless specified). The funny part is that it doesn't happen when compiled with "-O2" or "-O3" (separate registers are used for base and source). Am I doing something wrong? Is this normal behavior? The same happens with other instructions (like lw/lh/lb/etc.).
I've came up with something like this, to overcome this problem when debugging.
But this one is less efficient (even with "-O3" it uses some additional daddu, and always stores the result on stack, then loads it again, where in previous version it was preserving the value in register across all the function.
Well, I think I can have two versions of this simple code (one for debug, one for optimized compilation)... but it doesn't seem like a good idea.
Now, I have to say I'm not very good at mips assembly (I have some experience with PSX), nor gcc specific stuff (mostly programming for w32, I'm using MSVS most of the time, with Visual c# recently more than anything else).
Second one: what is the cost of lwl/lwr swl/swr? Sure, I can do my own tests, but maybe someone already did (didn't find specific data). What I mean is, maybe it's just more efficient to use other methods. In this sample, the destination/source is not always word aligned but always halfword aligned, so it's quite easy to load/store it in the other way, like:
The third question is:
Do you know of an efficient way to convert 16bit (5:5:5:1) textures into 32bit (8:8:8:8) and vice versa? I've been able to come up with something like this (for 16b->32b, for the other just revers order and pack instead of ext):
It converts four pixels at time, so it's a little bit more efficient than what I've been doing to this time, and can be even better with unrolling up to four (16pixels at time, more gives no increase)... but is not what I would expect.
Thank you in advance.
Anyway, first question. Let's consider some simple code, that fetches word from memory address which is not always aligned (or even will be mostly unaligned).
Code: Select all
void fetch_and_do_something(u32 * srcdst)
{
u32 tmpDest;
__asm__ (
"lwl %0, 3(%1)\n\t"
"lwr %0, 0(%1)\n\t"
:"=r"(tmpDest)
:"r"(srcdst)
);
//do something here with the tmpDst
//...
//store tmpDst to the same location using similar syntax (swl/swr)
}
Code: Select all
#standard sp/fp stuff
sd $a0, srcdst_stack($sp)
lw $v0, srcdst_stack($sp)
lwl $v0, 3($v0)
lwr $v0, 0($v0)
I've came up with something like this, to overcome this problem when debugging.
Code: Select all
void fetch_and_do_something(u32 * srcdst)
{
u32 tmpDest;
__asm__ (
"lwl $t9, 3(%1)\n\t"
"lwr $t9, 0(%1)\n\t"
"sw $t9, %0\n\t"
:"=m"(tmpDest)
:"r"(srcdst)
:"t9"
);
//do something here with the tmpDst
//...
//store tmpDst to the same location using similar syntax (swl/swr)
}
Well, I think I can have two versions of this simple code (one for debug, one for optimized compilation)... but it doesn't seem like a good idea.
Now, I have to say I'm not very good at mips assembly (I have some experience with PSX), nor gcc specific stuff (mostly programming for w32, I'm using MSVS most of the time, with Visual c# recently more than anything else).
Second one: what is the cost of lwl/lwr swl/swr? Sure, I can do my own tests, but maybe someone already did (didn't find specific data). What I mean is, maybe it's just more efficient to use other methods. In this sample, the destination/source is not always word aligned but always halfword aligned, so it's quite easy to load/store it in the other way, like:
Code: Select all
tmpDest = ((*(u16*)srcdst)&0xffff)|((*(((u16*)srcdst)+1))<<16);
Do you know of an efficient way to convert 16bit (5:5:5:1) textures into 32bit (8:8:8:8) and vice versa? I've been able to come up with something like this (for 16b->32b, for the other just revers order and pack instead of ext):
Code: Select all
u64 sec;
u128 tempColor;
u16 texture16bit[size]; //64b aligned
u32 texture32bit[size]; //128b aligned
for(pixel=0;pixel<size;pixel+=4)
{
sec=*(u64*)&texture16bit[pixel];
__asm__(
"pexcw %1, %1\n\t"
"pexch %1, %1\n\t"
"pext5 %0, %1\n\t"
:"=r"(tempColor):"r"(sec)
);
*(u128*)&texture32bit[pixel]=tempColor;
}
Thank you in advance.