Anyway, first question. Let's consider some simple code, that fetches word from memory address which is not always aligned (or even will be mostly unaligned).
Code: Select all
void fetch_and_do_something(u32 * srcdst)
{
u32 tmpDest;
__asm__ (
"lwl %0, 3(%1)\n\t"
"lwr %0, 0(%1)\n\t"
:"=r"(tmpDest)
:"r"(srcdst)
);
//do something here with the tmpDst
//...
//store tmpDst to the same location using similar syntax (swl/swr)
}
Code: Select all
#standard sp/fp stuff
sd $a0, srcdst_stack($sp)
lw $v0, srcdst_stack($sp)
lwl $v0, 3($v0)
lwr $v0, 0($v0)
I've came up with something like this, to overcome this problem when debugging.
Code: Select all
void fetch_and_do_something(u32 * srcdst)
{
u32 tmpDest;
__asm__ (
"lwl $t9, 3(%1)\n\t"
"lwr $t9, 0(%1)\n\t"
"sw $t9, %0\n\t"
:"=m"(tmpDest)
:"r"(srcdst)
:"t9"
);
//do something here with the tmpDst
//...
//store tmpDst to the same location using similar syntax (swl/swr)
}
Well, I think I can have two versions of this simple code (one for debug, one for optimized compilation)... but it doesn't seem like a good idea.
Now, I have to say I'm not very good at mips assembly (I have some experience with PSX), nor gcc specific stuff (mostly programming for w32, I'm using MSVS most of the time, with Visual c# recently more than anything else).
Second one: what is the cost of lwl/lwr swl/swr? Sure, I can do my own tests, but maybe someone already did (didn't find specific data). What I mean is, maybe it's just more efficient to use other methods. In this sample, the destination/source is not always word aligned but always halfword aligned, so it's quite easy to load/store it in the other way, like:
Code: Select all
tmpDest = ((*(u16*)srcdst)&0xffff)|((*(((u16*)srcdst)+1))<<16);
Do you know of an efficient way to convert 16bit (5:5:5:1) textures into 32bit (8:8:8:8) and vice versa? I've been able to come up with something like this (for 16b->32b, for the other just revers order and pack instead of ext):
Code: Select all
u64 sec;
u128 tempColor;
u16 texture16bit[size]; //64b aligned
u32 texture32bit[size]; //128b aligned
for(pixel=0;pixel<size;pixel+=4)
{
sec=*(u64*)&texture16bit[pixel];
__asm__(
"pexcw %1, %1\n\t"
"pexch %1, %1\n\t"
"pext5 %0, %1\n\t"
:"=r"(tempColor):"r"(sec)
);
*(u128*)&texture32bit[pixel]=tempColor;
}
Thank you in advance.