automatic alignment of stack vars for proper vfpu
automatic alignment of stack vars for proper vfpu
Hi
If you have tried to use the vfpu, you will have notice that it's fundamental to have all the variables referenced to be 16-bytes aligned. This words fine for the global vars using the gcc __attribute__((aligned(16))) as ScePspFVector4 in psptypes.h
But if you create temporal variables in the stack that need to be referenced with vfpu instrucctions the app might generate a exception.
I found a solution to solve the previous problem which consists on using the -mpreferred-stack-boundary=4 when compiling the source files. This forces the gcc to create the temporal variables aligned to 16 bytes, instead of the default 8 bytes.
yabadabo
If you have tried to use the vfpu, you will have notice that it's fundamental to have all the variables referenced to be 16-bytes aligned. This words fine for the global vars using the gcc __attribute__((aligned(16))) as ScePspFVector4 in psptypes.h
But if you create temporal variables in the stack that need to be referenced with vfpu instrucctions the app might generate a exception.
I found a solution to solve the previous problem which consists on using the -mpreferred-stack-boundary=4 when compiling the source files. This forces the gcc to create the temporal variables aligned to 16 bytes, instead of the default 8 bytes.
yabadabo
Re: automatic alignment of stack vars for proper vfpu
well, this is a recent addition to psp-gcc I added because it was only effective on i386-gcc beforehand.yabadabo wrote:I found a solution to solve the previous problem which consists on using the -mpreferred-stack-boundary=4 when compiling the source files. This forces the gcc to create the temporal variables aligned to 16 bytes, instead of the default 8 bytes.
however, you should be very prudent when using it in stack not to mix code with and without this option because if you call an external function compiled with this option from another function which isn't compiled with this option, you may end with a misaligned stack pointer because psp-gcc stack alignment doesn't insure the right alignment at the entry of function but assumed it is right aligned and try to allocate locals in such way that alignment is still right.
Just use the defines from gumInternal.h:
Code: Select all
// these macros are because GCC cannot handle aligned matrices declared on the stack
#define GUM_ALIGNED_MATRIX() (ScePspFMatrix4*)((((unsigned int)alloca(sizeof(ScePspFMatrix4)+64)) + 63) & ~63)
#define GUM_ALIGNED_VECTOR() (ScePspFVector4*)((((unsigned int)alloca(sizeof(ScePspFVector4)+64)) + 63) & ~63)
alloca uses allocation in heap (unless gcc sees them as a builtin to replace them as an allocation in stack but i'm unsure about it).J.F. wrote:Just use the defines from gumInternal.h:
Code: Select all
// these macros are because GCC cannot handle aligned matrices declared on the stack #define GUM_ALIGNED_MATRIX() (ScePspFMatrix4*)((((unsigned int)alloca(sizeof(ScePspFMatrix4)+64)) + 63) & ~63) #define GUM_ALIGNED_VECTOR() (ScePspFVector4*)((((unsigned int)alloca(sizeof(ScePspFVector4)+64)) + 63) & ~63)
Wrong. From the description of alloca():hlide wrote:alloca uses allocation in heap (unless gcc sees them as a builtin to replace them as an allocation in stack but i'm unsure about it).
Since I wrote those macros I think I know what they were supposed to do. :)The alloca function allocates size bytes of space in the stack frame of the caller. This temporary space is automatically freed when the function that called alloca returns to its caller.
Aligning to 16 instead of 64 should be enough though, there was an argument about aligning matrices back when it was written, and I decided to go the safe route when doing the first VFPU stuff. It hasn't been updated since then.
GE Dominator
well the very first implementation of alloca i saw was allocating space in the heap in such a way the successice call to alloca would manage to free memory blocks which were allocated in callee - but i believed to see that gcc rearrange it thoughout a builtin function to force it in the stack. And probably now any compliant C compiler must handle alloca in such a way that it directly allocates in stack. Or it may be a GCC exception.chp wrote:Wrong. From the description of alloca():hlide wrote:alloca uses allocation in heap (unless gcc sees them as a builtin to replace them as an allocation in stack but i'm unsure about it).Since I wrote those macros I think I know what they were supposed to do. :)The alloca function allocates size bytes of space in the stack frame of the caller. This temporary space is automatically freed when the function that called alloca returns to its caller.
I must check it.
Last edited by hlide on Fri Apr 13, 2007 4:26 am, edited 1 time in total.
Please could you tell me how a C function can allocate space in stack without freeing this space at exit of this C function ? the only way I see for it to be able to allocate space in stack as it is used is that it is a builtin function that gcc handles by inserting code to allocate space in stack and free it at the exit of the function where alloca() is called.chp wrote:The alloca function allocates size bytes of space in the stack frame of the caller. This temporary space is automatically freed when the function that called alloca returns to its caller.
oh bad, even if i do it that way with a macro, it doesn't work at all because tmp is discarded anyway before i can use its pointer :
#define alloca(size) (void *)({ char tmp[size]; &tmp; })
so could you explain me how to write a C plain alloca function which would allocate space in stack ?
16 is enough since this is the smaller requirement for lv.q/sv.q instructions. The only reason I see 64-byte alignment is good is to fit a matrix in a cache line (which is 64-byte wide) perfectly instead of two if misaligned.chp wrote:Aligning to 16 instead of 64 should be enough though, there was an argument about aligning matrices back when it was written, and I decided to go the safe route when doing the first VFPU stuff. It hasn't been updated since then.
Yes, there are versions of alloca() for platforms that cannot support grabbing memory from the stack that emulate the functionality by grabbing the current stack location and allocating from the heap, but it is not the original intention of the function.
These functions also break the functionality of the original definition, because they do not free memory on return but at the next alloca()-call, which might not also release that memory if you are at the same or lower stack-depth.
Example:
With a proper implementation actually allocating from the stack, the program result will be:
A worse case would be something like this:
Aaaaanyway, it's not that important. :)
These functions also break the functionality of the original definition, because they do not free memory on return but at the next alloca()-call, which might not also release that memory if you are at the same or lower stack-depth.
Example:
Code: Select all
int main()
{
char* a;
char* b;
a = call1();
b = call2();
printf("a: %p b: %p diff: %d\n",a,b,b-a);
return 0;
}
char* call1()
{
char* buf = alloca(2 << 16);
return buf;
}
char* call2()
{
char* buf = alloca(2 << 16);
return buf;
}
But with the C emulation of alloca(), the output will be this: (confirmed)a: 0xbfddca50 b: 0xbfddca50 diff: 0
As you can see, if you incidentally allocate from the same level always, you will end up leaking memory for each call. They have a "solution" for this, and it's calling alloca(0) at a higher level in the program, but it's not documented for the function in itself, only in the source of the emulation.a: 0xb7e55010 b: 0xb7e34010 diff: -135168
A worse case would be something like this:
Code: Select all
for (i = 0; i < 100; ++i)
{
char* b;
b = call2();
printf("a: %p b: %p diff: %d\n",a,b,b-a);
}
GE Dominator