what is the -G0 gcc option for?
what is the -G0 gcc option for?
Hi guys,
I've been trying to find in the code what is the -G0 option for and i can't find it anywhere. Can someone explain why almost all makefiles use it in the CFLAGS variable?
I've been trying to find in the code what is the -G0 option for and i can't find it anywhere. Can someone explain why almost all makefiles use it in the CFLAGS variable?
Hi! :)
I don't really know what difference makes in the executable, but from the man of gcc:
Ciaooo
Sakya
I don't really know what difference makes in the executable, but from the man of gcc:
Code: Select all
-G num
Put global and static objects less than or equal to num bytes into the small
data or bss sections instead of the normal data or bss sections. The default
value of num is 8. The -msdata option must be set to one of sdata or use for
this option to have any effect.
All modules should be compiled with the same -G num value. Compiling with dif‐
ferent values of num may or may not work; if it doesn’t the linker will give an
error message---incorrect code will not be generated.
Sakya
The -G option can change how you application references global vars. Say you want to load the value of some global variable, you ordinarily would have to spend two instructions loading the high and low halfwords of the address into a register, and then de-reference that to get your number.
With gp-relative addressing, the compiler knows your global is a fixed (and small) offset from some location, the gp-base, and keeps this base in the same register always (not sure what convention says this is on MIPS). Then all you need to do to get a global is a const-offset load from the GP register.
So accessing globals will take 1 instruction instead of 3, at the cost of one less scratch register for doing normal stuff.
Also, all code needs to be compiled with the same setting if you want to be able to link it.. as each unit will make assumptions about how to vars that may be extern'd to a different unit, and if they have different ideas on how to access it.
The -G simply says how small in bytes a variable must be before it is moved to the sdata section, and hence be accessible via gp-relative loads. Setting to 0 effectively disables this altogether.
With gp-relative addressing, the compiler knows your global is a fixed (and small) offset from some location, the gp-base, and keeps this base in the same register always (not sure what convention says this is on MIPS). Then all you need to do to get a global is a const-offset load from the GP register.
So accessing globals will take 1 instruction instead of 3, at the cost of one less scratch register for doing normal stuff.
Also, all code needs to be compiled with the same setting if you want to be able to link it.. as each unit will make assumptions about how to vars that may be extern'd to a different unit, and if they have different ideas on how to access it.
The -G simply says how small in bytes a variable must be before it is moved to the sdata section, and hence be accessible via gp-relative loads. Setting to 0 effectively disables this altogether.
Damn, I need a decent signature!
Information bump:
I've noticed a bug (more like instability) when -G0 is on and one is using the VFPU:
The PSP crashes with this:
The VFPU expects the parameters to be globals
-G0 screws this up causing the VFPU to fail
I suggest that -G0 be left off if you're using the VFPU
Without -G0 it runs fine
I've noticed a bug (more like instability) when -G0 is on and one is using the VFPU:
Code: Select all
void vfpu_ortho_matrix(ScePspFMatrix4 *m, float left, float right, float bottom, float top, float near, float far) {
__asm__ volatile (
"vmidt.q M100\n" // set M100 to identity
"mtv %2, S000\n" // C000 = [right, ?, ?, ]
"mtv %4, S001\n" // C000 = [right, top, ?, ]
"mtv %6, S002\n" // C000 = [right, top, far ]
"mtv %1, S010\n" // C010 = [left, ?, ?, ]
"mtv %3, S011\n" // C010 = [left, bottom, ?, ]
"mtv %5, S012\n" // C010 = [left, bottom, near]
"vsub.t C020, C000, C010\n" // C020 = [ dx, dy, dz]
"vrcp.t C020, C020\n" // C020 = [1/dx, 1/dy, 1/dz]
"vmul.s S100, S100[2], S020\n" // S100 = m->x.x = 2.0 / dx
"vmul.s S111, S111[2], S021\n" // S110 = m->y.y = 2.0 / dy
"vmul.s S122, S122[2], S022[-x]\n" // S122 = m->z.z = -2.0 / dz
"vsub.t C130, C000[-x,-y,-z], C010\n" // C130 = m->w[x, y, z] = [-(right+left), -(top+bottom), -(far+near)]
// we do vsub here since -(a+b) => (-1*a) + (-1*b) => -a - b
"vmul.t C130, C130, C020\n" // C130 = [-(right+left)/dx, -(top+bottom)/dy, -(far+near)/dz]
"sv.q C100, 0 + %0\n"
"sv.q C110, 16 + %0\n"
"sv.q C120, 32 + %0\n"
"sv.q C130, 48 + %0\n"
:"=m"(*m) : "r"(left), "r"(right), "r"(bottom), "r"(top), "r"(near), "r"(far));
}
Code: Select all
Exception - Address store
Thread ID - 0x046E4001
Th Name - user_main
Module ID - 0x046EC875
Mod Name - Untitled
EPC - 0x08826B50
Cause - 0x10000014
BadVAddr - 0x08947A38
Status - 0x60088613
zr:0x00000000 at:0x2008FF00 v0:0x00000000 v1:0x43F00000
a0:0x08947A38 a1:0x43880000 a2:0x00000000 a3:0xC1200000
t0:0x00000000 t1:0x00000002 t2:0x00000000 t3:0xBD400000
t4:0x09FBFD88 t5:0x00001E04 t6:0x08824F68 t7:0x20088600
s0:0x00000012 s1:0x09FBFE34 s2:0x00000001 s3:0x09FBFEE0
s4:0x00000012 s5:0x00000013 s6:0xDEADBEEF s7:0xDEADBEEF
t8:0x880A0000 t9:0x00000000 k0:0x09FBFF00 k1:0x00000000
gp:0x08845570 sp:0x09FBFD98 fp:0x09FBFD98 ra:0x0881BD0C
0x08826B50: 0xF8840000 '....' - sv.q C100, 0($a0)
-G0 screws this up causing the VFPU to fail
I suggest that -G0 be left off if you're using the VFPU
Without -G0 it runs fine
The only problem is that my parameter *was* global =o
*EDIT*
Problem solved:
1) The matrix I was passing to the ortho function IS global
2) It's located at 0x8947A38
3) 0x8947A38 is NOT divisible by 16
4) So I add __attribute__((aligned(16)))
5) Taddah, now it works
Moral of the story?
Keep -G0 on,
and use __attribute__((aligned(16))) when passing to VFPU
-G0 ensures 2byte aligned, but the VFPU needs 16
*EDIT*
Problem solved:
1) The matrix I was passing to the ortho function IS global
2) It's located at 0x8947A38
3) 0x8947A38 is NOT divisible by 16
4) So I add __attribute__((aligned(16)))
5) Taddah, now it works
Moral of the story?
Keep -G0 on,
and use __attribute__((aligned(16))) when passing to VFPU
-G0 ensures 2byte aligned, but the VFPU needs 16
-G0 has nothing at all to do with alignment of data. All -G# does is put data elements <= # in size into the small data segment. -G0 means nothing goes into the small data segment. -G8 means anything 8 bytes or smaller goes into the small data segment. In either case, alignment is not guaranteed to be anything other than the natural alignment (e.g., 4 bytes for int) unless you specify the alignment yourself (as you discovered).SANiK wrote:The only problem is that my parameter *was* global =o
*EDIT*
Problem solved:
1) The matrix I was passing to the ortho function IS global
2) It's located at 0x8947A38
3) 0x8947A38 is NOT divisible by 16
4) So I add __attribute__((aligned(16)))
5) Taddah, now it works
Moral of the story?
Keep -G0 on,
and use __attribute__((aligned(16))) when passing to VFPU
-G0 ensures 2byte aligned, but the VFPU needs 16
Gcc v4.x seems to do a lousy job of handling the small data segment, which is why you normally need -G0. You can change that on a file to file basis in the makefile. Some apps will perform a little better with stuff in the small data segment.
This issue is well known and can be handled if you add the option I added to psp-gcc to alter the minimal stack alignment boundary to a 16-byte alignment boundary, this way you may also use local objects instead of global objects in your parameters. Funnily I forgot its name, perhaps Heimdall can remember it because I gave him the necessary patch :). Regarding VFPU, you must always use this __attribute__((aligned(16))) anyway.SANiK wrote:The only problem is that my parameter *was* global =o
*EDIT*
Problem solved:
1) The matrix I was passing to the ortho function IS global
2) It's located at 0x8947A38
3) 0x8947A38 is NOT divisible by 16
4) So I add __attribute__((aligned(16)))
5) Taddah, now it works
Moral of the story?
Keep -G0 on,
and use __attribute__((aligned(16))) when passing to VFPU
-G0 ensures 2byte aligned, but the VFPU needs 16