-pg gprof profiling library added to libc
Posted: Tue Oct 04, 2005 8:51 am
Hey,
Committed at revision 1222 is my latest patch, -pg profiling. Supports both caller->callee graph profiling, and sampling.
In order to get your code profiled, just add -pg to your EE_CFLAGS variable in your Makefile. That's all there's to it. But please read on before using this. Like on any other platform, the generated output file is only written when main() leaves. Either by normal exit, or by calling exit(). Which means you will have to terminate your program to get the output. Also, the output is always written to host:gmon.out.
The implementation of sampling uses either TIM0 or TIM1. By default it is TIM0, but it is changeable by a #define at the beginning of mcount_internal.c. Please make sure you're not using the same timer as mcount.
An example is found under libc/samples/prof. It's a simple example(/test) that shows a function called loops_10_times being called 10 times(duh), and that function calls nested only 5 times. Another function, sleeping_beauty is called, and it waits around 3 seconds worth of hsync (see output below, 2.96 seconds).
One last thing, if you want to profile the entire libc, go into its Makefile and look for a commented "-pg".
Running the sample will yield the following output (portion of it):
Hope somebody finds this useful. I have found many debug functions being called in the code that I'm working on, so, you never know :P
-- gawd 2005-10-03
[/i]
Committed at revision 1222 is my latest patch, -pg profiling. Supports both caller->callee graph profiling, and sampling.
In order to get your code profiled, just add -pg to your EE_CFLAGS variable in your Makefile. That's all there's to it. But please read on before using this. Like on any other platform, the generated output file is only written when main() leaves. Either by normal exit, or by calling exit(). Which means you will have to terminate your program to get the output. Also, the output is always written to host:gmon.out.
The implementation of sampling uses either TIM0 or TIM1. By default it is TIM0, but it is changeable by a #define at the beginning of mcount_internal.c. Please make sure you're not using the same timer as mcount.
An example is found under libc/samples/prof. It's a simple example(/test) that shows a function called loops_10_times being called 10 times(duh), and that function calls nested only 5 times. Another function, sleeping_beauty is called, and it waits around 3 seconds worth of hsync (see output below, 2.96 seconds).
One last thing, if you want to profile the entire libc, go into its Makefile and look for a commented "-pg".
Running the sample will yield the following output (portion of it):
Code: Select all
Flat profile:
Each sample counts as 0.001 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
100.00 2.96 2.96 1 2.96 2.96 sleeping_beauty
0.00 2.96 0.00 10 0.00 0.00 loops_10_times
0.00 2.96 0.00 5 0.00 0.00 nested
0.00 2.96 0.00 4 0.00 0.00 SifAddCmdHandler
0.00 2.96 0.00 3 0.00 0.00 DIntr
0.00 2.96 0.00 3 0.00 0.00 EIntr
0.00 2.96 0.00 1 0.00 0.00 EnableDmac
0.00 2.96 0.00 1 0.00 0.00 SifInitCmd
0.00 2.96 0.00 1 0.00 0.00 SifInitRpc
0.00 2.96 0.00 1 0.00 0.00 SifSendCmd
0.00 2.96 0.00 1 0.00 0.00 _SifSendCmd
0.00 2.96 0.00 1 0.00 0.00 _ps2sdk_alloc_init
0.00 2.96 0.00 1 0.00 0.00 _ps2sdk_args_parse
0.00 2.96 0.00 1 0.00 0.00 _ps2sdk_libc_deinit
0.00 2.96 0.00 1 0.00 0.00 _ps2sdk_libc_init
0.00 2.96 0.00 1 0.00 0.00 _ps2sdk_stdio_init
0.00 2.96 0.00 1 0.00 0.00 _ps2sdk_stdlib_deinit
0.00 2.96 0.00 1 0.00 0.00 _ps2sdk_stdlib_init
0.00 2.96 0.00 1 0.00 0.00 chdir
0.00 2.96 0.00 1 0.00 0.00 cleanup
0.00 2.96 0.00 1 0.00 2.96 main
Hope somebody finds this useful. I have found many debug functions being called in the code that I'm working on, so, you never know :P
-- gawd 2005-10-03
[/i]