GCC and cache misses
GCC and cache misses
Hello,
I've a very big problem, my project (MasterBoy, an emulator for two consoles) seems to have reached a "critical size". In fact, whatever I modify or add can - and will most likely - make the performance drop by as much as 100%. It's usually about 50%, but it's simply enormous anyway.
Until now I could work around this problem by moving things here and there, changing some compiler options, trying to spare some memory where I could, but now it seems to be really finished, I can't do anything: it's now slow as hell whatever I do.
Of course, because of this I can't continue adding things and even correcting bugs, that's why I ask you if there's a way to overcome this problem?
I've seen GCC reorganizes and aligns data, often in chunks that are > 8k, just killing that 2-way cache :(
Also, in the CPU core, which has 60k of code (I know it's too much, but it's a jump table), it's very efficient in a "good" build (i.e. I've got luck) but very bad else (sometimes it's nearly as slow as running the code from an uncached address). The problem is as I never touch the CPU core again, the things in there should not be reorganized, I don't know why so much randomness :/
I've tried a lot of options like -fno-align-functions and -fno-reorder-blocks and even if it improves performance for a build or two, changing something in the code is likely to produce something even worse than before. Also as these flags are applied to the compiler, not the linker, I don't think it's what I'm searching for.
Thanks for any suggestions!
I've a very big problem, my project (MasterBoy, an emulator for two consoles) seems to have reached a "critical size". In fact, whatever I modify or add can - and will most likely - make the performance drop by as much as 100%. It's usually about 50%, but it's simply enormous anyway.
Until now I could work around this problem by moving things here and there, changing some compiler options, trying to spare some memory where I could, but now it seems to be really finished, I can't do anything: it's now slow as hell whatever I do.
Of course, because of this I can't continue adding things and even correcting bugs, that's why I ask you if there's a way to overcome this problem?
I've seen GCC reorganizes and aligns data, often in chunks that are > 8k, just killing that 2-way cache :(
Also, in the CPU core, which has 60k of code (I know it's too much, but it's a jump table), it's very efficient in a "good" build (i.e. I've got luck) but very bad else (sometimes it's nearly as slow as running the code from an uncached address). The problem is as I never touch the CPU core again, the things in there should not be reorganized, I don't know why so much randomness :/
I've tried a lot of options like -fno-align-functions and -fno-reorder-blocks and even if it improves performance for a build or two, changing something in the code is likely to produce something even worse than before. Also as these flags are applied to the compiler, not the linker, I don't think it's what I'm searching for.
Thanks for any suggestions!
Sorry for my bad english
Oldschool library for PSP - PC version released
Oldschool library for PSP - PC version released
It doesn't help, -Os will produce significantly slower code :(
(I'm using -O3 right now and -O2 does help for "bad builds" but is significantly slower on "good builds")
(I'm using -O3 right now and -O2 does help for "bad builds" but is significantly slower on "good builds")
Sorry for my bad english
Oldschool library for PSP - PC version released
Oldschool library for PSP - PC version released
To be pedantic, you don't have to use inline assembly. Try __builtin_allegrex_cache(int op, int addr). Fill is op 0x1e and if you want to prevent some data from being evicted, fill and lock is of 0x1f.J.F. wrote:Well, the MIPS CPU has a cache prefetch instruction. You could try using it with inline assembly at the start of certain key functions to make sure the data it needs will be in the cache. The hard part, of course, is deciding where to put it, and what data to prefetch.
Cool! Thanks for the tip. I didn't realize the PSP C compiler had a built-in op for the cache. Any other allegrex specific built-ins?crazyc wrote:To be pedantic, you don't have to use inline assembly. Try __builtin_allegrex_cache(int op, int addr). Fill is op 0x1e and if you want to prevent some data from being evicted, fill and lock is of 0x1f.J.F. wrote:Well, the MIPS CPU has a cache prefetch instruction. You could try using it with inline assembly at the start of certain key functions to make sure the data it needs will be in the cache. The hard part, of course, is deciding where to put it, and what data to prefetch.
Well, min, max, bitrev, madd (integer multiply-add), msub (integer multiply-subtract), wsbh (word swap bytes by halfword), wsbw (word swap bytes by word), clo (count leading ones), clz (count leading zeros), cto, ctz (these are implemented as a bitrev, cl[oz] pair), sync, ceil_w_s, floor_w_s, trunc_w_s, and round_w_s. They are listed in the psp-gcc patch.J.F. wrote:Cool! Thanks for the tip. I didn't realize the PSP C compiler had a built-in op for the cache. Any other allegrex specific built-ins?
Thanks. Some would be very useful to some things when optimizing for the PSP.crazyc wrote:Well, min, max, bitrev, madd (integer multiply-add), msub (integer multiply-subtract), wsbh (word swap bytes by halfword), wsbw (word swap bytes by word), clo (count leading ones), clz (count leading zeros), cto, ctz (these are implemented as a bitrev, cl[oz] pair), sync, ceil_w_s, floor_w_s, trunc_w_s, and round_w_s. They are listed in the psp-gcc patch.J.F. wrote:Cool! Thanks for the tip. I didn't realize the PSP C compiler had a built-in op for the cache. Any other allegrex specific built-ins?
Thank you but can I get a documentation somewhere on how to use it?crazyc wrote:To be pedantic, you don't have to use inline assembly. Try __builtin_allegrex_cache(int op, int addr). Fill is op 0x1e and if you want to prevent some data from being evicted, fill and lock is of 0x1f.J.F. wrote:Well, the MIPS CPU has a cache prefetch instruction. You could try using it with inline assembly at the start of certain key functions to make sure the data it needs will be in the cache. The hard part, of course, is deciding where to put it, and what data to prefetch.
Does this affect the I or D-cache? If I fill and/or lock a specified address, how much data is locked?
Thanks
Sorry for my bad english
Oldschool library for PSP - PC version released
Oldschool library for PSP - PC version released
It's just giving you direct access to the assembly language cache instruction. Look in any MIPS programming manual for an explanation (pg 93 of the MIPS Vol 2 manual PDF).Brunni wrote:Thank you but can I get a documentation somewhere on how to use it?crazyc wrote:To be pedantic, you don't have to use inline assembly. Try __builtin_allegrex_cache(int op, int addr). Fill is op 0x1e and if you want to prevent some data from being evicted, fill and lock is of 0x1f.J.F. wrote:Well, the MIPS CPU has a cache prefetch instruction. You could try using it with inline assembly at the start of certain key functions to make sure the data it needs will be in the cache. The hard part, of course, is deciding where to put it, and what data to prefetch.
Does this affect the I or D-cache? If I fill and/or lock a specified address, how much data is locked?
Thanks
Before you start inserting cache ops, you should probably profile your app with the psp profiler to check the d and i cache miss rate.Brunni wrote:Thank you but can I get a documentation somewhere on how to use it?
Does this affect the I or D-cache? If I fill and/or lock a specified address, how much data is locked?
Thanks