GCC and cache misses

Discuss the development of new homebrew software, tools and libraries.

Moderators: cheriff, TyRaNiD

Post Reply
Brunni
Posts: 186
Joined: Sat Oct 08, 2005 10:27 pm

GCC and cache misses

Post by Brunni »

Hello,

I've a very big problem, my project (MasterBoy, an emulator for two consoles) seems to have reached a "critical size". In fact, whatever I modify or add can - and will most likely - make the performance drop by as much as 100%. It's usually about 50%, but it's simply enormous anyway.

Until now I could work around this problem by moving things here and there, changing some compiler options, trying to spare some memory where I could, but now it seems to be really finished, I can't do anything: it's now slow as hell whatever I do.

Of course, because of this I can't continue adding things and even correcting bugs, that's why I ask you if there's a way to overcome this problem?

I've seen GCC reorganizes and aligns data, often in chunks that are > 8k, just killing that 2-way cache :(

Also, in the CPU core, which has 60k of code (I know it's too much, but it's a jump table), it's very efficient in a "good" build (i.e. I've got luck) but very bad else (sometimes it's nearly as slow as running the code from an uncached address). The problem is as I never touch the CPU core again, the things in there should not be reorganized, I don't know why so much randomness :/

I've tried a lot of options like -fno-align-functions and -fno-reorder-blocks and even if it improves performance for a build or two, changing something in the code is likely to produce something even worse than before. Also as these flags are applied to the compiler, not the linker, I don't think it's what I'm searching for.

Thanks for any suggestions!
Sorry for my bad english
Image Oldschool library for PSP - PC version released
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

what about -Os ?
Brunni
Posts: 186
Joined: Sat Oct 08, 2005 10:27 pm

Post by Brunni »

It doesn't help, -Os will produce significantly slower code :(
(I'm using -O3 right now and -O2 does help for "bad builds" but is significantly slower on "good builds")
Sorry for my bad english
Image Oldschool library for PSP - PC version released
J.F.
Posts: 2906
Joined: Sun Feb 22, 2004 11:41 am

Post by J.F. »

Well, the MIPS CPU has a cache prefetch instruction. You could try using it with inline assembly at the start of certain key functions to make sure the data it needs will be in the cache. The hard part, of course, is deciding where to put it, and what data to prefetch.
crazyc
Posts: 408
Joined: Fri Jun 17, 2005 10:13 am

Post by crazyc »

J.F. wrote:Well, the MIPS CPU has a cache prefetch instruction. You could try using it with inline assembly at the start of certain key functions to make sure the data it needs will be in the cache. The hard part, of course, is deciding where to put it, and what data to prefetch.
To be pedantic, you don't have to use inline assembly. Try __builtin_allegrex_cache(int op, int addr). Fill is op 0x1e and if you want to prevent some data from being evicted, fill and lock is of 0x1f.
J.F.
Posts: 2906
Joined: Sun Feb 22, 2004 11:41 am

Post by J.F. »

crazyc wrote:
J.F. wrote:Well, the MIPS CPU has a cache prefetch instruction. You could try using it with inline assembly at the start of certain key functions to make sure the data it needs will be in the cache. The hard part, of course, is deciding where to put it, and what data to prefetch.
To be pedantic, you don't have to use inline assembly. Try __builtin_allegrex_cache(int op, int addr). Fill is op 0x1e and if you want to prevent some data from being evicted, fill and lock is of 0x1f.
Cool! Thanks for the tip. I didn't realize the PSP C compiler had a built-in op for the cache. Any other allegrex specific built-ins?
crazyc
Posts: 408
Joined: Fri Jun 17, 2005 10:13 am

Post by crazyc »

J.F. wrote:Cool! Thanks for the tip. I didn't realize the PSP C compiler had a built-in op for the cache. Any other allegrex specific built-ins?
Well, min, max, bitrev, madd (integer multiply-add), msub (integer multiply-subtract), wsbh (word swap bytes by halfword), wsbw (word swap bytes by word), clo (count leading ones), clz (count leading zeros), cto, ctz (these are implemented as a bitrev, cl[oz] pair), sync, ceil_w_s, floor_w_s, trunc_w_s, and round_w_s. They are listed in the psp-gcc patch.
J.F.
Posts: 2906
Joined: Sun Feb 22, 2004 11:41 am

Post by J.F. »

crazyc wrote:
J.F. wrote:Cool! Thanks for the tip. I didn't realize the PSP C compiler had a built-in op for the cache. Any other allegrex specific built-ins?
Well, min, max, bitrev, madd (integer multiply-add), msub (integer multiply-subtract), wsbh (word swap bytes by halfword), wsbw (word swap bytes by word), clo (count leading ones), clz (count leading zeros), cto, ctz (these are implemented as a bitrev, cl[oz] pair), sync, ceil_w_s, floor_w_s, trunc_w_s, and round_w_s. They are listed in the psp-gcc patch.
Thanks. Some would be very useful to some things when optimizing for the PSP.
Brunni
Posts: 186
Joined: Sat Oct 08, 2005 10:27 pm

Post by Brunni »

crazyc wrote:
J.F. wrote:Well, the MIPS CPU has a cache prefetch instruction. You could try using it with inline assembly at the start of certain key functions to make sure the data it needs will be in the cache. The hard part, of course, is deciding where to put it, and what data to prefetch.
To be pedantic, you don't have to use inline assembly. Try __builtin_allegrex_cache(int op, int addr). Fill is op 0x1e and if you want to prevent some data from being evicted, fill and lock is of 0x1f.
Thank you but can I get a documentation somewhere on how to use it?
Does this affect the I or D-cache? If I fill and/or lock a specified address, how much data is locked?
Thanks
Sorry for my bad english
Image Oldschool library for PSP - PC version released
J.F.
Posts: 2906
Joined: Sun Feb 22, 2004 11:41 am

Post by J.F. »

Brunni wrote:
crazyc wrote:
J.F. wrote:Well, the MIPS CPU has a cache prefetch instruction. You could try using it with inline assembly at the start of certain key functions to make sure the data it needs will be in the cache. The hard part, of course, is deciding where to put it, and what data to prefetch.
To be pedantic, you don't have to use inline assembly. Try __builtin_allegrex_cache(int op, int addr). Fill is op 0x1e and if you want to prevent some data from being evicted, fill and lock is of 0x1f.
Thank you but can I get a documentation somewhere on how to use it?
Does this affect the I or D-cache? If I fill and/or lock a specified address, how much data is locked?
Thanks
It's just giving you direct access to the assembly language cache instruction. Look in any MIPS programming manual for an explanation (pg 93 of the MIPS Vol 2 manual PDF).
crazyc
Posts: 408
Joined: Fri Jun 17, 2005 10:13 am

Post by crazyc »

Brunni wrote:Thank you but can I get a documentation somewhere on how to use it?
Does this affect the I or D-cache? If I fill and/or lock a specified address, how much data is locked?
Thanks
Before you start inserting cache ops, you should probably profile your app with the psp profiler to check the d and i cache miss rate.
Post Reply