-Os should be used instead of -O2 (or -O3)

Discuss the development of new homebrew software, tools and libraries.

Moderators: cheriff, TyRaNiD

Post Reply
Inferno_Intelligence
Posts: 3
Joined: Fri Aug 12, 2005 8:29 am

-Os should be used instead of -O2 (or -O3)

Post by Inferno_Intelligence »

Hey guys, I raised this on irc a day or two ago but figured I'd post it here for now.
Since storage space is usually limited on the psp, the size of executables should be an important issue for all psp developers.
To help minimize size, I suggest all developers start using the -Os optimization flag.
From gcc man:
-Os Optimize for size. -Os enables all -O2 optimizations
that do not typically increase code size. It also
performs further optimizations designed to reduce code
size.

-Os disables the following optimization flags:
-falign-functions -falign-jumps -falign-loops
-falign-labels -freorder-blocks -fprefetch-loop-arrays
Normally a few K for an executable doesn't matter but when you only have a couple megs to a gig of space every little byte counts. I think that space is going to be a bigger issue than runtime when it comes to the psp so optimization for space makes sense.

Anyway, I'd be interested in what others think of this suggestion and I'd also like to see other suggestions on how to save space. Maybe next we can look into encouraging people to compress their data all the time, maybe through some sort of serialization that has compression built in?
mrbrown
Site Admin
Posts: 1537
Joined: Sat Jan 17, 2004 11:24 am

Post by mrbrown »

Not good enough. The speed optimizations gained by -O2 far outweigh the minor size optimizations potentially gained by -Os. -O2 is the de facto standard and we're sticking with it (as far as PSPSDK goes).

There are much better ways to get the size of an executable down, including stripping it (which you get if you use PSPSDK's build.mak). You can also remove -G0 (or expliticly set -G8) which turns on $GP-relative addressing (one instruction variable loads), which has just been fixed in newlib.

I've also heard that people are working on packers as well.
remleduff
Posts: 11
Joined: Sun Jul 24, 2005 3:29 am

Post by remleduff »

Trying to save memory stick space by compiler optimizations is pretty silly.

But...

I am planning to benchmark the speed of code compiled with -Os vs. -O2 in any event.

The PSP doesn't have a very large cache on the processor, savings in code size cause fewer cache misses and oftentimes speed up code more than many of the -O2 tricks. Loop unrolling for instance is pretty much a thing of the past for modern processors.
jsgf
Posts: 254
Joined: Tue Jul 12, 2005 11:02 am
Contact:

Post by jsgf »

Sony recommends using Os on the PS2 because of the code-size decrease, and corresponding improvement in icache hit rate (the presentation I saw sait it was a guarenteed 10% performance improvement). This tradeoff may or may not apply to the PSP, because of the different memory architecture, but its definitely worth testing.
mrbrown
Site Admin
Posts: 1537
Joined: Sat Jan 17, 2004 11:24 am

Post by mrbrown »

No, their recommendation was for code that was profiled to run faster with -Os, not for all the code in the project. Of course you will get some pieces of code that run faster with -Os because it fits better in the I-cache. There's other tricks to make better use of the PSP's (and PS2's) miniscule I-cache than just passing -Os everywhere.

Overall, -Os just doesn't cut it when you're going for speed.
jsgf
Posts: 254
Joined: Tue Jul 12, 2005 11:02 am
Contact:

Post by jsgf »

http://www.scee.sony.co.uk/sceesite/fil ... eWeGot.pdf, page 26, is pretty unequivocal about using -Os on the PS/2. Most of its recommendations look fairly applicable to the PSP.
mrbrown
Site Admin
Posts: 1537
Joined: Sat Jan 17, 2004 11:24 am

Post by mrbrown »

Hahaha :) Oh, man. Let me clue you in on something. SCEE uses CodeWarrior everywhere internally. I'm 99% certain that one page you pointed out refers to CW and not GCC. Especially since they didn't mention any "small code" vs. "fast code" optimizations by name, and they state "10% in a mouse click". That means what they said should be talken with a grain of salt for any non-CW compiler (not endorsing CW, but I'm sure it's "small code" setting means something different than -Os). I've never seen that PDF you pointed out, but I'd be skeptical of a one page blurb with no specifics.

Anyway, there's another white paper, or newsgroup post (can't remember which) where Lionel Lemarie (Hikey of PS2 Linux fame, also from SCEE) talks about his experiences with setting "small code" in the compiler. Unless my memory is really shot, he says that "small code" is useful in key areas of code (that you must benchmark yourself) but he couldn't get good results across the whole of the project using it. His mention of -Os may have just been that it was what to pick for GCC, so he was probably using CW as well.

I'd be very interested in seeing some of your benchmarks of -Os vs. -O2 on the PSP.
jsgf
Posts: 254
Joined: Tue Jul 12, 2005 11:02 am
Contact:

Post by jsgf »

That presentation is by Lionel Lemarie (page 3).
matkeupon
Posts: 26
Joined: Sat Jul 02, 2005 10:58 pm

Post by matkeupon »

Personally I compiled the latest version of SmashGpsp with -Os. There is, in my case, absolutely no visible difference between the 2 settings regarding speed. And I gain 8 KB (210 KB with -Os, 218 with -O2).

As there is a very little size difference also, I'd say the 2 settings are quite similar.
User avatar
cwbowron
Posts: 76
Joined: Fri May 06, 2005 4:22 am
Location: East Lansing, MI
Contact:

Post by cwbowron »

I did a small test with pspChess. Compiled the app-level stuff using Os and O2. Library was standard library build. No background music. I tested using an 8 ply search and a 9 ply search. 3 runs per test. First move for black. reply to a2a4.

-Os (984,232 bytes)
8 ply: 50s, 53s, 50s => 51s average
9 ply: 5:45, 5:41, 5:27 => 5:38 average

-O2 (992,984 bytes)
8 ply: 45s, 44s, 45s => 45s average
9 ply: 4:37, 5:23, 4:44 => 4:55 average

-O3 (1,028,644 bytes)
8 ply: 47s, 47s, 50s => 48s average
9 ply: 5:13, 5:16, 4:34 => 5:01 average

looks like about a 10-15% increase in speed using -O2

EDIT: added -O3 results
Post Reply