Profiling, optimizing and general education
Profiling, optimizing and general education
Just a little background; I'm a developer looking for a new project.  Decided I wanted to see if I could optimize snes9x TYL(SNES emulator).  I've read just about every publicly available document on MIPS32 assembly, I've read dozens of articles/papers/wikis on PSP development.  I've set up an environment in Ubuntu and managed to build the snes9x.  I'm still learning though and I'd like the advice of some of you professionals so I can start myself in the right direction.
With that said I know there is a gprof tool for profiling PSP apps. That's my next step in figuring out how I can tighten the wheels on this emulator. Once I've identified the bottlenecks to the best of my ability I'll either be rearranging algorithms or writing faster code.
Where I'm looking for information is the writing of faster code. I'll always prefer writing modules in C than I would hack around in assembly but the aim of my project is breakneck speed (or at least 100% speed). At what point does assembly become the best option? Is there ever a time to second guess the compiler or does assembly make sense in certain situations? What have you experienced with your own applications?
Also if you have any advice about profiling I'd love to hear it.
Muchas gracias,
snow
			
			
													With that said I know there is a gprof tool for profiling PSP apps. That's my next step in figuring out how I can tighten the wheels on this emulator. Once I've identified the bottlenecks to the best of my ability I'll either be rearranging algorithms or writing faster code.
Where I'm looking for information is the writing of faster code. I'll always prefer writing modules in C than I would hack around in assembly but the aim of my project is breakneck speed (or at least 100% speed). At what point does assembly become the best option? Is there ever a time to second guess the compiler or does assembly make sense in certain situations? What have you experienced with your own applications?
Also if you have any advice about profiling I'd love to hear it.
Muchas gracias,
snow
					Last edited by snow on Sun Dec 16, 2007 12:13 pm, edited 1 time in total.
									
			
									
						Well you could profile your code and the look at the assembly listing of those cputime-eating functions. Compilers have become very good the last couple of years, but sometimes they generate bloated code. With some common sense and x86 assembly knowledge you could write faster code.
Also you could inline small and often called functions but I assume you already know that. I've never written an emulator before, but maybe you could look at how you could use MMX or SSE instructions? Maybe you could do something that takes the PSP several instuctions with SSE that takes one or two instuctions. But like I said, I don't know how the emulator works so I could be wrong here.
By the way, what do you need to profile psp-apps? I'd like to be able to profile my app too :)
			
			
									
									
						Also you could inline small and often called functions but I assume you already know that. I've never written an emulator before, but maybe you could look at how you could use MMX or SSE instructions? Maybe you could do something that takes the PSP several instuctions with SSE that takes one or two instuctions. But like I said, I don't know how the emulator works so I could be wrong here.
By the way, what do you need to profile psp-apps? I'd like to be able to profile my app too :)
I have written a videogame in x86 before but the MIPS32 ISA (while similar to other assembly languages) is a different creature.  I think MMX and SSE are extensions available on x86 but not necessarily MIPS although I appreciate the suggestions.
I haven't had a chance to poke around with gprof but that seems to be the tool included with the PSP SDK for profiling code. Once I've tried it out I'll report back. I haven't found a lot of information on the forum. When I try to call on the command line it it's complaining of a missing file so I still have to tinker.
			
			
									
									
						I haven't had a chance to poke around with gprof but that seems to be the tool included with the PSP SDK for profiling code. Once I've tried it out I'll report back. I haven't found a lot of information on the forum. When I try to call on the command line it it's complaining of a missing file so I still have to tinker.
Ah, my apologies.  That's exactly what I wrote and it's not what I had meant.  Didn't mean to crush anyone's hopes.  That would be impressive if I was only modifying one of my many PSP emulators  ;-P  I got so wound up in framing my question to elicit the best response that I did the exact opposite.
I'm picking up snes9x TYL to see if I can optimize it (assuming it isn't already optimised). The original authors aren't responding to emails so I'm going to re-learn what they already know. So I had intended to profile it to build some metrics before I dug in. Pending the results I'm either juggling algorithms or writing assembly.
			
			
									
									
						I'm picking up snes9x TYL to see if I can optimize it (assuming it isn't already optimised). The original authors aren't responding to emails so I'm going to re-learn what they already know. So I had intended to profile it to build some metrics before I dug in. Pending the results I'm either juggling algorithms or writing assembly.
keep this news away from noobs until you get somewhere.. Just a word of advicesnow wrote:Ah, my apologies. That's exactly what I wrote and it's not what I had meant. Didn't mean to crush anyone's hopes. That would be impressive if I was only modifying one of my many PSP emulators ;-P I got so wound up in framing my question to elicit the best response that I did the exact opposite.
I'm picking up snes9x TYL to see if I can optimize it (assuming it isn't already optimised). The original authors aren't responding to emails so I'm going to re-learn what they already know. So I had intended to profile it to build some metrics before I dug in. Pending the results I'm either juggling algorithms or writing assembly.
Cool. I THOUGHT you meant an emulator running on the PSP rather than the other thing. Look at the x86 ZSNES asm code included with SNES9x. That will show you what parts were converted to assembly on the x86 for speed. You'll find those files in the i386 directory of an official SNES9x source archive.snow wrote:Ah, my apologies. That's exactly what I wrote and it's not what I had meant. Didn't mean to crush anyone's hopes. That would be impressive if I was only modifying one of my many PSP emulators ;-P I got so wound up in framing my question to elicit the best response that I did the exact opposite.
I'm picking up snes9x TYL to see if I can optimize it (assuming it isn't already optimised). The original authors aren't responding to emails so I'm going to re-learn what they already know. So I had intended to profile it to build some metrics before I dug in. Pending the results I'm either juggling algorithms or writing assembly.
Nice.  Looking at the latest code highlights the fact that the snes9x PSP port is working off of a 5 year old build (1.39 as opposed to 1.51).  I'm wondering if I should bring it up to the latest build.  Probably would do a lot with my education.  Definitely won't be easy given that diffs show these files as night and day.
Thanks again for the help.
			
			
									
									
						Thanks again for the help.
I just wanted to chime in with my experience of optimising Daedalus for the PSP.
The only reason Daedalus is anywhere close to being playable is from having a good dynamic recompiler. The PSP port wasn't really a case of optimising any high-level algorithms, or even converting any specific code to assembly (although admittedly I had some small success from using the VFPU for TnL/clipping code). You can spend as much time as you like optimising the emulation of individual opcodes into MIPS assembly, but while you're still interpreting individual instructions, most of your CPU cycles are going to be taken up just fetching and decoding, rather than doing actual 'work'.
I spent about a year re-writing Daedalus's dynarec so that it worked well on the PSP (and even now I'd say that it's only about halfway to where I want it to be). Daedalus typically spends a small fraction of its time in statically compiled code (i.e. Daedalus.elf) - by far most of the time is spent in code which is generated dynamically at runtime. What's particularly hard about optimising in this kind of situation is that it's not really about making your code generation fast, it's about making your generated code fast. Unfortunately that means you can't rely on the compiler to help you out at all here. Even profiling tools aren't much use, because the generated code is totally dependent on which rom is being emulated.
If I was looking at optimising an existing emulator for the PSP, I'd start by seeing what it was doing for code generation. If it's purely an interpreting emulator, adding dynarec will give you by far the biggest gain. If it's already got a dynarec core, I'd start by looking at the quality of the generated code to see how it could be improved.
Of course, I know very little about emulating the snes, so my advice might be totally void :)
			
			
									
									
						The only reason Daedalus is anywhere close to being playable is from having a good dynamic recompiler. The PSP port wasn't really a case of optimising any high-level algorithms, or even converting any specific code to assembly (although admittedly I had some small success from using the VFPU for TnL/clipping code). You can spend as much time as you like optimising the emulation of individual opcodes into MIPS assembly, but while you're still interpreting individual instructions, most of your CPU cycles are going to be taken up just fetching and decoding, rather than doing actual 'work'.
I spent about a year re-writing Daedalus's dynarec so that it worked well on the PSP (and even now I'd say that it's only about halfway to where I want it to be). Daedalus typically spends a small fraction of its time in statically compiled code (i.e. Daedalus.elf) - by far most of the time is spent in code which is generated dynamically at runtime. What's particularly hard about optimising in this kind of situation is that it's not really about making your code generation fast, it's about making your generated code fast. Unfortunately that means you can't rely on the compiler to help you out at all here. Even profiling tools aren't much use, because the generated code is totally dependent on which rom is being emulated.
If I was looking at optimising an existing emulator for the PSP, I'd start by seeing what it was doing for code generation. If it's purely an interpreting emulator, adding dynarec will give you by far the biggest gain. If it's already got a dynarec core, I'd start by looking at the quality of the generated code to see how it could be improved.
Of course, I know very little about emulating the snes, so my advice might be totally void :)
I haven't dug into the internals yet but I think it's a static binary.  I've put dynarec on the table as an option but I'm going to try to avoid redesigning any of the core as my understanding of the SNES and its many sub processors is nothing beyond 40-50 white papers and technical specs.  In another month or two I ought to have a build so I can make a better guestimation then (time permitting...work + kids == no time for PSP dev).  Until then I'll be hanging out with the search page and Eclipse.
As far as building fast, generated code I imagine that profiling is your only route. While the generated code is entirely dependent on the ROM you're executing you can't escape the fact that you'd need to account for every ROM anyways. But I speak out of ignorance of Daedalus's internal workings so I can't really say. Maybe some day when I find the time I could take a look ;)
EDIT: Thanks again to all for the feedback. This is exactly what I was looking for.
			
			
									
									
						As far as building fast, generated code I imagine that profiling is your only route. While the generated code is entirely dependent on the ROM you're executing you can't escape the fact that you'd need to account for every ROM anyways. But I speak out of ignorance of Daedalus's internal workings so I can't really say. Maybe some day when I find the time I could take a look ;)
EDIT: Thanks again to all for the feedback. This is exactly what I was looking for.
Feel free to dig through Daedalus's source for anything that might be useful. It's not the tidiest of code bases, but it's slowly getting more organised (slowly, as in over the period of several years :)snow wrote:As far as building fast, generated code I imagine that profiling is your only route. While the generated code is entirely dependent on the ROM you're executing you can't escape the fact that you'd need to account for every ROM anyways. But I speak out of ignorance of Daedalus's internal workings so I can't really say. Maybe some day when I find the time I could take a look ;)
Certainly, if you decide to look at dynarec then Daedalus has a lot of handy code for assembling MIPS code on the fly, e.g. AssemblyWriterPSP.cpp, which you're welcome to use.

