Some suggestions

Discuss using and improving Lua and the Lua Player specific to the PSP.

Moderators: Shine, Insert_witty_name

Post Reply
nonarKitten
Posts: 4
Joined: Sat Aug 20, 2005 3:11 pm
Location: Calgary Canada
Contact:

Some suggestions

Post by nonarKitten »

I don't know how much overhead there is in LUA itself (i.e., how much critical optimization in the C functions would actually help), but skimming through the code I can see a lot of places where there's a lot of cycles burnt.

1. Because you're using both VRAM and system RAM for graphics, you have to needlessly check everytime whether-or-not the operation is being performed against the screen (into VRAM) or against something else (in system RAM). Considering it has 2MB, isn't it being a little restrictive using it exclusively for the ~256KB frame buffer? It would be a lot faster to just use VRAM exclusively, and just have an error if the user tries to load more graphics than memory permits.

Either that, or go the AMOS/Darkbasic way and start making more libraries that use and manage VRAM themselves (like sprites with collision detection and scrolling tilemapped playfields).

2. Every single graphics command, including pixel() is bounds checking. While this is the safest practice against novice programmers, it adds a lot of overhead (e.g., compare filling the screen with fillScreenRect versus doing it a pixel-at-a-time within LUA). Perhaps a Blitz like FastWitePixel and FastReadPixel function could be made that does not do this and only draws to the screen (to also avoid the overhead of that test).

3. Unless perfectly optimized, MUL and DIV should be avoided at the top loop in drawing functions. For example, in fillScreenRect, instead of doing a PSP_LINE_SIZE * whatever each pixel, you calculate the pixels skipped per line (which is the image width minus the rect width).

Code: Select all

void fillScreenRect(u16 color, int x0, int y0, int width, int height)
{
	sceKernelDcacheWritebackInvalidateAll();
	if (!initialized) return;
	u16* vram = getVramDrawBuffer();
	int x, y;
	int xSkip, screenPos;
	xSkip = PSP_LINE_SIZE - width;
	screenPos = (PSP_LINE_SIZE * y0) + x0;

	for &#40;y = 0; y < height; y++&#41; &#123;
		for &#40;x = 0; x < width; x++&#41; &#123;
			vram&#91;screenPos++&#93; = color;
		&#125;
		screenPos += xSkip;
	&#125;
	sceKernelDcacheWritebackInvalidateAll&#40;&#41;;
&#125;
Also <<1 is much faster than *2. Should test this, but it may be faster to do this for Y:

Code: Select all

&#40;&#40;y << 9&#41; - &#40;y << 5&#41;&#41;
Regarding the drawLine, the fastest Bresenham code is this. (Note: neither of these are actually tested on the PSP, but shuold work):

Code: Select all

    public void lineFast&#40;int x0, int y0, int x1, int y1, Color color&#41;
    &#123;
        int pix = color.getRGB&#40;&#41;;
        int dy = y1 - y0;
        int dx = x1 - x0;
        int stepx, stepy;

        if &#40;dy < 0&#41; &#123; dy = -dy;  stepy = -raster.width; &#125; else &#123; stepy = raster.width; &#125;
        if &#40;dx < 0&#41; &#123; dx = -dx;  stepx = -1; &#125; else &#123; stepx = 1; &#125;
        dy <<= 1;
        dx <<= 1;

        y0 *= raster.width;
        y1 *= raster.width;
        raster.pixel&#91;x0+y0&#93; = pix;
        if &#40;dx > dy&#41; &#123;
            int fraction = dy - &#40;dx >> 1&#41;;
            while &#40;x0 != x1&#41; &#123;
                if &#40;fraction >= 0&#41; &#123;
                    y0 += stepy;
                    fraction -= dx;
                &#125;
                x0 += stepx;
                fraction += dy;
                raster.pixel&#91;x0+y0&#93; = pix;
            &#125;
        &#125; else &#123;
            int fraction = dx - &#40;dy >> 1&#41;;
            while &#40;y0 != y1&#41; &#123;
                if &#40;fraction >= 0&#41; &#123;
                    x0 += stepx;
                    fraction -= dy;
                &#125;
                y0 += stepy;
                fraction += dx;
                raster.pixel&#91;x0+y0&#93; = pix;
            &#125;
        &#125;
    &#125;
4. How about a poke and peek like function (even if its just to the screen). If I wanted, to say, fill the screen with random dots, it'd be faster to do 'for i=vramDrawBuffer,vramDrawBuffer+130559 do; poke(i, random(0,65535)); end' This, of course, is still dependant upon how fast random() is...

5. Does Luaplayer support lua binary modules? It may be an interesting way to go, then different modules could be created for different needs. I just don't know if the PSP's DRM with stop this possibility... but for example.

ok, str = loadmodule("luadebuggfx") loads standard Lua graphics routines

ok, str = loadmodule("luafastgfx") loads graphics routines that are VRAM only and have all bounds (x<0) checking removed for speed.
chaos
Posts: 135
Joined: Sun Apr 10, 2005 5:05 pm

Post by chaos »

some great suggestions, makes me want to compile my own version of luaplayer. :)
Chaosmachine Studios: High Quality Homebrew.
Shine
Posts: 728
Joined: Fri Dec 03, 2004 12:10 pm
Location: Germany

Post by Shine »

1. Yes, using VRAM is a good idea, feel free to enhance the graphics functions to cache images in VRAM :-)

2. The check for valid x and y position are very fast compared to the conversion from double to int when calling the function from Lua (every number in Lua is double), so this is not very much faster.

3. I've optimized it, but it looks like it is only faster when accessing main memory, but when accessing VRAM it is not faster, perhaps because VRAM is not cached, so the CPU waits mainly for memory access and the calculation doesn't hurt. But accessing the main memory is now twice as fast.

Your draw line function is not very much faster, about 15 %, but I've included it.

4. poke and peek would compromise the security, so I don't add it. And it would be not faster, because the double numbers has to be converted the same way as in pixel(x, y, color).

5. loadmodules could compromise the security, too.
nonarKitten
Posts: 4
Joined: Sat Aug 20, 2005 3:11 pm
Location: Calgary Canada
Contact:

Post by nonarKitten »

I'd love to try compiling my own version for experiment, but haven't quite figured out how to do so on my Mac just yet.

Edit: I've read somewhere that double-precision really hurts on the MIPS3 architecture. It might be beneficial to go to single, if possible.

While integers might be better for somethings, fp mul and div are faster than integer mul and div. Where they're used may affect speed (for example it may be better to multiply the Y with PSP_LINE_SIZE as a float before converting to an integer.) e.g., (my C memory is rather rusty, so I can't say if this is correct or not).

Code: Select all

&#40;in lua.h&#41;
/* type of numbers in Lua */
#ifndef LUA_NUMBER
typedef single lua_Number;
#else
typedef LUA_NUMBER lua_Number;
#endif

&#40;elsewhere...&#41;

#define PSP_LINE_SIZE_FP 512.0
#define SCREEN_AREA 130560

static *u16 VramDrawBuffer;

void flipScreen&#40;&#41; &#123;
	if &#40;!initialized&#41; return;
	sceGuSwapBuffers&#40;&#41;;
	sceDisplaySetFrameBuf&#40;VramDrawBuffer, PSP_LINE_SIZE, 1, 1&#41;;
	dispBufferNumber ^= 1;
	VramDrawBuffer = getVramDrawBuffer&#40;&#41;;
&#125;

void initGraphics&#40;&#41; &#123;
	...
	VramDrawBuffer = getVramDrawBuffer&#40;&#41;;
	...
&#125;

typedef struct
&#123;
	int textureWidth; 
	lua_Number textureWidth_fp; // same as above, in floatingpoint
	int imageWidth;  // the image width
	int imageHeight;
	int imageArea; // width * height precomputed
	u16* data;
&#125; Image;

static int Image_pixel &#40;lua_State *L&#41; &#123;
	int argc = lua_gettop&#40;L&#41;;
	if&#40;argc != 3 && argc != 4&#41; return luaL_error&#40;L, "Image&#58;pixel&#40;x, y, &#91;color&#93;&#41; takes two or three arguments, and must be called with a colon."&#41;;
	SETDEST
	lua_Number ypos = luaL_checknumber&#40;L, 2&#41;
	ypos *= &#40;dest&#41;?dest->textureWidth_fp&#58;PSP_LINE_SIZE_FP;

	int x = luaL_checkint&#40;L, 1&#41;;
	int xpos = x + &#40;int&#41;ypos;
	if&#40;x<0 || xpos<0&#41; return 0;

	int color = &#40;argc == 4&#41;?*toColor&#40;L, 3&#41;&#58;0;

	if&#40;dest&#41; &#123;
		if &#40;x<dest->imageWidth && xpos<dest->imageArea&#41;&#41; &#123;
			if&#40;argc==3&#41; &#123;
				*pushColor&#40;L&#41; = dest->data&#91;xpos&#93;;
				return 1;
			&#125; else &#123;
				dest->data&#91;xpos&#93;=color;
				return 0;
			&#125;
		&#125;

	&#125; else &#123;
		if &#40;x<SCREEN_WIDTH && xpos<SCREEN_AREA&#41; &#123;
			if&#40;argc==3&#41; &#123;
				*pushColor&#40;L&#41; = VramDrawBuffer&#91;xpos&#93;;
				return 1;
			&#125; else &#123;
				VramDrawBuffer&#91;xpos&#93; = color;
				return 0;
			&#125;
		&#125;
	&#125;
&#125;
P.S. I'm rather obesessed with speeding up the pixel functions to make starfields smooth. Anyway, that's enough for now, back to trying to get the tool chain working...
"Knowledge without wisdom is like a bunch of books strapped on the back of an ass."
nevyn
Posts: 136
Joined: Sun Jul 31, 2005 5:05 pm
Location: Sweden
Contact:

Post by nevyn »

nonarKitten wrote:I'd love to try compiling my own version for experiment, but haven't quite figured out how to do so on my Mac just yet.
Just open up the Terminal and type the following:
curl -O http://www.oopo.net/consoledev/files/ps ... 050801.tgz
tar -xzvf psptoolchain-20050801.tgz
cd psptoolchain
sudo ./toolchain.sh
cd `mkdir -pv ~/Projects/PSP`
svn checkout svn://svn.pspdev.org/pspware/trunk/LuaPlayer

should work. You'll need the Developer Tools installed, of course... And svn. And wget from http://macosx.forked.net/download.php?j ... .1.pkg.tgz . (don't mess with fink or darwinports, unless you /like/ torture.) IM me if that still doesn't work :)
nonarKitten
Posts: 4
Joined: Sat Aug 20, 2005 3:11 pm
Location: Calgary Canada
Contact:

Post by nonarKitten »

Ye haw! Got it working - now I can profile all my wacky ideas before bothering anyone. Thanks! Svn is the key - get's everything I need all at once instead of compile-what-am-I-missing-go-get-and-recompile...
"Knowledge without wisdom is like a bunch of books strapped on the back of an ass."
Shine
Posts: 728
Joined: Fri Dec 03, 2004 12:10 pm
Location: Germany

Post by Shine »

nonarKitten wrote:I've read somewhere that double-precision really hurts on the MIPS3 architecture. It might be beneficial to go to single, if possible.
Yes, you are right, I've changed it and calculating 10000 sin values and adding it is now 4.3 times faster than before.
nonarKitten wrote: While integers might be better for somethings, fp mul and div are faster than integer mul and div. Where they're used may affect speed (for example it may be better to multiply the Y with PSP_LINE_SIZE as a float before converting to an integer.) e.g., (my C memory is rather rusty, so I can't say if this is correct or not).
I don't think that a floating point mul is faster than an integer mul, but if you measure it, I'll change it.
nonarKitten wrote: P.S. I'm rather obesessed with speeding up the pixel functions to make starfields smooth.
I don't know why, but looks like the speed of the conversion from float to int for the pixel function is much faster, or perhaps the new PSPSDK and compiler changes are faster, but with version 0.9 I can display only less than 30 stars in one vsync, but with the new version more than 200 stars are no problem. I've ported my PS2 starfield code (which was originally written by Sjeep) for testing:

Code: Select all

size = 200
zMax = 5
speed = 0.1

width = 480
height = 272

starfield = &#123;&#125;
math.randomseed&#40;os.time&#40;&#41;&#41;

function createStar&#40;i&#41;
	starfield&#91;i&#93; = &#123;&#125;
	starfield&#91;i&#93;.x = math.random&#40;2*width&#41; - width
	starfield&#91;i&#93;.y = math.random&#40;2*height&#41; - height
	starfield&#91;i&#93;.z = zMax
end

for i = 1, size do
	createStar&#40;i&#41;
	starfield&#91;i&#93;.z = math.random&#40;zMax&#41;
end

white = Color.new&#40;255, 255, 255&#41;
black = Color.new&#40;0, 0, 0&#41;

while true do
	screen&#58;clear&#40;black&#41;
	for i = 1, size do
		starfield&#91;i&#93;.z = starfield&#91;i&#93;.z - speed
		if starfield&#91;i&#93;.z < speed then createStar&#40;i&#41; end
		x = width / 2 + starfield&#91;i&#93;.x / starfield&#91;i&#93;.z
		y = height / 2 + starfield&#91;i&#93;.y / starfield&#91;i&#93;.z
		if x < 0 or y < 0 or x >= width or y >= height then
			createStar&#40;i&#41;
		else
			screen&#58;pixel&#40;x, y, white&#41;
		end
	end
	screen&#58;print&#40;272, 264, "Starfield for PSP by Shine", white&#41;
	screen.waitVblankStart&#40;&#41;
	screen.flip&#40;&#41;
	if Controls.read&#40;&#41;&#58;start&#40;&#41; then break end
end
Post Reply