Help with converting direct vram writing to GU.

SamuraiX · Post by **SamuraiX** » Thu Aug 24, 2006 1:30 pm

I'm a complete newbie when it comes to programming video. I have successfully been able to display images by writing directly to vram. However, I would like to get even better performance but really I have no clue. Would the people here help me out on my quest to using the graphics unit properly.

Here is my original code that writes to vram for my game:

Code: Select all

int video_copy_screen&#40;s_screen* src&#41;
&#123;
	char *sp;
	Color *dp;

	int width, height;
	
	// Determine width and height
	width = screen_w;
	if&#40;width > src->width&#41; width = src->width;
	height = screen_h;
	if&#40;height > src->height&#41; height = src->height;

	if&#40;!width || !height&#41; return 0;

	// Copy to linear video ram
	sp = src->data;
	dp = getVramDisplayBuffer&#40;&#41;+&#40;&#40;SCREEN_WIDTH-screen_w&#41;/2&#41;+&#40;&#40;SCREEN_HEIGHT-screen_h&#41;/2&#41;*LINESIZE;
	do&#123;
		int x;
		for&#40;x=0;x<width;x++&#41; &#123;
			dp&#91;x&#93; = palette&#91;&#40;&#40;int&#41;&#40;sp&#91;x&#93;&#41;&#41; & 0xFF&#93;;
		&#125;
		sp += src->width;
		dp += SCREEN_PITCH;
	&#125;while&#40;--height&#41;;

	if&#40;pspFpsEnabled&#41; getFPS&#40;&#41;;
	return 1;
&#125;

And here was my attempt at converting the code to the GU:

Code: Select all

int video_copy_screen&#40;s_screen* src&#41;
&#123;
	char *sp;
	Color *dp;
	int width, height;

	Image* image = &#40;Image*&#41; malloc&#40;sizeof&#40;Image&#41;&#41;;
    if &#40;!image&#41; return 0;

	// Determine width and height
	width = screen_w;
	if&#40;width > src->width&#41; width = src->width;
	height = screen_h;
	if&#40;height > src->height&#41; height = src->height;

	if&#40;!width || !height&#41; return 0;

    image->imageWidth = width;
    image->imageHeight = height;
    image->textureWidth = getNextPower2&#40;width&#41;;
    image->textureHeight = getNextPower2&#40;height&#41;;
	
	image->data = &#40;Color*&#41; memalign&#40;16, image->textureWidth * image->textureHeight * sizeof&#40;Color&#41;&#41;;
    if &#40;!image->data&#41; return 0;
    memset&#40;image->data, 0, image->textureWidth * image->textureHeight * sizeof&#40;Color&#41;&#41;;

	sp = src->data;
	dp = image->data + &#40;&#40;SCREEN_WIDTH-screen_w&#41;/2&#41;+&#40;&#40;SCREEN_HEIGHT-screen_h&#41;/2&#41; * image->textureWidth;
	do&#123;
		int x;
		for&#40;x=0;x<width;x++&#41; &#123;
			dp&#91;x&#93; = palette&#91;sp&#91;x&#93;&#93;;
		&#125;
		sp += src->width;
		dp += image->textureWidth;
	&#125;while&#40;--height&#41;;

	if&#40;pspFpsEnabled&#41; getFPS&#40;&#41;;
	blitImageToScreen&#40;0, 0, 480, 272, image, 0, 0&#41;;
	flipScreen&#40;&#41;;
	freeImage&#40;image&#41;;
	return 1;
&#125;

Now when writing to vram I am getting about 60~70 fps. With the new code I'm only getting about ~38 fps.

Can some guide me in properly using the GU to get better performance than the original code.

blitImageToScreen, getNextPower2, flipScreen and freeImage is based on graphics.c from luaplayer.

s_screen is a struct that contains two ints for width and height and char array for the data.

Jim · Post by **Jim** » Thu Aug 24, 2006 5:18 pm

Not really surprised. You're copying the entire frame from your palettised texture into a 32bit texture, then blitting that, instead of just copying it once. Plus you've added a bunch of dynamic memory allocation too.

To get the real speed you should store the palettised version at the right size in vram, and blit vram->vram.

Jim

SamuraiX · Post by **SamuraiX** » Fri Aug 25, 2006 2:47 am

Jim wrote:To get the real speed you should store the palettised version at the right size in vram, and blit vram->vram.

ok then. What I would need to do is create two vram pointers (using some GU function for sp,dp) and palletize src->data into the vram pointer. Then use some GU function (like sceGuCopyImage) to blit the vram pointer to vram?

Code: Select all


	sp = src->data;
	dp = &#40;&#40;SCREEN_WIDTH-screen_w&#41;/2&#41;+&#40;&#40;SCREEN_HEIGHT-screen_h&#41;/2&#41;*LINESIZE;
	do&#123;
		int x;
		for&#40;x=0;x<width;x++&#41; &#123;
			dp&#91;x&#93; = palette&#91;&#40;&#40;int&#41;&#40;src->data&#91;x&#93;&#41;&#41; & 0xFF&#93;;
		&#125;
		src->data += src->width;
		dp += SCREEN_PITCH;
	&#125;while&#40;--height&#41;;

Thank you for your reply and help Jim.

Jim · Post by **Jim** » Fri Aug 25, 2006 8:21 am

It looks like your source data is a palette index texture with a 32bit palette lookup table. You'd have to check, but I'm sure the PSP can handle this kind of format natively, so you don't need to do the unpacking yourself.

Jim

SamuraiX · Post by **SamuraiX** » Fri Aug 25, 2006 12:42 pm

You are correct but the palette table is only 8-bit and I've been unsuccessfull in getting the image to display natively without unpacking the data.

However, I've progressed a bit in trying to use the method you stated above. To just copy the main data without adding dynamic memory allocation and just blit ram-vram.

Code: Select all

int video_copy_screen&#40;s_screen* src&#41;
&#123;
   char *sp;
   char *dp;
   int width, height;

   // Determine width and height
   width = screen_w;
   if&#40;width > src->width&#41; width = src->width;
   height = screen_h;
   if&#40;height > src->height&#41; height = src->height;

   if&#40;!width || !height&#41; return 0;

   sp = src->data;
   dp = src->data + 512 * 272 * 2;
   
   int x,y; 

   for&#40;y=0; y<height; y++&#41;&#123;
      for&#40;x=0;x<width;x++&#41; &#123;
         dp&#91;x+512*272*2&#93; = palette&#91;sp&#91;x&#93;&#93;;
      &#125;
      sp += src->width;
      dp += image->textureWidth;
   &#125;

   if&#40;pspFpsEnabled&#41; getFPS&#40;&#41;;

   Color* vram = getVramDrawBuffer&#40;&#41;;
   sceKernelDcacheWritebackInvalidateAll&#40;&#41;;
   guStart&#40;&#41;;
   sceGuCopyImage&#40;GU_PSM_8888, 0, 0, width, height, src->textureWidth, dp, 0, 0, LINESIZE, vram&#41;;
   sceGuFinish&#40;&#41;;
   sceGuSync&#40;0,0&#41;;
   flipScreen&#40;&#41;;
     return 1;
&#125;

Lastly, here is the image that is being displayed on the PSP.

SamuraiX · Post by **SamuraiX** » Sat Aug 26, 2006 10:54 am

ok... starting from scratch I'm able to display my image perfectly! I found out how to use the GU functions by referencing DoomPSP video implementation! But I'm not sure if some functions that I'm using are necessary...

The src->data that I'm using is an 8-bit texture and I think one way to increase performance is to lower the bit level from 32 to 8. But I'm not sure what to do next....

Code: Select all

int video_copy_screen&#40;s_screen* src&#41;
&#123;
	int width, height;
	
	// Determine width and height
	width = screen_w;
	if&#40;width > src->width&#41; width = src->width;
	height = screen_h;
	if&#40;height > src->height&#41; height = src->height;

	if&#40;!width || !height&#41; return 0;

	if&#40;pspFpsEnabled&#41; getFPS&#40;&#41;;

	sceKernelDcacheWritebackAll&#40;&#41;;
	sceGuStart&#40;0,list&#41;;
	sceGuClearColor&#40;0xff000000&#41;;
	sceGuClearDepth&#40;0&#41;;
	sceGuClear&#40;GU_COLOR_BUFFER_BIT|GU_DEPTH_BUFFER_BIT&#41;;
 
	sceGuClutMode&#40;GU_PSM_8888,0,0xff,0&#41;; // 32-bit palette
	sceGuClutLoad&#40;&#40;32&#41;,palette&#41;; // upload 32*8 entries &#40;256&#41;

	sceGuTexMode&#40;GU_PSM_T8,0,0,0&#41;;  
	sceGuTexImage&#40;0,512,512,width, src->data&#41;;
	sceGuTexFunc&#40;GU_TFX_REPLACE,0&#41;;
	sceGuTexFilter&#40;GU_LINEAR,GU_LINEAR&#41;;
	sceGuTexOffset&#40;0,0&#41;;
	sceGuAmbientColor&#40;0xffffffff&#41;;

	// render sprite

	sceGuColor&#40;0xffffffff&#41;;
	struct Vertex *vertices = &#40;struct Vertex*&#41;sceGuGetMemory&#40;2 * sizeof&#40;struct Vertex&#41;&#41;;
	vertices&#91;0&#93;.u = 0;
	vertices&#91;0&#93;.v = 0;
	vertices&#91;0&#93;.x = 0;
	vertices&#91;0&#93;.y = 0;
	vertices&#91;0&#93;.z = 0;
	vertices&#91;1&#93;.u = width; 
	vertices&#91;1&#93;.v = height;
	vertices&#91;1&#93;.x = SCREEN_WIDTH; 
	vertices&#91;1&#93;.y = SCREEN_HEIGHT; 
	vertices&#91;1&#93;.z = 0;
	sceGuDrawArray&#40;GU_SPRITES,GU_TEXTURE_32BITF|GU_VERTEX_32BITF|GU_TRANSFORM_2D,2,0,vertices&#41;;

	sceGuFinish&#40;&#41;;
	sceGuSync&#40;0,0&#41;;
	
	sceGuSwapBuffers&#40;&#41;;
	
	return 1;
&#125;

However, the performance is still lacking. I'm getting 50 fps at best. So my next setup is to allocate my struct (screen) into video memory then blit from there to see if things speed up. Any recommendations? Thank You very much for your help so far Jim!

Jim · Post by **Jim** » Sat Aug 26, 2006 5:12 pm

It's this 'src->data' that needs to be in vram for max speed. Unless it's changing dynamically, copy it into vram first, once only. Ideally swizzle it too.

Jim
[/i]

SamuraiX · Post by **SamuraiX** » Sat Aug 26, 2006 8:04 pm

Jim I was want to start off by thank you for your all your help. And I have good news!

Previously the most I could get at 222 was 35~40 fps and at 333 a solid 60 (These number would dip depending on the mod being used). But now I'm getting 105 fps and 165 respectivaly to each of the CPU speeds. I never imagined using the GPU would make such a difference!

Just in case you were wondering where all this work was going... Its for my Beats of Rage/OpenBoR Port.

Lastly, I tried to swizzle as well but my image would look off horizontally.... Not sure why this was happening?

Jim · Post by **Jim** » Sun Aug 27, 2006 12:08 pm

http://wiki.ps2dev.org/psp:ge_faq.
You just need to make sure your textures are a multiple of 16bytes wide and 8rows high. Swizzled graphics are far faster than normal ones.

Glad to hear things are moving along :D

Jim

SamuraiX · Post by **SamuraiX** » Mon Aug 28, 2006 8:27 am

I have tried to swizzle the graphics and I'm seeing around 7~10 fps less then having them not swizzled?

chp · Post by **chp** » Mon Aug 28, 2006 4:28 pm

Briefly looking at your code I see that you copy the entire screen in one single sprite. This is not good for the GE cache as it has to refill many more times than you want it to. Try splitting the copy into slices of 32 source-pixels each. Take a look at the blit-sample in pspsdk if you need more information. With 8-bit source data, you should be able to hit around 1000 fps if going vram->vram or 500 fps from ram (not swizzled).

SamuraiX · Post by **SamuraiX** » Tue Aug 29, 2006 6:52 am

I should of updated the post with the new code prior to stating 150 fps. But yeah I was amazed how slicing could give such a boost (went from 50 fps to 150).

Now I have two questions. The first is should I move each slice into vram as the whole image is too big to fit into vram?

The second question is... for the life of me I can't seem to figure out why I cant change GU_TEXTURE_32BITF | GU_VERTEX_32BITF to GU_TEXTURE_16BIT | GU_VERTEX_16BIT. When I do all I see is a blank screen. Is it because I have not initilized the right settings?

Here is the initilizing code:

Code: Select all


#define FRAMEBUFFER_SIZE &#40;LINESIZE*SCREEN_HEIGHT*4&#41;

void initGraphics&#40;&#41;
&#123;
        dispBufferNumber = 0;

        sceGuInit&#40;&#41;;

        guStart&#40;&#41;;
        sceGuDrawBuffer&#40;GU_PSM_8888, &#40;void*&#41;FRAMEBUFFER_SIZE, LINESIZE&#41;;
        sceGuDispBuffer&#40;SCREEN_WIDTH, SCREEN_HEIGHT, &#40;void*&#41;0, LINESIZE&#41;;
        sceGuClear&#40;GU_COLOR_BUFFER_BIT | GU_DEPTH_BUFFER_BIT&#41;;
        sceGuDepthBuffer&#40;&#40;void*&#41; &#40;FRAMEBUFFER_SIZE*2&#41;, LINESIZE&#41;;
        sceGuOffset&#40;2048 - &#40;SCREEN_WIDTH / 2&#41;, 2048 - &#40;SCREEN_HEIGHT / 2&#41;&#41;;
        sceGuViewport&#40;2048, 2048, SCREEN_WIDTH, SCREEN_HEIGHT&#41;;
        sceGuDepthRange&#40;0xc350, 0x2710&#41;;
        sceGuScissor&#40;0, 0, SCREEN_WIDTH, SCREEN_HEIGHT&#41;;
        sceGuEnable&#40;GU_SCISSOR_TEST&#41;;
        sceGuAlphaFunc&#40;GU_GREATER, 0, 0xff&#41;;
        sceGuEnable&#40;GU_ALPHA_TEST&#41;;
        sceGuDepthFunc&#40;GU_GEQUAL&#41;;
        sceGuEnable&#40;GU_DEPTH_TEST&#41;;
        sceGuFrontFace&#40;GU_CW&#41;;
        sceGuShadeModel&#40;GU_SMOOTH&#41;;
        sceGuEnable&#40;GU_CULL_FACE&#41;;
        sceGuEnable&#40;GU_TEXTURE_2D&#41;;
        sceGuEnable&#40;GU_CLIP_PLANES&#41;;
        sceGuTexMode&#40;GU_PSM_8888, 0, 0, 0&#41;;
        sceGuTexFunc&#40;GU_TFX_REPLACE, GU_TCC_RGBA&#41;;
        sceGuTexFilter&#40;GU_NEAREST, GU_NEAREST&#41;;
        sceGuAmbientColor&#40;0xffffffff&#41;;
        sceGuEnable&#40;GU_BLEND&#41;;
        sceGuBlendFunc&#40;GU_ADD, GU_SRC_ALPHA, GU_ONE_MINUS_SRC_ALPHA, 0, 0&#41;;
        sceGuFinish&#40;&#41;;
        sceGuSync&#40;0, 0&#41;;

        sceDisplayWaitVblankStart&#40;&#41;;
        sceGuDisplay&#40;GU_TRUE&#41;;
        initialized = 1;
&#125;

Here is the blit function that I use now with 32 pixels/slices:

Code: Select all

void blitAlphaImageToScreen&#40;int sx, int sy, s_screen* source, int dx, int dy&#41;
&#123;
        if &#40;!initialized&#41; return;

        sceKernelDcacheWritebackInvalidateAll&#40;&#41;;
        guStart&#40;&#41;;

		sceGuClutMode&#40;GU_PSM_8888,0,0xff,0&#41;; // 32-bit palette
		sceGuClutLoad&#40;&#40;32&#41;,palette&#41;; // upload 32*8 entries &#40;256&#41;

		sceGuTexMode&#40;GU_PSM_T8,0,0,0&#41;;  
		sceGuTexImage&#40;0,512,512,source->width, source->data&#41;;
		sceGuTexFunc&#40;GU_TFX_REPLACE,GU_TCC_RGB&#41;;
        
        int j = 0;
        while &#40;j < source->width&#41; &#123;
                Vertex* vertices = &#40;Vertex*&#41; sceGuGetMemory&#40;2 * sizeof&#40;Vertex&#41;&#41;;
                int sliceWidth = 32;
                if &#40;j + sliceWidth > source->width&#41; sliceWidth = source->width - j;
                vertices&#91;0&#93;.u = sx + j;
                vertices&#91;0&#93;.v = sy;
                vertices&#91;0&#93;.x = dx + j;
                vertices&#91;0&#93;.y = dy;
                vertices&#91;0&#93;.z = 0;
                vertices&#91;1&#93;.u = sx + j + sliceWidth;
                vertices&#91;1&#93;.v = sy + source->height;
                vertices&#91;1&#93;.x = dx + j + sliceWidth;
                vertices&#91;1&#93;.y = dy + source->height;
                vertices&#91;1&#93;.z = 0;
                sceGuDrawArray&#40;GU_SPRITES, GU_TEXTURE_32BITF | GU_VERTEX_32BITF | GU_TRANSFORM_2D, 2, 0, vertices&#41;;
                j += sliceWidth;
        &#125;

        sceGuFinish&#40;&#41;;
        sceGuSync&#40;0, 0&#41;;
&#125;

Thank you again for all your help!

Aion · Post by **Aion** » Tue Aug 29, 2006 9:51 am

Did you change the declaration of the "Vertex" structure to match the 16bits vertex format?

And did you wanted to change the vertex color mode, or the vertex coordinates ? Because "GU_VERTEX_32BITF" is used for coordinates, while stuff like "GU_COLOR_8888" is used for color mode of the vertex.

SamuraiX · Post by **SamuraiX** » Tue Aug 29, 2006 11:23 am

Aion wrote:Did you change the declaration of the "Vertex" structure to match the 16bits vertex format?

And did you wanted to change the vertex color mode, or the vertex coordinates ? Because "GU_VERTEX_32BITF" is used for coordinates, while stuff like "GU_COLOR_8888" is used for color mode of the vertex.

I didn't know that for 16 bit the vertex need to be changed. But it does make sense as I'm using..

Code: Select all

typedef struct
&#123;
	float u,v;
	float x,y,z;
&#125; Vertex;

Which must be for 32 bit mode. While I'm assuming...

Code: Select all

typedef struct
&#123;
    unsigned short u, v;
    short x, y, z;
&#125; Vertex;

Must be for 16 bit mode.

And yes, Vertex would be used for coordinates.

Thank You Aion for point this out!

**Updated** Yep that did it! But the performance increase wasn't much.

Aion · Post by **Aion** » Tue Aug 29, 2006 11:38 am

I'm not 100% certain, but it seems that using 16bits fixed point coordinates wouldn't increase performances greatly, since you do not have that many vertex to transfer each frame.

The point of going from 32bits to 16bits is to reduce memory transfer. So it would have a significant gain in vertex/texture color because of the amount of data involve, but not on vertex since in your case, there are so little.

Btw, changing vertex coordinate to 16bits means that it's now using a fixed integer of 1:15 (1-integer 15-fractional)

BTw, why do you wish to have such a high refresh rate? Usually we lock at 60, since the psp screen only refresh 60 time per seconds and we wait for it to not be currently drawing, to avoid graphical glitches (imagine that you have a red screen, then in the middle of the screen refresh, you change it to blue, you'll end up with a top half of red, and bottom of blue, for a split second)

SamuraiX · Post by **SamuraiX** » Tue Aug 29, 2006 12:00 pm

Well the goal was to increase the fps on my port. Previously I could attain 60 fps but it would decrease down to as low as 15 fps depending on how many objects were on screen.

I'm trying to reduce the amount of times it decreases. As for graphical glitches there are none surprisingly. But I found if I lock the refresh rate to 60 fps (sceDisplayWaitVblankStart after each blit) There are times that the performance is worse than writing directly to vram!

However, I've never written any code for gpu processesing before PSP (execept directly to vram). I appreciate all the help everyone has given. This has been a great learning experience!

Hopefully my questions sounded intellegent to say the least. ;)

Jim · Post by **Jim** » Tue Aug 29, 2006 5:41 pm

sceDisplayWaitVblankStart after each blit

Surely you mean 'just before every call to sceGuSwapBuffers'? You definitely don't want to call that function after every blit!

Jim

Aion · Post by **Aion** » Tue Aug 29, 2006 10:51 pm

Yeah, after each buffer swap. I was thinking of it in term of being done blitting in the backbuffer and then swapping :)

Tinnus · Post by **Tinnus** » Wed Aug 30, 2006 12:09 am

Not AFTER each buffer swap.

It should be BEFORE each buffer swap.

Aion · Post by **Aion** » Wed Aug 30, 2006 12:57 am

*sigh*

Sorry for the semantic, between the end of the backbuffer blitting and the swapping.

I did code it right :P

SamuraiX · Post by **SamuraiX** » Wed Aug 30, 2006 2:12 am

Jim wrote:
sceDisplayWaitVblankStart after each blit
Surely you mean 'just before every call to sceGuSwapBuffers'? You definitely don't want to call that function after every blit!

Jim

Thats what i meant. I should of been more clear. Originally I placed it after every blit just to see that It impacted the performance greatly (makes sense). I then re-ordered to the following...

First is sceDisplayWaitVblankStart
Second is sceGuSwapBuffers
Third is call blit function.

This way I will always blit to the back buffer then on the next go around it will vwait and swap then right to the back buffer again.

Tinnus · Post by **Tinnus** » Wed Aug 30, 2006 7:06 am

I think that could potentially cause problems like a 1 frame dalay in the display. You should do:

- blit to the backbuffer
- WaitVBlankStart
- SwapBuffers

SamuraiX · Post by **SamuraiX** » Fri Sep 01, 2006 5:32 am

I just wanted to thank all of you for your help. Everything is running great and fast!

Thank You... Jim!!!, Aion, chp and Tinnus

This thread can be closed now.