Fast stretching advice.

Discuss the development of new homebrew software, tools and libraries.

Moderators: cheriff, TyRaNiD

Post Reply
HexDump
Posts: 70
Joined: Tue Jun 07, 2005 9:18 pm

Fast stretching advice.

Post by HexDump »

Hi,

I want to add a fast way to make image streaching to my emulator. I have think of using glu and rendering to a texture and then put it in a cuad. Do anybody know if this is the fastest way to do this?.


Thanks in advance,
HexDump.
User avatar
Raphael
Posts: 646
Joined: Tue Jan 17, 2006 4:54 pm
Location: Germany
Contact:

Post by Raphael »

Can't think of a better way, esp if you already use the GU for some of the drawing.
HexDump
Posts: 70
Joined: Tue Jun 07, 2005 9:18 pm

Post by HexDump »

No, I don´t use it for anything right now, anyway I´m moving to it. Thanks Raphael for the answer anly wanted to be sure.


HexDump.
User avatar
Raphael
Posts: 646
Joined: Tue Jan 17, 2006 4:54 pm
Location: Germany
Contact:

Post by Raphael »

Well, that's no problem either. If you (software) render to any screen buffer in system ram, you can use sceGuCopyImage to transfer it to VRAM quickly (150MB/s) and then use it as a texture for a quad.
Going for (mostly) complete hardware rendering and doing render-to-textures will probably be the fastest way though.

Good luck :)
JoAl
Posts: 7
Joined: Mon Aug 15, 2005 5:11 am

Post by JoAl »

Raphael wrote:Going for (mostly) complete hardware rendering and doing render-to-textures will probably be the fastest way though.
Raphael, what do you mean with "mostly complete hardware rendering"?
Could you point me to some source code/sample?
User avatar
Raphael
Posts: 646
Joined: Tue Jan 17, 2006 4:54 pm
Location: Germany
Contact:

Post by Raphael »

I was refering to HexDumps emulator, as it most likely currently only uses software rendering of tile sets and/or sprites (depending on what his emulator is). Drawing these tiles with textured quads using the GU was what I was calling "(mostly) complete hardware rendering" in his case (because the tiles itself will have to be generated in software anyway).

As for general GU usage, just take a look at the SDK samples.
HexDump
Posts: 70
Joined: Tue Jun 07, 2005 9:18 pm

Post by HexDump »

Good point rafael, I will try that.

Edited: Raphael have you done any speed test writing an image to vram directly (writing in a loop), and writing it to ram first and then copying to vram with the function you told me?. Do you know what is fater?.

Thanks again,
HexDump.
User avatar
Raphael
Posts: 646
Joined: Tue Jan 17, 2006 4:54 pm
Location: Germany
Contact:

Post by Raphael »

I just made a little test programm to check on that and it seems that writing directly to VRAM is little more than double as slow as using sceGuCopyImage when doing only straight copies of a 512x512x32bit texture from system ram to VRAM.
However that's no real-world performance measure but there's also another downside to the direct copy:

1) you are using the CPU to copy the data from system ram to VRAM, which will lessen the CPU cycles available for other stuff

2) the copy also is asynchronous, meaning that while the copy is in process, nothing else can be done - for the sceGuCopyImage it's different, you can call the function at the start of your innerloop (directly calling sceGuFinish() once afterwards, to get the command sent) and put a whole bunch of other code afterwards before you finally call sceGuSync(0,0) to wait for the blit to finish completely. The code and the copy will then be done at the same time. For example you could let the GU render the last tile, while you upload the next and at the same time do some calculations on the cpu.

3) you can only write to 32bit boundaries, meaning you can only copy whole ints and won't be able to write single bytes or words and your vram pointer needs to be aligned correctly (which it should be anyway). This is no problem for 32bit images and also for images who's width is a multiple of 4 (or 8 for 4bit images), but a major pain in the ass if you want to copy small 4 or 8bit images to destination offsets which don't fit the alignment (fe non multiple-of-8 x coordinates for 4bit). For sceGuCopyImage you can at least also copy 16bit images without any problem.

So better stick with using sceGuCopyImage when possible :)
HexDump
Posts: 70
Joined: Tue Jun 07, 2005 9:18 pm

Post by HexDump »

Thanks again Raphael, just a correction, I think you wanted to say "syncronous" insted of "asyncronous".


Thanks in advance,
HexDump.
HexDump
Posts: 70
Joined: Tue Jun 07, 2005 9:18 pm

Post by HexDump »

Raphael another thing that is messing me.

You wrote this:

"Well, that's no problem either. If you (software) render to any screen buffer in system ram, you can use sceGuCopyImage to transfer it to VRAM quickly (150MB/s) and then use it as a texture for a quad.
Going for (mostly) complete hardware rendering and doing render-to-textures will probably be the fastest way though. "


Errr, What do you mean by system ram? I mean, from what I remember you can map a ram address (usually 0x4000000) to be the starting position of the screen. So everything is in ram, doesn´t it?, I think you don´t have an specific vram memory. Sorry if this is a dumb question, But I have been out for 4 months...

Thanks in advance,
HexDump.
BlackDiamond
Posts: 16
Joined: Sat Jul 02, 2005 7:31 pm
Location: Paris, FRANCE

Post by BlackDiamond »

0x4000000 *is* vram start address, it's part of the GE and *outside* the 32MB of system RAM (0x08000000 - 0x09ffffff)
User avatar
Raphael
Posts: 646
Joined: Tue Jan 17, 2006 4:54 pm
Location: Germany
Contact:

Post by Raphael »

HexDump wrote:Thanks again Raphael, just a correction, I think you wanted to say "syncronous" insted of "asyncronous".
No, it's really asynchronous in this case. Just search on google for "asynchronous DMA" and you'll know what it exactly means :)
Synchronous transfer would be doing it by the CPU (direct copy).
HexDump wrote:Errr, What do you mean by system ram? I mean, from what I remember you can map a ram address (usually 0x4000000) to be the starting position of the screen. So everything is in ram, doesn´t it?, I think you don´t have an specific vram memory.
No, there is a 2MB size eDRAM attached to the GU, which functions as dedicated VRAM. It's start address as BlackDiamond said is 0x4000000 and reaches up to 0x4200000
HexDump
Posts: 70
Joined: Tue Jun 07, 2005 9:18 pm

Post by HexDump »

Hehe ok, I misunderstood you :).

Greetings,
HexDump.
User avatar
Raphael
Posts: 646
Joined: Tue Jan 17, 2006 4:54 pm
Location: Germany
Contact:

Post by Raphael »

Ah hell, fell for my own error, it really should read synchronous in 2) since I was referring to the direct copy first.

That's what happens when you write a complete post about your findings, just to notice at last minute that you mistook the facts in the first place and have to switch good & bad in your excerpt :(
HexDump
Posts: 70
Joined: Tue Jun 07, 2005 9:18 pm

Post by HexDump »

Hi Raphael,


No one is perfect :).

Thanks,
HexDump.
Post Reply