Slow PNG encode? Memory stick issue?

Discuss the development of new homebrew software, tools and libraries.

Moderators: cheriff, TyRaNiD

bulb
Posts: 50
Joined: Thu Jan 19, 2006 10:59 pm

Slow PNG encode? Memory stick issue?

Post by bulb »

Hi,

I am new to homebrewing on PSP. I have written the code that takes a snapshot of current display and saves it as PNG.

I have compiled libpng 1.2.8 and zlib 1.2.3 without any modification to the source code.

When I take a snapshot it takes approx 10 seconds to finish the job. I thought PSP is faster than this.

I believe access to memory stick is slowing things down. I did some debugging - tracing to file when each scan line gets processed by libpng. The log function is simple:

void Log(const char *s)
{
FILE *fp = fopen("trace.txt", "a");
fputs(s, fp);
fclose(fp);
}

The encoding process takes 2 and a half minutes when the logging is turned on!!

Is memory stick really that slow? Any suggestions how to speed up the encoding process?

Thanx
Fanjita
Posts: 217
Joined: Wed Sep 28, 2005 9:31 am

Post by Fanjita »

It is quite slow, but not usually that slow. What seems to hurt the speed the most is doing lots of small individual writes - especially if you close the file in between.

Tracing by closing and reopening the file is painfully slow - unfortunately it seems to be necessary if you want to be sure the file is flushed, in case of a crash.

To speed up your app, you could try writing the PNG file data to memory first, rather than to file. Assuming that you're using libpng, this is fairly easy to do - there's a function called something like png_set_write_func() that lets you register your own implementation of the writing function, that can write to mem rather than disk.

You might also look into the async file I/O functions - e.g. sceIoWriteAsync, as these let your program continue execution before the I/O is complete.

Finally, you could try doing stuff like disabling interrupts, or copying the screen to a buffer before encoding it, to try to reduce any visual glitches from screen updates before you finish encoding.
Got a v2.0-v2.80 firmware PSP? Download the eLoader here to run homebrew on it!
The PSP Homebrew Database needs you!
bulb
Posts: 50
Joined: Thu Jan 19, 2006 10:59 pm

Post by bulb »

Fanjita,

Thanks for your tips.

Too bad logging like this is so slow, it makes logging useless, because I want to know what was the last thing that was executed before crash, not what was the last thing that was unbuffered from stream. :(

I'll try writting PNG file to memory first and then write to the stream in one blow.

Bulb
User avatar
nullp01nter
Posts: 26
Joined: Wed Jan 04, 2006 7:40 am
Location: Saxony/Germany

Post by nullp01nter »

@bulb:
You may want to try PSPLINK to have proper logging support. You either have to setup a Wifi network for this or get (or build) a SIO to PC converter. You then can use printf() to print logging statements to stdout and they will appear on your serial console or telnet. Have a look here: http://forums.ps2dev.org/viewtopic.php?t=3834

Thoralt
Last edited by nullp01nter on Fri Jan 20, 2006 9:04 am, edited 1 time in total.
TyRaNiD
Posts: 907
Joined: Sun Jan 18, 2004 12:23 am

Post by TyRaNiD »

And psplink has an inbuilt screenshot command (admitedly in .bmp only but that is what graphics converters are for :P).

At any rate it will be because you are writing in small chunks, the kernel's io functions do almost zero buffering, takes maybe a second or so to write out an uncompressed bitmap (nearly 400k) from psplink which builds in memory first and fires off the entire lot to ms.
bulb
Posts: 50
Joined: Thu Jan 19, 2006 10:59 pm

Post by bulb »

Nullpointer and Tyranid,

Thanks for your tips. I have read the discussion about PSPLink and I think the stuff is cool. I will probably switch to wifi logging (mainly because I lack electronics knowledge to build a special serial cable and because I already have a router :).

Anyway, I thought access to memory stick would outperform WIFI or serial access.

As I said, 1st I have to encode PNG in memory and then write whole chunk to memory stick. If file access is really such a burden on the system, then writing compressed data is better thing to do (PNG vs BMP).
bulb
Posts: 50
Joined: Thu Jan 19, 2006 10:59 pm

Post by bulb »

I've done simple test now that just writes 100 Kbytes to memory stick. I have written it:

a. in one single block and
b. one byte at a time (using fwrite() 100000 times)

Both methods took neary the same time. The problem is that they take about 40 seconds to complete! This is 2.5 Kb/sec.

Is this rate OK or is there something wrong with my memory stick or (hopefully not) with my PSP?

If this is the normal rate, then I guess I should better move my file I/O to different thread (assuming that main thread can run, when worker thread is blocked with I/O operation). Currently I only use one thread. Could this be my sole problem?

Thanx
bulb
Posts: 50
Joined: Thu Jan 19, 2006 10:59 pm

Post by bulb »

Everyone,

I was looking through some example code and noone uses another thread for file IO, so this must not be an issue.

I experimented with another memory stick, which proved faster: doing 100 Kb writes in 7 seconds, instead of 40 seconds. Still way too slow. Interesting note: both memory sticks are from Sony and have 1 Gb capacity.

I was using slower memory stick in further examination.

Original code giving 2.5 Kb/s rate:

u8 buffer[100000];
FILE *f = fopen("test.bin", "wb");
fwrite(buffer, 100000, 1, f);
fclose(f);

Modified code giving 150 Kb/s rate:

u8 buffer[100000];
int f = sceIoOpen("test.bin", PSP_O_WRONLY | PSP_O_CREAT, 0777);
sceIoWrite(f, buffer, 100000);
sceIoClose(f);

Seems using Sony's file IO functions directly gives much better performance than C runtime library. This seems strange, because CRT is just a wrapper around Sony's file IO. Seems like a bug of CRT. Sadly I can't check the sources, because I am using devkitpro R6, which comes precompiled.

Newly released MSTest v1.0 also gives such transfer rate, with various buffer sizes it was possible to squeeze it to max 170 Kb/s.

I am still not completly satisfied though, because USB copy to memory stick gives me around 2500 Kb/s. Though, I have run out of ideas where to get better transfer rate.

I would really like to hear your experience with write transfer rate.
Fanjita
Posts: 217
Joined: Wed Sep 28, 2005 9:31 am

Post by Fanjita »

Don't forget that the USB transfer is buffered asynchronously in RAM - if you're doing one single file write, you will get large transfer speeds.

You'll notice the true speed if you try writing to multiple files, it seems to wait for the async completion before moving between files.
Got a v2.0-v2.80 firmware PSP? Download the eLoader here to run homebrew on it!
The PSP Homebrew Database needs you!
bulb
Posts: 50
Joined: Thu Jan 19, 2006 10:59 pm

Post by bulb »

Fanjita,

Are you saying that IO Async equivalents are synthetically faster? That Async function first copies everything in RAM, returns immediately and in the background flushes that buffer?

Sadly I can't test Async functions, because my wife is currently hooked with Lumines. :)

Though I have some doubts. USB transfer could be doing fake trasfer rates, but by copying very large files (100 Mb and more) I don't see a drop in transfer rate, which should be dropped, because there is no memory in PSP to hold such large buffer.
bulb
Posts: 50
Joined: Thu Jan 19, 2006 10:59 pm

Post by bulb »

Ok, I have played with IO async function and they do not work any faster. Unless I call sceIoWaitAsync() the data is not written to the memory stick at all. When it is written to the memory stick, it consumes the same amount of time as equivalent sync function does. Funny thing is that the file is nowhere to be found, eventhough memory stick was accessed and i called sceIoWaitAsync() before sceIoCloseAsync().

So, I still wonder how come USB transfer rate is 2500 Kb/sec, but my code and any other program only achieves 170 Kb/sec. This is 14 times faster! I know memory stick can't be as slow as 170 Kb/sec.
jimparis
Posts: 1145
Joined: Fri Jun 10, 2005 4:21 am
Location: Boston

Post by jimparis »

I can easily get 15MB/s on mine using read(). Wasn't interested in writing when I ran this test. benchmark.txt
weltall
Posts: 310
Joined: Fri Feb 20, 2004 1:56 am
Contact:

Post by weltall »

why don't you try sandisk i noticed some time ago that sony's memory stick are slower
bulb
Posts: 50
Joined: Thu Jan 19, 2006 10:59 pm

Post by bulb »

I can't afford to buy Sandisk just to try things out. :)

Anyway, I don't have an issue with having Sony's slower memory stick, the issues here are:

1. Why is CRT IO up to 70 times slower than Sony's IO? Seems like a bug in CRT code.

2. Why are Sony's IO routines 14 times slower than using USB transfer? Is this deliberate action of Sony? Developers would not need fast transfer rates for writing and reading saves, so did Sony choose to put some waits in these routines?

What I would really like to see if someone tries to compare USB transfer rate versus transfer rate of his own code.
weltall
Posts: 310
Joined: Fri Feb 20, 2004 1:56 am
Contact:

Post by weltall »

for the second thatcould also be made by buffering on pc side
bulb
Posts: 50
Joined: Thu Jan 19, 2006 10:59 pm

Post by bulb »

weltall,

No buffering, if I transfer like 100 Mb file and then stop USB connection the file is completly transfered on the PSP. If I was to write 100 Mb file using sceIoWrite() I would have to wait like for eternity. If transfer is 14 times slower you can see the difference!

TyRaNiD,

I have just run your (great) PSPLink and experimented with USB copy that your (again great:) application provides. I've got 1700 - 1800 Kb/sec transfer rate, which is slower than usual USB copy (2500 Kb/sec), but still way faster than my sceIoWrite(). What's the catch here? Is sceIoWrite() performing faster if the buffer is in specific memory location? Do you do anything special when copying files via USB?

BTW, if I read from PSP, the USB transfer rate from your program is max 5800 Kb/sec, which is exactly the same as via normal USB transfer.
jonny
Posts: 351
Joined: Thu Sep 22, 2005 5:46 pm
Contact:

Post by jonny »

you could try to make sure data in memory is aligned to 64 bytes (this makes a difference on reading, i'm sure it makes on writing too)
bulb
Posts: 50
Joined: Thu Jan 19, 2006 10:59 pm

Post by bulb »

jonny,

Good point! Can you tell me how? I was looking around for __atribute or #pragma, but can't find anything useful.

Thanx!
User avatar
nullp01nter
Posts: 26
Joined: Wed Jan 04, 2006 7:40 am
Location: Saxony/Germany

Post by nullp01nter »

Try __attribute__((aligned(64)))

Thoralt
jonny
Posts: 351
Joined: Thu Sep 22, 2005 5:46 pm
Contact:

Post by jonny »

void *p = memalign(64, size)

if you want to allocate it
bulb
Posts: 50
Joined: Thu Jan 19, 2006 10:59 pm

Post by bulb »

Alright!!! This did the trick!

Now I've got 1900 Kb/sec trasfer rate if using sceIoWrite() and 165 Kb/sec transfer rate if using fwrite().

Can anyone else check if memory alignment makes a difference? Could be that you ought to have slow memory stick to spot the difference. :)

Thanx for the tip jonny!
patpsp
Posts: 31
Joined: Tue Oct 25, 2005 5:24 pm

Post by patpsp »

nullp01nter wrote:Try __attribute__((aligned(64)))

Thoralt
where do you put that code ?
User avatar
nullp01nter
Posts: 26
Joined: Wed Jan 04, 2006 7:40 am
Location: Saxony/Germany

Post by nullp01nter »

Normally you use this if you declare your data in your code, e. g.:

Code: Select all

unsigned char __attribute__((aligned(16))) ucData[256];
This reserves a chunk of memory (statically) which is aligned at 16 bytes.

Thoralt
bulb
Posts: 50
Joined: Thu Jan 19, 2006 10:59 pm

Post by bulb »

patpsp,

Something like:

u8 buffer[SIZE] __attribute__((aligned(64)));
patpsp
Posts: 31
Joined: Tue Oct 25, 2005 5:24 pm

Post by patpsp »

sorry, to post again here, but
i think the problem is not solved

bulb's tests revealed that sceIoWrite() gives better performance than using fwrite(). Is it normal ? Do someone know why or is it an unknown bug ?
jonny
Posts: 351
Joined: Thu Sep 22, 2005 5:46 pm
Contact:

Post by jonny »

i think fwrite is buffered, so there is no control over the internal buffer used (and the performance drop due to non optimal buffer alignment or ?maybe? non optimal buffer size).

c open/write functions (fcntl.h) should give the same performance of sceIoWrite
patpsp
Posts: 31
Joined: Tue Oct 25, 2005 5:24 pm

Post by patpsp »

thanks for your answer jonny

the fact is libpng uses fread and fwrite, so we would load our png textures faster whether we rewrite libpng with write/read instead of fwrite/fread, or we find a solution to implement fwrite/fread with same efficiency
jsgf
Posts: 254
Joined: Tue Jul 12, 2005 11:02 am
Contact:

Post by jsgf »

You can use setvbuf() to control stdio's buffering, including buffer size and policy.
bulb
Posts: 50
Joined: Thu Jan 19, 2006 10:59 pm

Post by bulb »

patpsp,

If your only concern is the use of libpng you can set the library to use your own write function via png_set_write_fn(), where you can finetune your output to memory stick.

Best regards
patpsp
Posts: 31
Joined: Tue Oct 25, 2005 5:24 pm

Post by patpsp »

bulb -> ok, i have looked at libpng, and as you said, only two functions uses fread / fwrite :

- pngwio.c :

Code: Select all

#if !defined(PNG_NO_STDIO)
/* This is the function that does the actual writing of data.  If you are
   not writing to a standard C stream, you should create a replacement
   write_data function and use it at run time with png_set_write_fn(), rather
   than changing the library. */
#ifndef USE_FAR_KEYWORD
void PNGAPI
png_default_write_data(png_structp png_ptr, png_bytep data, png_size_t length)
{
   png_uint_32 check;

#if defined(_WIN32_WCE)
   if ( !WriteFile((HANDLE)(png_ptr->io_ptr), data, length, &check, NULL) )
      check = 0;
#else
   check = fwrite(data, 1, length, (png_FILE_p)(png_ptr->io_ptr));
#endif
   if (check != length)
      png_error(png_ptr, "Write Error");
}
#else
/* this is the model-independent version. Since the standard I/O library
   can't handle far buffers in the medium and small models, we have to copy
   the data.
*/

#define NEAR_BUF_SIZE 1024
#define MIN&#40;a,b&#41; &#40;a <= b ? a &#58; b&#41;


void PNGAPI
png_default_write_data&#40;png_structp png_ptr, png_bytep data, png_size_t length&#41;
&#123;
   png_uint_32 check;
   png_byte *near_data;  /* Needs to be "png_byte *" instead of "png_bytep" */
   png_FILE_p io_ptr;

   /* Check if data really is near. If so, use usual code. */
   near_data = &#40;png_byte *&#41;CVT_PTR_NOCHECK&#40;data&#41;;
   io_ptr = &#40;png_FILE_p&#41;CVT_PTR&#40;png_ptr->io_ptr&#41;;
   if &#40;&#40;png_bytep&#41;near_data == data&#41;
   &#123;
#if defined&#40;_WIN32_WCE&#41;
      if &#40; !WriteFile&#40;io_ptr, near_data, length, &check, NULL&#41; &#41;
         check = 0;
#else
      check = fwrite&#40;near_data, 1, length, io_ptr&#41;;
#endif
   &#125;
   else
   &#123;
      png_byte buf&#91;NEAR_BUF_SIZE&#93;;
      png_size_t written, remaining, err;
      check = 0;
      remaining = length;
      do
      &#123;
         written = MIN&#40;NEAR_BUF_SIZE, remaining&#41;;
         png_memcpy&#40;buf, data, written&#41;; /* copy far buffer to near buffer */
#if defined&#40;_WIN32_WCE&#41;
         if &#40; !WriteFile&#40;io_ptr, buf, written, &err, NULL&#41; &#41;
            err = 0;
#else
         err = fwrite&#40;buf, 1, written, io_ptr&#41;;
#endif
         if &#40;err != written&#41;
            break;
         else
            check += err;
         data += written;
         remaining -= written;
      &#125;
      while &#40;remaining != 0&#41;;
   &#125;
   if &#40;check != length&#41;
      png_error&#40;png_ptr, "Write Error"&#41;;
&#125;
#endif
#endif
- pngrio.c :

Code: Select all

#if !defined&#40;PNG_NO_STDIO&#41;
/* This is the function that does the actual reading of data.  If you are
   not reading from a standard C stream, you should create a replacement
   read_data function and use it at run time with png_set_read_fn&#40;&#41;, rather
   than changing the library. */
#ifndef USE_FAR_KEYWORD
void PNGAPI
png_default_read_data&#40;png_structp png_ptr, png_bytep data, png_size_t length&#41;
&#123;
   png_size_t check;

   /* fread&#40;&#41; returns 0 on error, so it is OK to store this in a png_size_t
    * instead of an int, which is what fread&#40;&#41; actually returns.
    */
#if defined&#40;_WIN32_WCE&#41;
   if &#40; !ReadFile&#40;&#40;HANDLE&#41;&#40;png_ptr->io_ptr&#41;, data, length, &check, NULL&#41; &#41;
      check = 0;
#else
   check = &#40;png_size_t&#41;fread&#40;data, &#40;png_size_t&#41;1, length,
      &#40;png_FILE_p&#41;png_ptr->io_ptr&#41;;
#endif

   if &#40;check != length&#41;
      png_error&#40;png_ptr, "Read Error"&#41;;
&#125;
#else
/* this is the model-independent version. Since the standard I/O library
   can't handle far buffers in the medium and small models, we have to copy
   the data.
*/

#define NEAR_BUF_SIZE 1024
#define MIN&#40;a,b&#41; &#40;a <= b ? a &#58; b&#41;

static void /* PRIVATE */
png_default_read_data&#40;png_structp png_ptr, png_bytep data, png_size_t length&#41;
&#123;
   int check;
   png_byte *n_data;
   png_FILE_p io_ptr;

   /* Check if data really is near. If so, use usual code. */
   n_data = &#40;png_byte *&#41;CVT_PTR_NOCHECK&#40;data&#41;;
   io_ptr = &#40;png_FILE_p&#41;CVT_PTR&#40;png_ptr->io_ptr&#41;;
   if &#40;&#40;png_bytep&#41;n_data == data&#41;
   &#123;
#if defined&#40;_WIN32_WCE&#41;
      if &#40; !ReadFile&#40;&#40;HANDLE&#41;&#40;png_ptr->io_ptr&#41;, data, length, &check, NULL&#41; &#41;
         check = 0;
#else
      check = fread&#40;n_data, 1, length, io_ptr&#41;;
#endif
   &#125;
   else
   &#123;
      png_byte buf&#91;NEAR_BUF_SIZE&#93;;
      png_size_t read, remaining, err;
      check = 0;
      remaining = length;
      do
      &#123;
         read = MIN&#40;NEAR_BUF_SIZE, remaining&#41;;
#if defined&#40;_WIN32_WCE&#41;
         if &#40; !ReadFile&#40;&#40;HANDLE&#41;&#40;io_ptr&#41;, buf, read, &err, NULL&#41; &#41;
            err = 0;
#else
         err = fread&#40;buf, &#40;png_size_t&#41;1, read, io_ptr&#41;;
#endif
         png_memcpy&#40;data, buf, read&#41;; /* copy far buffer to near buffer */
         if&#40;err != read&#41;
            break;
         else
            check += err;
         data += read;
         remaining -= read;
      &#125;
      while &#40;remaining != 0&#41;;
   &#125;
   if &#40;&#40;png_uint_32&#41;check != &#40;png_uint_32&#41;length&#41;
      png_error&#40;png_ptr, "read Error"&#41;;
&#125;
#endif
#endif
Ok, so i have to try to modify in order to use sceIoWrite() / sceIoRead().
Plus, the png_FILE_p structure.


Plus, do i have also to modify malloc in libpng ?

- pngmem.c :

Code: Select all

/* Allocate memory.  For reasonable files, size should never exceed
   64K.  However, zlib may allocate more then 64K if you don't tell
   it not to.  See zconf.h and png.h for more information.  zlib does
   need to allocate exactly 64K, so whatever you call here must
   have the ability to do that. */

png_voidp PNGAPI
png_malloc&#40;png_structp png_ptr, png_uint_32 size&#41;
&#123;
   png_voidp ret;

#ifdef PNG_USER_MEM_SUPPORTED
   if &#40;png_ptr == NULL || size == 0&#41;
      return &#40;NULL&#41;;

   if&#40;png_ptr->malloc_fn != NULL&#41;
       ret = &#40;&#40;png_voidp&#41;&#40;*&#40;png_ptr->malloc_fn&#41;&#41;&#40;png_ptr, &#40;png_size_t&#41;size&#41;&#41;;
   else
       ret = &#40;png_malloc_default&#40;png_ptr, size&#41;&#41;;
   if &#40;ret == NULL && &#40;png_ptr->flags&PNG_FLAG_MALLOC_NULL_MEM_OK&#41; == 0&#41;
       png_error&#40;png_ptr, "Out of Memory!"&#41;;
   return &#40;ret&#41;;
&#125;

png_voidp PNGAPI
png_malloc_default&#40;png_structp png_ptr, png_uint_32 size&#41;
&#123;
   png_voidp ret;
#endif /* PNG_USER_MEM_SUPPORTED */

   if &#40;png_ptr == NULL || size == 0&#41;
      return &#40;NULL&#41;;

#ifdef PNG_MAX_MALLOC_64K
   if &#40;size > &#40;png_uint_32&#41;65536L&#41;
   &#123;
#ifndef PNG_USER_MEM_SUPPORTED
      if&#40;png_ptr->flags&PNG_FLAG_MALLOC_NULL_MEM_OK&#41; == 0&#41;
         png_error&#40;png_ptr, "Cannot Allocate > 64K"&#41;;
      else
#endif
         return NULL;
   &#125;
#endif

 /* Check for overflow */
#if defined&#40;__TURBOC__&#41; && !defined&#40;__FLAT__&#41;
 if &#40;size != &#40;unsigned long&#41;size&#41;
   ret = NULL;
 else
   ret = farmalloc&#40;size&#41;;
#else
# if defined&#40;_MSC_VER&#41; && defined&#40;MAXSEG_64K&#41;
 if &#40;size != &#40;unsigned long&#41;size&#41;
   ret = NULL;
 else
   ret = halloc&#40;size, 1&#41;;
# else
 if &#40;size != &#40;size_t&#41;size&#41;
   ret = NULL;
 else
   ret = malloc&#40;&#40;size_t&#41;size&#41;;
# endif
#endif

#ifndef PNG_USER_MEM_SUPPORTED
   if &#40;ret == NULL && &#40;png_ptr->flags&PNG_FLAG_MALLOC_NULL_MEM_OK&#41; == 0&#41;
      png_error&#40;png_ptr, "Out of Memory"&#41;;
#endif

   return &#40;ret&#41;;
&#125;
The line : ret = malloc((size_t)size);
has to be changed in memalign(64, (size_t)size) , isn't it ?

jsgf -> i looked for setvbuf in pspsdk and i found this in pspsdk/src/lib/LIB.status :
Stdio:
-----

stdin, stdout, stderr, ok. Maybe some specific ps2 function to switch stderr
to SIO could be an idea.

Also, should have buffering...

remove - missing
rename - missing
tmp* - missing
fclose - ok
fflush - ok (memory card)
fcloseall - ok
fopen - ok
freopen - missing
fdopen - ok
setbuf - missing
setvbuf - missing
So I suppose that this function is not yet implemented, isn't it ?

Thanks for your help, i'm not on my personal computer so i can't test for the moment.
Post Reply