I'm now using a simple method below, but I think it is not fast enough, hope some faster methods or sceGU hardware functions.
Codes:
for(m=0;m<16;m++)
{
vp=&pic_buffer[ii][begin_m+m][begin_n];
mm=m>>1;
mn=(m>>3)*2;
for(n=0;n<16;n++)
{
Y=iclpp[dct_recon[mn+(n>>3)][m&7][n&7]];
nn=n>>1;
*(vp++)=R_table[Y][dct_recon[5][mm][nn]];
*(vp++)=iclpp[Y-G_table[dct_recon[4][mm][nn]][dct_recon[5][mm][nn]]];
*(vp++)=B_table[Y][dct_recon[4][mm][nn]];
}
}
R_table,G_table and B_table are tables that instead of the calculations.
Anyone knows how to do YUV2RGB faster?
The most nifty way would be similar to how I do the Mac 24 bit refresh, converting Mac 24 bit video to PSP 24 bit video. Do multiple passes with custom palettes. Just look at the refresh24() code in Basilisk II to see what I mean. This would use the GPU, so it'd be pretty fast while leaving the CPU free to do whatever.