Hi everyone,
Up until now I've used GU_PSM_T8 textures with a CLUT. I'm wondering if it's possible to use these textures without a CLUT, and if so, how is the value interpreted? Is it simply a grayscale value 0-255, or is there color packing? What about T4?
Thanks
GU_PSM_T8
T4 should be 4bit-addressed CLUT. That means you have 16 possible colors specified in a LUT the usual way. Long ago, in a desperate atempt to use GE for a somewhat GPGPU i tried some tricks with CLUT... using Tx textures without providing a valid CLUT leaded to random (or black at all) screen. I see no point in doing so..if you want grayscale, just generate it on the fly. The real mistery is GU_PSM_T32... take a look at http://forums.ps2dev.org/viewtopic.php? ... b2d8c9e506
No mystery on GU_PSM_T32 or GU_PSM_T16. The GPU fetches 16 or 32 bits at a time instead of 4 or 8, then the palette is addressed according to the shift and mask specified for the texture palette operation. I use this in B2 to accelerate the conversion of 15 and 24 bit Mac video into the proper PSP video data. It's not mysterious - just not well documented. You need to look at examples... like my refresh routines in Basilisk II.jean wrote:T4 should be 4bit-addressed CLUT. That means you have 16 possible colors specified in a LUT the usual way. Long ago, in a desperate atempt to use GE for a somewhat GPGPU i tried some tricks with CLUT... using Tx textures without providing a valid CLUT leaded to random (or black at all) screen. I see no point in doing so..if you want grayscale, just generate it on the fly. The real mistery is GU_PSM_T32... take a look at http://forums.ps2dev.org/viewtopic.php? ... b2d8c9e506
Have you benchmarked how long each pass takes? This could possibly be used to accelerate unchained vga to linear texture conversion, but would be pointless if it is slower then the dcache unfriendly method of splitting up the planar data when a vram write occurs.No mystery on GU_PSM_T32 or GU_PSM_T16. The GPU fetches 16 or 32 bits at a time instead of 4 or 8, then the palette is addressed according to the shift and mask specified for the texture palette operation. I use this in B2 to accelerate the conversion of 15 and 24 bit Mac video into the proper PSP video data.
I haven't done any specific benchmarking, but it's faster than using the CPU. Let's see... how you would convert the VGA data...crazyc wrote:Have you benchmarked how long each pass takes? This could possibly be used to accelerate unchained vga to linear texture conversion, but would be pointless if it is slower then the dcache unfriendly method of splitting up the planar data when a vram write occurs.No mystery on GU_PSM_T32 or GU_PSM_T16. The GPU fetches 16 or 32 bits at a time instead of 4 or 8, then the palette is addressed according to the shift and mask specified for the texture palette operation. I use this in B2 to accelerate the conversion of 15 and 24 bit Mac video into the proper PSP video data.
Set the palette for 8 bit to 8888 so that you convert XX to 000000XX. Do pass over VGA plane 0 with no adding. Then set the palette to convert XX to 0000XX00 and do a pass over plane 1 with adding. Repeat for planes 2 and 3 with the appropriate palette. Then finally do a normal blit using the texture just built and the real palette. That would work. Overall, you'd be doing 4 smaller blits and one big one (equivalent to two big ones), so I'd say it would be pretty fast. It shouldn't be too hard to make that routine and try it. I'd have the palettes already preset from the start so you could just upload them without making them on the fly... every little bit of time saved helps.
This is pretty much what I was thinking based on what I saw in your Basilisk drawing code. The fact that it take 5 passes per frame is why I was wondering about the performance of doing multiple full frame adds. Looks like I've got some testing to do.J.F. wrote:Set the palette for 8 bit to 8888 so that you convert XX to 000000XX. Do pass over VGA plane 0 with no adding. Then set the palette to convert XX to 0000XX00 and do a pass over plane 1 with adding. Repeat for planes 2 and 3 with the appropriate palette. Then finally do a normal blit using the texture just built and the real palette. That would work. Overall, you'd be doing 4 smaller blits and one big one (equivalent to two big ones), so I'd say it would be pretty fast. It shouldn't be too hard to make that routine and try it. I'd have the palettes already preset from the start so you could just upload them without making them on the fly... every little bit of time saved helps.
When you look at the total, it's really only doing two passes. The first four passes are only on 1/4 the screen data. So it does 4 X 1/4 to put together the data into a form that you can then do one full pass over.crazyc wrote:This is pretty much what I was thinking based on what I saw in your Basilisk drawing code. The fact that it take 5 passes per frame is why I was wondering about the performance of doing multiple full frame adds. Looks like I've got some testing to do.J.F. wrote:Set the palette for 8 bit to 8888 so that you convert XX to 000000XX. Do pass over VGA plane 0 with no adding. Then set the palette to convert XX to 0000XX00 and do a pass over plane 1 with adding. Repeat for planes 2 and 3 with the appropriate palette. Then finally do a normal blit using the texture just built and the real palette. That would work. Overall, you'd be doing 4 smaller blits and one big one (equivalent to two big ones), so I'd say it would be pretty fast. It shouldn't be too hard to make that routine and try it. I'd have the palettes already preset from the start so you could just upload them without making them on the fly... every little bit of time saved helps.