GU_PSM_T8

uberjack · Post by **uberjack** » Tue Aug 05, 2008 3:47 pm

Hi everyone,
Up until now I've used GU_PSM_T8 textures with a CLUT. I'm wondering if it's possible to use these textures without a CLUT, and if so, how is the value interpreted? Is it simply a grayscale value 0-255, or is there color packing? What about T4?

Thanks

jean · Post by **jean** » Tue Aug 05, 2008 6:03 pm

T4 should be 4bit-addressed CLUT. That means you have 16 possible colors specified in a LUT the usual way. Long ago, in a desperate atempt to use GE for a somewhat GPGPU i tried some tricks with CLUT... using Tx textures without providing a valid CLUT leaded to random (or black at all) screen. I see no point in doing so..if you want grayscale, just generate it on the fly. The real mistery is GU_PSM_T32... take a look at http://forums.ps2dev.org/viewtopic.php? ... b2d8c9e506

J.F. · Post by **J.F.** » Tue Aug 05, 2008 6:10 pm

jean wrote:T4 should be 4bit-addressed CLUT. That means you have 16 possible colors specified in a LUT the usual way. Long ago, in a desperate atempt to use GE for a somewhat GPGPU i tried some tricks with CLUT... using Tx textures without providing a valid CLUT leaded to random (or black at all) screen. I see no point in doing so..if you want grayscale, just generate it on the fly. The real mistery is GU_PSM_T32... take a look at http://forums.ps2dev.org/viewtopic.php? ... b2d8c9e506

No mystery on GU_PSM_T32 or GU_PSM_T16. The GPU fetches 16 or 32 bits at a time instead of 4 or 8, then the palette is addressed according to the shift and mask specified for the texture palette operation. I use this in B2 to accelerate the conversion of 15 and 24 bit Mac video into the proper PSP video data. It's not mysterious - just not well documented. You need to look at examples... like my refresh routines in Basilisk II.

jean · Post by **jean** » Tue Aug 05, 2008 6:14 pm

Wow....many thanks, guy! I'll take a look...

uberjack · Post by **uberjack** » Wed Aug 06, 2008 1:42 am

Thanks!

Xfacter · Post by **Xfacter** » Wed Aug 06, 2008 4:11 am

You can also use it to do fullscreen effects by creating a clut for whatever the effect is, then shifting/masking and blending for each color. FuncLib has a module for effects like this, so you can check that out if you want.

crazyc · Post by **crazyc** » Thu Aug 07, 2008 1:15 am

No mystery on GU_PSM_T32 or GU_PSM_T16. The GPU fetches 16 or 32 bits at a time instead of 4 or 8, then the palette is addressed according to the shift and mask specified for the texture palette operation. I use this in B2 to accelerate the conversion of 15 and 24 bit Mac video into the proper PSP video data.

Have you benchmarked how long each pass takes? This could possibly be used to accelerate unchained vga to linear texture conversion, but would be pointless if it is slower then the dcache unfriendly method of splitting up the planar data when a vram write occurs.

J.F. · Post by **J.F.** » Thu Aug 07, 2008 3:46 am

crazyc wrote:
No mystery on GU_PSM_T32 or GU_PSM_T16. The GPU fetches 16 or 32 bits at a time instead of 4 or 8, then the palette is addressed according to the shift and mask specified for the texture palette operation. I use this in B2 to accelerate the conversion of 15 and 24 bit Mac video into the proper PSP video data.
Have you benchmarked how long each pass takes? This could possibly be used to accelerate unchained vga to linear texture conversion, but would be pointless if it is slower then the dcache unfriendly method of splitting up the planar data when a vram write occurs.

I haven't done any specific benchmarking, but it's faster than using the CPU. Let's see... how you would convert the VGA data...

Set the palette for 8 bit to 8888 so that you convert XX to 000000XX. Do pass over VGA plane 0 with no adding. Then set the palette to convert XX to 0000XX00 and do a pass over plane 1 with adding. Repeat for planes 2 and 3 with the appropriate palette. Then finally do a normal blit using the texture just built and the real palette. That would work. Overall, you'd be doing 4 smaller blits and one big one (equivalent to two big ones), so I'd say it would be pretty fast. It shouldn't be too hard to make that routine and try it. I'd have the palettes already preset from the start so you could just upload them without making them on the fly... every little bit of time saved helps.

crazyc · Post by **crazyc** » Thu Aug 07, 2008 4:12 am

J.F. wrote:Set the palette for 8 bit to 8888 so that you convert XX to 000000XX. Do pass over VGA plane 0 with no adding. Then set the palette to convert XX to 0000XX00 and do a pass over plane 1 with adding. Repeat for planes 2 and 3 with the appropriate palette. Then finally do a normal blit using the texture just built and the real palette. That would work. Overall, you'd be doing 4 smaller blits and one big one (equivalent to two big ones), so I'd say it would be pretty fast. It shouldn't be too hard to make that routine and try it. I'd have the palettes already preset from the start so you could just upload them without making them on the fly... every little bit of time saved helps.

This is pretty much what I was thinking based on what I saw in your Basilisk drawing code. The fact that it take 5 passes per frame is why I was wondering about the performance of doing multiple full frame adds. Looks like I've got some testing to do.

J.F. · Post by **J.F.** » Thu Aug 07, 2008 4:49 am

crazyc wrote:
J.F. wrote:Set the palette for 8 bit to 8888 so that you convert XX to 000000XX. Do pass over VGA plane 0 with no adding. Then set the palette to convert XX to 0000XX00 and do a pass over plane 1 with adding. Repeat for planes 2 and 3 with the appropriate palette. Then finally do a normal blit using the texture just built and the real palette. That would work. Overall, you'd be doing 4 smaller blits and one big one (equivalent to two big ones), so I'd say it would be pretty fast. It shouldn't be too hard to make that routine and try it. I'd have the palettes already preset from the start so you could just upload them without making them on the fly... every little bit of time saved helps.
This is pretty much what I was thinking based on what I saw in your Basilisk drawing code. The fact that it take 5 passes per frame is why I was wondering about the performance of doing multiple full frame adds. Looks like I've got some testing to do.

When you look at the total, it's really only doing two passes. The first four passes are only on 1/4 the screen data. So it does 4 X 1/4 to put together the data into a form that you can then do one full pass over.