Hardware accelerated swizzling

jsgf · Post by **jsgf** » Thu Dec 01, 2005 9:24 am

UPDATE: I looked more closely at swizzling, and found that the heart of it is a rotate-by-3 of a bitfield in the middle of the offset; I've put the details up at http://wiki.ps2dev.org/psp:ge_faq. swizzle^7 == unswizzle is only true for 256-byte wide textures; other sizes will have different numbers of swizzle operations to get to inverse, depending on the size of the bitfield. Basically, its the number of times you need to do a rol3 to equal a single ror3.

I've been playing around with how to get the hardware to do swizzling. There doesn't seem to be any direct way to do it, but the hardware does, obviously, do unswizzling.

Unfortunately swizzling isn't an inverse of itself, so:

Code: Select all

unswizzle&#40;unswizzle&#40;tex&#41;&#41; != tex

But it does have a cycle after 8 applications, so

Code: Select all

unswizzle^8&#40;tex&#41; == tex

This means that

Code: Select all

unswizzle^7&#40;tex&#41; == swizzle&#40;tex&#41;

This means that you could use a render-to-texture operation to bounce between two or more buffers to do the swizzling operations, but at 7 bandwidth-heavy operations it isn't a trivial decision; it might be worthwhile since it offloads the CPU from doing the work, but it greatly increases the GE workload. Whether this matters depends a lot on where the bottlenecks are, and whether swizzling is all that expensive.

It would be nice to see if there's some way to do it more efficiently. I was wondering if playing with the dimensions or some other simple modification might allow the process to converge more quickly. What it really needs is someone to sit down and look at the swizzling transform more closely and work out the details; I've been meaning to do it, but I wonder if someone has already looked at it...

f_bohmann · Post by **f_bohmann** » Thu Dec 01, 2005 10:05 pm

i am not totally sure why you would want to do that, since for realtime generated textures (emu screen etc) just displaying them unswizzled should be a lot faster than uploading them, swizzle them and then display. and for non-realtime data ... well. you would of course pre-swizzle them in your toolchain and not on the psp itself.

jsgf · Post by **jsgf** » Fri Dec 02, 2005 11:50 am

Well, swizzling is pretty cheap all around; its more a matter of trying to squeeze as much as possible out of the system. If swizzling really does have a large performance improvement, then any texture which is used more than N times will benefit, even if its dynamic. If the texture is dynamic and generated with render-to-texture, then doing the swizzle entirely on the GE side is a huge win, simply because it avoids a pipeline stall.

Bugger, I notice that I posted this in the wrong forum; it's meant to be in PSP Software Development. Any chance of someone moving it?

cheriff · Post by **cheriff** » Fri Dec 02, 2005 12:12 pm

But wouldnt getting the GE to process the entire texture to generate the swizzled version just cause the same tex-cache misses that rendering from a non-swizzled texture would cause? Only with the additional penalty of the generation pass?
Possibly even more so since i'm not sure swizzling is an operation that can occur in place.

starman2049 · Post by **starman2049** » Fri Dec 02, 2005 3:37 pm

did you actually verify that unswizzle'ing a texture 8 times will get you back to the original. That's pretty cool.

jsgf · Post by **jsgf** » Sat Dec 03, 2005 4:15 am

cheriff wrote:But wouldnt getting the GE to process the entire texture to generate the swizzled version just cause the same tex-cache misses that rendering from a non-swizzled texture would cause? Only with the additional penalty of the generation pass?
Possibly even more so since i'm not sure swizzling is an operation that can occur in place.

No, obviously swizzling the texture and using it just once is less efficent that just using the texture. But if you use the texture multiple times, it may be worth spending effort on swizzling it, because the savings on each use amortize the swizzling cost. The tradeoff is between how expensive it is to swizzle vs the savings on each use.

If you have a dynamic texture which is generated by the GE (ie, render to texture), then using the CPU to swizzle would be very expensive - aside from the cost of reading the pixels back over the bus, it would also require stalling the GE pipeline. Using the GE to do the swizzling, even if the mechanism is relatively expensive, may be a win simply because it 1) leaves the CPU free and 2) is pipelined.

If the dynamic texture is generated by the CPU, it probably makes more sense to swizzle in the CPU. In fact, because swizzled address generation is pretty simple, you might be better off just generating your texture directly in swizzled form (and it might even be an improvement because of the improved cache locality).

jsgf · Post by **jsgf** » Sat Dec 03, 2005 4:16 am

starman2049 wrote:did you actually verify that unswizzle'ing a texture 8 times will get you back to the original. That's pretty cool.

I haven't tried it on the actual hardware, but using chp's swizzling sample code. It makes sense that it should repeat after some cycle, because it just shuffles the bytes around in a deterministic way.

CyberBill · Post by **CyberBill** » Sat Dec 03, 2005 3:41 pm

How do you get that you cant unswizzle a texture in one function??

Its just moving bytes around, just undo it.

jsgf · Post by **jsgf** » Sat Dec 03, 2005 4:06 pm

CyberBill wrote:How do you get that you cant unswizzle a texture in one function??

Its just moving bytes around, just undo it.

You missed the point. I'm talking about doing it with the GE hardware, not the CPU. The GE can only unswizzle textures; it doesn't seem to have a way to swizzle directly, but 7 unswizzles is the same as 1 swizzle. Or you could use many tiny copy operations, but that's lots of commands in the command stream...

groepaz · Post by **groepaz** » Sat Dec 03, 2005 8:14 pm

while working on pspinside i noticed that if you read back the videoram from baseaddress 0x04200000 it appears to be in some strange shuffled format, judging from the visual appearance it could be actually swizzled data. i never investigated that further yet, but maybe someone else wants to try. (also what happens if you write to 0x04200000... and read back from the usual address).

jsgf · Post by **jsgf** » Sat Dec 03, 2005 8:26 pm

Hey, that's really interesting. It might be swizzled, or it might be the linearized version of the depth buffer (if you read it back directly, it is rearranged in some way which is similar to, but not the same as, swizzling).