UPDATE: I looked more closely at swizzling, and found that the heart of it is a rotate-by-3 of a bitfield in the middle of the offset; I've put the details up at
http://wiki.ps2dev.org/psp:ge_faq. swizzle^7 == unswizzle is only true for 256-byte wide textures; other sizes will have different numbers of swizzle operations to get to inverse, depending on the size of the bitfield. Basically, its the number of times you need to do a rol3 to equal a single ror3.
I've been playing around with how to get the hardware to do swizzling. There doesn't seem to be any direct way to do it, but the hardware does, obviously, do unswizzling.
Unfortunately swizzling isn't an inverse of itself, so:
Code: Select all
unswizzle(unswizzle(tex)) != tex
But it does have a cycle after 8 applications, so
This means that
Code: Select all
unswizzle^7(tex) == swizzle(tex)
This means that you could use a render-to-texture operation to bounce between two or more buffers to do the swizzling operations, but at 7 bandwidth-heavy operations it isn't a trivial decision; it might be worthwhile since it offloads the CPU from doing the work, but it greatly increases the GE workload. Whether this matters depends a lot on where the bottlenecks are, and whether swizzling is all that expensive.
It would be nice to see if there's some way to do it more efficiently. I was wondering if playing with the dimensions or some other simple modification might allow the process to converge more quickly. What it really needs is someone to sit down and look at the swizzling transform more closely and work out the details; I've been meaning to do it, but I wonder if someone has already looked at it...