64bytes vb alignment required when using lpspgum_vfpu?
64bytes vb alignment required when using lpspgum_vfpu?
Hi all,
I'm using VFPU to help transforming my vertices. At the very beginning, I found my sprites blinking, and I think this is due to I didn't wait for Vblank? It starts to get worse when more and more sprites are transformed/rendered. They blinked more frenquently, sometimes they disappeared, or even mangled by mis-transformed vertex coords and bad uv coords. After that, I realized that it has nothing to do with Vblank.
I don't think this is a cache-related topic, 'cause I always update my vertex buffer by an uncached pointer, and I never flush cache after initializations. I don't think this is a VFPU-alignment topic either :p, 'cause I saw it's always using 'ulv/usv' to manipulate in/out matrices in pspgum_vfpu.c, even if my matrices are allocated on the stack, which means they may not be aligned.
And, the most important, I will get exactly what I want when linking with -lpspgum rather than -lpspgum_vfpu, using 16 bytes aligned vb/ibs. Everything runs perfectly when I stick to CPU/FPU(well...except for FPS). When I switch to VFPU, I have to align the buffers to 64 bytes to render(or transform?) them correctly.
Well, finally I got this fixed by memalign() my vertex buffers and index buffers to 64 bytes, but I don't know what's happening. Any ideas? Thanks in advance.
I'm using VFPU to help transforming my vertices. At the very beginning, I found my sprites blinking, and I think this is due to I didn't wait for Vblank? It starts to get worse when more and more sprites are transformed/rendered. They blinked more frenquently, sometimes they disappeared, or even mangled by mis-transformed vertex coords and bad uv coords. After that, I realized that it has nothing to do with Vblank.
I don't think this is a cache-related topic, 'cause I always update my vertex buffer by an uncached pointer, and I never flush cache after initializations. I don't think this is a VFPU-alignment topic either :p, 'cause I saw it's always using 'ulv/usv' to manipulate in/out matrices in pspgum_vfpu.c, even if my matrices are allocated on the stack, which means they may not be aligned.
And, the most important, I will get exactly what I want when linking with -lpspgum rather than -lpspgum_vfpu, using 16 bytes aligned vb/ibs. Everything runs perfectly when I stick to CPU/FPU(well...except for FPS). When I switch to VFPU, I have to align the buffers to 64 bytes to render(or transform?) them correctly.
Well, finally I got this fixed by memalign() my vertex buffers and index buffers to 64 bytes, but I don't know what's happening. Any ideas? Thanks in advance.
Okay, just getting the silly questions out of the way early. :) You never know what someone will forget. Next silly question - you do have PSP_THREAD_ATTR_VFPU in the module header, right? And any threads that also use the VFPU?
You said you write to uncached memory - did you invalidate the cache for that region (or flush/invalidate the entire cache)? If a region is cached, accessing that region through the uncached portion of the map will STILL use the cache until it's been invalidated.
You said you write to uncached memory - did you invalidate the cache for that region (or flush/invalidate the entire cache)? If a region is cached, accessing that region through the uncached portion of the map will STILL use the cache until it's been invalidated.
Well, I think I will never forget to specify the PSP_THREAD_ATTR_VFPU flag when I use VFPU. If I do, my PSP will crash-to-power-off to remind me that :p The other thread is just sleeping there, awaiting to serve the exit callback. And that's all threads I have.J.F. wrote:Okay, just getting the silly questions out of the way early. :) You never know what someone will forget. Next silly question - you do have PSP_THREAD_ATTR_VFPU in the module header, right? And any threads that also use the VFPU?
The only trick I played with cache is sceKernelDcacheWritebackInvalidateAll(), I called this function after all initializations. After that I forget about cache. I never read from vertex buffers, but only update them with uncached pointers.J.F. wrote:You said you write to uncached memory - did you invalidate the cache for that region (or flush/invalidate the entire cache)?
This is what I don't know before. Thanks for the information, J.F.J.F. wrote:If a region is cached, accessing that region through the uncached portion of the map will STILL use the cache until it's been invalidated.
Yes, I think you get the point, J.F.
I've tried to use cached pointers then flush the cache explicitly before sceGumDrawArray() calls. And that works fine even with 16-bytes-aligned buffers.
So...according to what you said in the previous post:
Now I think I should not only align the start address of my vertex buffers to 64 bytes boundary, but also align the size of the buffers to multiple of 64.
So finally this is really a cache-related topic :p
Thanks J.F.
I've tried to use cached pointers then flush the cache explicitly before sceGumDrawArray() calls. And that works fine even with 16-bytes-aligned buffers.
So...according to what you said in the previous post:
This really makes several beginning/ending bytes of my vertex buffer to be cached, if I access the memory region (with a cached pointer) just a few bytes ahead/behind of the vb block. Because they may reside in same cache-lines, right? And that will make my subsequent update operations actually updated nothing but the cache.If a region is cached, accessing that region through the uncached portion of the map will STILL use the cache until it's been invalidated.
Now I think I should not only align the start address of my vertex buffers to 64 bytes boundary, but also align the size of the buffers to multiple of 64.
So finally this is really a cache-related topic :p
Thanks J.F.
It shouldn't. What will happen is the cached data will eventually get written back as usual, and the changes you made to uncached memory would be overwritten and lost.If a region is cached, accessing that region through the uncached portion of the map will STILL use the cache until it's been invalidated.
Do you have a source for that? It's not typical behavior for CPUs. Caches are a shortcut that come before any external access and are physically mapped. Cached and uncached are the same physical address, which then shortcuts to the cache as it still has valid data. Every CPU I've ever worked on has this problem, which is why they all require you to invalidate the cache before using uncached memory.jimparis wrote:It shouldn't. What will happen is the cached data will eventually get written back as usual, and the changes you made to uncached memory would be overwritten and lost.If a region is cached, accessing that region through the uncached portion of the map will STILL use the cache until it's been invalidated.
If you access a memory place through a cached segment, then try to access it through a uncached segment, two situations can occur :
1) a writeback occurs before accessing it through a uncached segment, you'll read the same value. If you write a new value, this is okay too
2) a writeback never occurs before accessing it through a uncached segment, you'll read a different value. If you write a new value, this new value will be scratched by the old value in cached segment after an explicit writeback.
The point is : be sure never to mix cached and uncached accesses.
note that the uncached accesses are good only for write-only purposes.
And you probably need a "sync" at the end to be sure the write-buffer updates the real memory as well.
EDIT : I wouldn't trust "ulv.q". I would use 4 "lv.s" or an aligned "lv.q" instead.
1) a writeback occurs before accessing it through a uncached segment, you'll read the same value. If you write a new value, this is okay too
2) a writeback never occurs before accessing it through a uncached segment, you'll read a different value. If you write a new value, this new value will be scratched by the old value in cached segment after an explicit writeback.
The point is : be sure never to mix cached and uncached accesses.
note that the uncached accesses are good only for write-only purposes.
And you probably need a "sync" at the end to be sure the write-buffer updates the real memory as well.
EDIT : I wouldn't trust "ulv.q". I would use 4 "lv.s" or an aligned "lv.q" instead.