MMU for dynamic VRAM usage

Discuss the development of new homebrew software, tools and libraries.

Moderators: cheriff, TyRaNiD

Post Reply
User avatar
Raphael
Posts: 646
Joined: Tue Jan 17, 2006 4:54 pm
Location: Germany
Contact:

MMU for dynamic VRAM usage

Post by Raphael »

I just wrote a little MMU for dynamically allocating/freeing memory on the GU in a well-known manner. This will be essential when trying to create a good dynamic texture manager.

The code is based on the malloc implementation of the sdk and I tried to keep the coding style the same.

Currently this provides functions for allocating with 16 byte alignment (valloc), freeing (vfree), getting free vram (vmemavail) and getting largest free block (vlargestblock) as well as a easy way to convert GU relative pointers to absolute and vice versa.
valloc always returns absolute pointers, so if you want to allocate your framebuffer/backbuffer/zbuffer to submit them to the GU with sceGuDrawBuffer or alike, you have to convert them to relative pointers with vrelptr( ptr ). Else than that, there's no limitation.

Useage:
- extract the zip to your application dir
- edit your makefile and add valloc.o to the OBJS
- #include "valloc.h" in your application
- use it

Enjoy.
Raphael

valloc.zip
71M
Posts: 122
Joined: Tue Jun 21, 2005 5:28 am
Location: London

Post by 71M »

Very useful indeed!
Thanks for sharing that with everyone.

Cheers,
71M
subbie
Posts: 122
Joined: Thu May 05, 2005 4:14 am

Post by subbie »

Yay!! I so could use this on both my projects. Thanks for sharing. :D
User avatar
dot_blank
Posts: 498
Joined: Wed Sep 28, 2005 8:47 am
Location: Brasil

Post by dot_blank »

thank you very much ...this works a dream :)

can you elaborate a bit with these below

Code: Select all

// Return an absolute pointer useable by CPU
void* vCPUPointer( void* ptr );

// Returns an absolute pointer useable by CPU
void* valloc( size_t size );
i think the pspsdk would greatly benefit with
these great VRAM functions
10011011 00101010 11010111 10001001 10111010
User avatar
Raphael
Posts: 646
Joined: Tue Jan 17, 2006 4:54 pm
Location: Germany
Contact:

Post by Raphael »

dot_blank wrote: can you elaborate a bit with these below

Code: Select all

// Return an absolute pointer useable by CPU
void* vCPUPointer( void* ptr );

// Returns an absolute pointer useable by CPU
void* valloc( size_t size );
Well the vCPUPointer is just one of the two conversion functions and it will convert any pointer on VRAM to an absolute pointer, so that you can directly write to the VRAM like to any system ram pointer. Since valloc already returns absolute pointers, this is just for convenience and if someone wouldn't be sure if a pointer is absolute you can use it to get sure (ie. it won't change an already absolute pointer).
I also added two macros for the vCPUPointer and vGUPointer functions to be able to use them as vabsptr() and vrelptr(), since the naming scheme didn't quite fit with the rest.
i think the pspsdk would greatly benefit with
these great VRAM functions
Would be cool to have them integrated there, yes :)
I'm not sure yet however if I should first add vrealloc & vcalloc for completeness. Also a vram compaction routine would be nice, if the vram got fragmented too much over time (but this could simply be avoided by good organization of the valloc and vfree calls).
User avatar
dot_blank
Posts: 498
Joined: Wed Sep 28, 2005 8:47 am
Location: Brasil

Post by dot_blank »

thanx for clearing that up ...i also agree
for completeness sake to have also

vrealloc + vcalloc

...also even vmemchr, vmemcmp, vmemcpy
vmemmove, vmemset just for completeness sake ;)
10011011 00101010 11010111 10001001 10111010
User avatar
Raphael
Posts: 646
Joined: Tue Jan 17, 2006 4:54 pm
Location: Germany
Contact:

Post by Raphael »

ok, ok, got that :P
sandberg
Posts: 90
Joined: Wed Oct 05, 2005 1:25 am
Location: Denmark

Post by sandberg »

Nice work :) How about an interface for setting an offset into the VRAM, from which the textures are allocated ?

That would be nice, so that you could mark the first xxx bytes as non available for textures, and use it for the frame and z buffers.

Just some simple vinit function taking a single parameter, which is a number of bytes, that you could add to your __valloc_vram_base pointer.
Br, Sandberg
User avatar
Raphael
Posts: 646
Joined: Tue Jan 17, 2006 4:54 pm
Location: Germany
Contact:

Post by Raphael »

I also thought about implementing some kind of static allocation function, but I decided against it for the reason that it will just confuse people on when to use which function, while this could already be achieved with the one function provided. Just allocate your frame-, back- and z-buffer once on start-up and don't free them all over your program. This space won't be available for subsequent allocations, no matter what.
Same works for allocating static texture space.

Here's a code snippet of how I use valloc for initializing the GU:

Code: Select all

	fbp0 = vrelptr(valloc( FRAME_SIZE ));
	fbp1 = vrelptr(valloc( FRAME_SIZE ));
	zbp = vrelptr(valloc( ZBUF_SIZE ));

	sceGuInit();

	sceGuStart(GU_DIRECT,list);
	sceGuDrawBuffer(PIXEL_FORMAT,(void*)fbp0,BUF_WIDTH);
	sceGuDispBuffer(SCR_WIDTH,SCR_HEIGHT,(void*)fbp1,BUF_WIDTH);
	sceGuDepthBuffer((void*)zbp,BUF_WIDTH);
Scienthsine
Posts: 6
Joined: Mon Mar 20, 2006 11:36 pm

Post by Scienthsine »

Also a vram compaction routine would be nice, if the vram got fragmented too much over time (but this could simply be avoided by good organization of the valloc and vfree calls).
I havn't yet written any code that uses graphics yet, but I was thinking about this last night. My idea for dealing with this is a simple routine that when called moves/combines all freeVram blocks 'downward'. This is done kinda bubble sortish by finding the 'top' free vRam block, swapping it with the allocated block below it, and combining it with a free block if there's one 'next' to it. This function would take a true or false parameter, that determines if it should 'defrag' all the free blocks in one call, or just do one step. This would allow it to be called once per frame by the user to keep the vram unfragmented without any huge loss of cycles all at once.

The only problem with this, is when you move your vram allocs around, you invalidate your pointers. My way of dealing with this is to return pointers to my vram alloc structure, which always contains an up to date pointer to it's current vRam alloc. Your display, draw, etc... buffers would need to be allocated first, however, because I don't think they would like being moved at all. You would want to make sure your current display list has finished before calling the defrag function, and storing display lists for later would probley not like this either.

But like I said, I've yet to actually program anything visual, as I'm still looking for good info. I've been at it for 3 days now, and I think I might be up to making something tonight :D So gimme your ideas.

Ohh two questions that I have yet to be able to find: What's the total amount of vRam? I suppose theirs a define of it somewhere? What's swizzling? What's it for?
User avatar
Raphael
Posts: 646
Joined: Tue Jan 17, 2006 4:54 pm
Location: Germany
Contact:

Post by Raphael »

Scienthsine wrote: Ohh two questions that I have yet to be able to find: What's the total amount of vRam? I suppose theirs a define of it somewhere? What's swizzling? What's it for?
First: VRAM is 2MB in size and in sdk there's a function returning this value (sceGuGetEdramSize or sth). But since it's constant all over the world, I just used a define :)
Second: Swizzling is a way of ordering the texture pixels, so that cache misses on access are minimized (and thus speed highly increased). Its some sort of 'blocking' the texture.
For more detailed information on it, visit the wiki or just create a checkerboard texture where each block is 16x8bytes wide (ie. 8x8 pixels at 16bpp), swizzle it with the function from the samples and draw it with swizzling disabled (you should see straight lines of white and black instead of blocks - theoretically ;)).


And for the compaction routine: You brought up one problem I didn't think of yet - the invalidation of the pointers (foolish me :|). Your idea of returning pointers to the allocation structure is a good idea, but it will not make the pointer useable as I wanted ('as is') and that's something I don't want to change (it was already a hazzle to have to add the conversion functions).
One idea that comes to my mind is a routine that just defrags one single block that the user provides with a pointer he allocated before, which is then in turn updated and returned. This would also fit into your "single step defragmentation" strategy, however it will always miss a "compactall" routine unless you ask of the user to provide a linked list with the pointers he allocated - nothing I'd want to do with such low-level routines.

However I can't think of any other useable solution to this atm, other than leaving compacting alltogether to the overlaying program, which will be good enough for texture managers, which anyway may want to do their allocation strategy in a way that avoids defragmentation to the most.

The algorithm I'd use is pretty simple (and most likely equivalent to yours): Just traverse the linked list from bottom (the vram_tail pointer in my routines) to top (vram_head pointer), each time moving a block backward to 'connect' to the previous (ie. prev_offs+prev_size=curr_offs), so that all free space will end up at the top of VRAM. This would also snap to my single pointer defragmentation function and would be fast if the pointer to be defraged is already defraged.

Hope to hear from you, having some chat about thinkings and strategies is something I often miss :)
PeterM
Posts: 125
Joined: Sat Dec 31, 2005 7:25 pm
Location: Edinburgh, UK
Contact:

Post by PeterM »

If there was an automatic compaction scheme you'd need the user to get called back about it so they can update their pointers. i.e.:

Code: Select all

vcompact(MyCallBack, myUserData);

...

void MyCallBack(const void* oldPointer, void* newPointer, void* userData)
{
    // traverse my textures, find a match for the old pointer then update it.
}
Pete
__count
Posts: 22
Joined: Thu Mar 23, 2006 8:40 pm

Post by __count »

However I can't think of any other useable solution to this atm, other than leaving compacting alltogether to the overlaying program, which will be good enough for texture managers, which anyway may want to do their allocation strategy in a way that avoids defragmentation to the most.
Agreed. I think a MMU shouldn't do discrete defragmentation, because not every application will cause significant fragmentation. For instance, if I have a simple game with a fixed amount of textures that are all the same size I don't want my MMU to create unnecessary overhead.

Texture managers are in a perfect position to do defragmentation. The MMU should offer some realloc function at most.

edit: and thanks for sharing your code! :)
User avatar
Raphael
Posts: 646
Joined: Tue Jan 17, 2006 4:54 pm
Location: Germany
Contact:

Post by Raphael »

PeterM wrote:If there was an automatic compaction scheme you'd need the user to get called back about it so they can update their pointers. i.e.:

Code: Select all

vcompact(MyCallBack, myUserData);

...

void MyCallBack(const void* oldPointer, void* newPointer, void* userData)
{
    // traverse my textures, find a match for the old pointer then update it.
}
Pete
Good idea, but I dislike callbacks, especially for a low-level MMU. Also, this would lead to even more overhead on the defragmentation process.

__count wrote:Agreed. I think a MMU shouldn't do discrete defragmentation, because not every application will cause significant fragmentation. For instance, if I have a simple game with a fixed amount of textures that are all the same size I don't want my MMU to create unnecessary overhead.
That's right, but with a simple "vdefragblock(void* ptr)" which just moves back the ptr if there's space before it, that's no problem, as it is all the users choice to call this function or not.
However at second thinking I found that this solution still isn't the best, because if the order the pointers are thrown at this function is highest to lowest (worst case) there will be nearly no defragmentation done, because the empty blocks then are just moved one block up. So total defragmentation will only be finished after n such complete traversals, leading to n² complexity. :/
__count wrote:Texture managers are in a perfect position to do defragmentation. The MMU should offer some realloc function at most.
I really think this is the best solution, and since I'm working on a texture manager anyhow I'll maybe just relase the source too when it's done.

As for the realloc - that would be handy, but could be messy to be done reasonably. The basic aproach of the std lib c realloc won't work that well, because of the limited space - so the realloc will most likely fail. Thus it would be neccessary to backup the data to system ram first, free the pointer and then allocate a new one to copy back the data (could be done asynchronously by GU, so that's not the problem). However this is a bad idea, because afaik the GU can't write to system ram via sceGuCopyImage (have to try though) and then it would be neccessary to use the CPU for the VRAM reads - leading to crap performance.
My best idea would be to just join the allocated space with any free space before it and move data backward (implementing the vdefragblock() from above with the same drawback, but since realloc as such isn't made for defragmentation it's still ok), and only if not possible try to find enough free space elsewhere (leading to the std lib c aproach).
Gonna try that :)
__count wrote:edit: and thanks for sharing your code! :)
No problem - sharing makes the world a better place :)
User avatar
Raphael
Posts: 646
Joined: Tue Jan 17, 2006 4:54 pm
Location: Germany
Contact:

Post by Raphael »

I finished a vrealloc implementation with sceGuCopyImage. However it needs that sceGuInit() was called before a call to vrealloc (shouldn't be a problem - after all it's all made for GU using programs) and you need to call sceGuSync before the reallocated space is used, to get sure the data transfer has finished.

I also fixed a very small bug, which prevented the last byte of VRAM to be allocated. Here's the patch for it:

Code: Select all

27c27
< #define VRAM_BASE &#40;&#40;unsigned int&#41;sceGeEdramGetAddr&#40;&#41;&#41;
---
> #define VRAM_BASE 0x04000000
88c88
< 	if &#40;&#40;&#40;u32&#41;head->ptr+head->size+size&#41;<VRAM_SIZE&#41; &#123;
---
> 	if &#40;&#40;&#40;u32&#41;head->ptr+head->size+size&#41;<=VRAM_SIZE&#41; &#123;
And here's the updated complete sources with vrealloc:
SOURCE

EDIT - NOTE: The approach of _vram_fit_size() currently is totally bad and will miss some bytes if texturesize is not a multiple of 1024.
Last edited by Raphael on Sat Mar 25, 2006 6:51 am, edited 1 time in total.
Scienthsine
Posts: 6
Joined: Mon Mar 20, 2006 11:36 pm

Post by Scienthsine »

I finally got around to taking a good look at that code of yours, Rachael. Why don't you just keep track of free vram with a global, subtract from it on alloc, and add back to it on free? It would allow for the value to be checked without having to traverse the list. Also, you could then add a quick 'fail' return in the alloc function if the requested memory is less than the total free. (Ofcourse if it's fragmented enough that the required memory isn't all in one block, then you'll still end up with a traversal, and fail.)

As for the memory defrag stuff, I plan to have a function like I mentioned above. It would allow for small amounts of defragmentation every so often, if the user uses it. It's completely optional really, not using the function would be as if it didn't exist. But my method for keeping up with the pointers requires that the actual pointer is 'managed' by another struct, which is a bit troublesome I suppose.

I'll be around, hopefully more once I get more familiar with everything. Just wrote a simple RLE image format to use, though it's nothing compared to png's size, I felt like making one :p
User avatar
Raphael
Posts: 646
Joined: Tue Jan 17, 2006 4:54 pm
Location: Germany
Contact:

Post by Raphael »

Good idea there, I didn't think of optimizing the vmemavailfunction (the other functions won't benefit from it, except for that additional 'quick fail' in valloc) because I didn't think it would be called very often - most likely only once or twice in a program to show some statistics of its usage for debugging purpose or whatever.
Won't hurt to implement that though.

For your defragmentation, either stick with callback function or you could also try working with pointers to pointers. Your application would have to use them correctly though, but it would have to do so too with your pointer to structures.
I imagine something like this:

Code: Select all

void** valloc&#40;size_t size&#41; &#123;
...
return &new_ptr;
&#125;

void vfree&#40;void** ptr&#41; &#123;
...
while &#40;*ptr!=cur->ptr&#41; &#123;
...
&#125;
&#125;
This way your defrag could change the actual addresses of the pointers, but the pointers themself won't change, so the application won't have to care about it
Post Reply