Media Engine?
-
- Posts: 339
- Joined: Thu Sep 29, 2005 4:19 pm
Here's something people may enjoy - I did a PRX for the MediaEngine. It's mostly based on the code in the previous page for the ME part, and KeyCleaner for the EBOOT/PRX parts. It works fine on my Slim PSP with 3.60 M33.
This doesn't yet have any attributions in the files. Before an "official" release, I wanted to see what you all think I should put in there. Pmpmod is GPL, so this may need to be as well (the prx part, at least).
http://groups.google.com/group/chilly-w ... -v1.0a.zip
Comments and suggestions welcome.
This doesn't yet have any attributions in the files. Before an "official" release, I wanted to see what you all think I should put in there. Pmpmod is GPL, so this may need to be as well (the prx part, at least).
http://groups.google.com/group/chilly-w ... -v1.0a.zip
Comments and suggestions welcome.
Well, for starters here's some dcache functions.J.F. wrote:That would be nice. I could work it into the next revision of the prx.
Code: Select all
#define load_tag(index, hi, lo) __builtin_allegrex_cache(0x10, index); \
__asm__ volatile ("mfc0 %0, $28\nmfc0 %1, $29\n":"=r"(lo), "=r"(hi));
#define store_tag(index, hi, lo) __asm__ (".set push\n" \
".set noreorder\n" \
"mtc0 %0, $28\n" \
"mtc0 %1, $29\n" \
".set pop\n" \
::"r"(lo),"r"(hi)); \
__builtin_allegrex_cache(0x11, index);
void dcache_wb_range(void *addr, int size) {
int i, j = (int)addr;
for(i = j; i < size+j; i += 64)
__builtin_allegrex_cache(0x1a, i);
}
void dcache_wbinv_all() {
int i;
for(i = 0; i < 8192; i += 64)
__builtin_allegrex_cache(0x14, i);
}
void dcache_inv_range(void *addr, int size) {
int i, j = (int)addr;
for(i = j; i < size+j; i += 64)
__builtin_allegrex_cache(0x19, i);
}
void dcache_wb_all() {
int i, hi, lo;
for(i = 0; i < 8192; i += 64) {
load_tag(i, hi, lo);
if(hi&(1<<20)) __builtin_allegrex_cache(0x1a, ((hi<<13) | i));
if(lo&(1<<20)) __builtin_allegrex_cache(0x1a, ((lo<<13) | i));
}
}
void dcache_inv_all() {
int i;
store_tag(i, 0, 0);
for(i = 0; i < 8192; i += 64) {
__builtin_allegrex_cache(0x13, i);
__builtin_allegrex_cache(0x11, i);
}
}
void dcache_wbinv_range(void *addr, int size) {
int i, j = (int)addr;
for(i = j; i < size+j; i += 64)
__builtin_allegrex_cache(0x1b, i);
}
:)crazyc wrote:The FlushME function confuses me. Unless I'm missing something, it looks like you are flushing the SC.J.F. wrote:Thanks. Good cache routines can improve the speed of certain things quite a bit.
You're right. I goofed on that one. That's another reason I posted here before releasing. Another goof - you don't need the PRX for setting the signals. SignalME should be a macro in the program. So I have some goofs to fix as well as some extra stuff to add as you post.
Thanks for the heads up!
Okay, here's my latest. It has some stuff taken out of the kernel mode prx that didn't need to be there in the first place and shuffled into a .c file and .h file you include with the project. The kernel mode prx also takes into account 3.71, so it should work on any 3.xx firmware. It works well on my 3.71 M33 Slim.
The example is a bit more substantial that the first test - it uses the ME to color cycle the screen. The code for this test is based on some old(ish) code I found here on using a framebuffer in the main memory, so this demonstrates a number of things - putting the framebuffer into main memory, getting the info about the framebuffer, setting the debug screenbase so you can still use the debug printf function, and using the ME to draw to the display.
You'll notice I use the non-cached address of the framebuffer on the ME. One thing I learned when I made the first Amiga PPC port of DOOM - don't draw to cached buffers. You flood the cache and slow down the rest of the program. Drawing to non-cached memory means the difference between 10 FPS and 100 FPS... at least on a PPC, so I figure it's probably about the same on the MIPS. :)
http://groups.google.com/group/chilly-w ... est-v1.zip
The example is a bit more substantial that the first test - it uses the ME to color cycle the screen. The code for this test is based on some old(ish) code I found here on using a framebuffer in the main memory, so this demonstrates a number of things - putting the framebuffer into main memory, getting the info about the framebuffer, setting the debug screenbase so you can still use the debug printf function, and using the ME to draw to the display.
You'll notice I use the non-cached address of the framebuffer on the ME. One thing I learned when I made the first Amiga PPC port of DOOM - don't draw to cached buffers. You flood the cache and slow down the rest of the program. Drawing to non-cached memory means the difference between 10 FPS and 100 FPS... at least on a PPC, so I figure it's probably about the same on the MIPS. :)
http://groups.google.com/group/chilly-w ... est-v1.zip
Only this link works for me:
http://chilly-willys-ice-flow.googlegro ... est-v1.zip
edit: WTF, now the original one works... oh well :p
Thanks :)
http://chilly-willys-ice-flow.googlegro ... est-v1.zip
edit: WTF, now the original one works... oh well :p
Thanks :)
Probably some delay in google updating all the group links. I have no idea how their web server software works - I just upload the file and it supplies a link. :)jas0nuk wrote:Only this link works for me:
http://chilly-willys-ice-flow.googlegro ... est-v1.zip
edit: WTF, now the original one works... oh well :p
Thanks :)
J.F.: In your DisplayTest sample (thanks for this by the way - it's excellent), the prx has the following module declaration:
I appreciate that the 0x1000 flag is for kernel mode. What does 0x6 signify? I can't find any documentation for this.
Code: Select all
PSP_MODULE_INFO("MediaEngine", 0x1006, VERS, REVS);
http://forums.ps2dev.org/viewtopic.php?p=57753#57753 :)StrmnNrmn wrote:J.F.: In your DisplayTest sample (thanks for this by the way - it's excellent), the prx has the following module declaration:
I appreciate that the 0x1000 flag is for kernel mode. What does 0x6 signify? I can't find any documentation for this.Code: Select all
PSP_MODULE_INFO("MediaEngine", 0x1006, VERS, REVS);
StrmnNrmn, please read my reply (no. 6) to your Media Engine blog entry :p
Awesome - that's exactly the page I failed to find while searching :) I guess 0x1006 is kernel mode/load/start then?jas0nuk wrote:http://forums.ps2dev.org/viewtopic.php?p=57753#57753 :)StrmnNrmn wrote:J.F.: In your DisplayTest sample (thanks for this by the way - it's excellent), the prx has the following module declaration:
I appreciate that the 0x1000 flag is for kernel mode. What does 0x6 signify? I can't find any documentation for this.Code: Select all
PSP_MODULE_INFO("MediaEngine", 0x1006, VERS, REVS);
Will do - thanks :)jas0nuk wrote:StrmnNrmn, please read my reply (no. 6) to your Media Engine blog entry :p
By the way, the latest incarnation of the MediaEngine prx is with the source for SNES9X_TYL 0.4.2 ME for 3xx. You can find an arc here:
http://chilly-willys-ice-flow.googlegro ... 3x-src.zip
It has all the routines that don't need to be in the prx in a separate .c file, and has the ability to invalidate/wbinv the caches on entry/exit of the ME function.
When using the ME on the Slim, remember that you cannot try to change the cpu clock once the ME has been activated or the PSP will freeze.
http://chilly-willys-ice-flow.googlegro ... 3x-src.zip
It has all the routines that don't need to be in the prx in a separate .c file, and has the ability to invalidate/wbinv the caches on entry/exit of the ME function.
When using the ME on the Slim, remember that you cannot try to change the cpu clock once the ME has been activated or the PSP will freeze.
I was thinking about how to call kernel functions from the ME. I warn you now that I am not the best at understanding low level hardware or assembly, but I try my best. The following could be wrong, totally off, or just plain stupid. Tell me what you think. The ME program could be a separate elf with a copy of the SDK with special wrapper functions (described below) and it's own version of malloc, free, etc. It would link to ME versions of these libraries. Any library like libmikmod would not need to be changed (just compiled and linked with the ME version of PSPSDK) and to the normal homebrew programmer everything would look the same.
The following memory (obviously needs some adjusting) would not be cached, available to both the main CPU and the ME, and would be outside the heap for both the main CPU and the ME (or set dynamically from a malloc or static variable on the main CPU and passed to some init function). The start address of 0x13370000 is just for the example and would not be the real location. I did not take into account the different memory locations for the main CPU and the ME when I wrote the example code bellow, so bear with me.
The process would go like this:
ME code calls the function sceIoOpen from the ME version of PSPSDK as seen bellow.
This adds the function to our shared memory and waits for the main CPU to finish with it.
The main CPU either via an interrupt or a function called from the main loop checks if it needs to call a kernel function. If it does, it looks up the function index in an array of function pointers and calls that function (handle_sceIoOpen).
The function handle_sceIoOpen (or similar) would pass the proper arguments and set the return value. Then when the kernel function returns would signal the ME to wake up and continue.
Calling a function on the ME (for example you might want to load a song, play, stop, etc) would work the same way but in reverse and with the ME function area in the above example memory map.
The above is a little ugly and could be made simpler to the eyes with macros. It might be a little slow so you wouldn't want to call a lot of kernel functions on the ME. One way you might be able to reduce calls is loading a file into memory (if there is room) and having the library (for example libmikmod or libpng) use the copy in memory so all the I/O calls would be done on the main CPU without the need to pass back and forth between the main CPU and the ME. Am I way off? Is there a better way? I might try to code up a example of this if it sounds good (emphasis on might and try). I still need to look over the code posted here a little more and figure out some specifics. I bet someone has already come up with an idea like this and has it in the works... anyway, how does it sound :)
The following memory (obviously needs some adjusting) would not be cached, available to both the main CPU and the ME, and would be outside the heap for both the main CPU and the ME (or set dynamically from a malloc or static variable on the main CPU and passed to some init function). The start address of 0x13370000 is just for the example and would not be the real location. I did not take into account the different memory locations for the main CPU and the ME when I wrote the example code bellow, so bear with me.
Code: Select all
0x13370000 Main Call Triggered
0x13370001 ME Call Triggered
0x13370100 Main Function Index
0x13370104 Main Function Return Value
0x13370108 Main Function Arg 1
0x1337010C Main Function Arg 2
0x13370110 Main Function Arg 3
0x13370200 ME Function Index
0x13370204 ME Function Arg 1
ME code calls the function sceIoOpen from the ME version of PSPSDK as seen bellow.
This adds the function to our shared memory and waits for the main CPU to finish with it.
The main CPU either via an interrupt or a function called from the main loop checks if it needs to call a kernel function. If it does, it looks up the function index in an array of function pointers and calls that function (handle_sceIoOpen).
The function handle_sceIoOpen (or similar) would pass the proper arguments and set the return value. Then when the kernel function returns would signal the ME to wake up and continue.
Calling a function on the ME (for example you might want to load a song, play, stop, etc) would work the same way but in reverse and with the ME function area in the above example memory map.
Code: Select all
#define FUNC_sceIoOpen 0
#define FUNC_sceIoClose 1
SceUID sceIoOpen(const char *file, int flags, SceMode mode)
{
// Push the function
(*((unsigned int*)0x13370100)) = FUNC_sceIoOpen;
// Set the first argument past the return value
void *current_arg = 0x13370104 + sizeof(SceUID);
// Push the arguments
(*((const char**)current_arg)) = file - 0x40000000; // Adjust the pointer for the main CPU memory location
current_arg += sizeof(const char*);
(*((int*)current_arg)) = flags;
current_arg += sizeof(int);
(*((SceMode*)current_arg)) = mode;
// Trigger the function and wait
(*((unsigned int*)0x13370000)) = 1;
waitForMainReturn();
return (*((SceUID*)0x13370104));
};
///////////////////////////////////////////////////
void handle_sceIoOpen()
{
(*((SceUID*)0x13370104)) = sceIoOpen(*((const char**)0x13370108), *((int*)0x1337010C), *((SceMode*)0x13370110));
signalMeMainCallDone();
};
-
- Posts: 110
- Joined: Tue Feb 27, 2007 9:43 pm
- Contact:
The ME firmware has many of these libs already. It would be better to try make use of them instead of creating your own functions.
In <2.50, the ME fw is gzipped inside mebooter.prx. From 2.50+ they are stored as encrypted images in kd/resources.
Ive worked out a few of these libs and their locations already, like the ME equivalent sysreg lib, and also cache functions.
In <2.50, the ME fw is gzipped inside mebooter.prx. From 2.50+ they are stored as encrypted images in kd/resources.
Ive worked out a few of these libs and their locations already, like the ME equivalent sysreg lib, and also cache functions.
J.F./crazyc: I've found something very unusual going on with the cache invalidation functions called from me_loop(). Specifically, I've found that when I set postcache_len to -1 (to cause a call to dache_wbinv_all()), this doesn't seem to be correctly writing back the data cache, i.e. it's as if the function isn't being called.J.F. wrote:By the way, the latest incarnation of the MediaEngine prx is with the source for SNES9X_TYL 0.4.2 ME for 3xx. You can find an arc here:
http://chilly-willys-ice-flow.googlegro ... 3x-src.zip
...
What's very confusing is that if I call dcache_wbinv_all() twice in me_loop, everything works as expected.
Can anyone verify that this implementation is correct?
Code: Select all
void dcache_wbinv_all() {
int i;
for(i = 0; i < 8192; i += 64)
__builtin_allegrex_cache(0x14, i);
}
No, it's definitely 0x14 in the PSP kernel. Interestingly, in the sceKernelDcacheWritebackInvalidateAll, Sony does the cache operation twice in each iteration of the loop just like you suggest.StrmnNrmn wrote:J.F./crazyc: I've found something very unusual going on with the cache invalidation functions called from me_loop(). Specifically, I've found that when I set postcache_len to -1 (to cause a call to dache_wbinv_all()), this doesn't seem to be correctly writing back the data cache, i.e. it's as if the function isn't being called.
What's very confusing is that if I call dcache_wbinv_all() twice in me_loop, everything works as expected.
Can anyone verify that this implementation is correct?
Looking at the MIPS reference manual, it looks like 0x14 corresponds to a 'Fill' operation on the instruction cache? Shouldn't it be 0x15 (i.e. I'd expect the lowest 2 bits to be 0x1 which corresponds to an operation on the data cache...)Code: Select all
void dcache_wbinv_all() { int i; for(i = 0; i < 8192; i += 64) __builtin_allegrex_cache(0x14, i); }
The cache routines were from a previous post in this thread, so they may need some work. If the routine needs to be called twice, the change should be made. That might be why there are sometimes issues with the sound on the SNES emu using the ME. If you find any other problems, be sure to post on it. Eventually, we'll have a nice ME prx that's (hopefully) bug-free.
Thinking about the threads I've read on the PSP cache, it seems that most cache commands are run twice, supposedly because the MIPS cache is two-way set associative, meaning you have to do the command twice to affect both sets. So it would make sense that you would either have to call the function twice, or do it like this:
which is how you normally see PSP cache code.
Thinking about the threads I've read on the PSP cache, it seems that most cache commands are run twice, supposedly because the MIPS cache is two-way set associative, meaning you have to do the command twice to affect both sets. So it would make sense that you would either have to call the function twice, or do it like this:
Code: Select all
void dcache_wbinv_all() {
int i;
for(i = 0; i < 8192; i += 64)
__builtin_allegrex_cache(0x14, i);
__builtin_allegrex_cache(0x14, i);
}
That shouldn't be the case as 0x14 is index writeback invalidate and the psp has 128 (8192/64) total cache lines with 64 in each way. Each index should refer to an individual cache line.J.F. wrote: Thinking about the threads I've read on the PSP cache, it seems that most cache commands are run twice, supposedly because the MIPS cache is two-way set associative, meaning you have to do the command twice to affect both sets.
Isn't the cache 16kb? Hence it would be 16384/64 = 256 lines with 128 each way. And since it's index writeback there are only 128 indices, so the function used for(i = 0; i < 8192; i += 64).crazyc wrote:That shouldn't be the case as 0x14 is index writeback invalidate and the psp has 128 (8192/64) total cache lines with 64 in each way. Each index should refer to an individual cache line.J.F. wrote: Thinking about the threads I've read on the PSP cache, it seems that most cache commands are run twice, supposedly because the MIPS cache is two-way set associative, meaning you have to do the command twice to affect both sets.
<Don't push the river, it flows.>
http://wordpress.fx-world.org - my devblog
http://wiki.fx-world.org - VFPU documentation wiki
Alexander Berl
http://wordpress.fx-world.org - my devblog
http://wiki.fx-world.org - VFPU documentation wiki
Alexander Berl