Media Engine?
Hello
Sorry to bump this topic. I'm trying to continue research about the Media Engine, but I'm not a very good programmer (in fact programming is ok but with makefiles/gcc switches/unix command line and so, I'm very bad, so it's very difficult for me to work with this SDK).
I would like to know where researches have been done, has something more been done since executing a loop on the ME? I have searched over Internet but I found nothing; it seems the idea is abandoned (too complicated?).
From what I have seen it's not possible yet to make it execute C code directly from the project, as GCC generates absolute jumps (j) which makes the ME execute the code elsewhere (main memory or cache), and the program crashes short after (maybe when cache is flushed). I should relocate (maybe the .text section?) to 0xbfc00040, but I don't know how to do.
I tried crazyc's libme but after a lot of compilation problems (I also had to convert ./build to a standard Makefile because it must create a PBP file), it compiled, the elf is loaded (pspMeLoadExec returns 0) but it doesn't work (ASM code at 0xbfc00040 runs correctly, then jumps to k0 (883000f8) and then ME seems to crash... maybe the elf format generated by my compiler version is different? I can't even tell what it does execute since I have no debugger).
Any info about the ME (or at least how to use melib) would be very appreciated, thanks ^^
Sorry to bump this topic. I'm trying to continue research about the Media Engine, but I'm not a very good programmer (in fact programming is ok but with makefiles/gcc switches/unix command line and so, I'm very bad, so it's very difficult for me to work with this SDK).
I would like to know where researches have been done, has something more been done since executing a loop on the ME? I have searched over Internet but I found nothing; it seems the idea is abandoned (too complicated?).
From what I have seen it's not possible yet to make it execute C code directly from the project, as GCC generates absolute jumps (j) which makes the ME execute the code elsewhere (main memory or cache), and the program crashes short after (maybe when cache is flushed). I should relocate (maybe the .text section?) to 0xbfc00040, but I don't know how to do.
I tried crazyc's libme but after a lot of compilation problems (I also had to convert ./build to a standard Makefile because it must create a PBP file), it compiled, the elf is loaded (pspMeLoadExec returns 0) but it doesn't work (ASM code at 0xbfc00040 runs correctly, then jumps to k0 (883000f8) and then ME seems to crash... maybe the elf format generated by my compiler version is different? I can't even tell what it does execute since I have no debugger).
Any info about the ME (or at least how to use melib) would be very appreciated, thanks ^^
Okay thanks.
I rewrote my own loader, which is more simplist, it justs resets the ME and loads a main routine for it. Then it can execute code from RAM just like the main processor (or at least I think it should be okay...)
But can you tell me if you did go further about the ME?
[Edit] I got some more code to work, however the lack of kernel functions / vram access is annoying and limits terribly what we can do with it :(
The base Media Engine impelementation accesses the kernel, no? So it should be possible to use it also here?
I rewrote my own loader, which is more simplist, it justs resets the ME and loads a main routine for it. Then it can execute code from RAM just like the main processor (or at least I think it should be okay...)
But can you tell me if you did go further about the ME?
[Edit] I got some more code to work, however the lack of kernel functions / vram access is annoying and limits terribly what we can do with it :(
The base Media Engine impelementation accesses the kernel, no? So it should be possible to use it also here?
Yes I remarked also, but it's not VRAM. According to the Sony's block diagram, ME has access to VRAM trough the main bus.
It might be a stupid question since I know nothing about the PSP kernel, but might it be possible to call OS functions (sceKernel...) manually (i.e not passing trough the ME kernel), like a set of ROM CALLs in low-specs systems?
It might be a stupid question since I know nothing about the PSP kernel, but might it be possible to call OS functions (sceKernel...) manually (i.e not passing trough the ME kernel), like a set of ROM CALLs in low-specs systems?
Even if the kernel were reentrant or had locks at every entrypoint, any call that touched any mmio would crash.It might be a stupid question since I know nothing about the PSP kernel, but might it be possible to call OS functions (sceKernel...) manually (i.e not passing trough the ME kernel), like a set of ROM CALLs in low-specs systems?
communication between the two cores could get implemented via shared mailbox memory in kernelspace or SRAM (kind of pseudo-RPC, e.g. using NIDs - then the ME can execute kernel calls indirectly over the main core). In addition with a simple scheduler on the ME core then general POSIX threading may work and be useful in general applications, the PSP would appear as dual-processor system...
Could do that but note the CPU's aren't cache coherent, you wouldn't want to do it much.holger wrote:communication between the two cores could get implemented via shared mailbox memory in kernelspace or SRAM (kind of pseudo-RPC, e.g. using NIDs - then the ME can execute kernel calls indirectly over the main core). In addition with a simple scheduler on the ME core then general POSIX threading may work and be useful in general applications, the PSP would appear as dual-processor system...
writes to the mailbox would need to be uncached, sure (a minimal protocol could e.g. only transfer the register set including the PC of the called function, this would also work bidirectionally). It's also likely that there is an interrupt line connecting the cores, this may get used for notification...crazyc wrote:Could do that but note the CPU's aren't cache coherent, you wouldn't want to do it much.holger wrote:communication between the two cores could get implemented via shared mailbox memory in kernelspace or SRAM (kind of pseudo-RPC, e.g. using NIDs - then the ME can execute kernel calls indirectly over the main core). In addition with a simple scheduler on the ME core then general POSIX threading may work and be useful in general applications, the PSP would appear as dual-processor system...
To be sure the communication interrupt handler could also flush caches. Nevertheless... sounds like some amount of work to do; we're basically talking about a mini-OS.
may be worth an investigation. Transferring the register set of the calling function has the charme of extremely trivial implementation and minimal overhead -- no NIDs need to get resolved, register sets are saved/restored on interrupts anyways, so no need to write much new code.mrbrown wrote:Er, there's already RPCs to the ME. Consult your local NID list.
-
- Posts: 47
- Joined: Wed Dec 15, 2004 4:23 am
it looks like $22 is 0 on system core processor and !=0 on media engine processor. look at the exception vector at 0xBFC00000, the code is like this:crazyc wrote:sceSysregInterruptToOther probably triggers an interrupt on the ME. Also cop0 $22 appears to be connected between the two.It's also likely that there is an interrupt line connecting the cores, this may get used for notification...
-----------8<------------
$c0c6 = $v0;
$v0 = $c0r22; //processor detection
if (0 != $v0) goto 0xBFC00040; //exception vector for ME
call $c0c9;
----------->8------------
they use $22 for processor detection, as probably $PrID is the same.
Reffering to:crazyc wrote:BTW, there is something mapped at 0x04000000, where the VRAM is on the main cpu, but I'm not sure what it is.
http://pc.watch.impress.co.jp/docs/2004 ... gai_3a.gif
It could be that 2 MB sub-memory in the block diagram, but I'm not sure. The fact is that it's not the same memory as the one the main processor has access.
It seems that this eDRAM is faster than the main (32 MB) DDR, if we could relocate a program from which the main loop is too big to be placed in I-cache, we might get a speed improvement (but maybe I'm totally wrong).
What do you think about this? I don't know if a better RAM speed would enhance things a lot or if it could even be used as normal memory... (it's 512 bits, right?)
The local ram is at 0x80000000, in my library I use it for heap and stack, that seems to be what sony does too. What's at 0x04000000 is probably mmio, but all I know for sure is that it's not ram.Brunni wrote:Reffering to:crazyc wrote:BTW, there is something mapped at 0x04000000, where the VRAM is on the main cpu, but I'm not sure what it is.
http://pc.watch.impress.co.jp/docs/2004 ... gai_3a.gif
It could be that 2 MB sub-memory in the block diagram, but I'm not sure. The fact is that it's not the same memory as the one the main processor has access.
It seems that this eDRAM is faster than the main (32 MB) DDR, if we could relocate a program from which the main loop is too big to be placed in I-cache, we might get a speed improvement (but maybe I'm totally wrong).
What do you think about this? I don't know if a better RAM speed would enhance things a lot or if it could even be used as normal memory... (it's 512 bits, right?)
Okay thx
I did some tries to get it to work and synchronize with the main CPU. But I'm having some serious problems. I was first invalidating the d-cache each time I finished to write informations to variables that should be used by the ME, but it slows down quite a lot.
So I replaced this method with writing to uncached addresses. Altrough I've found anywhere, I assumed that I could also add 0x40000000 to bypass cache with RAM variables (0x80000000). I did a simple synchronization:
Here it is okay. But when it becomes more complex, it doesn't work anymore. For example, if I add some code (which uses ONLY local variables) after the *pbSync=1 in the main routine, sometimes *pbSync won't be modified (or only much later), like if it was placed in cache, then the framerate goes down to 1 fps because pbSync is not seen immediately by the ME. The same thing if I do:
I do not access to pglobalCounter anywhere else, so it should work, 'i' should be a counter mirroring globalCounter. But it isn't! It is blocked, but not if I remove the additionnal code between the two instructions... I really don't know what's happening... It seems it's provoked by the cache flush of the ME and the main CPU (conflicts), but it should not as I access ONLY uncached variables from the Media Engine. It doesn't seems to be completely uncached tough...
I worked on it 4 whole days, but didn't found any solution. Here is the code: http://infernobox.dyndns.org/brunni/PSP_MediaEngine.zip
If you have any suggestions for helping me, I would really appreciate, thanks. You can modify and use my code if you want, but if you find a solution, I would really appreciate if you explain it to me. :)
I did some tries to get it to work and synchronize with the main CPU. But I'm having some serious problems. I was first invalidating the d-cache each time I finished to write informations to variables that should be used by the ME, but it slows down quite a lot.
So I replaced this method with writing to uncached addresses. Altrough I've found anywhere, I assumed that I could also add 0x40000000 to bypass cache with RAM variables (0x80000000). I did a simple synchronization:
Code: Select all
[... shared variables ...]
typedef unsigned int BOOL;
volatile BOOL bSync;
#define GetUncachedPtr(address) ((void*)((u32)(address)|0x40000000))
[... main code ...]
void main(void) {
volatile BOOL *pbSync;
pbSync = GetUncachedPtr(&bSync);
MediaEngine_Boot(MediaEngine_Main); //Boots the ME to our routine
*pbSync = 1;
while (*pbSync);
printf("Execution done correctly");
}
[... media engine code ...]
void MediaEngine_Main() {
volatile BOOL *pbSync;
pbSync = GetUncachedPtr(&bSync);
//Message handling loop
while(1) {
if (*pbSync)
*pbSync = 0;
}
}
Code: Select all
[media engine code]
void MediaEngine_Main() {
volatile BOOL *pbSync;
volatile int *pglobalCounter;
pbSync = GetUncachedPtr(&bSync);
pglobalCounter = GetUncachedPtr(&globalCounter);
//Message handling loop
while(1) {
if (*pbSync) {
i = *pglobalCounter;
[Some more code, that accesses global backbuffer]
(*pglobalCounter) = (*pglobalCounter) + 1;
}
}
}
I worked on it 4 whole days, but didn't found any solution. Here is the code: http://infernobox.dyndns.org/brunni/PSP_MediaEngine.zip
If you have any suggestions for helping me, I would really appreciate, thanks. You can modify and use my code if you want, but if you find a solution, I would really appreciate if you explain it to me. :)
Running C code on the ME
OK I went ahead and merged code from a few of the posts here, and made an example which allows you to easily run C code on the ME (with the usual limitations on kernel calls etc).
The file is here: http://www.stashbox.org/uploads/1129672725/meccode.zip
it's based on the SDK's basic ME example.
The file is here: http://www.stashbox.org/uploads/1129672725/meccode.zip
it's based on the SDK's basic ME example.
sorry for the n00b question, is the me running at the same speed of the main cpu?
and is scePowerSetClockFrequency affecting the me too?
(i'm trying to figure out how much speed i can gain using it in parallel with the main cpu to make integer yuv to rgb csc and integer idct)
EDIT: forgive me i've got a color space conversion running ~2x ^^ (and my answers)
xdeadbeef & all the others, thanks for the code
and is scePowerSetClockFrequency affecting the me too?
(i'm trying to figure out how much speed i can gain using it in parallel with the main cpu to make integer yuv to rgb csc and integer idct)
EDIT: forgive me i've got a color space conversion running ~2x ^^ (and my answers)
xdeadbeef & all the others, thanks for the code