ME library - a new project for a more elaborate ME library

Discuss the development of new homebrew software, tools and libraries.

Moderators: cheriff, TyRaNiD

hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

ME library - a new project for a more elaborate ME library

Post by hlide »

Ok,

Long time ago, I did some research and test about ME : one of my goals was to know how to use ME interrupt to allow better communications between ME and SC.

Right now, there is no real improvement on ME side, so I think it could be a good idea to set a new project where some skilled people can work together on it and improve it tremendously.

For those who are interested to participate in the development of this library, we need an SVN workspace. I don't know if ps2dev admins can set one here or I should setup one to www.assembla.com/www.sourceforge.org/etc. and invite those who are interested

Expected features :
- kernel prx (similar to JF's MediaEngine.prx)
- a mini kernel where exceptions and irqs on ME are handled (at least no Black Screen Of Death)
- can run code on user mode on ME (may or not be of any interest)
- syscall can be usable on ME, not as equivalent SC syscall but as a mean to execute ME kernel functions when running code in user mode
- special ME interrupt is used to sync exchanges between SC and ME

Right now, I'm writing asm code for a generic exception/irq/syscall handling so we we could run a permanent mini kernel on ME.

So who are interested ?
Last edited by hlide on Sun Jan 27, 2008 4:53 am, edited 1 time in total.
KickinAezz
Posts: 328
Joined: Sun Jun 03, 2007 10:05 pm

Post by KickinAezz »

I know what is ME.
What is SC?
Intrigued by PSP system Since December 2006.
Use it more for Development than for Gaming.
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

KickinAezz wrote:I know what is ME.
What is SC?
SC = System Control, that is the main CPU with VFPU
J.F.
Posts: 2906
Joined: Sun Feb 22, 2004 11:41 am

Post by J.F. »

Count me in. Any chance for better ME support is something I'm highly motivated toward.
fungos
Posts: 41
Joined: Wed Oct 31, 2007 10:43 am
Location: cwb br
Contact:

Post by fungos »

I'm not a skilled psp developer, but if any extra coding is needed and wanted I can help.
User avatar
jean
Posts: 489
Joined: Sat Jan 05, 2008 2:44 am

Post by jean »

Cool!! Really. Unfortunately I don't think I could be of help since with ME hardware, I'm still experimentig basics... However, from what i can see(and given my experience on ME i could probably be mistaken) , i don't think that making ME code capable of calling usermode code will be of interest. Instead, the capability to execute a bunch of "kernel" functions (designed ad-hoc for ME) and a better communication system with SC will make a huge diference. Since ME is not in charge to handle basic interrupts or what-so-ever, if it cycles in a tight loop doing nothing but expecting someone feeding it a function pointer (like in color cyle demo) it's not very bad, but i suspect that power-drain of such a solution will be relevant. So, utilizing interrupts to handle asynchronous "delegation of service" from SC to ME would be great.
SilverSpring
Posts: 110
Joined: Tue Feb 27, 2007 9:43 pm
Contact:

Re: ME library - a new project for a more elaborate ME libra

Post by SilverSpring »

hlide wrote:
So who are interested ?
Absolutely :)

Btw, SC == System Control, are you sure about this one? Is it documented anywhere? Because normally, SCE uses "System Control" to designate the syscon chip, not the main cpu.

ME == Media Engine, SCE lists this one plenty of times, but SC I've never really seen what it really stood for, apart from it being used in various nids:


Code: Select all

sceSysregScResetEnable
sceDmacplusSc2MeInit
sceDmacplusMe2ScInit
sceDmacplusSc128Init
scePowerLimitScCpuClock
scePowerLimitScBusClock
etc.
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Re: ME library - a new project for a more elaborate ME libra

Post by hlide »

SilverSpring wrote:Btw, SC == System Control, are you sure about this one? Is it documented anywhere? Because normally, SCE uses "System Control" to designate the syscon chip, not the main cpu.

ME == Media Engine, SCE lists this one plenty of times, but SC I've never really seen what it really stood for, apart from it being used in various nids:

in fact ME is not really a cpu :)

My hint :

ME stands for Media Engine, that is, all the hardware system under the control of the second cpu. Whereas I think SC stands for System Control(ler?), that is all the hardware system under the control of the main cpu.

By extension, we can say ME cpu and SC cpu as a way to distinguish which cpu we speak about.

Satisfied ? ;)
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

jean wrote:However, from what i can see(and given my experience on ME i could probably be mistaken) , i don't think that making ME code capable of calling usermode code will be of interest. Instead, the capability to execute a bunch of "kernel" functions (designed ad-hoc for ME) and a better communication system with SC will make a huge dif[f]erence. Since ME is not in charge to handle basic interrupts or what-so-ever, if it cycles in a tight loop doing nothing but expecting someone feeding it a function pointer (like in color cyle demo) it's not very bad, but i suspect that power-drain of such a solution will be relevant. So, utilizing interrupts to handle asynchronous "delegation of service" from SC to ME would be great.
ok, I guess there is not interest about user mode for ME cpu, so I finally remove the overhead code to handle properly the kernel stack when an exception/irq/syscall interrupts a user mode code. Since I have no answer about the a SVN access here, I created pspme workspace on www.assembla.com. For those who want to be a member, you need to register there and give me you assembla nickname so i can invite you.
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

well i think this workspace is public and so you can access at : http://www.assembla.com/spaces/new_items/pspme but you don't have edit access.

I found some old sources I wrote a long time ago where I used Sysreg function to install an ME/SC sub-interrupt. Maybe we could reuse the same trick for this project. Apparently, when you raise a ME/SC interrupt from a cpu, both cpus execute the interrupt. It may be awkward, but i feel we can use this fact to allow the sender to synch with the receiver :

The sender requires the ME/SC mutex, checks if the receiver is waiting for a request : if so, the sender dequeues a request in the shared 0xBFC00xxx space and issue an interrupt. If not, the sender enqueues a request in a local fifo and releases the ME/SC mutex.

When the sender and the receiver is in an irq callback, they check who owns the ME/SC mutex :

the sender always owns the mutex. The receiver knows it has to get the request stored at 0xBFC00xxx space and enqueues it in its local FIFO list and sets a flag to the sender for the next request. The sender would wait for the receiver flag (in 0xBFC00xxx space too) and tries to dequeue and store the next request in the shared space until no request remains in local FIFO list. When the sender has no request, it releases the ME/SC mutex and the receiver would also know it is over by checking this ME/SC mutex at the end of each request.

processor SC would send an event to a thread so the later can proceed ME requests stored in a local FIFO list.

processor ME would probably have a special loop which executes the SC requests.

so we need 4 FIFO lists : 2 for SC, 2 for ME. One for in requests, one for out requests.

Please, if you can see some misconceptions or flaws, don't be afraid to address them.
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

regarding sysreg, I made some reverse engineering and modify it as I think it may work for illustration :
sysreg can handle 32 sub-interrupts for ME interrupt. So there is special subintr descriptor to let kernel handle some tasks.

In PSPSDK, I have "int sceKernelRegisterIntrHandler(int intno, int no, void *handler, void *arg1, void *arg2);" where :
  • intno - The interrupt number to register.
    no - The queue number.
    handler - Pointer to the handler.
    arg1 - Unknown (probably a set of flags)
    arg2 - Unknown (probably a common pointer)
no : queue number ???? what is it ?

i think arg1 is a data pointer to pass to a interrupt handler (that's why I replace the global data me_data_intr access with an argument access in most functions in the following code instead of nothing like in the original sysreg).

arg2 is the most difficult to determine but i think i found out the purpose : it describe a small structure which seems to indicate that intr uses subintr mechanism. The structure is 3 words. The first word is for the size of the structure itself (Sony really likes to do so), the second which is 32 in sysreg seems to be the max number of subintr for this intr and the last word seems to be a pointer on a subintr interface.

the structure which seems to be subintr interface is 44-word long. The first word is size of the structure itself and the other are null or point out on functions which mimic an enable/disable/suspend/resume/request subintr (probably those function in pspintrman.h like sceKernelEnableSubIntr, etc.)

here is the code heavily inspired from sysreg

Code: Select all

#define SCE_ERROR_NOT_SUPPORTED	    (0x80000004)
#define SCE_ERROR_INVALID_INDEX     (0x80000102)
#define SCE_ERROR_INVALID_MODE      (0x80000107)
#define SCE_ERROR_INVALID_VALUE     (0x800001FE)

#define c0_read_cpuid() ({ u32 res; __asm__ __volatile__ ("mfc0 %0, $22" : "=r"(res)); res; })
#define rotrv(x, s) ({ u32 res; __asm__ __volatile__ ("rotrv %0, %1, %2" : "=r"(res) : "r"(x), "r"(s)); res; })
#define clz(x) ({ u32 res; __asm__ __volatile__ ("clz %0, %1" : "=r"(res) : "r"(x)); res; })

extern u32  me_enter_critical_session();
extern void me_leave_critical_session();
extern void me_interrupt();

typedef struct
{
    u32 mask;
    int bit;
}  me_intr_data_t;

typedef struct
{
    u32 size; /* hlide : methinks it contains the size of the structure itself */
    u32 res0[4];
    u32 (*func1)(me_intr_data_t *, int);
    u32 (*func2)(me_intr_data_t *, int);
    u32 (*func3)(me_intr_data_t *, int);
    u32 (*func4)(me_intr_data_t *, int, int *);
    u32 (*func5)(me_intr_data_t *, int, int);
    u32 (*func6)(me_intr_data_t *, int);
    u32 res1[1];
} me_subintr_interface_t;

typedef struct
{
    u32 size; /* hlide : methinks it contains the size of the structure itself */
    u32 nsubintr; /* hlide : methinks it contains the max number of subintr available */
    me_subintr_interface_t *interface; /* hlide : methinks it contains the pointer on an interface */
} me_subintr_desc_t;

me_intr_data_t me_intr_data;

int me_subintr_enable(int no)
{
    if (no > 31) return SCE_ERROR_INVALID_INDEX;

    u32 old = sceKernelCpuSuspendIntr();

    int res = (me_intr_data.mask>>no)&1;

    me_intr_data.mask |= &#40;1<<no&#41;;

    sceKernelCpuResumeIntr&#40;old&#41;;

    return res;
&#125;

int me_subintr_disable&#40;int no&#41;
&#123;
    if &#40;no > 31&#41; return SCE_ERROR_INVALID_INDEX;

    u32 old = sceKernelCpuSuspendIntr&#40;&#41;;

    int res = &#40;me_intr_data.mask>>no&#41;&1;

    me_intr_data.mask &= ~&#40;1<<no&#41;;

    sceKernelCpuResumeIntr&#40;old&#41;;

    return res;
&#125;

int me_subintr_request&#40;int cpu, int no&#41;
&#123;
    if &#40;&#40;no > 31&#41; || &#40;cpu > 1&#41;&#41; return SCE_ERROR_INVALID_INDEX;

    me_enter_critical_session&#40;&#41;;

    *&#40;volatile u32 *&#41;&#40;0xbfc0400+&#40;cpu<<2&#41;&#41; |= &#40;1<<no&#41;;

    me_leave_critical_session&#40;&#41;;

    if &#40;cpu != c0_read_cpuid&#40;&#41;&#41; me_interrupt&#40;&#41;;

    return 0;
&#125;

int me_sc_intr_handler&#40;me_intr_data_t *data, int no, me_subintr_desc_t *desc&#41;
&#123;
    int cpu = c0_read_cpuid&#40;&#41;;

    u32 *mask = &#40;u32 *&#41;0xbfc00400;

    if &#40;mask&#91;cpu&#93; & data->mask&#41;
    &#123;
        int bit = &#40;data->bit+31-clz&#40;rotrv&#40;mask&#91;cpu&#93;, data->bit&#41;&#41;&#41;&31;

        me_enter_critical_session&#40;&#41;;

        &#40;&#40;volatile u32 *&#41;0xbfc00400&#41;&#91;cpu&#93; &= ~&#40;1<<bit&#41;;

        me_leave_critical_session&#40;&#41;;

        data->bit++;
        data->bit &= 31;

        sceKernelCallSubIntrHandler&#40;data, no, no, desc&#41;;
    &#125;

    return -1;
&#125;

static u32 subintr_set_flag&#40;me_intr_data_t *data, int no&#41;
&#123;
    data->mask |= 1<<no;

    return 0;
&#125;

static u32 subintr_clear_flag&#40;me_intr_data_t *data, int no&#41;
&#123;
    data->mask &= ~&#40;1<<no&#41;;

    return 0;
&#125;

static u32 subintr_suspend&#40;me_intr_data_t *data, int no, int *flag&#41;
&#123;
    if &#40;flag&#41; *flag = &#40;data->mask>>no&#41;&1;

    data->mask &= ~&#40;1<<no&#41;;

    return 0;
&#125;

static u32 subintr_resume&#40;me_intr_data_t *data, int no, int flag&#41;
&#123;
    data->mask |= &#40;flag<<no&#41;;

    return 0;
&#125;

static u32 subintr_request&#40;me_intr_data_t *data, int no&#41;
&#123;
    int mask = *&#40;volatile u32 *&#41;&#40;0xbfc00400+&#40;c0_read_cpuid&#40;&#41;<<2&#41;&#41;>>no;
    return mask & 1;
&#125;

me_subintr_interface_t me_subintr_interface =
&#123;
    sizeof&#40;me_subintr_interface_t&#41;,
    0,
    0,
    0,
    0,
    subintr_clear_flag,
    subintr_set_flag,
    subintr_clear_flag,
    subintr_suspend,
    subintr_resume,
    subintr_request,
    0
&#125;;

me_subintr_desc_t me_subintr_desc =
&#123;
    sizeof&#40;me_subintr_desc_t&#41;,
    32,
    &me_subintr_interface
&#125;;

int me_sc_intr_start&#40;&#41;
&#123;
    sceKernelRegisterIntrHandler&#40;
        31, /* intrno */
        1, /* ??? */
        &me_sc_intr_handler /* intr handler */,
        &me_intr_data /* probably a data to pass to intr handler */,
        &me_subintr_desc /* probably a pointer on a descriptor to tell how to handle subintr */
    &#41;;
    sceKernelEnableIntr&#40;31&#41;;
    return 0;
&#125;

int me_sc_intr_stop&#40;&#41;
&#123;
    sceKernelReleaseIntrHandler&#40;31&#41;;
    return 0;
&#125;
regarding the project, i don't think i will use this subintr mechanism but a simple intr mechanim to keep this interrupt simple on ME side.
J.F.
Posts: 2906
Joined: Sun Feb 22, 2004 11:41 am

Post by J.F. »

The "queue number" is probably a priority which is used when inserting the node in the linked list. For Amiga-folks, this is very recognizable. The Amiga kept a list header for each interrupt handler; when you added an interrupt routine, it was inserted into the list based on the priority, where higher priority nodes were linked before lower priority nodes. When the interrupt occurred, the main handler just stepped through the list knowing that it was already sorted by priority, calling each one until a return value told it to stop stepping through the list entries. So if you wanted your vertical blank routine to be executed before the graphics library vbint, you gave it a priority higher than the graphics library vbint (Amiga told you what it was so you could do that).
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

On SC side :

When SC processor receives an interrupt because ME processor wants to send a request or a result, I want to call a deferred callback so the interrupt handler may be as short as possible to run. I know a callback can only be executed only in the thread where this callback is created. Probably I need to create a kernel thread with higher priority which would sleep through a call to sceKernelSleepThreadCB so the callback can be executed whenever notified.

Why callbacks ? if ME wants to make SC processor executes some code, I cannot make execute this function in the interrupt handler so the only possibility seems to be to use a deferred callback in a kernel thread with a highest priority thread. Since ME is running kernel code, I wonder if it makes sense to run kernel function on SC processor. Grrr, I really forgot the user/kernel separation on SC processor.

A deferred callback will handle a list of request to execute (after invalidating dcache in precache) on SC processor.

Another deffered callback will handle a list of results (after invalidating dcache in postcache) on SC processor.

I have a hard job to sort out about how to handle results :

(1)- SC issues an asynchronous call without result : quite easy, ME never issues a result for those.

(2)- SC issues an asynchronous call with result : SC processor should use "res = me_poll/wait_result(rpc, &res);" or something like that to get the result. In case of "me_wait_result", it would be great to make the thread passively wait for the result. To wait, we probably need to save the id from "sceKernelGetThreadId" into rpc->thid then "sceKernelSleepThreadCB" itself so the deferred callback can wake up the right thread with sceKernelWakeupThread(rpc->thid).

(3)- SC issues a synchronous call with result : the same thing as (2) but there is a call "me_wait_result" just after "me_call_asynch" which sleeps the calling thread until a result arrives.

Any remarks ?
J.F.
Posts: 2906
Joined: Sun Feb 22, 2004 11:41 am

Post by J.F. »

Could the ME set a semaphore for the SC? If so, I think that would be the way to handle this. No extra threads, no callbacks, just test/wait on a semaphore. Maybe have the SC set the semaphore in the exception handler if the ME can't set it directly.

If you want the SC to do something in response to the ME, you could then make a thread, the thread would then set up a semaphore, tell the ME to do something, then wait on the semaphore. If the app thinks these things should be a higher priority, the programmer can just use a higher priority when creating the thread. If you just wanted to do ME calls, then you don't need the extra thread, just the semaphore. We want to keep the overhead for the ME library down, and I think this would be smaller and cleaner.
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

J.F. wrote:Could the ME set a semaphore for the SC? If so, I think that would be the way to handle this. No extra threads, no callbacks, just test/wait on a semaphore. Maybe have the SC set the semaphore in the exception handler if the ME can't set it directly.
what do you call a semaphore here ?
J.F.
Posts: 2906
Joined: Sun Feb 22, 2004 11:41 am

Post by J.F. »

hlide wrote:
J.F. wrote:Could the ME set a semaphore for the SC? If so, I think that would be the way to handle this. No extra threads, no callbacks, just test/wait on a semaphore. Maybe have the SC set the semaphore in the exception handler if the ME can't set it directly.
what do you call a semaphore here ?
Well, the semaphores the sdk uses would be a start. :) But I'm not sure how those are implemented. We might need our own semphores done with sc/ll if the sdk semaphores can't be used this way (can't use in an exception handler, for example).

So a program would call sceKernelCreateSema() to be used by ME calls (if multiple async calls, create more than one sema); it would then call the ME lib to start an async call with something like meCallAsync(func, arg, sema); then later the program polls or waits on the semaphore it passed to meCallAsync(). The ME would do the function and cause the exception when the result is ready. The exception handler for the SC would then do sceKernelSignalSema() on the semaphore (this is the part I'm not sure of... can sceKernelSignalSema() be used inside an exception handler). The program would be using either sceKernelPollSema() or sceKernelWaitSema() and see the semaphore set by the handler and know the ME code was done.

Very simple and straightforward, but as I said above, I'm not sure if we'll be able to use the sce semaphore routines. We might have to make our own int safe functions if the sce functions can't be used from an exception handler.
User avatar
jean
Posts: 489
Joined: Sat Jan 05, 2008 2:44 am

Post by jean »

The general structure of such a library should be something like this:
1) registration of needed services (each "service" structure has an ID, a function pointer to the service implementor itself, a priority and maybe a little contextualization data) Once registration data is stored, no operation but those registered can be executed on ME: this will help reduce queue overhead
2) Inizialization - bootstrap of ME with code aware of our service collection
3) async calls only by ID...developers can define symbolic constants for readability and maintainability. This way each "service" can be initialized once and for all and readily called without per-call overhead.
4) when an async call is performed from SC to ME, an address to a memory block allocated for results and one to a semaphore-like struct must be passed as well
5) once the request on SC is signalled to ME, flow continue to exploit parallelism
6) since results from ME are obviously to be used in SC code, somewere in the SC code there must be a resync - a wait on previously defined semaphore like structure (are my results ready?). Once the semaphore conditions are satisfied, results can be retrieved at data-addr specified in the service call.
So, the only thing we still can't afford is the synchronization mechanism...
Even if I still don't know ME hardware that much, I worked alot with asynchronous calls logic. In another topic I told that mutex (or semaphores...basic concept is the same) are always poll-and-wait, i.e. ideally a cycle waiting for a condition to be true. This is not completely true, because in thread conditional blocks (like semaphores), things are generally managed by the OS moving around instruction pointer and not pointlessly cycling. If I had to implement a cross-cpu flag signalling system, the first try would be to check on a shared memory block testing for some value; but doing this way (as everyone sees) there's a waste of both cpu's power, and the whole thing is not very elegant. So, I think that sceCreateSema() and sceKernelWaitSema() are to be analyzed in order to create our cross-cpu non-cpu-time-wasting semaphores.
A question: maybe you already said that and i'm missing something but...why are you speaking of shared-memory space? Can't ME address the entire memory??

PS:i'll have to go deeper in ME structure...i have to do a 45' presentation on "Advanced Micro Architectures" and i think PSP achitecture is funny enough!!
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

J.F. wrote:Well, the semaphores the sdk uses would be a start. :) But I'm not sure how those are implemented. We might need our own semphores done with sc/ll if the sdk semaphores can't be used this way (can't use in an exception handler, for example).

So a program would call sceKernelCreateSema() to be used by ME calls (if multiple async calls, create more than one sema); it would then call the ME lib to start an async call with something like meCallAsync(func, arg, sema); then later the program polls or waits on the semaphore it passed to meCallAsync(). The ME would do the function and cause the exception when the result is ready. The exception handler for the SC would then do sceKernelSignalSema() on the semaphore (this is the part I'm not sure of... can sceKernelSignalSema() be used inside an exception handler). The program would be using either sceKernelPollSema() or sceKernelWaitSema() and see the semaphore set by the handler and know the ME code was done.
just have a look on my last commit as i take into account several things you points out :

- just having SC processor to send requests through a requests FIFO list, the head and tail of which are shared between ME and SC processors and the access of which is protected through the ownership of ME/SC hardware mutex by SC processor. No interrupt is issued as ME processor would probably have a loop to scan new requests in the request FIFO list to execute them.

- just having ME processor to send results through a results FIFO list, the head and tail of which are also shared between ME and SC processors and the access of which is also protected through the ownership of ME/SC hardware mutex by ME processor. A ME interrupt is issued by ME processor while still owning the ME/SC hardware mutex so it can signal to SC processor new results to take into account (and so to let SC wakeup the threads waiting for a result)

Mainly, ME processor has no ME interrupt to handle (so it is masked for ME processor) and relies on a requests loop which scans a new request from the shared request FIFO list when SC processor is not owning the ME/SC hardware mutex and issues a ME interrupt to SC processor after getting the ownership of this hardware mutex and inserting a new result in the results FIFO list.

the code for ME processor is still still work in progress.

NOTE:
ME and SC processors has a shared memory which doesn't need a dcache invalidate/writeback to synch data, which lies between 0xbfc000000 and 0xbfc001000. I'm using 0xbfc00600-0xbfc00610 to store the head and tail of both FIFO lists i was spoken about.
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

note that me_rpc_t structure has some similarities with J.F.'s me_struct structure but instead of having a unique instance, you can have several instances so they can be inserted in a request fifo list without the need to wait a result just after a rpc call. the result fifo list will contain those rpc instances, the functions of which were executed. maybe we should also add a fast fixed rpc structure allocator to allow SC issue several asynchronous rpc calls without the need to wait a result except maybe for the last one and without the need to allocate explicitely those rpc structures (also note they need to be aligned at 64-byte boundary as a cache-line since ME and SC processor needs to prefetch/flush them properly through me_dcache_prefetch_line/me_dcache_wb_line).
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

jean wrote:The general structure of such a library should be something like this:
1) registration of needed services (each "service" structure has an ID, a function pointer to the service implementor itself, a priority and maybe a little contextualization data) Once registration data is stored, no operation but those registered can be executed on ME: this will help reduce queue overhead
priority... i guess if we add it, we need to make ME processor reorder requests rather than SC processor. Right now, there is no queue overhead as the queue is just a single list containing pointers on rpcs, which may be seen as ids as well.
one thing is sure, if you want to reduce overhead you should be sure to make ME processor to proceed a big function instead of small functions (in term of time of course). For instance, ME processor would compute the physics of the objects for the next frame where as SC draws the objects in the current frame (just an example). If only ME processor can have VFPU too :/. I wouldn't bet the slim PSP adds a VFPU in the second processor.
jean wrote:A question: maybe you already said that and i'm missing something but...why are you speaking of shared-memory space? Can't ME address the entire memory??
ME processor can address the main ram at the same address as SC processor can but there is no cache snooping. At the same time, there is a small memory shared between ME and SC processors between 0xbfc000000 and 0xbfc01000 and maybe between 0xbfc001000 and 0xbfc002000 too. I decide to use 0xbfc000600-0xbfc000610 to share heads and tails of fifo lists between ME and SC processors because we don't need invalide/writeback dcache this way.

In fact, the original J.F.'s me_struct could have been in this space instead of being in an uncached main memory.

Don't ask me the access time for this shared memory, i dunno, they may be better as they may be worse. I guess we need to test them.
J.F.
Posts: 2906
Joined: Sun Feb 22, 2004 11:41 am

Post by J.F. »

hlide wrote:If only ME processor can have VFPU too :/. I wouldn't bet the slim PSP adds a VFPU in the second processor.
One of us Slim owners should test that some time. :)
At the same time, there is a small memory shared between ME and SC processors between 0xbfc000000 and 0xbfc01000 and maybe between 0xbfc001000 and 0xbfc002000 too. I decide to use 0xbfc000600-0xbfc000610 to share heads and tails of fifo lists between ME and SC processors because we don't need invalide/writeback dcache this way.

In fact, the original J.F.'s me_struct could have been in this space instead of being in an uncached main memory.
I did not know that. Now I do. If I had known that, I'd have probably made a function for the ME lib I did that allocate the me_struct from that memory. I thought the memory used for the ME init was a small register area that could serve as limited pointers for starting the ME, but a small chunk of memory (larger than I was thinking) makes more sense.
User avatar
jean
Posts: 489
Joined: Sat Jan 05, 2008 2:44 am

Post by jean »

Well, I don't think chache sync would be an issue if our elaborations on ME are big enough...if we do image processing/sound generation/(put your beefy media task here) on ME, and hence process banks of 300Kb at a time, chache invalidation would not be the bottleneck...I would never bother to put my computeMeanOfThreeNumbers(int a, int b, int c); executing on ME :)))
Am I wrong?

PS:now that i'm playing around with code,i see priority is not really needed but...
PPS: Yes, one big function would be more performing than some little...but then the whole idea of an ME library spreading the use of the second processor would be pointless...What about setting priorities in the service registration process, before ME bootstrap? We could prepare a "fixed-function-pipeline" before entering in critical loops... I used a similar trick in my wiiMote library
User avatar
jean
Posts: 489
Joined: Sat Jan 05, 2008 2:44 am

Post by jean »

I found some old sources I wrote a long time ago where I used Sysreg function to install an ME/SC sub-interrupt
Ok, please read this carefully:
I'm not analyzing sceKernelWaitSema or similar functions (like i said someone has to do :) ), but writing down my presentation, i realized that a trick could be used in order to achieve a good result:
1) ME can endlessly cycle waiting for something to do for now...maybe in future we can bother to sleep or wake up the CPU on demand (maybe we can simply lower ME CPU speed to reduce power consumption: sony's specifics say that ME can run from 1 to 333 MHz: let's set it to the lowest possible value and set it higher just before delegation request)
2) on the SC side we can setup the intr handler hlide was speaking of, to be fired by ME and executed on SC when a communication has to take place. By now, it DOESN'T MAKES SENSE to me any other operation desired from ME being executed by SC
3) In the intr handler simply access a shared structure and WAKE corresponding threads running on SC
4) In SC code, simply delegate your work by signalling in a shared memory structure that something has to be done (function pointer, arguments, ecc...), continue your SC-side elaboration and in the moment you want to resync (=wait for results) SUSPEND THREAD.

This means that: once an SC thread is slept down waiting for elaboration to be done, it doesn't pointlessly cycle waiting for a lock to be removed, and simply hands out CPU time to other threads not in need for said elaboration results. Hence there is no CPU power wasting (but on ME...anyway this is another story)

All this is based upon the assumption that we can somehow wake an arbitrary thread in an intr handler, and that the thread manager never choose to rearrange thread states.

What do you think?? - jean

- EDIT -
Ok, dudes, do as if I never spoke....i wrote the post yesterday but i'm posting only now...in the meanwile i was reading for the first time the sources from your SVN and you're already doing exactly what i thought...what a shame...

Great work...can I experiment it (giving credit) in a project of mine?
Anyway, the right name should be critical seCTion...
crazyc
Posts: 408
Joined: Fri Jun 17, 2005 10:13 am

Post by crazyc »

Here's some code you might want to adapt for your library.

Code: Select all

// I hope that any pending interrupt won't arrive until the instruction after the mtic
// and ic_status had better be 1
#define atomic_mtic_halt&#40;ic_status&#41;	__asm__&#40;".set push\n"		\
					        ".set noreorder\n"	\
					        "mtic %0, $0\n"		\
					        ".word 0x70000000\n"	\
					        ".set pop\n"		\
					&#58;&#58;"r"&#40;ic_status&#41;&#41;

#define atomic_check_queue_halt&#40;cpu&#41;  &#123; int ic_status = pspSdkDisableInterrupts&#40;&#41;;	\
					if&#40;head_ptr_uc&#40;cpu&#41; == tail_ptr_uc&#40;cpu&#41;&#41;	\
						atomic_mtic_halt&#40;ic_status&#41;;		\
					else pspSdkEnableInterrupts&#40;ic_status&#41;; &#125;

Code: Select all

	.ent me_exc

me_exc&#58;
	ctc0	$v0, $0
	mfc0	$k0, CAUSE
	srl	$k0, $k0, 2
	andi	$k0, $k0, 0xf
	bne	$k0, $0, oops
	nop

	lui	$k0, 0xBC30
	lw	$k1, 0&#40;$k0&#41;
	beq	$k1, $0, leave
	sw	$k1, 0&#40;$k0&#41;
	
	mfc0	$v0, EPC
	lw	$k1, 0&#40;$v0&#41;		//skip forward 4 bytes if *EPC==HALT
	li	$k0, 0x70000000
	bne	$k0, $k1, leave
	addiu	$v0, 4
	mtc0	$v0, EPC
leave&#58;
	mfc0	$k0, STATUS
	srl	$k1, $k0, 8
	ins	$k0, $k1, 8, 8
	mtc0	$k0, STATUS
	cfc0	$v0, $0
	eret
	nop

oops&#58; // unhandled exception here

It enables the ME to be halted and woken up with an interrupt.
User avatar
jean
Posts: 489
Joined: Sat Jan 05, 2008 2:44 am

Post by jean »

wow...so we can do something like

Code: Select all


static inline int me_call_async_&#40;
    me_rpc_t *rpc,
    void *func,
    int arg0,
    int arg1,
    int arg2,
    int arg3,
    u32 precache_length,
    void *precache_address,
    u32 postcache_length,
    void *postcache_address
&#41;
&#123;
    if &#40;!processesRunning&#41; wakeUpME&#40;&#41;;
    processesRunning++;

    
    // etc etc...

    return 0;
&#125;


int me_wait_result&#40;me_rpc_t *rpc&#41;
&#123;
    rpc->thid = sceKernelGetThreadId&#40;&#41;;

    do
    &#123;
        sceKernelSleepThreadCB&#40;&#41;;
    &#125;
    while &#40;!&#40;&#40;volatile me_rpc_t *&#41;rpc&#41;->done&#41;;

    processesRunning--;
    if &#40;!processesRunning&#41; putMEToSleep&#40;&#41;;

    return 0;
&#125;
J.F.
Posts: 2906
Joined: Sun Feb 22, 2004 11:41 am

Post by J.F. »

@crazyc - nice code! Just what the ME needs... the ability to be woken when there's something to do. Busy looping is fine for debugging, but given that the PSP is a handheld, we eventually need to take power usage into consideration.
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

@crazyc

Sony developers seems to insert 7 nops just after a mtic. I wonder if there is a good reason to do so... just for your information, Allegrex has a 7-stage pipeline.

There must be two instructions between an mtc0 which changes a register read by eret and an eret, not just one. Those registers are EPC, ErrorEPC and Status.

Your last code is really interesting but i wonder if it is okay as we don't know if mtic has to have 7 nops afterwards. Unless you already saw it used on Sony code, of course :).

I thought about another possibility using a syscall, a break or a software interrupt indirectly to execute a halt : in the handler, no interrupt can occur, so we just enable interrupts in cop0 status et set cop0 epc to an address to a halt instruction. When running eret, halt would be executed unless a ME interrupt needs to be handled here - this one still should look for "halt" to skip it when exiting with disabling interrupts so ME can proceed the new requests.

@all
A thought : Allegrex apparently uses a write buffer for cached and uncached memory, so if you want to be sure the other processor can read the right values, you may also need to issue a "sync" after any change of shared values.

@jean
This project being open source, i don't see any problem if you want to experiment it. But be aware this project is still a work in progress, totally experimental and still incomplete and not even in alpha stage : i didn't even test it on any psp up to now, so there may be a lot of flaws.

by the way, you're right : i'll rename critical session to critical section.
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

@jean

i don't think "putMEToSleep();" may be necessary : ME request loop may just need to sleep when no request has to be handled and no results has to be sent to SC without the need for SC processor to order ME processor to sleep. The only thing is if ME is sleeping, it must be able to receive any ME interrupt signaling by SC processor. When ME is awake, ME interrupt are masked until the next sleep so that way ME processor could send results to SC processor through signaling a ME interrupt to SC processor without being forced to run its own ME interrupt handler because SONY messes up with it (setting 0xBC000048 to 1 would issue a ME interrupt in BOTH cpus) .
crazyc
Posts: 408
Joined: Fri Jun 17, 2005 10:13 am

Post by crazyc »

hlide wrote:Your last code is really interesting but i wonder if it is okay as we don't know if mtic has to have 7 nops afterwards. Unless you already saw it used on Sony code, of course :).
I'm comfortable with this, the only issue would be if interrupts never are reenabled and from testing I know they are.
Sony developers seems to insert 7 nops just after a mtic. I wonder if there is a good reason to do so... just for your information, Allegrex has a 7-stage pipeline.
Any instructions left in the pipeline had better be completed before the CPU halts it would be ugly.
User avatar
uberjack
Posts: 34
Joined: Tue Jul 17, 2007 9:09 am
Location: California, USA
Contact:

Post by uberjack »

I'm definitely interested. The only thing preventing me from using ME (no pun intended :) ) in any of my projects is the lack of documentation and potential for significant code rewrite.
I'd be more than happy to help, if necessary[/quote]
Post Reply