Lock-free algorithms and atomic primitives

Discuss the development of new homebrew software, tools and libraries.

Moderators: cheriff, TyRaNiD

Post Reply
StrmnNrmn
Posts: 46
Joined: Wed Feb 14, 2007 11:32 pm
Location: London, UK
Contact:

Lock-free algorithms and atomic primitives

Post by StrmnNrmn »

I've been doing a lot of work recently trying to get Daedalus's audio processing moved over to the Media Engine.

I've had a fair amount of success so far, but I still have some tricky issues to deal with regarding synchronisation with the ME. Specifically, the inability to use mutexes and semaphores between the two cores makes a number of things difficult to implement.

To start with I've re-written a couple of simple datastructures I was using to be lock-free. With a little work I've rewritten the ringbuffer I was using for audio playback as a single-reader, single-writer FIFO. This now works great, and works without any kind of kernel-level synchronisation between the SC* and ME.

It got me thinking about whether anyone else had spent any time working on lock-free data structures for the PSP. Initially I'm just thinking about execution on multiple threads on the SC, but hopefully this would lead to code that could be easily adapted to run on the ME too.

There are a few lock-free libraries around, so my first question is whether or not anyone has taken a look at porting these to the PSP? I did spend some time searching and couldn't find anything relevant.

Many of the lock-free algorithms require an atomic compare-and-swap primitive, so my second question was whether anyone had already implemented this? Again, I couldn't find anything relevant when searching the forums. There doesn't seem to be anything along these lines in the sdk. The closest I could find was hlide's post on using SYNC to provide some synchronisation between the two cores, but I' expecting the CAS primitive to be implemented using LL/SC). There was also this post from hlide talking about the 'SC/ME mutex'. What is this?

Edit Hmmm, it seems that LL/SC will fail when used with uncached or noncoherent memory locations. I'd anticipated using an uncached pointer to avoid cache coherency issues between the ME/SC, so it seems that this rules out using LL/SC on the ME? Maybe I'm missing the point, and LL/SC are designed to work across multiple cores with separate caches?

-StrmnNrmn

*I've picked up this terminology as shorthand for 'the main CPU', but I have no ideas what it means :) Anyone?
crazyc
Posts: 408
Joined: Fri Jun 17, 2005 10:13 am

Re: Lock-free algorithms and atomic primitives

Post by crazyc »

StrmnNrmn wrote:There was also this post from hlide talking about the 'SC/ME mutex'.
At address 0xbc100048 on both cpus, there is a word mmio register that can only be modified by one cpu at a time. If it is set to 1 then the sc holds it and it can't be modified by the me and if it's set to 2 visa versa. If it's set to 0 then it is unheld. Of course it's a problem on the slim as it can only be changed in kernel mode and the kernel function sceSysregSemaTryLock and sceSysregSemaUnlock are not exported to user mode.
Hmmm, it seems that LL/SC will fail when used with uncached or noncoherent memory locations.
I had assumed that LL/SC were unimplemented in allegrex as sony never intended for the ME to be used for anything but completely asynchronous media decoding.
StrmnNrmn
Posts: 46
Joined: Wed Feb 14, 2007 11:32 pm
Location: London, UK
Contact:

Re: Lock-free algorithms and atomic primitives

Post by StrmnNrmn »

crazyc wrote:
StrmnNrmn wrote:There was also this post from hlide talking about the 'SC/ME mutex'.
At address 0xbc100048 on both cpus, there is a word mmio register that can only be modified by one cpu at a time. If it is set to 1 then the sc holds it and it can't be modified by the me and if it's set to 2 visa versa. If it's set to 0 then it is unheld. Of course it's a problem on the slim as it can only be changed in kernel mode and the kernel function sceSysregSemaTryLock and sceSysregSemaUnlock are not exported to user mode.
I see. That actually sounds pretty useful - it's a bit of a shame it's problematic on the Slim.
crazyc wrote:
Hmmm, it seems that LL/SC will fail when used with uncached or noncoherent memory locations.
I had assumed that LL/SC were unimplemented in allegrex as sony never intended for the ME to be used for anything but completely asynchronous media decoding.
I've not tried them yet, but I assumed they were essential for implementing any atomic operations in user mode. How are sceKernelSignalSema etc implemented in the kernel? Do they rely on the fact that interrupts are disabled whilst in kernel mode?

Edit Just tried this, and I can confirm LL/SC work (on the SC at least :):

Code: Select all

_AtomicIncrement: 
	ll		$t1, 0($a0)
	add		$t2, $t1, 1
	or		$v0, $t2, $0		# Copy for return value
	sc		$t2, 0($a0)
	beq		$t2, 0, _AtomicIncrement
	nop
	jr		$ra
	nop

Code: Select all

extern "C" { u32 _AtomicIncrement( u32 * ptr ); } 
...
u32 val = 0;
printf( "Incrementing %d\n", _AtomicIncrement( &val ) );
printf( "Incrementing %d\n", _AtomicIncrement( &val ) );
printf( "Incrementing %d\n", _AtomicIncrement( &val ) );
printf( "Incrementing %d\n", _AtomicIncrement( &val ) );
printf( "Incrementing %d\n", _AtomicIncrement( &val ) );
printf( "Incremented value is %d\n", val );
Generates:

Code: Select all

Incrementing 1
Incrementing 2
Incrementing 3
Incrementing 4
Incrementing 5
Incremented value is 5
Of course, I really need to check that this works as expected with contention from other threads. That and figure out what to do about the ME..
Last edited by StrmnNrmn on Sat Dec 15, 2007 5:27 am, edited 1 time in total.
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Re: Lock-free algorithms and atomic primitives

Post by hlide »

enjoy !

Code: Select all

int sceSysregSemaTryLock(void)
{
  _sync();
  v0 = _mfc0(22); // CPUId (0 -> SC / 1 -> ME)
  a2 = v0 + 1; // 1 -> SC / 2 -> ME
  _writew(a2, 0xbc100048); // try to acquire semaphore
  a3 = _readw(0xbc100048) & 3; // who owns semaphore ?
  return -((a2 ^ a3) != 0); // 0 if successful or -1 if not
}

void sceSysregSemaUnlock(void)
{
  _sync();
  _writew(0, 0xbc100048); // owner releases semaphore
  _sync();
}
i wonder if 0xbc00048 is only accessible in kernel mode... long time ago i didn't program on psp...
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

ll/sc pair works on a single processor (but Sony seems to be too lazy and prefers to use mfic/mtic pair). However i wouldn't expect them to work as a charm between two cores, so you still need the SC/ME mutex for this and 'sync' to flush the write buffer which Allegrex uses implicitely when storing data ( VFPU uses it explicitely through "vwb.q vr, ofs(rs)" ).
StrmnNrmn
Posts: 46
Joined: Wed Feb 14, 2007 11:32 pm
Location: London, UK
Contact:

Re: Lock-free algorithms and atomic primitives

Post by StrmnNrmn »

hlide wrote:enjoy !
...
Awesome -thanks.

I'm assuming from crazyc's comment that this is only callable from the SC whilst in kernel mode, so I'd need to implement it in a kernel mode prx? I guess it would make a very useful addition to the MediaEngine.prx I've been using (courtesy of J.F.).
hlide wrote:ll/sc pair works on a single processor (but Sony seems to be too lazy and prefers to use mfic/mtic pair). However i wouldn't expect them to work as a charm between two cores, so you still need the SC/ME mutex for this and 'sync' to flush the write buffer which Allegrex uses implicitely when storing data ( VFPU uses it explicitely through "vwb.q vr, ofs(rs)" ).
Agreed, I think it's going to be pretty messy. According to the MIPS R4k manual 'LL and SC instructions implicitly perform a SYNC', so this should be fine. However, the memory location must be cached for use with LL/SC, so I think the variable will need to live in a cache-aligned chunk of memory, and I'll have to insert some cache invalidation/writeback around the LL/SC call. Something like:

Code: Select all

struct Blah { int Val; char pad[60]; } 
__attribute__(aligned(64)) Blah gBlah;

while( sceSysrecSemaTryLock() );
dcache_inv_range( &gBlah, 64 );
_AtomicIncrement( &gBlah.Val );
dcache_wb_range( &gBlah, 64 );
sceSysrecSemaUnlock();
It's not looking all that pretty/efficient really :)
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Re: Lock-free algorithms and atomic primitives

Post by hlide »

StrmnNrmn wrote:
hlide wrote:enjoy !
...
Awesome -thanks.

I'm assuming from crazyc's comment that this is only callable from the SC whilst in kernel mode, so I'd need to implement it in a kernel mode prx? I guess it would make a very useful addition to the MediaEngine.prx I've been using (courtesy of J.F.).
hlide wrote:ll/sc pair works on a single processor (but Sony seems to be too lazy and prefers to use mfic/mtic pair). However i wouldn't expect them to work as a charm between two cores, so you still need the SC/ME mutex for this and 'sync' to flush the write buffer which Allegrex uses implicitely when storing data ( VFPU uses it explicitely through "vwb.q vr, ofs(rs)" ).
Agreed, I think it's going to be pretty messy. According to the MIPS R4k manual 'LL and SC instructions implicitly perform a SYNC', so this should be fine. However, the memory location must be cached for use with LL/SC, so I think the variable will need to live in a cache-aligned chunk of memory, and I'll have to insert some cache invalidation/writeback around the LL/SC call. Something like:

Code: Select all

struct Blah { int Val; char pad[60]; } 
__attribute__(aligned(64)) Blah gBlah;

while( sceSysrecSemaTryLock() );
dcache_inv_range( &gBlah, 64 );
_AtomicIncrement( &gBlah.Val );
dcache_wb_range( &gBlah, 64 );
sceSysrecSemaUnlock();
It's not looking all that pretty/efficient really :)
sysreg is using some ranges like 0xbfc006xx, you may as well allocate them to have a fast and atomic counter since they are shared between SC and ME.
crazyc
Posts: 408
Joined: Fri Jun 17, 2005 10:13 am

Re: Lock-free algorithms and atomic primitives

Post by crazyc »

StrmnNrmn wrote:it's a bit of a shame it's problematic on the Slim.
Well, it's not that problematic as you can write a kernel module to do it for you. I guess I'm just annoyed that my own code has to be changed a lot as I wrote it so the same function can be shared between both cpus as hide's above and that becomes rather difficult in later psp kernels where it's not easy to create kernel threads.
According to the MIPS R4k manual 'LL and SC instructions implicitly perform a SYNC', so this should be fine. However, the memory location must be cached for use with LL/SC, so I think the variable will need to live in a cache-aligned chunk of memory, and I'll have to insert some cache invalidation/writeback around the LL/SC call.
I must be missing something. Why deal with LL/SC rather then disabling interrupts then writing an uncached pointer? After all LL/SC don't gain you anything wrt SC/ME synchronization.
*I've picked up this terminology as shorthand for 'the main CPU', but I have no ideas what it means :) Anyone?
The only place i've seen sc used to refer to the main cpu is in the function sceSysregScResetEnable.
StrmnNrmn
Posts: 46
Joined: Wed Feb 14, 2007 11:32 pm
Location: London, UK
Contact:

Re: Lock-free algorithms and atomic primitives

Post by StrmnNrmn »

crazyc wrote:
According to the MIPS R4k manual 'LL and SC instructions implicitly perform a SYNC', so this should be fine. However, the memory location must be cached for use with LL/SC, so I think the variable will need to live in a cache-aligned chunk of memory, and I'll have to insert some cache invalidation/writeback around the LL/SC call.
I must be missing something. Why deal with LL/SC rather then disabling interrupts then writing an uncached pointer? After all LL/SC don't gain you anything wrt SC/ME synchronization.
Well I'm primarily thinking about lock-free primitives on just the SC for now. LL/SC work fine in user-mode and can easily be inlined, so I'm guessing they're cheaper than calling out to a prx to call mfic/mtic. Also, disabling interrupts just to do an atomic increment (e.g. on a refcounted pointer) seems a bit heavy duty to me!

For synchronisation between the ME/SC I think you're right - mtic/mfic through an uncached pointer is probably the way to go.

I realised over the weekend that very little of my code which needs synchronisation actually needs to run on the ME. I have a 'JobManager' class which arbitrates access to the ME. Each 'job' that I queue up for the ME has an optional initialise/finalise function which executes on the SC, e.g. the worker thread in the JobManager essentially does this:

Code: Select all

while( job = GetJob() )
{
  job->Init();
  CallME( ...., job->Main, ... );
  job->Fini();
}
I can set things up so that any synchronisation is done in Init()/Fini(), which means that I only need synchronisation between threads running on the SC. Typically in Fini() I'm just doing something like 'gSomeFlags |= JOB_DONE', but I've had a couple of bugs because '|=' isn't atomic.
crazyc wrote:
*I've picked up this terminology as shorthand for 'the main CPU', but I have no ideas what it means :) Anyone?
The only place i've seen sc used to refer to the main cpu is in the function sceSysregScResetEnable.
Hmmm, mysterious :)
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Re: Lock-free algorithms and atomic primitives

Post by hlide »

StrmnNrmn wrote:
crazyc wrote:
*I've picked up this terminology as shorthand for 'the main CPU', but I have no ideas what it means :) Anyone?
The only place i've seen sc used to refer to the main cpu is in the function sceSysregScResetEnable.
Hmmm, mysterious :)
SC => SystemControl
ME => MediaEngine
StrmnNrmn
Posts: 46
Joined: Wed Feb 14, 2007 11:32 pm
Location: London, UK
Contact:

Re: Lock-free algorithms and atomic primitives

Post by StrmnNrmn »

hlide wrote:SC => SystemControl
ME => MediaEngine
Aww, you've ruined the mystery now! :)
brin_vg
Posts: 17
Joined: Wed Oct 03, 2007 5:43 pm
Location: A pineapple, under the sea
Contact:

Re: Lock-free algorithms and atomic primitives

Post by brin_vg »

StrmnNrmn wrote:
hlide wrote:SC => SystemControl
ME => MediaEngine
Aww, you've ruined the mystery now! :)
A little part of me was hoping it'd be something incredibly stupid :D

ME => Media Engine
Yep, yep, we know that...

SC => Smelly Cheese
Wh-... What? Bad Sony!
J.F.
Posts: 2906
Joined: Sun Feb 22, 2004 11:41 am

Re: Lock-free algorithms and atomic primitives

Post by J.F. »

brin_vg wrote:
StrmnNrmn wrote:
hlide wrote:SC => SystemControl
ME => MediaEngine
Aww, you've ruined the mystery now! :)
A little part of me was hoping it'd be something incredibly stupid :D

ME => Media Engine
Yep, yep, we know that...

SC => Smelly Cheese
Wh-... What? Bad Sony!
Nah! It comes from early beta testers who would bang the malfunctioning systems on the table yelling "Stupid computer!"
:)
User avatar
Wally
Posts: 663
Joined: Mon Sep 26, 2005 11:25 am

Re: Lock-free algorithms and atomic primitives

Post by Wally »

J.F. wrote:
brin_vg wrote:
StrmnNrmn wrote: Aww, you've ruined the mystery now! :)
A little part of me was hoping it'd be something incredibly stupid :D
Ah well, mysteries are meant to be solved at some point. Hope all is working well :)
ME => Media Engine
Yep, yep, we know that...

SC => Smelly Cheese
Wh-... What? Bad Sony!
Nah! It comes from early beta testers who would bang the malfunctioning systems on the table yelling "Stupid computer!"
:)
It could also be Silent Corner :) or Super Clowns!

Ah well
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Re: Lock-free algorithms and atomic primitives

Post by hlide »

StrmnNrmn wrote:I realised over the weekend that very little of my code which needs synchronisation actually needs to run on the ME.
if so, a ll/sc pair should be enough as long as you don't need to access the atomic memory mutex or counter or what else on ME processor, right ?

I also really prefer this ll/sc pair to mfic/mtic pair to synchronize between threads.

But I have a question : I heard PSP firmware uses a cooperative-thread scheduler. How PSP firmware is running timer callback for instance ? in a special thread, in the interrupted thread ? or does it mean we should call a blocking function to have a chance to run this callback ?
crazyc
Posts: 408
Joined: Fri Jun 17, 2005 10:13 am

Re: Lock-free algorithms and atomic primitives

Post by crazyc »

StrmnNrmn wrote:Well I'm primarily thinking about lock-free primitives on just the SC for now. LL/SC work fine in user-mode and can easily be inlined, so I'm guessing they're cheaper than calling out to a prx to call mfic/mtic. Also, disabling interrupts just to do an atomic increment (e.g. on a refcounted pointer) seems a bit heavy duty to me!
Reading in and writing out a cache line seems heavy too though, especially since the code will have to block on the read.

Edit: I should check these things before replying. In fact reading and writing an uncached pointer takes the same number of clocks as invalidating, reloading and writing a cache line, ~65. I guess that isn't surprising. Disabling and enabling interrupts adds ~35 additional clocks. Also, it might be a good idea to figure out how to access the 0xbfc00xxx area from user mode as it only takes ~15 clocks to read and write uncached.
StrmnNrmn wrote:I realised over the weekend that very little of my code which needs synchronisation actually needs to run on the ME. I have a 'JobManager' class which arbitrates access to the ME. Each 'job' that I queue up for the ME has an optional initialise/finalise function which executes on the SC, e.g. the worker thread in the JobManager essentially does this:

Code: Select all

while( job = GetJob() )
{
  job->Init();
  CallME( ...., job->Main, ... );
  job->Fini();
}
I can set things up so that any synchronization is done in Init()/Fini(), which means that I only need synchronization between threads running on the SC. Typically in Fini() I'm just doing something like 'gSomeFlags |= JOB_DONE', but I've had a couple of bugs because '|=' isn't atomic.
I'm interested to see it when it's done. My code is somewhat different. I just have a queue that the ME moves through one by one doing each entry in order and if the ME has some finalization (or anything else that needs to be done i.e. cache synchronization) afterward it places an entry in another queue that the SC checks occasionally.
J.F.
Posts: 2906
Joined: Sun Feb 22, 2004 11:41 am

Post by J.F. »

A linked list of jobs would be more handy in some cases. A priority based queue would also be an idea. The Amiga used priority based linked lists for queues for a number of things, like interrupt handlers.

You could have a function to push a prioritized task onto the ME queue. Tasks of the same priority would naturally be FIFO.
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

this file linux/include/asm-mips/atomic.h provides several atomic functions using LL/SC. It may help you.

However I'm a bit confused on what follows :

a. R4000 implementation when you perform LL :

Code: Select all

VAddr = sign_extend(offset) + GPR[base];

(PAddr, uncached) = translate_address(VAddr, DATA);

GPR[rt] = load_memory(uncached, WORD, PAddr, VAddr, DATA);

LLbit = 1;
b. R4000 implementation, when you perform SC :

Code: Select all

VAddr = sign_extend(offset) + GPR[base];

(PAddr, uncached) = translate_address(VAddr, DATA);

if (LLbit) store_memory (uncached, WORD, data, PAddr, VAddr, DATA);

GPR[rt] = LLbit;
Something is really wrong... there is a LLaddr in CP0 register (the physical address used in the last executed LL) but it doesn't seem to be updated by LL if we refer to LL implementation. And SC should really use LLaddr to check against the physical address given by the current SC. Besides, it looks as if LLbit is never reset once set to 1.

Also note this is a weak implementation (from Wikipedia) :
Any exceptional events between the two operations, such as a context switch, another load-link, or even (on many platforms) another load or store operation, will cause the store-conditional to spuriously fail.
EDIT:

if those implementations are right and LLaddr is really updated, it would mean a normal load or store operation should always update this LLaddr and reset LLbit : and so "another load or store operation between the two operations will cause the store-conditional to spuriously fail". But it doesn't work with the R4000 implementations.

Damned, those implementations make no sense :(((.


For me, the right implementations should be :

a. when you perform LL :

Code: Select all

VAddr = sign_extend(offset) + GPR[base];

(PAddr, uncached) = translate_address(VAddr, DATA);

LLaddr = PAddr;

GPR[rt] = load_memory(uncached, WORD, PAddr, VAddr, DATA);

LLbit = 1;
b. the right implementation, when you perform SC :

Code: Select all

VAddr = sign_extend(offset) + GPR[base];

(PAddr, uncached) = translate_address(VAddr, DATA);

if &#40;LLaddr != PAddr&#41; LLbit = 0; <-- the reason why LLbit needs to be reset

if &#40;LLbit&#41; store_memory &#40;uncached, WORD, data, PAddr, VAddr, DATA&#41;;

GPR&#91;rt&#93; = LLbit;

LLbit = 0; <-- may be optional to be able to issue several SC at the same physical address
crazyc
Posts: 408
Joined: Fri Jun 17, 2005 10:13 am

Post by crazyc »

Something is really wrong... there is a LLaddr in CP0 register (the physical address used in the last executed LL) but it doesn't seem to be updated by LL if we refer to LL implementation.
In the MIPS32 databook it says:
The LLAddr register contains relevant bits of the physical address read by the most recent Load Linked instruction. This register is implementation dependent and for diagnostic purposes only and serves no function during normal operation.
Hmmm, it seems that LL/SC will fail when used with uncached or noncoherent memory locations.
After some testing it looks like the only time sc will fail is when there is an exception between ll and sc no matter whether the address is cached or not. That works nicely as an alternative to disabling interrupts.
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

crazyc wrote:
Something is really wrong... there is a LLaddr in CP0 register (the physical address used in the last executed LL) but it doesn't seem to be updated by LL if we refer to LL implementation.
In the MIPS32 databook it says:
The LLAddr register contains relevant bits of the physical address read by the most recent Load Linked instruction. This register is implementation dependent and for diagnostic purposes only and serves no function during normal operation.
ok, so it would mean LLAddr is not really used for checking purpose on SC execution.
Hmmm, it seems that LL/SC will fail when used with uncached or noncoherent memory locations.
After some testing it looks like the only time sc will fail is when there is an exception between ll and sc no matter whether the address is cached or not. That works nicely as an alternative to disabling interrupts.
If so, it means an exception always resets LLbit. This is necessary because if a context switch occurs and runs SC, it would be incoherent and LL/SC pairs would be unusable.

So, restrictions are :

- don't nest (LL1/LL2/SC2/SC1) or cross (LL1/LL2/SC1/SC2) in the same thread several LL/SC pairs because SC would be fooled by LLbit.

- don't call blocking functions (which can switch context) between LL and SC for the same reason.

- don't mix normal load and store operations with LL/SC at the same address, because it would disrupt the coherency :

Code: Select all

  // create a new top element
  ...
  element_t *old_element_ptr = *list_top;
  new_element_ptr->next = old_element_ptr; <-- wrong !!!

  ...
  // critical session &#58; update top_list
  do
  &#123;
    element_t *old_element_ptr = __asm_ll&#40;&list_top&#41;;
  &#125;
  while &#40;!__asm_sc&#40;new_element_ptr,  &list_top&#41;&#41;;
is wrong because old_pointer_ptr stored into new_element_ptr->next may be different and disrupt the chain list.

the right thing is :

Code: Select all

  // create a new top element
  ...

  // critical session &#58; update top_list
  do
  &#123;
    element_t *old_element_ptr = __asm_ll&#40;&list_top&#41;;

    // we assume a simple store at a different address wouldn't
    // disrupt LL/SC pair. Otherwise, we are in big trouble.
    new_element_ptr->next = old_element_ptr;
  &#125;
  while &#40;!__asm_sc&#40;new_element_ptr,  &list_top&#41;&#41;;
Last edited by hlide on Mon Dec 24, 2007 12:17 am, edited 1 time in total.
crazyc
Posts: 408
Joined: Fri Jun 17, 2005 10:13 am

Post by crazyc »

hlide wrote:- don't nest (LL1/LL2/SC2/SC1) or cross (LL1/LL2/SC1/SC2) in the same thread several LL/SC pairs because SC would be fooled by LLbit.
Both stores actually succeed here but if an exception is thrown between the first and second ll's, both stores still succeed. Almost like ll sets the ll_bit but sc doesn't clear it. Anyway, your advice stands.
hlide wrote:- don't call blocking functions (which can switch context) between LL and SC for the same reason.
Actually, any kernel function will trigger a syscall exception causing the store to fail.
hlide wrote:- don't mix normal load and store operations with LL/SC at the same address, because it would disrupt the coherency :
Definitely. The ll/sc circuitry does not detect changes to the memory at the load address.
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

crazyc wrote:Both stores actually succeed here but if an exception is thrown between the first and second ll's, both stores still succeed. Almost like ll sets the ll_bit but sc doesn't clear it.

Actually, any kernel function will trigger a syscall exception causing the store to fail.
Exactly, it is said from R4000 doc :
LLbit : Bit of state to specify synchronization instructions. Set by
_LL_, cleared by _ERET_ and _Invalidate_ and read by _SC_.
looking for "LLbit" in document shows me only LL, SC and ERET are reading/writing this bit (i don't know what they mean by _Invalidate_, cache operation ?).
Post Reply