uClinux on the PSP
-
- Posts: 80
- Joined: Wed Feb 22, 2006 4:43 am
uClinux on the PSP
Hey, I ported Xiptech's mipsnommu version of uClinux-2.4.19 to the PSP. The only hardware support it has currently is the headphone jack's serial port (used by the console and tty implementation), but its a start!
Check it out here: http://df38.dot5hosting.com/~remember/chris/
It mounts a ramdisk as the root filesystem, the root disk image is linked in with the kernel. The disk image has a minimal userland including sh, ls, mkdir, echo, cat, basic stuff like that, built with uClibC. All the executables are statically-linked.
The only way to use it is with some sort of serial port hardware like discussed here: http://forums.ps2dev.org/viewtopic.php?t=5234
Hopefully more people will help now and we can really turn this into something.
Check it out here: http://df38.dot5hosting.com/~remember/chris/
It mounts a ramdisk as the root filesystem, the root disk image is linked in with the kernel. The disk image has a minimal userland including sh, ls, mkdir, echo, cat, basic stuff like that, built with uClibC. All the executables are statically-linked.
The only way to use it is with some sort of serial port hardware like discussed here: http://forums.ps2dev.org/viewtopic.php?t=5234
Hopefully more people will help now and we can really turn this into something.
Good work to start.
however I have a global look on the source you provided and there is one thing that puzzles me, there is no specific psp architecture directory. I think you should consider adding a psp architecture directory to avoid pollute the generic one.
Just one example, cache operations : cacheops.h defines a lot of operation codes which don't match those of psp cache instruction. For instance, code 0x08 is 'hit invalidate icache' on psp, not 'index store tag icache'.
Due to the fact the hardware part (speaking about hardware registers) of psp is quite unknown, it would be a big hassle to implement.
that said, alternatively we can consider to run ucLinux on ME processor too (having SC processor to provide to ME processor some devices functionalities to start).
well, I cheer you up.
however I have a global look on the source you provided and there is one thing that puzzles me, there is no specific psp architecture directory. I think you should consider adding a psp architecture directory to avoid pollute the generic one.
Just one example, cache operations : cacheops.h defines a lot of operation codes which don't match those of psp cache instruction. For instance, code 0x08 is 'hit invalidate icache' on psp, not 'index store tag icache'.
Due to the fact the hardware part (speaking about hardware registers) of psp is quite unknown, it would be a big hassle to implement.
that said, alternatively we can consider to run ucLinux on ME processor too (having SC processor to provide to ME processor some devices functionalities to start).
well, I cheer you up.
-
- Posts: 80
- Joined: Wed Feb 22, 2006 4:43 am
dude this is just to get the ball rolling. When you start a new port of the kernel, you typically hack up the most similar existing port to get started.
But you are right, ultimately my hacked up mipsnommu/simulator/ will be moved into mipsnommu/psp/ Of course if you've ever done anything with the kernel source, you'll know that the architecture specific code does not stay entirely in arch/x/y/. So there is lots of code plunked in various places that needs to be wrapped in an #ifdef CONFIG_PSP.
But that is a mundane detail that we'll get around to eventually. It'll be easy to use a visual diff tool to see where the modifications were made, and where they may be better placed. Keep in mind this was very much a learning experience for me, so all kinds of bad design choices were made in the beginning. For example, I opted out of learning how to use the PSP cache by just linking the kernel to run in 0xAxxx,xxxx range, the uncached memory segment.
Thanks!
But you are right, ultimately my hacked up mipsnommu/simulator/ will be moved into mipsnommu/psp/ Of course if you've ever done anything with the kernel source, you'll know that the architecture specific code does not stay entirely in arch/x/y/. So there is lots of code plunked in various places that needs to be wrapped in an #ifdef CONFIG_PSP.
But that is a mundane detail that we'll get around to eventually. It'll be easy to use a visual diff tool to see where the modifications were made, and where they may be better placed. Keep in mind this was very much a learning experience for me, so all kinds of bad design choices were made in the beginning. For example, I opted out of learning how to use the PSP cache by just linking the kernel to run in 0xAxxx,xxxx range, the uncached memory segment.
Thanks!
-
- Posts: 107
- Joined: Sat Jan 13, 2007 11:50 am
It looks impressive.
I'd like to know what you would use Linux on a psp for ?
I like the fact of "developping stuff just for the sake of it", but porting Linux is a huge task, so I guess you have an idea behind that.
Like, completeley replacing the XMB or whatever...
I don't know the psp enough right now to understand what would be the advantage on using a different OS (Linux) rather than using the standard firmware capbilities to develop, say, homebrew...
I'm not very clear, that's because I think I'm trying to compare two things that cannot be compared, but imagine I want to develop a mp3 player. What would be the advantage of developping this mp3 player for linuxforpsp rather than developping it as a "standard" homebrew ?
Or mayber that's not why you're doing this at all ?
I'd like to know what you would use Linux on a psp for ?
I like the fact of "developping stuff just for the sake of it", but porting Linux is a huge task, so I guess you have an idea behind that.
Like, completeley replacing the XMB or whatever...
I don't know the psp enough right now to understand what would be the advantage on using a different OS (Linux) rather than using the standard firmware capbilities to develop, say, homebrew...
I'm not very clear, that's because I think I'm trying to compare two things that cannot be compared, but imagine I want to develop a mp3 player. What would be the advantage of developping this mp3 player for linuxforpsp rather than developping it as a "standard" homebrew ?
Or mayber that's not why you're doing this at all ?
-
- Posts: 80
- Joined: Wed Feb 22, 2006 4:43 am
Well, mainly it was to learn about operating system kernels.
But there are other uses for bringing a standard OS platform to the PSP. It could perhaps enhance portability, meaning, it may be easier to port existing Linux applications to a PSP linux/libc runtime environment, rather than the XMB/psp-sdk environment.
Then again, maybe not!
All in all it just gives a few more developing options that may or may not be pursued. I imagine developing an interesting keystroke entry mechanism [similar to the T9Word feature on my cell phone] for it, and running Gaim and Firefox.
But there are other uses for bringing a standard OS platform to the PSP. It could perhaps enhance portability, meaning, it may be easier to port existing Linux applications to a PSP linux/libc runtime environment, rather than the XMB/psp-sdk environment.
Then again, maybe not!
All in all it just gives a few more developing options that may or may not be pursued. I imagine developing an interesting keystroke entry mechanism [similar to the T9Word feature on my cell phone] for it, and running Gaim and Firefox.
maybe eventually you could have the usb on top have a usb driver, then it would be extremely easy to attach keyboards, mice and what not (do a simple rewiring to a hub or what not) it might be impossible
00100000 01101001 01101101 00100000 01110010 01101001 01100111 01101000 01110100 00100000 01100010 01100101 01101000 01101001 01101110 01100100 00100000 01111001 01101111 01110101 00100001
i'm not sure about this, you may need to setup the LCD controller so it can display VRAM.FreePlay wrote:Question:
With the system as you currently have it, is it possible to have primitive graphics support through direct VRAM manipulation, or is the VRAM no longer mapped under Linux (or some other problem)?
By the way, nice work :)
-
- Posts: 80
- Joined: Wed Feb 22, 2006 4:43 am
framebuffer, cache, etc
I seem to be able to write pixels to the screen from linux just by writing values to VRAM [0xA400,0000 range] This is probably because the screen is already initialized when the bootloader program is loaded by the PSP. Even if we don't know how to change the video mode, we could make a framebuffer device that just supports that particular mode (or, in the bootloader, while the PSP OS is still operating, use SDK functions to setup the screen however we want it.)
I dont really understand how the "framebuffer" abstraction works in terms of eliminating flickering and stuff though, as I recall in old PC programs I'd use "page flipping" to draw to some non-visible portion of VRAM and then "page flip" the video hardware to point that portion of ram once drawing was complete. Anyone who has any info on this subject should post it here. :)
Another thing: Hlide, you seem to know a little bit about the PSP cache. My first goal here in the new year is to get this kernel using the cache. Is there any online resource that points out the differences between the PSP cache and a standard Mips r4000 cache? Where did you get your information? Thanks.
I dont really understand how the "framebuffer" abstraction works in terms of eliminating flickering and stuff though, as I recall in old PC programs I'd use "page flipping" to draw to some non-visible portion of VRAM and then "page flip" the video hardware to point that portion of ram once drawing was complete. Anyone who has any info on this subject should post it here. :)
Another thing: Hlide, you seem to know a little bit about the PSP cache. My first goal here in the new year is to get this kernel using the cache. Is there any online resource that points out the differences between the PSP cache and a standard Mips r4000 cache? Where did you get your information? Thanks.
Last edited by chrismulhearn on Tue Jan 16, 2007 3:41 am, edited 1 time in total.
dont worry about the page flipping stuff, if i recall correctly from gc-linux the linux framebuffer drivers dont directly deal with it at all (i remember trying to put a efb->xfb copy into the pageflipping routine, which i couldnt find at all =P). that said, since the psp has a common pixel color format, porting one of the existing drivers to psp should be rather trivial (except for the setup that is, which isnt really needed for a start like you said - you could always add it later)
Re: framebuffer, cache, etc
i will provide you with a cacheops.h updated for allegrex.chrismulhearn wrote:Another thing: Hlide, you seem to know a little bit about the PSP cache. My first goal here in the new millenium is to get this kernel using the cache. Is there any online resource that points out the differences between the PSP cache and a standard Mips r4000 cache? Where did you get your information? Thanks.
there is one point which bothers me : in ME code (executed by the second processor as known as media engine processor) i can see 0x1 and 0x11 as cache oprerations (i dunno if they are really Index_Writeback_Inv_D and Hit_Invalidate_I like in standard Mips r4000 cache). So what to think about it ???? this code appears when you boot ME processor (at adress 0xBFC00040). As far as I know ME processor is an allegrex cpu (because it doesn't complain about m(t/f)ic for instance) without vfpu so i would expect it to have the same cache operations as SC processor.
Re: framebuffer, cache, etc
I'm fairly sure 0x01 is "icache index store tag" and 0x11 is "dcache index store tag" and the code in the ME init is just clearing the cache tags.hlide wrote:i will provide you with a cacheops.h updated for allegrex.chrismulhearn wrote:Another thing: Hlide, you seem to know a little bit about the PSP cache. My first goal here in the new millenium is to get this kernel using the cache. Is there any online resource that points out the differences between the PSP cache and a standard Mips r4000 cache? Where did you get your information? Thanks.
there is one point which bothers me : in ME code (executed by the second processor as known as media engine processor) i can see 0x1 and 0x11 as cache oprerations (i dunno if they are really Index_Writeback_Inv_D and Hit_Invalidate_I like in standard Mips r4000 cache). So what to think about it ???? this code appears when you boot ME processor (at adress 0xBFC00040). As far as I know ME processor is an allegrex cpu (because it doesn't complain about m(t/f)ic for instance) without vfpu so i would expect it to have the same cache operations as SC processor.
Here's the notes I took back when I first got code running on the ME. I'm not %100 sure but it's probably accurate and the ops are most likely the same on the SC.
Code: Select all
#define IXILT 0x00 /* Icache index load tag */
#define IXIST 0x01 /* Icache index store tag */
#define IXIINV 0x03 /* Icache index invalidate */
#define IXHINV 0x08 /* Icache hit invalidate */
#define DXILT 0x10 /* Dcache index load tag */
#define DXIST 0x11 /* Dcache index store tag */
#define DXIINV 0x13 /* Dcache index invalidate */
#define DXIWBINV 0x14 /* Dcache index writeback invalidate */
#define DXHINV 0x19 /* Dcache hit invalidate */
#define DXHWB 0x1a /* Dcache hit writeback */
#define DXHWBINV 0x1b /* Dcache hit writeback invalidate */
#define TAG_SIZE 128
#define LINE_SIZE 64
GOAL : invalidate, writeback, create exclusive dirty, fill ?
NOTE : only 0x18 to 0x1F are tested
First test on ME processor :
- t0 = cycles spent on this cache operation
- t1 = cycles spent on a load instruction
- t2 = cycles spent on a store instruction
results :
note : because a dcache writeback and invalidate is done just before, we cannot determine if the following one is invalidate/writeback
- t0 = cycles spent on this cache operation
- t1 = cycles spent on a load instruction
- t2 = cycles spent on a store instruction
results :
We still need to determine the difference between 18, 1C and 1D and also between 1E and 1F.
for 1F i know now for sure it is something like FILL AND LOCK because I tested it some months ago and I was able check that we can use it to have a very small but fast memory which never writes back in main memory unless an explicit writeback operation is done. The cache line is locked until an explicit invalidate or unlock operation is done.
I suspect there is also a CREATE DIRTY EXCLUSIVE AND LOCK since it is an allocation for a cache line as FILL is.
in http://www.freepatentsonline.com/20010052053.html you can read some details about LOCK mechanism :
NOTE : only 0x18 to 0x1F are tested
First test on ME processor :
- t0 = cycles spent on this cache operation
- t1 = cycles spent on a load instruction
- t2 = cycles spent on a store instruction
Code: Select all
.p2align 6
me_cache_line:
.long +0, -1, -2, -3
.long -1, -2, -3, -0
.long -2, -3, -0, -1
.long -3, -0, -1, -2
.macro test_cache i
li t1, 0x80000000
mtc0 t1, $11
li t0, 0
mtc0 t0, $9
sw t1, 0(at) # store the value
cache 0x1B, 0(at) # dcache hit writeback and invalidate
.p2align 6
mfc0 t0, $9 # time elapsed at t0
cache \i, 0(at) # operation code to test
mfc0 t1, $9 # cycles elapsed at t1
lw v0, 0(at) # read the value
mfc0 t2, $9 # cycles elapsed at t2
nor v0, zr, v0 # invert bits of the value
mfc0 t3, $9 # cycles elapsed at t3
sw v0, 0(at) # write the negated value back
mfc0 t4, $9 # cycles elapsed at t4
cache 0x1B, 0(at) # dcache hit writeback and invalidate
lw v0, 0(at)
subu t0, t1, t0
subu t1, t2, t1
subu t2, t4, t3
sw v0, 0x00(a0) # store the value
sw t0, 0x04(a0) # store time elapsed
sw t1, 0x08(a0) # store time elapsed
sw t2, 0x0C(a0) # store time elapsed
.endm
.global me_test_cache
me_test_cache:
jal me_enter_critical_session
nop
li v1, 0xa0000000
la at, me_cache_line
or a0, a0, v1
test_cache 0x18
addiu a0, a0, 16
test_cache 0x19
addiu a0, a0, 16
test_cache 0x1A
addiu a0, a0, 16
test_cache 0x1B
addiu a0, a0, 16
test_cache 0x1C
addiu a0, a0, 16
test_cache 0x1D
addiu a0, a0, 16
test_cache 0x1E
addiu a0, a0, 16
test_cache 0x1F
addiu a0, a0, 16
jal me_leave_critical_session
nop
0: b 0b
nop
- # 18 -> v0 = 7FFFFFFF, t0 = 4 cycles, t1 = 2 cycles, t2 = 3 cycles
# 19 -> v0 = 7FFFFFFF, t0 = 3 cycles, t1 = 41 cycles, t2 = 3 cycles
# 1A -> v0 = 7FFFFFFF, t0 = 3 cycles, t1 = 41 cycles, t2 = 3 cycles
# 1B -> v0 = 7FFFFFFF, t0 = 3 cycles, t1 = 41 cycles, t2 = 3 cycles
# 1C -> v0 = 7FFFFFFF, t0 = 4 cycles, t1 = 2 cycles, t2 = 3 cycles
# 1D -> v0 = 7FFFFFFF, t0 = 4 cycles, t1 = 2 cycles, t2 = 3 cycles
# 1E -> v0 = 7FFFFFFF, t0 = 70 cycles, t1 = 2 cycles, t2 = 3 cycles
# 1F -> v0 = 7FFFFFFF, t0 = 70 cycles, t1 = 2 cycles, t2 = 3 cycles
note : because a dcache writeback and invalidate is done just before, we cannot determine if the following one is invalidate/writeback
- - t0 == 3 cycles ==> no state changing
- t0 == 4 cycles ==> change state : CREATE DIRTY EXCLUSIVE-like operation
- t0 == 70 cycles ==> change state, fetching data in main memory : FILL-like operation
- t1 == 2 cycles ==> no data fetching in main memory
- t1 == 41 cycles ==> data fetching in main memory
- - 18, 1C and 1D must be like CREATE DIRTY EXCLUSIVE operations.
- 1E and 1F must be like FILL operations
- t0 = cycles spent on this cache operation
- t1 = cycles spent on a load instruction
- t2 = cycles spent on a store instruction
Code: Select all
...
.macro test_cache i
li t1, 0x80000000
mtc0 t1, $11
li t0, 0
mtc0 t0, $9
sw t1, 0(at) # store the value
# REMOVE THIS CACHE INSN SO WE CAN DETERMINE THE NEXT ONE :
# cache 0x1B, 0(at) # dcache hit writeback and invalidate
.p2align 6
mfc0 t0, $9 # time elapsed at t0
cache \i, 0(at) # operation code to test
...
- # 18 -> v0 = 7FFFFFFF, t0 = 4 cycles, t1 = 2 cycles, t2 = 3 cycles
# 19 -> v0 = 80000000, t0 = 3 cycles, t1 = 41 cycles, t2 = 3 cycles
# 1A -> v0 = 7FFFFFFF, t0 = 7 cycles, t1 = 2 cycles, t2 = 3 cycles
# 1B -> v0 = 7FFFFFFF, t0 = 7 cycles, t1 = 61 cycles, t2 = 3 cycles
# 1C -> v0 = 7FFFFFFF, t0 = 4 cycles, t1 = 2 cycles, t2 = 3 cycles
# 1D -> v0 = 7FFFFFFF, t0 = 4 cycles, t1 = 2 cycles, t2 = 3 cycles
# 1E -> v0 = 7FFFFFFF, t0 = 4 cycles, t1 = 2 cycles, t2 = 3 cycles
# 1F -> v0 = 7FFFFFFF, t0 = 4 cycles, t1 = 2 cycles, t2 = 3 cycles
- v0 == 80000000 ==> the "sw" of 7FFFFFFF from previous test_cache was not written back in main memory : HIT INVALIDATE operation
t0 == 3 cycles ==> no state changing
t0 == 4 cycles ==> change state : INVALIDATE/CREATE DIRTY EXCUSIVE/FILL-like operation
t0 == 7 cycles ==> change state : WRITEBACK-like operation
t1 == 2 cycles ==> no data fetching in memory
t1 == 41 cycles ==> data fetching in main memory
t1 == 61 cycles ==> old data saving (?) + new data fetching in main memory
- - 18, 1C and 1D are like a CREATE DIRTY EXCLUSIVE operation
- 19 is a HIT INVALIDATE operation
- 1A is a HIT WRITEBACK operation
- 1B is a HIT WRITEBACK AND INVALIDATE operation
- 1E and 1F are a FILL operation because of the first test
We still need to determine the difference between 18, 1C and 1D and also between 1E and 1F.
for 1F i know now for sure it is something like FILL AND LOCK because I tested it some months ago and I was able check that we can use it to have a very small but fast memory which never writes back in main memory unless an explicit writeback operation is done. The cache line is locked until an explicit invalidate or unlock operation is done.
I suspect there is also a CREATE DIRTY EXCLUSIVE AND LOCK since it is an allocation for a cache line as FILL is.
in http://www.freepatentsonline.com/20010052053.html you can read some details about LOCK mechanism :
I read more details about LOCK mechanism in a MIPS architecture else allegrex but I cannot retrieve this pdf because I don't remember how to do since I found out this document by accident : this document says that a locked cache line can be unlocked by an invalidate or unlock operation. But what is this or those unlock operations ?[0187] Cache
3 The CACHE instruction implements the following five operations: 0: Index Invalidate - Instruction Cache 1: Index Write-back Invalidate - Data Cache 5: Index Write-back Invalidate - Data Cache 9: Index Write-back - Data Cache 16: Hit Invalidate - Instruction Cache 17: Hit Invalidate - Data Cache 21: Hit Write-back Invalidate - Data Cache 25: Hit Write-back - Data Cache 28: Fill Lock - Instruction Cache 29: Fill Lock - Data Cache
[0188] The Fill Lock instructions are used to lock the instruction and data caches on a line by line basis. Each line can be locked by utilizing these instructions. The instruction and data caches are four way set associative, but software should guarantee that a maximum of three of the four lines in each set are locked. If all four lines become locked, then one of the lines will be automatically unlocked by hardware the first time a replacement is needed in that set.
Last edited by hlide on Tue Jan 16, 2007 4:26 am, edited 1 time in total.
-
- Posts: 80
- Joined: Wed Feb 22, 2006 4:43 am
cache
Wow it seems i have a lot to learn about the MIPS cache. Everything I've ever worked with has had an architecturally-invisible cache.
What is the basic concept here? When you try to read a memory location that isn't in the primary cache, does some exception get raised? What are you supposed to do in that situation, perform a CACHE operation that dumps a cache line to memory and replaces that cache line with the memory location you initially tried to read?
MIPS manuals seem to have very lengthy chapters about the cache, but they focus on coherency (which in my case is not an issue since I am using only the Allegrex cpu, my environment is therefore uni-processor) and it is hard to tell which parts are accomplished automagically and which parts are up to the programmer.
If anyone knows enough about this to post a simple example of an exception handler that allows use of the Allegrex icache and dcache in a uni-processor (as in, ignoring the ME) environment, it would be a really big help!
What is the basic concept here? When you try to read a memory location that isn't in the primary cache, does some exception get raised? What are you supposed to do in that situation, perform a CACHE operation that dumps a cache line to memory and replaces that cache line with the memory location you initially tried to read?
MIPS manuals seem to have very lengthy chapters about the cache, but they focus on coherency (which in my case is not an issue since I am using only the Allegrex cpu, my environment is therefore uni-processor) and it is hard to tell which parts are accomplished automagically and which parts are up to the programmer.
If anyone knows enough about this to post a simple example of an exception handler that allows use of the Allegrex icache and dcache in a uni-processor (as in, ignoring the ME) environment, it would be a really big help!
basically you probably need just a portion :
- lock mechanism ? just forget about it, it seems Linux doesn't bother with it
- since allegrex has only one primary cache for each purpose (one for instruction and another for data), i don't think we can encounter any coherency problem so no bother here.
- the very fact that allegrex has no TLB, VCE (Virtual Error Coherency, an exception raised when two entries happen to have the same content because they have the same PA (Physical Address) but different VA (Virtual Address) and the cache operation is in conflict if i'm not wrong) should not happen.
Well, i would say there is no reason to think there is an exception to handle for cache coherency in fact so you should be relieved.
Oh sorry I forgot to answer your real question :
Normally this is transparent (that is, user code shouldn't handle it by using directly those cache instructions)
But still, when you do need to use them (indirectly by provided linux functions)
1) when you store some instructions through dcache in main memory and you need to run them though icache : DCACHE WRITEBACK INVALIDATE then ICACHE INVALIDATE at their addresses. Why ? because icache can some wrong instructions at those addresses so you need to invalidate them before running the right one.
2) for DMA or peripheral (hardware off cpu) operations which need to deal with main memory : DCACHE WRITEBACK INVALIDATE.
As you can see there is essentially 2 operations :
- DCACHE WRITEBACK AND INVALIDATE
- ICACHE INVALIDATE
Linux can use CREATE_DIRTY_EXCLUSIVE as a kind of prefetch instruction for data writing. There is a define to enable it.
And they should only be used in a driver code mostly.
- lock mechanism ? just forget about it, it seems Linux doesn't bother with it
- since allegrex has only one primary cache for each purpose (one for instruction and another for data), i don't think we can encounter any coherency problem so no bother here.
- the very fact that allegrex has no TLB, VCE (Virtual Error Coherency, an exception raised when two entries happen to have the same content because they have the same PA (Physical Address) but different VA (Virtual Address) and the cache operation is in conflict if i'm not wrong) should not happen.
Well, i would say there is no reason to think there is an exception to handle for cache coherency in fact so you should be relieved.
Oh sorry I forgot to answer your real question :
Normally this is transparent (that is, user code shouldn't handle it by using directly those cache instructions)
But still, when you do need to use them (indirectly by provided linux functions)
1) when you store some instructions through dcache in main memory and you need to run them though icache : DCACHE WRITEBACK INVALIDATE then ICACHE INVALIDATE at their addresses. Why ? because icache can some wrong instructions at those addresses so you need to invalidate them before running the right one.
2) for DMA or peripheral (hardware off cpu) operations which need to deal with main memory : DCACHE WRITEBACK INVALIDATE.
As you can see there is essentially 2 operations :
- DCACHE WRITEBACK AND INVALIDATE
- ICACHE INVALIDATE
Linux can use CREATE_DIRTY_EXCLUSIVE as a kind of prefetch instruction for data writing. There is a define to enable it.
And they should only be used in a driver code mostly.
to put bluntly, you probably just need to provide an allegrex-revised cacheops.h (with the right codes) and activate/desactivate some features (http://www.linux-mips.org/wiki/Cpu_features) probably
-
- Posts: 80
- Joined: Wed Feb 22, 2006 4:43 am
in my experience, its very difficult to diagnose problems if you don't have a sound conceptual understanding of whats supposed to be going on. I was under the impression that there was an exception handling aspect to this, but you seem to imply the only time you really need to use the cache instructions is when you are explicitly doing something that will put the cache in an awkward state [for example, writing instructions to memory, the icache may have already cached the instruction you are trying to overwrite, so without invalidating that cache, you won't see your new instruction.]
about VCE :
In my opinion VCE is irrelevant for Allegrex.
usually :
- VCEI : exception #14
- VCED : exception #31
but we have those :
and
my opinion is that exception #14 and #31 don't really exist (irrelevant for Allegrex because there is no TLB), and exception #31 is softwarely reused for unrecoverable error (whereas VCE is not an unrecoverable error).
if so, there is no exception to handle cache trouble.
In my opinion VCE is irrelevant for Allegrex.
usually :
- VCEI : exception #14
- VCED : exception #31
but we have those :
Code: Select all
EXC_31_ERROR_handler(/* v1 */) /* (exceptionman:0x06c8) */
{
COP0CTRL.7=v1; /* save v1 in cc0.7 (GPR.v1) */
COP0CTRL.20=COP0STAT.13; /* save (Cause) in cc0.20 */;
COP0CTRL.1=COP0STAT.30; /* save (ErrorEPC) in cc0.1 Error Exception Program Counter */
COP0CTRL.19=COP0STAT.12; /* save v1 (Status) in cc0.19 Status register */
exception_handler(31< <2); /* v0=0x007c default offset in table */
}
Code: Select all
void *ExceptionVectorTable[32] /* 8801ea00 (exceptionman) Exception Vector Table (32 Entries) */
{
/* 0 */ 88020F74 (interruptman:0x2274) /* IRQ (=default_irq_handler) */
/* 1 */ 8801D130 (hang)while(1);
/* 2 */ 8801D130 (hang)while(1);
/* 3 */ 8801D130 (hang)while(1);
/* 4 */ 8801D130 (hang)while(1);
/* 5 */ 8801D130 (hang)while(1);
/* 6 */ 8801D130 (hang)while(1);
/* 7 */ 8801D130 (hang)while(1);
/* 8 */ 88021E74 (interruptman:0x3174) /* syscall (=EXC_8_Syscall handler) */
/* 9 */ 8801D130 (hang)while(1);
/* 10 */ 8801D130 (hang)while(1);
/* 11 */ 8801D130 (hang)while(1);
/* 12 */ 8801D130 (hang)while(1);
/* 13 */ 8801D130 (hang)while(1);
/* 14 */ 8801D130 (hang)while(1);
/* 15 */ 8801D130 (hang)while(1);
/* 16 */ 8801D130 (hang)while(1);
/* 17 */ 8801D130 (hang)while(1);
/* 18 */ 8801D130 (hang)while(1);
/* 19 */ 8801D130 (hang)while(1);
/* 20 */ 8801D130 (hang)while(1);
/* 21 */ 8801D130 (hang)while(1);
/* 22 */ 8801D130 (hang)while(1);
/* 23 */ 8801D130 (hang)while(1);
/* 24 */ 8801D130 (hang)while(1); /* debug exception */
/* 25 */ 8801D130 (hang)while(1);
/* 26 */ 8801D130 (hang)while(1);
/* 27 */ 8801D130 (hang)while(1);
/* 28 */ 8801D130 (hang)while(1);
/* 29 */ 8801D130 (hang)while(1);
/* 30 */ 8801D130 (hang)while(1);
/* 31 */ 8801D370 (exceptionman:0x0c70) /* error, default (=default_error_handler) */
}
if so, there is no exception to handle cache trouble.
Last edited by hlide on Tue Jan 16, 2007 4:59 am, edited 1 time in total.
-
- Posts: 80
- Joined: Wed Feb 22, 2006 4:43 am
-
- Posts: 80
- Joined: Wed Feb 22, 2006 4:43 am
OK wow! I'm using the cache now (Linking to 0x8xxx,xxxx instead of 0xAxxx,xxxx) and it is amazingly faster, and it works!
But this is because at the moment, with no device drivers except the serial port tty [which explicitly uses uncached memory segment 0xAxxx,xxxx] I'm not doing anything where the cache could end up with incorrect values.
So now onto cacheops.h, etc.
It seems to be pretty messy and hard to understand in the lower levels of the cache code, and enough #ifdefs to make my head spin, so I'd rather just write my own cache flushing routines and map them to the _flush_cache_xxx stubs. In order to do this, I basically need to write:
flush_dcache_all() {
// data cache flush
// perform all pending writebacks and invalidate the whole cache.
}
and
flush_icache_all() {
// instruction cache flush
// perform all pending writebacks and invalidate the whole cache.
}
Now, it won't be as efficient as if i wrote the flush_page() and flush_range() functions, but it will work, and for now thats all I care about.
So..... how do I implement those? hahaha.
But this is because at the moment, with no device drivers except the serial port tty [which explicitly uses uncached memory segment 0xAxxx,xxxx] I'm not doing anything where the cache could end up with incorrect values.
So now onto cacheops.h, etc.
It seems to be pretty messy and hard to understand in the lower levels of the cache code, and enough #ifdefs to make my head spin, so I'd rather just write my own cache flushing routines and map them to the _flush_cache_xxx stubs. In order to do this, I basically need to write:
flush_dcache_all() {
// data cache flush
// perform all pending writebacks and invalidate the whole cache.
}
and
flush_icache_all() {
// instruction cache flush
// perform all pending writebacks and invalidate the whole cache.
}
Now, it won't be as efficient as if i wrote the flush_page() and flush_range() functions, but it will work, and for now thats all I care about.
So..... how do I implement those? hahaha.
-
- Posts: 80
- Joined: Wed Feb 22, 2006 4:43 am
have you done kernel work in the past? Thanks for all the help.. I've got a few more questions if you don't mind:
1. why does this work to invalidate the _whole_ instruction cache?
static inline void blast_icache(void)
{
unsigned long start = KSEG0;
unsigned long end = (start + icache_size);
while(start < end) {
cache_unroll(start,Index_Invalidate_I);
start += ic_lsize;
}
}
cache_unroll is defined as:
#define cache_unroll(base, op)
asm("cache %1, (%0)"
:
: "r" (base) // thats %0
"i" (op) // thats %1
);
Is it because the "base" in this context actually refers to a particular cache line, rather than a particular physical address? In that case, why does it start at KSEG0 (0x8000,0000) instead of starting at 0?
In the r4000 manual, the meaning of this "index" value is based on the cache line length. Do we know what the cache line length is on the PSP?
From the r4000 manual: "For a primary cache of 2^(CACHEBITS) bytes with 2^(LINEBITS) bytes per tag, vAddr (note- I guess vAddr refers to what i am calling "base" in this discussion) vAddr(bit CACHEBITS ... bit LINEBITS) specifies the block. "
Kind of odd notation here, but ok for a 64kbyte cache, CACHEBITS would equal 16 (that satisfies 2^CACHEBITS = 64k bytes) .
So the most significant bit that is used in the "vAddr" field woudl be bit 16. Which suggests that using "KSEG0" as the "start" in that function up there is really no different than using "0" as the "start", since that upper bit that distinguishes KSEG0 from 0x00000000 is ignored by this cache operating anyways, unless you had a giiiiiiiiiiiiiiiiiiiiiiiiiigantic cache. Isnt that strange?
Anyways, moving on, now the second half of this equation is knowing "LINEBITS" . I dont know how big our cache lines are. Of course, if we don't know, I could just count up by 1's, and I'd probably be hitting the same index over and over ( to be precise, I'd strike the same index 2^(LINEBITS-1) times if I counted my "base" by 1's)
Is any of this making sense to you? What are your thoughts?
1. why does this work to invalidate the _whole_ instruction cache?
static inline void blast_icache(void)
{
unsigned long start = KSEG0;
unsigned long end = (start + icache_size);
while(start < end) {
cache_unroll(start,Index_Invalidate_I);
start += ic_lsize;
}
}
cache_unroll is defined as:
#define cache_unroll(base, op)
asm("cache %1, (%0)"
:
: "r" (base) // thats %0
"i" (op) // thats %1
);
Is it because the "base" in this context actually refers to a particular cache line, rather than a particular physical address? In that case, why does it start at KSEG0 (0x8000,0000) instead of starting at 0?
In the r4000 manual, the meaning of this "index" value is based on the cache line length. Do we know what the cache line length is on the PSP?
From the r4000 manual: "For a primary cache of 2^(CACHEBITS) bytes with 2^(LINEBITS) bytes per tag, vAddr (note- I guess vAddr refers to what i am calling "base" in this discussion) vAddr(bit CACHEBITS ... bit LINEBITS) specifies the block. "
Kind of odd notation here, but ok for a 64kbyte cache, CACHEBITS would equal 16 (that satisfies 2^CACHEBITS = 64k bytes) .
So the most significant bit that is used in the "vAddr" field woudl be bit 16. Which suggests that using "KSEG0" as the "start" in that function up there is really no different than using "0" as the "start", since that upper bit that distinguishes KSEG0 from 0x00000000 is ignored by this cache operating anyways, unless you had a giiiiiiiiiiiiiiiiiiiiiiiiiigantic cache. Isnt that strange?
Anyways, moving on, now the second half of this equation is knowing "LINEBITS" . I dont know how big our cache lines are. Of course, if we don't know, I could just count up by 1's, and I'd probably be hitting the same index over and over ( to be precise, I'd strike the same index 2^(LINEBITS-1) times if I counted my "base" by 1's)
Is any of this making sense to you? What are your thoughts?
I was curious too, so a few minutes of looking turned up this:chrismulhearn wrote: So the most significant bit that is used in the "vAddr" field woudl be bit 16. Which suggests that using "KSEG0" as the "start" in that function up there is really no different than using "0" as the "start", since that upper bit that distinguishes KSEG0 from 0x00000000 is ignored by this cache operating anyways, unless you had a giiiiiiiiiiiiiiiiiiiiiiiiiigantic cache. Isnt that strange?
Seems CPUs with an MMU save a TLB lookup this way.I'm not sure what you mean by TLB translations required for hit cacheops.
If you mean the Index Writeback or Index Invalidate functions, note that
you can (and should) use a kseg0 address to do this. This bypasses
the TLB, while still giving you the index that you want. We simply
OR the kseg0 base address into the index that we've calculated and
use that as the argument to the CACHE instruction. There's actually
words to this effect in the MIPS32/MIPS64 spec, but it is, perhaps,
not clear enough.
Cache lines are 64 bytes.chrismulhearn wrote:Anyways, moving on, now the second half of this equation is knowing "LINEBITS" . I dont know how big our cache lines are. Of course, if we don't know, I could just count up by 1's, and I'd probably be hitting the same index over and over ( to be precise, I'd strike the same index 2^(LINEBITS-1) times if I counted my "base" by 1's)
Is any of this making sense to you? What are your thoughts?
-
- Posts: 80
- Joined: Wed Feb 22, 2006 4:43 am
psp cache size + instructions
awesome detective work crazyc.
How big are the instruction + data caches on the PSP, by the way? Do we know for sure? google seemed to turn up 32k i, 64k d
Also, how did you figure out that:
0x03 = icache indexed invalidate
0x14 = dcache indexed writeback+invalidate
Those are actually the only two instructions I'm using right now, in a flush_all_dcache() and flush_all_icache() functions that loopsthrough the entire cache. I'll let you know if it works.
thanks for the help everyone
How big are the instruction + data caches on the PSP, by the way? Do we know for sure? google seemed to turn up 32k i, 64k d
Also, how did you figure out that:
0x03 = icache indexed invalidate
0x14 = dcache indexed writeback+invalidate
Those are actually the only two instructions I'm using right now, in a flush_all_dcache() and flush_all_icache() functions that loopsthrough the entire cache. I'll let you know if it works.
thanks for the help everyone
-
- Posts: 80
- Joined: Wed Feb 22, 2006 4:43 am
cache flushing in uClinux
OK so I wrote code that assumed both caches were 64K. Now, I noticed that as soon as I started running kernel + programs in a cached memory segment (instead of 0xAxxx,xxxx where _everything_ was uncached) when I invoked programs from the shell, weird things would happen every now and then, programs would crash... the first time i ran them. But then the second time they'd work. Even though they were the exact same program being loaded into the exact same memory location over and over.
Naturally I thought "well its because I'm not flushing the caches, when I load the user program into memory, the kernel isn't flushing the icache, so who knows what could be lingering in there."
So I wrote my cache flushing functions, and noticed that the problem was still there. But I also noticed that when the linux kernel would load an executable (and by "load" i mean copy off of the ramdisk into a different spot it allocated for it) it would ONLY flush the instruction cache. But if we dont flush the data cache, then some program code could still be sitting in the data cache, and the instruction cache would come along and scoop old program code out of RAM...
so I thought, "maybe I'll just explicity flush the data cache first, any time anyone flushes the instruction cache." And that solved the problem.
Weird huh? Maybe the kernel is assuming theres some "coherency" between those two caches [because in this case, there definitely isn't.] ?
Naturally I thought "well its because I'm not flushing the caches, when I load the user program into memory, the kernel isn't flushing the icache, so who knows what could be lingering in there."
So I wrote my cache flushing functions, and noticed that the problem was still there. But I also noticed that when the linux kernel would load an executable (and by "load" i mean copy off of the ramdisk into a different spot it allocated for it) it would ONLY flush the instruction cache. But if we dont flush the data cache, then some program code could still be sitting in the data cache, and the instruction cache would come along and scoop old program code out of RAM...
so I thought, "maybe I'll just explicity flush the data cache first, any time anyone flushes the instruction cache." And that solved the problem.
Weird huh? Maybe the kernel is assuming theres some "coherency" between those two caches [because in this case, there definitely isn't.] ?
One thing I noticed (which probably is unlikely to matter) is you need to be careful of aliasing between ksegX addresses and usegX ones, by that I mean if you write to kseg or vice versa and do not flush the cache you cannot be certain that this will be reflected on the otherside.
i.e. _sw(0x12345678, 0x884000000) x = _lw(0x08400000); even though this is the same physical address x probably wont be set to the value you expect.
i.e. _sw(0x12345678, 0x884000000) x = _lw(0x08400000); even though this is the same physical address x probably wont be set to the value you expect.
index based cache operation : you don't give an address exactly because you want to flush this address but flush all addresses which have the same index in cache lines : index = (address & INDEX_MASK) >> INDEX_SHIFT.chrismulhearn wrote:have you done kernel work in the past? Thanks for all the help.. I've got a few more questions if you don't mind:
1. why does this work to invalidate the _whole_ instruction cache?
static inline void blast_icache(void)
{
unsigned long start = KSEG0;
unsigned long end = (start + icache_size);
while(start < end) {
cache_unroll(start,Index_Invalidate_I);
start += ic_lsize;
}
}
cache_unroll is defined as:
#define cache_unroll(base, op)
asm("cache %1, (%0)"
:
: "r" (base) // thats %0
"i" (op) // thats %1
);
Is it because the "base" in this context actually refers to a particular cache line, rather than a particular physical address? In that case, why does it start at KSEG0 (0x8000,0000) instead of starting at 0?
In the r4000 manual, the meaning of this "index" value is based on the cache line length. Do we know what the cache line length is on the PSP?
From the r4000 manual: "For a primary cache of 2^(CACHEBITS) bytes with 2^(LINEBITS) bytes per tag, vAddr (note- I guess vAddr refers to what i am calling "base" in this discussion) vAddr(bit CACHEBITS ... bit LINEBITS) specifies the block. "
Kind of odd notation here, but ok for a 64kbyte cache, CACHEBITS would equal 16 (that satisfies 2^CACHEBITS = 64k bytes) .
So the most significant bit that is used in the "vAddr" field woudl be bit 16. Which suggests that using "KSEG0" as the "start" in that function up there is really no different than using "0" as the "start", since that upper bit that distinguishes KSEG0 from 0x00000000 is ignored by this cache operating anyways, unless you had a giiiiiiiiiiiiiiiiiiiiiiiiiigantic cache. Isnt that strange?
Anyways, moving on, now the second half of this equation is knowing "LINEBITS" . I dont know how big our cache lines are. Of course, if we don't know, I could just count up by 1's, and I'd probably be hitting the same index over and over ( to be precise, I'd strike the same index 2^(LINEBITS-1) times if I counted my "base" by 1's)
Is any of this making sense to you? What are your thoughts?
address based cache operation (HIT) : you do give an exact address because you want to flush this address and not another address which have the same index in cache lines : index = (address & INDEX_MASK) >> INDEX_SHIFT.
So far as i know, psp have 16KB ICACHE and 16KB DCACHE:
since a cache line is 64 byte-long, you have a total of 256 blocks for a 2-way set, so a maximum of 128 indexes.
so for a global flush, you may only need to do 128 operation for each index with an address 0, 64, 128, ..., 16320.
the fact that it is KSEG0 or KUSEG shouldn't matter because not all the bits of an address should be taken in account, i'm pretty sure not more than 24 bits are taken in account for the index and the tag. What it is important here is the physical address not the virtual address. Whatever its segment (mapped, cached unmapped, uncached unmapped), the physical address is the same and i'm pretty sure the cache used by allegrex always handle a physical address.
crazyc wrote:Of course ! KUSEG is a MAPPED segment (that is under a TLB control) whereas KSEG0 is a CACHED UMAPPED segment (not under TLB control).chrismulhearn wrote: I was curious too, so a few minutes of looking turned up this:
Seems CPUs with an MMU save a TLB lookup this way.I'm not sure what you mean by TLB translations required for hit cacheops.
If you mean the Index Writeback or Index Invalidate functions, note that
you can (and should) use a kseg0 address to do this. This bypasses
the TLB, while still giving you the index that you want. We simply
OR the kseg0 base address into the index that we've calculated and
use that as the argument to the CACHE instruction. There's actually
words to this effect in the MIPS32/MIPS64 spec, but it is, perhaps,
not clear enough.
some MIPS having an TLB MMU saves partially according a virtual address if i'm not wrong.