Page 1 of 1

Some questions about tlb

Posted: Tue Dec 18, 2007 2:58 am
by ffgriever
I've got some questions about TLB on PS2 (r5900). Well, some questions plus something I'm unsure of, so I'll be glad if someone would correct me if I'm wrong.

Normally when data/instruction is to be fetched, there occurs DTLB/ITLB lookup in parallel with JTLB lookup (this can decrease the overhead of looking in JTLB if DTLB/ITLB miss occurs). If DTLB/ITLB miss occurs, almost immediately there is available apropriate hit (if any) from JTLB (if none, TLB MISS exception occurs and TLB refill takes place).

Now the questions:

The JTLB is quite simple, 48 pairs of 2 entries each (starting virtual address aligned to page_size*2), variable page size 4kB-16MB in "multiplication by 4" steps. The format of entries is well described and pretty straight forward. Manuals say that ITLB is 2 entries and DTLB 4 entries. Does it mean 2/4 pairs of 2 entries (meaning 4/8 entries) or 1/2 pairs of 2 entries (meaning 2/4 entries) or 2/4 single entries (each is of constant 4kB page in opposition to JTLB's variable size). And what is directly bound to previous question, refilling of ITLB/DTLB. It is said that's done "in pseudo-LRU manner (least recently used entry of least recently used half is filled)" - if it's stored in pairs as in JTLB, that means in most cases the entire pair will be overwritten, because it would create some false (thus very bad) positives, right? It's generally about what is called "half" in the quotation above (is this the same as "pair", or it's just about half of the ITLB/DTLB).

Despite of the answer, there is also one more thing I would like you to confirm. The method of using JTLB/ITLB/DTLB means that no multiple matches should be allowed (the cpu allows it, but the result will be undetermined) - even if one would place such entries in specific order, this always can be broken by ITLB/DTLB hit? Just asking, because such use would help a bit to not make that many JTLB replacements by conserving TLB entries (for example mapping of 8MB of which first 512kB is write protected would require 7 double entries 2x256kB-write-protected, 2x256kB, 2x256kB, 2x256kB, 2x1MB, 2x1MB, 2x1MB; where in the other way it could be 2x256kB-write-protected and overlapping 2x4MB, assuming both vaddr and paddr are properly aligned... but the second may lead to some undefined results, mainly meaning that the page that is to be protected by not setting dirty bit will sometimes, or even most of the time, be available to writing, not causing tlb modify exception)?

It's not that much important to me, mostly curious (current implementation, where TLB is reorganized from time to time is efficient enough).

Thanks.

Posted: Tue Dec 18, 2007 9:01 am
by Mega Man
JTLB is not described in the EE manual. It is just called TLB lookup.

A TLB entry consists always of 1 virtual page number and up to 2 page frame numbers (odd and even). When there is something written about a TLB entry, there is always a pair possible. This is a design issue (or mistake?).

DLTB/ITLB is only a cache. The cache can't rearrange anything. So you still have a pair of page frame numbers.

You can't add overlapping mappings, because this lead to strange exceptions. For example you can access an address 2 times, but not 3 times. The 4th time it is working again.

When you want to optimize something you should use the performance counters. There are counters for ITLB, DTLB and TLB misses.

Posted: Tue Dec 18, 2007 9:59 pm
by ffgriever
Mega Man wrote:JTLB is not described in the EE manual. It is just called TLB lookup.
True, but it's not about nomenclature, but usability and purpose. It's one joint table for both instruction and data fetch, soooo... Anyway, sony in their manuals was very vague when it comes to this part of mmu. It's basically like "well, such thing exists, but you don't really need it, and if you still need it, you surely know how to do it" way of thinking.
A TLB entry consists always of 1 virtual page number and up to 2 page frame numbers (odd and even). When there is something written about a TLB entry, there is always a pair possible. This is a design issue (or mistake?).
I believe that everyone would rather have 96 separate entries than 48 double entries.
DLTB/ITLB is only a cache. The cache can't rearrange anything. So you still have a pair of page frame numbers.
I can't agree with you on this one. It's cache, true... but the page size in dtlb/itlb entries is always 4kB, where in main table (jtlb) it can be of many different sizes... So it in fact is rearranged.
You can't add overlapping mappings, because this lead to strange exceptions. For example you can access an address 2 times, but not 3 times. The 4th time it is working again.
I didn't even try it yet... but wasn't aware that the result will be that strange (I would rather expect the cache playing its role, thus the predicted behavior I described in previous post). Just curious what may cause such behavior then.
When you want to optimize something you should use the performance counters. There are counters for ITLB, DTLB and TLB misses.
That's what I'm doing, changing only when there is a need (and it's also good to get a total miss % of all accesses, because itlb/dtlb is accessed even for unmapped areas - just jtlb isn't).

But I'm not sure, because sony manuals are not enough, and the rest I've read in manuals for similar cpus (some of them almost identical, even have seen some of them here as good references with minor changes).

Posted: Thu Dec 20, 2007 8:21 am
by Mega Man
ffgriever wrote:
DLTB/ITLB is only a cache. The cache can't rearrange anything. So you still have a pair of page frame numbers.
I can't agree with you on this one. It's cache, true... but the page size in dtlb/itlb entries is always 4kB, where in main table (jtlb) it can be of many different sizes... So it in fact is rearranged.
I didn't know that. I think it is possible to prove this by a simple test (access only memory from one big TLB, but from different 4kb pages and look at the performance counter).
This would explain why you have very much DLTB/ITLB misses. I think it's not really possible to work with 2/4 ITLB/DTLBs without performance impact. Overlapping mappings will not help you to increase the performance, because you still have only 4kb pages.