Translation Lookaside Buffers

The virtual->physical address translation operation sits on the critical path between the CPU and the cache. If every request for a memory location emanating from the processor required one or more accesses to main memory (to read page table entries) in addition to the access to fetch the requested datum, then our processor would be extremely slow! So high performance processors include a translation look-aside buffer – commonly abbreviated to TLB (occasionally TLAB).

The translation lookaside buffer (TLB) is a cache for page table entries. It works in much the same way as the data cache: it stores recently accessed page table entries. It also relies on locality of reference.

Since each TLB entry covers a whole page of physical memory (512-8Kbytes, commonly 4Kbytes), a relatively small number of TLB entries will cover a large amount of program memory. Some TLB sizes found in commercial processors are:

 

Processor Date Number
of
TLB entries
Organisation
MIPS R4000 1992 48
MIPS R10000 1996 64 Fully associative
PowerPC 601 1993 Data: 256 2-way set-associative
Inst: 4 Fully associative
HP PA7100 1993 120

As with caches, separate TLBs for the instruction and data streams have been provided on many modern processors. Early TLBs had just a handful of entries and it was common to find fully-associativeTLBs: the overhead in comparators and additional tag bits was relatively small and easily accommodated. For example, the PowerPC601 provides only 4 fully-associative entries in its instruction TLB. As the number of transistors available to a designer has increased, larger TLBs with more entries have become feasible, but the benefit of fully-associative organisations has not justified the additional transistors and set-associative organisations have become common. For example, the PowerPC601 UTLB’s 256 entries are arranged as a two-way set-associative cache.

Performance

The large coverage of main memory by each TLB entry means that TLB hit rates of 98% or more are readily achieved even with small TLBs. Spatial locality within the small number of words in a cache line already contributes significantly to high performance, so it is not surprising that locality within a page of, say, 4Kbytes is high. On the other hand, a TLB miss has a large potential cost (several memory accesses and the execution of the page fault handler) so hit rates of this order are essential for good performance.

Processing Memory References

The diagram below summarises the operations that are performed on an address emitted from a CPU as it passes through the various system caches.