Translation Lookaside Buffers

The virtual->physical address translation operation sits on the critical path between the CPU and the cache. If every request for a memory location emanating from the processor required one or more accesses to main memory (to read page table entries) in addition to the access to fetch the requested datum, then our processor would be extremely slow! So high performance processors include a translation look-aside buffer – commonly abbreviated to TLB (occasionally TLAB).

The translation lookaside buffer (TLB) is a cache for page table entries. It works in much the same way as the data cache: it stores recently accessed page table entries. It also relies on locality of reference.

Since each TLB entry covers a whole page of physical memory (512-8Kbytes, commonly 4Kbytes), a relatively small number of TLB entries will cover a large amount of program memory. Some TLB sizes found in commercial processors are:

Processor	Date	Number of TLB entries	Organisation
MIPS R4000	1992	48
MIPS R10000	1996	64	Fully associative
PowerPC 601	1993	Data: 256	2-way set-associative
PowerPC 601	1993	Inst: 4	Fully associative
HP PA7100	1993	120

As with caches, separate TLBs for the instruction and data streams have been provided on many modern processors. Early TLBs had just a handful of entries and it was common to find fully-associativeTLBs: the overhead in comparators and additional tag bits was relatively small and easily accommodated. For example, the PowerPC601 provides only 4 fully-associative entries in its instruction TLB. As the number of transistors available to a designer has increased, larger TLBs with more entries have become feasible, but the benefit of fully-associative organisations has not justified the additional transistors and set-associative organisations have become common. For example, the PowerPC601 UTLB’s 256 entries are arranged as a two-way set-associative cache.

Performance

The large coverage of main memory by each TLB entry means that TLB hit rates of 98% or more are readily achieved even with small TLBs. Spatial locality within the small number of words in a cache line already contributes significantly to high performance, so it is not surprising that locality within a page of, say, 4Kbytes is high. On the other hand, a TLB miss has a large potential cost (several memory accesses and the execution of the page fault handler) so hit rates of this order are essential for good performance.

Processing Memory References

The diagram below summarises the operations that are performed on an address emitted from a CPU as it passes through the various system caches.

Translation Lookaside Buffers

Translation Lookaside Buffers

Performance

Processing Memory References

Submit a Comment Cancel reply

Recent Posts