Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
HP-UX Floating-Point Guide: HP 9000 Computers > Chapter 7 Performance Tuning

Cache Aliasing

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

 » Index

HP 9000 systems employ high-speed cache memory to store the most recently used instructions and data. The location of instructions and data in the cache is a function of the n low-order bits of the address of the data or instruction. Consequently, instructions and data that have the same n low-order bits compete for the same cache space. Such competing objects are called cache aliases of each other.

Programs that contain loops in which two or more cache aliases are referenced will run more slowly than expected because the system must repeatedly swap aliased objects into and out of the cache.

Data cache aliasing can occur when two data objects that are far apart in virtual memory (so that their addresses have the same low-order bits but different high-order bits) are referenced in the same loop. Instruction cache aliasing, which is less likely to happen but has more serious effects, can occur when two routines invoked in a loop are aliases of each other. In this case, the invocation of the second routine forces the first routine out of the cache, and the next time through the loop the first routine pushes the second routine out of the cache.

Table 7-1 “Typical Instruction Cache Aliasing Situation” illustrates this situation using a 1K-byte cache example. If the cache addresses of subr1 and subr2 overlap significantly, instruction cache aliasing can produce a severe performance degradation.

Table 7-1 Typical Instruction Cache Aliasing Situation

Code

Description

Address

DO 10 I = 1,1000
CALL subr1
CALL subr2
10 CONTINUE
END

Main ­program loop

SUBROUTINE subr1
.
.
RETURN
END

First ­subroutine

P



P+L1
(L1 = length of
subr1 code)

SUBROUTINE subr2
.
.
RETURN
END

Second ­subroutine

Q



Q+L2
(L2 = length of
subr2 code)

 

Detecting problems with instruction cache aliases is an inexact process because the system does not keep records of cache hits and misses. You may notice, after making some modifications to one piece of code, that a seemingly unrelated routine runs more slowly than before. This can happen if your modifications change the virtual address map of your program and cause increased cache aliasing occurrences. Or you may suspect that a routine is running much more slowly than it should.

Fixing instruction cache problems is also difficult because you do not have much control over what virtual addresses are assigned to data and instructions. However, you can move routines around within a source module, and you can change the order in which object modules are joined by the linker. There are no hard and fast rules about how the linker orders various data and text sections, but it tends to place them in the same order in which you list them in the compiler or ld command line. You can take advantage of this fact by reordering the modules on the command line.

Data cache aliasing is a more common performance problem than instruction cache aliasing, but it is also easier to deal with. If you suspect that your application is experiencing data cache performance problems on scalar or small vectors of data, you can usually correct the situation by rearranging the order in which your variables are allocated in memory. In Fortran you can do this by reordering common block assignments.

If your application uses very large arrays of data, hundreds of kilobytes or megabytes in length, than chances are good that you will experience data cache aliasing problems no matter how your data arrays are allocated in memory, simply because the arrays are too big to fit in the data cache all at once. In this case, you may be able to improve performance by using the +Odataprefetch option, either alone or in conjunction with +Ovectorize. (See “Optimizing Your Program” for information about these options.)

Another way to improve performance for very large arrays is to increase the locality of references to your data. This technique, sometimes called ­tiling, requires selecting an algorithm that processes as much data as possible while the data is resident in the cache so as to minimize the number of times the data must be re-cached later. The routines in the BLAS library use tiling when they operate on large matrices of data. For example, in multiplying two large matrices, each matrix is cut into tiles, which are processed in pairs. The size of the tiles is determined at run time by the size of the matrices and the size of the data cache on the system executing the application.

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© 1997 Hewlett-Packard Development Company, L.P.